The Shape of the Vote: Exploring Congressional Districts with Computation
In the past few decades, the process of redistricting has moved squarely into the computational realm, and with it the political practice of gerrymandering. But how can one solve the problem of equal representation mathematically? And what can be done to test the fairness of districts? In this post I’ll take a deeper dive with the Wolfram Language—using data exploration with Import and Association, built-in knowledge through the Entity framework and various GeoGraphics visualizations to better understand how redistricting works, where issues can arise and how to identify the effects of gerrymandering.
Rules of Apportionment
In the US House of Representatives, each state is assigned a number of representatives based on its population through the process of apportionment (or reapportionment). On the surface, the rules for this process are simple: each state gets at least one representative, and representative seats must be redistributed at least once per decennial census.
Apportionment has been tried using various mathematical methods throughout history. Since the 1940 Census, representatives have been assigned using the method of equal proportions (the Huntington–Hill method). This means that the next available slot goes to the state with the highest priority , defined as:
TraditionalForm[Subscript[A, n]==P/GeometricMean[{n,n-1}]]
… where P is the population of the state and n is the number of districts already assigned to the state. You might recognize the denominator as the geometric mean of and
. It’s straightforward to implement symbolically:
Priority[pop_,n_]:=N[pop/GeometricMean[{n,n-1}]] |
Formula in hand, I’d like to run a simulation to compare to the current apportionment plan. First I’ll pull the 2010 population data from the Wolfram Knowledgebase (excluding the District of Columbia):
states=Complement[all US states with District of Columbia administrative divisions["Entities"],{Entity["AdministrativeDivision", {"DistrictOfColumbia", "UnitedStates"}]}]; |
statenames=StringDrop[#["Name"],-15]&/@states; |
popdata=AssociationThread[statenames->Table[QuantityMagnitude[Dated[s,2010]["Population"]],{s,states}]]; |
RandomChoice[Normal@popdata] |
It’s worth noting that these population counts are slightly different from the official reapportionment numbers, which include overseas residents for each state. The discrepancy is too small to make a difference in my apportionment computations, but it could be a topic for a more detailed exploration.
To start my simulation, I give each state one representative. The initial 50 are actually assigned before applying the formula, so I’ll set those initial priority values at Infinity:
init=Thread[statenames->∞]; |
From there, districts are assigned based on successively smaller priority values. Historically, no state has received more than 55 seats, so I’ll set the upper limit at 60:
pvalues=Flatten@Table[Normal@Priority[popdata,i],{i,2,60}]; |
![]() ✕
app=TakeLargestBy[Join[init,pvalues],Values[#]&,435]; |
![]() ✕
DistrictWeightMap[apportionment_]:=GeoRegionValuePlot[KeyMap[Interpreter["USState"],apportionment]//Normal,GeoRange->Entity["Country", "UnitedStates"],GeoProjection->"Mercator",GeoLabels->(Text[Style[#4,FontFamily->"Arabic Transparent",White,Medium],#3]&), ImageSize->1200,ColorFunction->(Which[#<.02,GrayLevel[0.6],.02<#<=.15,Darker@Blue,.15<#<=.4,Darker@Orange,#>.4,Darker@Red]&),PlotRange->{0,Length@apportionment}, PlotLegends->Histogram] |
![]() ✕
DistrictWeightMap[app] |
(I’ve left off Alaska and Hawaii here for easier viewing, but they have one and two districts, respectively.)
![]() ✕
Position[Normal[app],"Illinois"][[-1,1]] |
![]() ✕
Priority[#,17]&/@{#,#-40000.}&@popdata["Illinois"] |
![]() ✕
Select[pvalues,-10000<Values[#]-%[[2]] |
![]() ✕
Priority[#,38]&/@{#,#+3000000}&@popdata["Texas"] |
![]() ✕
Take[app,-10] |
![]() ✕
DistrictDifferenceMap[newapp_,oldapp_]:=GeoRegionValuePlot[Quiet[Normal@KeyMap[Interpreter["USState"],Merge[{newapp,oldapp},Subtract@@#&]]/.{Subtract[a__]:>a}], GeoProjection->"Mercator", GeoRange->Entity["Country", "UnitedStates"], ImageSize->540, GeoLabels->(Text[Style[#4,"Text",White,10,FontFamily->"Arabic Transparent"],#3]&),ColorRules->({_?Positive->Green,_?Negative->Red,_->Gray})] |
![]() ✕
latestpopdata=AssociationThread[statenames->Table[QuantityMagnitude[s["Population"]],{s,states}]]; |
![]() ✕
latestpvalues=Flatten@Table[Normal@Priority[latestpopdata,i],{i,2,60}]; |
![]() ✕
latestapp=TakeLargestBy[Join[init,latestpvalues],Values[#]&,435]; |
![]() ✕
DistrictDifferenceMap[ReverseSort@Counts[Keys@latestapp],ReverseSort@Counts[Keys@app]] |
![]() ✕
uspophistory=Dated[Entity["Country", "UnitedStates"],All]["Population"]; |
![]() ✕
DateListPlot[TimeSeriesWindow[uspophistory,{"1918",Today}]/435.,ColorFunction->"DarkRainbow",PlotRange->Full,PlotTheme->"Detailed"] |
![]() ✕
popperdist=ReverseSort@Association@Table[Interpreter["USState"][s]->N[popdata[s]/Counts[Keys@app][s]],{s,statenames}]; |
![]() ✕
GeoRegionValuePlot[popperdist,GeoProjection->"Mercator",GeoRange->Entity["Country", "UnitedStates"],ColorFunction->"TemperatureMap"] |
![]() ✕
newapp=TakeLargestBy[Join[init,Flatten@Table[Normal[Priority[#,i]&/@popdata],{i,2,1000}]],Values[#]&,Floor[Total[popdata]/40000.]]; |
![]() ✕
DistrictWeightMap[newapp] |
![]() ✕
newpopperdist=ReverseSort@Association@Table[Interpreter["USState"][s]->N[popdata[s]/Counts[Keys@newapp][s]],{s,statenames}]; |
![]() ✕
GeoRegionValuePlot[newpopperdist,GeoProjection->"Mercator",GeoRange->Entity["Country", "UnitedStates"],ColorFunction->"TemperatureMap"] |
Of course, apportionment is just the first step. Adding more seats would also mean adding more districts—and that would likely make the next stage a lot more complicated.
Redistricting by the Numbers
Since populations migrate and fluctuate, government officials are constitutionally required to redraw congressional districts following reapportionment. On its surface, this seems straightforward: divide each state into areas of equal population. But the reality can be deceptively complex.
![]() ✕
Times@@Binomial[Range[50.,10,-10],10]/2 |
This issue scales up with the size of the population; with the current population of the US, the number of ways to divide it into 435 equal districts (ignoring all other constraints) is truly astounding:
![]() ✕
(Times@@Binomial[Range[#1,#2,-#2],#2]/#2!)&@@{QuantityMagnitude[Entity["Country", "UnitedStates"]["Population"]],435} |
The latest district maps are available through the Wolfram Knowledgebase:
![]() ✕
current=KeyDrop[GroupBy[EntityList["USCongressionalDistrict"],#["USState"]&],{"DistrictOfColumbia",Missing["NotApplicable"],Missing["NotAvailable"]}]; |
![]() ✕
distpop=Table[DeleteMissing[#["Population"]&/@current[s]],{s,statenames}]; |
![]() ✕
Mean@Table[If[Length[v]>1,N@StandardDeviation[v]/Mean@v,0.],{v,distpop}] |
![]() ✕
iacounties=EntityClass["AdministrativeDivision", "USCountiesIowa"]; |
![]() ✕
Show[GeoListPlot[List/@current["Iowa"],PlotLegends->None],GeoListPlot[iacounties,PlotStyle->Directive[EdgeForm[Blue],FaceForm[Opacity[0]]]]] |
![]() ✕
nccounties=EntityClass["AdministrativeDivision", "USCountiesNorthCarolina"]; |
![]() ✕
Show[GeoListPlot[List/@current["NorthCarolina"],PlotLegends->None],GeoListPlot[nccounties,PlotStyle->Directive[EdgeForm[Blue],FaceForm[Opacity[0]]]]] |
This kind of irregular shape is considered one of the main indications of deliberate manipulation of districts (and indeed, North Carolina’s map is currently being contested in court), but that’s not to say that every oddly shaped district is gerrymandered. Crooked borders often evolve slowly as the demography of areas change subtly over time.
Drawing on Experience: Historical Maps
![]() ✕
Import[""] |
![]() ✕
c1=Association@First@Import["","Data"]; |
![]() ✕
The "LabeledData" element contains ordered information about individual districts:
![]() ✕
From there, I can create entries that associate each state name with its district numbers and geometry:
![]() ✕
entries= <|#[[1]]-><|ToExpression[#[[2]]]->#[[3]]|>|>&/@ Transpose[{ ld["STATENAME"], ld["DISTRICT"], Polygon[Cases[#,_GeoPosition,All]]&/@c1["Geometry"] }]; |
![]() ✕
statenames=Union[Keys@entries]//Flatten; |
![]() ✕
districts=Association@Table[Merge[Sort@Select[entries,StringMatchQ[First@Keys@#,s]&],Association],{s,statenames}]; |
![]() ✕
GeoListPlot[List/@Values@districts["Virginia"],PlotLegends->None] |
![]() ✕
Show@@Table[GeoListPlot[List/@Values@d,PlotLegends->None],{d,districts}] |
![]() ✕
CongressionalMapData[congressnumber_]:= Module[{baseURL="",raw,ld,entries,statenames},raw=Association@First@Import[baseURL<>"districts"<>StringPadLeft[ToString[congressnumber],3,"0"]<>".zip","Data"]; ld=Association@raw["LabeledData"]; entries= <|#[[1]]-><|ToExpression[#[[2]]]->#[[3]]|>|>&/@Transpose[{ld["STATENAME"],ld["DISTRICT"], Polygon[Cases[#,_GeoPosition,All]]&/@raw["Geometry"]}]; statenames=Union[Keys@entries]//Flatten; Association@Table[Merge[Sort@Select[entries,StringMatchQ[First@Keys@#,s]&],Association],{s,statenames}] ] |
![]() ✕
CongressNumber[year_]:=Floor[(year-1787)/2.] CongressionalMapData[year_?(#>1700&)]:=CongressionalMapData[CongressNumber[year]] |
![]() ✕
DistrictMap[statedata_]:=GeoListPlot[Table[{s},{s,statedata}],GeoLabels->(Tooltip[#1,FirstPosition[statedata,#1][[1,1]]]&),PlotLegends->None] |
![]() ✕
dist1918=CongressionalMapData[1918]; |
![]() ✕
N@Length@current["Illinois"]/Length@dist1918["Illinois"] |
This included one “at-large” representative that represented the entire state, rather than a particular district or area. In this data, such districts are numbered “0”:
![]() ✕
GeoGraphics[dist1918["Illinois",0]] |
![]() ✕
DistrictDifferenceMap[Length/@current,Length/@dist1918] |
![]() ✕
allmaps=Table[CongressionalMapData[cnum],{cnum,114}]; |
![]() ✕
frames = Table[{DistrictWeightMap[Length /@ Values /@ allmaps[[i]]], 1789 + 2 i - 1}, {i, 114}]; ListAnimate[Labeled[#1, Style[#2, "Section"], Top] & @@@ frames] |
![]() ✕
nydists=Table[{i,allmaps[[CongressNumber[i],"New York"]]},{i,1793,2013,10}]; |
![]() ✕
ListAnimate[ Labeled[DistrictMap[#2], Style[ToString[#1] <> ": " <> Capitalize@IntegerName[Length[#2]] <> " Districts", "Section"], Top] & @@@ nydists, AnimationRepetitions -> 1, AnimationRunning -> False] |
![]() ✕
nhdists=Table[{i,allmaps[[CongressNumber[i],"New Hampshire"]]},{i,1793,2013,10}]; |
![]() ✕
ListAnimate[ Labeled[DistrictMap[#2], Style[ToString[#1] <> ": " <> Capitalize@IntegerName[Length[#2]] <> If[Length[#2] == 1, " District", " Districts"], "Section"], Top] & @@@ nhdists, AnimationRepetitions -> 1, AnimationRunning -> False] |
![]() ✕
GeoListPlot[{Values[CongressionalMapData[1859]["Virginia"]], Values[CongressionalMapData[1863]["West Virginia"]]}] |
![]() ✕
GeoListPlot[{Values[CongressionalMapData[1869]["Virginia"]], Values[CongressionalMapData[1869]["West Virginia"]]}] |
![]() ✕
dist1859=CongressionalMapData[1859]; dist1873=CongressionalMapData[1873]; |
![]() ✕
DistrictDifferenceMap[Length/@dist1873,Length/@dist1859] |
For instance, after gaining three seats in 1990, Texas attempted to draw new majority-minority districts to represent both Hispanic and African American voters. In Bush v. Vera, the court ruled that two of the new districts (the 29th and 30th) and one newly manipulated district (the 18th) violated compactness principles too severely:
![]() ✕
dist1993=CongressionalMapData[1993]; Row@Table[Labeled[GeoGraphics[{Green,dist1993["Texas",i]},ImageSize->150],Style[i,"Text",Darker@Green,Bold],Top],{i,{18,29,30}}] |
![]() ✕
dist1997=CongressionalMapData[1997]; Row@Table[Labeled[GeoGraphics[{Green,dist1997["Texas",i]},ImageSize->150],Style[i,"Text",Darker@Green,Bold],Top],{i,{18,29,30}}] |
![]() ✕
mm=Import["","Data"]; aalist=mm[[1,1,4,3;;27]]; GeoRegionValuePlot[Table[ current[[StringDelete[aalist[[d,3]]," "],aalist[[d,4]]]]->Quantity[aalist[[d,2]]],{d,Length@aalist}],GeoRange->{{40.,25.}, {-95.,-75.}},GeoProjection->"Mercator"] |
![]() ✕
hisplist=mm[[1,1,6,3;;27]]; GeoRegionValuePlot[Table[ current[[StringDelete[hisplist[[d,3]]," "],hisplist[[d,4]]]]->Quantity[hisplist[[d,2]]],{d,Length@hisplist}],GeoRange->{{38,25},{-120,-95}},GeoProjection->"Mercator"] |
![]() ✕
N[Entity["City", {"Chicago", "Illinois", "UnitedStates"}]["Population"]/Entity["AdministrativeDivision", {"Illinois", "UnitedStates"}]["Population"]] |
A look at the map shows that the city itself sprawls across nearly half the state’s 18 districts in order to distribute that population:
![]() ✕
Show[GeoListPlot[List/@Most[current["Illinois"]],PlotLegends->None], GeoGraphics[{FaceForm[Directive[Opacity[1.],Black]],EdgeForm[White],Entity["City", {"Chicago", "Illinois", "UnitedStates"}]["Polygon"]}],GeoRange->Entity["City", {"Chicago", "Illinois", "UnitedStates"}]] |
![]() ✕
dist1865=CongressionalMapData[1865]; Length@dist1865["Illinois"] |
![]() ✕
Show[GeoListPlot[List/@Values@dist1865["Illinois"],PlotLegends->None], GeoGraphics[{FaceForm[Directive[Opacity[1.],Black]],EdgeForm[White],Dated[Entity["City", {"Chicago", "Illinois", "UnitedStates"}],1823]["Polygon"]}],GeoRange->Entity["City", {"Chicago", "Illinois", "UnitedStates"}]] |
Gerrymandering and the Supreme Court
I found comprehensive election data in PDF format from the Clerk of the House. I tried various methods for importing these; in the end I created a package that uses string patterns to sort through election information:
![]() ✕
<<ElectionData` |
The package allows me to import election data by state and year (starting in 1998) as a Dataset:
![]() ✕
ildata=RepresentativeVotesDataset["Illinois",2014] |
![]() ✕
PartyVotes[electiondata_]:=With[{votes=GroupBy[Select[electiondata,StringMatchQ[#["Party"],"Republican"|"Democrat"]&],"District"]},Table[<|#["Party"]->(#["Votes"])&/@Normal@votes[i,All]|>,{i,Length@votes}]] |
![]() ✕
ilvotes=PartyVotes[ildata]; Total@ilvotes/Total@ildata[[All,"Votes"]]//N |
![]() ✕
Show[GeoRegionValuePlot[Thread[Most[current["Illinois"]]->(KeySort@N[#/Total[#]]&/@ilvotes)[[All,1]]],ColorFunction->(Blend[{Red,Blue},#]&), PlotRange->{0,1}], GeoGraphics[{FaceForm[Directive[Opacity[1.],Green]],EdgeForm[White],Entity["City", {"Chicago", "Illinois", "UnitedStates"}]["Polygon"]}]] |
And aside from a few “purple” bi-state areas, the irregular districts in Chicago appear to tip the balance for Democrats. While no case has been brought forth in Illinois, most critics point to the earmuff-shaped fourth district as a prime example of extreme gerrymandering:
![]() ✕
GeoGraphics[{Green,Polygon@current[["Illinois",4]]}] |
The range considered acceptable for each test can be subjective, but each measure gives a value between 0 and 1. Looking at the distribution of each test among the states, you can get a good sense of what’s average:
![]() ✕
Multicolumn[{CloudGet[""], CloudGet[""], CloudGet[""], CloudGet[""]}, ItemSize -> Full] |
Here are some of the least compact districts in the country, according to Marco’s computations:
Application of these and similar geometric tests has led several courts to strike down district maps that lack compactness (like in Texas). But there’s no single way to measure compactness, and some odd shapes are due to natural boundaries and other non-political factors.
The first case, Gill v. Whitford, takes a practical approach to the problem: if partisan gerrymandering is the issue, they reason, perhaps it needs a partisan-based solution. Originating in a Wisconsin state court, the plaintiffs presented a case in October 2017 based on a new measure of partisan bias proposed by Nicholas Stephanopoulos and Eric McGhee called efficiency gap. The formula is best summarized as the difference in the total number of wasted votes for each party—including votes cast for a losing candidate and surplus votes cast for a winning candidate—over the total votes cast:
![]() ✕
TraditionalForm[EG==(HoldForm@(Subscript[lost, A]+Subscript[surplus, A])-HoldForm@(Subscript[lost, B]+Subscript[surplus, B]))/(total votes)] |
By assuming equal population per district and a two-party system, this formula is conveniently reduced to the difference between a party’s seat margin (percentage of seats over 50%) and twice its vote margin:
![]() ✕
TraditionalForm[EG=="seat margin" - 2 *"vote margin"] |
![]() ✕
SeatMargin[electiondata_]:=With[{pv=PartyVotes[electiondata]},N@(Counts[Flatten@Keys[TakeLargest[#,1]&/@pv]]-Length@pv/2)/Length@pv] VoteMargin[electiondata_]:=N@#/Total[#]&@Merge[PartyVotes[electiondata],Total]-.5 |
For congressional districts, the efficiency gap is given in seats. Here’s an implementation of the simplified efficiency gap formula with positive numbers indicating a Democratic advantage and negative indicating a Republican advantage:
![]() ✕
EfficiencyGap[electiondata_]:=Length[GroupBy[electiondata,"District"]] *(KeySort[SeatMargin[electiondata]]-2 KeySort[VoteMargin[electiondata]]) |
![]() ✕
Table[With[{data=GroupBy[RepresentativeVotesDataset[state,{1998,2016}],"Year"]},DateListPlot[Transpose[{DateRange[{1998},{2016},2yr],Table[EfficiencyGap@data[[i]],{i,Length@data}][[All,1]]}],PlotTheme->"Scientific"]],{state,{"Michigan","Michigan","North Carolina","Ohio","Pennsylvania","Texas","Virginia"}}] |
![]() ✕
widata=GroupBy[RepresentativeVotesDataset["Wisconsin",{1998,2016}],"Year"]; |
![]() ✕
DateListPlot[Transpose[{DateRange[{1998},{2016},2yr],Table[EfficiencyGap@widata[[i]],{i,Length@widata}][[All,1]]}],PlotTheme->"Scientific"] |
![]() ✕
widists=Table[CongressionalMapData[i]["Wisconsin"],{i,2000,2016,4}]; |
![]() ✕
wivotes=Table[PartyVotes[RepresentativeVotesDataset["Wisconsin",i]],{i,2000,2016,4}]; |
![]() ✕
Grid[{Text/@Range[2000,2016,4], Table[GeoRegionValuePlot[Thread[Values[widists[[i]]]->(KeySort@N[#/Total[#]]&/@wivotes[[i]])[[All,1]]], PlotLegends->None, ColorFunction->(Blend[{Red,Blue},#]&), ImageSize->100],{i,Length@widists}]}] |
![]() ✕
mddata=GroupBy[RepresentativeVotesDataset["Maryland",{1998,2016}],"Year"]; |
![]() ✕
DateListPlot[Transpose[{DateRange[{1998},{2016},2yr],Table[KeySort@EfficiencyGap@mddata[[i]],{i,Length@mddata}][[All,1]]}],PlotTheme->"Scientific"] |
![]() ✕
mddists=Table[CongressionalMapData[i]["Maryland"],{i,2000,2016,8}]; |
![]() ✕
mdvotes=Table[PartyVotes[RepresentativeVotesDataset["Maryland",i]],{i,2000,2016,8}]; |
![]() ✕
Grid[{Text/@Range[2000,2016,8],GeoGraphics/@Transpose[{GeoStyling[Blend[{Red,Blue},#]]&/@(KeySort@N[#/Total[#]]&/@mdvotes[[All,6]])[[All,1]],mddists[[All,6]]}]}] |
Suffice it to say, the gerrymandering issue is coming to a head. With these three cases combined—as well as recent decisions in North Carolina and Pennsylvania, a ballot initiative in Michigan and all kinds of academic discussions around the country—the stage is set for the Supreme Court to make changes in how redistricting is regulated. Unfortunately, they’ve opted to pass on both partisan gerrymandering cases on technical grounds, so we will likely have to wait until next session to get a major decision.
Gerrymandering is a complex subject with a deep history, and this post only scratches the surface. Exploring with the Wolfram Language helped me pull everything together easily and discover a lot of intricacies I wouldn’t have otherwise found. Now that I’ve collected all the data in one place, I invite you to do your own exploration. Go find out the history of your district, explore measures of fairness and partition states as you see fit—just don’t forget to go out and vote this November!
I love this post Brian! Great work! Hopefully Eric Holder will see it!
This is an excellent article and perfect timing as regards the recent Supreme court ruling as regards gerrymandered Texas districts. Clearly the Supremes needed to have an article like this or they wouldn’t have decided to reject basic democracy by allowing votes to be given unequal representation.
I love this article.
Cool. Thanks for showing what wonders Mathematica can do !
It’s worth noting that the districts drawn in the Maryland case aren’t compact because the state itself isn’t compact. Perhaps the compactness metric should compare the district to the circumscribing shape *minus* the points outside the state.
I had the same issue as Ruben. You can download the notebook if you log into the Wolfram Cloud, provided you have a Wolfram account.
Hi Ruben:
I have tested the link and it seems to download fine. This is likely a permissions issue on your computer. I’ll check with a few individuals here at Wolfram and see if they might be able to offer any other possible reasons for you receiving this message.