Deploying and Sharing: Web Scraping with the Wolfram Language, Part 3
So far in this series, I’ve covered the process of extracting, cleaning and structuring data from a website. So what does one do with a structured dataset? Continuing with the Election Atlas data from the previous post, this final entry will talk about how to store your scraped data permanently and deploy results to the web for universal access and sharing.
A Permanent Place for Your Data
Starting with the structured data from the previous post, we can create a data resource to stash our results. This makes it easy to reference the content quickly for later use. It also saves computational resources—rather than reevaluating all those web-scraping computations, we can retrieve them immediately in a convenient, well-defined manner.
To start creating a data resource in Version 11.3 of Mathematica, go to . This will open a resource object template with fillable slots for describing the dataset, tweaking its structure and content and customizing how it can be accessed. First we’ll fill in a bit of information about the data being submitted. Add a title and description to explain the purpose of the resource:
For private resources, this is all the information needed. But we can also add other metadata, such as info about the contributor, the original data source (in this case http://uselectionatlas.org), related resources and the type and scope of the resource:
Moving down in the template notebook, we can add our existing web-scraping code (i.e. the ElectionAtlasData function) directly under Construction Area:
In the Content Element Initialization area, we add functions to help define some of the elements we want to add. Along with a modified version of the VoteMap code from the previous post, let’s add a simple function for computing the date of each election:
Engage with the code in this post by downloading the Wolfram Notebook
✕
ElectionDate[year_] := Interpreter["ComputedDate"][ "first Tuesday after Nov 1 " <> ToString[year]] |
We can also augment our dataset with a quick summary of each election, extracted using WikipediaData (conveniently, Wikipedia has very consistent page titles):
✕
ElectionSummary[year_] := First@TextCases[ WikipediaData["U.S. presidential election, " <> ToString[year], "SummaryPlaintext"], "Line"] |
Next, we define an Association that represents the full data to include for each entry:
✕
ElectionDataElements[year_] := With[{results = ElectionAtlasData[year]}, year -> <| "Date" -> ElectionDate[year], "Summary" -> ElectionSummary[year], "Candidates" -> Rest@Normal@Keys[results[[1]]], "CandidateTotals" -> Total@results[[All, 2 ;;]], "VoteMap" -> VoteMap[results, year], "VoteCounts" -> Normal[First[#]["StateAbbreviation"] -> # & /@ results] |>]; |
Since we’re working with a fairly large dataset, we can speed up the final deployment by using CloudExport to generate a serialized CloudObject:
✕
CloudExport[<| Table[ElectionDataElements[year], {year, 1824, 2016, 4}]|>, "MX", "ElectionData", Permissions -> "Public"] |
The Content Elements section contains the code for building the full resource—in this case, everything contained in the previously shown CloudObject:
✕
$$Object["FullContent"] = DataResource`$$ContentConversion@<| "Data" -> CloudObject["ElectionData"]|>; |
Under Default Element Specification, we can tell the system what element to use when ResourceData is called on the object. In this case, the only top-level element is "Data":
✕
$$Object["DefaultContentElement"] = "Data"; |
Jumping down to the Create Resource Object section, execute the following code to generate the resource object:
✕
$$ResourceObject = ResourceObject[EvaluationNotebook[]] |
The subheadings in the Deploy Resource Object section represent various ways of deploying a data resource; in this case we’ll deploy publicly to the Wolfram Cloud so we can connect our resource to a web deployment:
✕
CloudDeploy[$$ResourceObject, "ElectionResource", Permissions -> "Public"] |
Now that the resource has been deployed, it can be accessed directly using ResourceObject:
✕
ResourceObject["https://www.wolframcloud.com/objects/bwood/\ ElectionResource"] |
To access the full data (i.e. the "DefaultElement"), use ResourceData—as mentioned in the previous post, Dataset provides a convenient structure for viewing an entry:
✕
ResourceData[ "https://www.wolframcloud.com/objects/bwood/ElectionDataTest"][[-1]\ ] // Dataset |
Designing a Dashboard
Next, let’s make a nice, clean layout for displaying the information from a given election. Using DateString, we can customize the display format for showing election dates:
✕
MDYFormat[d_] := DateString[d, {"MonthName", " ", "DayShort", ", ", "Year"}] |
Our summary text can be displayed neatly in a Panel:
✕
Panel[Style[data[[-1]]["Summary"], "Text", LineIndent -> 0], ImageSize -> 500] |
NumberForm is useful for formatting large numbers; we’ll set DigitBlock to 3 to insert comma delimiters:
✕
FormatVoteTotal[total_] := Style[ToString@NumberForm[total, DigitBlock -> 3], "Text"] |
We can then pass everything into a Grid with custom Style settings for optimal display:
✕
ElectionResultsGrid[data_] := Grid[Join[{ Join[{""}, Style[#, "Subsection"] & /@ data["Candidates"]], Join[{Style["National", Bold, "Text"]}, FormatVoteTotal /@ Values@data["CandidateTotals"]]}, Flatten[{ Style[Keys@#, Bold], FormatVoteTotal /@ Values[#[[2, 2 ;; All]]]} ] & /@ data["VoteCounts"]], BaseStyle -> "Text"] |
Lastly, we stack the results vertically using Column:
✕
ElectionDataGrid[totals_] := Column[{ Style[MDYFormat[totals["Date"]], "Title"], totals["VoteMap"], Panel[Style[totals["Summary"], "Text", LineIndent -> 0], ImageSize -> 500], Style["Vote Totals", "Section"], ElectionResultsGrid[totals] }] |
The result is a clean summary of a given Election Atlas entry:
✕
ElectionDataGrid[ Last@ResourceData[ "https://www.wolframcloud.com/objects/bwood/ElectionResource"]] |
Deploying and Sharing
Finally, it’s time to create an interactive browser for sharing our results. Using FormPage, we can make a dynamic form that will import the data resource and display our information grid (adding a title using AppearanceRules):
✕
fp = FormPage[<|{"Year", "Select a Year"} -> AutoSubmitting@<| "Interpreter" -> ResourceData[ "https://www.wolframcloud.com/objects/bwood/\ ElectionResource"], "Control" -> PopupMenu |> |>, ElectionDataGrid[#Year] &, AppearanceRules -> <| "Title" -> "US Presidential Election Results"|> ]; |
For viewing and testing within a desktop notebook, a scrollable Pane is a useful way to display this form:
✕
Pane[fp, Scrollbars -> {False, True}, Alignment -> {Center, Top}, ImageSize -> {530, 530}] |
Using CloudDeploy, we can create a web version of the form that is accessible to anyone:
✕
CloudDeploy[fp, "ElectionDataBrowser", Permissions -> "Public"] |
Once this webpage is live, it provides continuous access to the new data resource. Any time the resource is updated, the deployment will pick up the new data as well.
Final Thoughts
Throughout this series, we’ve covered a full data science workflow—importing and exploring, cleaning and structuring data and finally creating a permanent cloud resource with an interactive web interface. Notably, everything here was done within the Wolfram ecosystem from start to finish, using built-in functionality and remarkably little code.
Although the result in this case is a simple display of historical information, it’s easy to apply the same strategies toward financial dashboards, image processing, linguistic analyses and other advanced deployments. With the breadth of algorithms and visualizations in the Wolfram Language, the possibilities are endless!
For more detail on the functions you read about here, see the Set Up a Personal Data Resource, Make a Grid of Output Data and Set Up a Repeated-Use Form Page workflows.
Download the data resource as a Wolfram Notebook.
Indeed, with remarkably little code you can achieve some great results! I’m still stuck at part2 but will soon get it working, i hope lol
Kate