Wolfram Blog
Brian Wood

Deploying and Sharing: Web Scraping with the Wolfram Language, Part 3

January 10, 2019 — Brian Wood, Lead Technical Marketing Writer, Document and Media Systems

So far in this series, I’ve covered the process of extracting, cleaning and structuring data from a website. So what does one do with a structured dataset? Continuing with the Election Atlas data from the previous post, this final entry will talk about how to store your scraped data permanently and deploy results to the web for universal access and sharing.

Deploying and Sharing with the Wolfram Language

A Permanent Place for Your Data

Starting with the structured data from the previous post, we can create a data resource to stash our results. This makes it easy to reference the content quickly for later use. It also saves computational resources—rather than reevaluating all those web-scraping computations, we can retrieve them immediately in a convenient, well-defined manner.

To start creating a data resource in Version 11.3 of Mathematica, go to . This will open a resource object template with fillable slots for describing the dataset, tweaking its structure and content and customizing how it can be accessed. First we’ll fill in a bit of information about the data being submitted. Add a title and description to explain the purpose of the resource:

Presidential Election Results, 1824-2016

For private resources, this is all the information needed. But we can also add other metadata, such as info about the contributor, the original data source (in this case http://uselectionatlas.org), related resources and the type and scope of the resource:

Metadata

Moving down in the template notebook, we can add our existing web-scraping code (i.e. the ElectionAtlasData function) directly under Construction Area:

Construction Area

In the Content Element Initialization area, we add functions to help define some of the elements we want to add. Along with a modified version of the VoteMap code from the previous post, let’s add a simple function for computing the date of each election:

ElectionDate
&#10005

ElectionDate[year_] :=
 Interpreter["ComputedDate"][
  "first Tuesday after Nov 1 " <> ToString[year]]

We can also augment our dataset with a quick summary of each election, extracted using WikipediaData (conveniently, Wikipedia has very consistent page titles):

ElectionSummary
&#10005

ElectionSummary[year_] :=
 First@TextCases[
   WikipediaData["U.S. presidential election, " <> ToString[year],
    "SummaryPlaintext"], "Line"]

Next, we define an Association that represents the full data to include for each entry:

ElectionDataElements
&#10005

ElectionDataElements[year_] :=
  With[{results = ElectionAtlasData[year]},
   year -> <|
     "Date" -> ElectionDate[year],
     "Summary" -> ElectionSummary[year],
     "Candidates" -> Rest@Normal@Keys[results[[1]]],
     "CandidateTotals" -> Total@results[[All, 2 ;;]],
     "VoteMap" -> VoteMap[results, year],
     "VoteCounts" ->
      Normal[First[#]["StateAbbreviation"] -> # & /@ results]
     |>];

Since we’re working with a fairly large dataset, we can speed up the final deployment by using CloudExport to generate a serialized CloudObject:

CloudExport
&#10005

CloudExport[<|
  Table[ElectionDataElements[year], {year, 1824, 2016,
    4}]|>, "MX", "ElectionData", Permissions -> "Public"]

The Content Elements section contains the code for building the full resource—in this case, everything contained in the previously shown CloudObject:

$$Object
&#10005

$$Object["FullContent"] =
  DataResource`$$ContentConversion@<|
    "Data" -> CloudObject["ElectionData"]|>;

Under Default Element Specification, we can tell the system what element to use when ResourceData is called on the object. In this case, the only top-level element is "Data":

$$Object["DefaultContentElement"]
&#10005

$$Object["DefaultContentElement"] = "Data";

Jumping down to the Create Resource Object section, execute the following code to generate the resource object:

$$ResourceObject=ResourceObject
&#10005

$$ResourceObject = ResourceObject[EvaluationNotebook[]]

The subheadings in the Deploy Resource Object section represent various ways of deploying a data resource; in this case we’ll deploy publicly to the Wolfram Cloud so we can connect our resource to a web deployment:

CloudDeploy
&#10005

CloudDeploy[$$ResourceObject, "ElectionResource",
 Permissions -> "Public"]

Now that the resource has been deployed, it can be accessed directly using ResourceObject:

ResourceData
&#10005

ResourceObject["https://www.wolframcloud.com/objects/bwood/\
ElectionResource"]

To access the full data (i.e. the "DefaultElement"), use ResourceData—as mentioned in the previous post, Dataset provides a convenient structure for viewing an entry:

ResourceData
&#10005

ResourceData[
   "https://www.wolframcloud.com/objects/bwood/ElectionDataTest"][[-1]\
] // Dataset

VoteMap

Designing a Dashboard

Next, let’s make a nice, clean layout for displaying the information from a given election. Using DateString, we can customize the display format for showing election dates:

MDYFormat
&#10005

MDYFormat[d_] :=
 DateString[d, {"MonthName", " ", "DayShort", ", ", "Year"}]

Our summary text can be displayed neatly in a Panel:

Panel
&#10005

Panel[Style[data[[-1]]["Summary"], "Text", LineIndent -> 0],
 ImageSize -> 500]

NumberForm is useful for formatting large numbers; we’ll set DigitBlock to 3 to insert comma delimiters:

FormatVoteTotal
&#10005

FormatVoteTotal[total_] :=
 Style[ToString@NumberForm[total, DigitBlock -> 3], "Text"]

We can then pass everything into a Grid with custom Style settings for optimal display:

ElectionResultsGrid
&#10005

ElectionResultsGrid[data_] := Grid[Join[{
    Join[{""}, Style[#, "Subsection"] & /@ data["Candidates"]],
    Join[{Style["National", Bold, "Text"]},
     FormatVoteTotal /@ Values@data["CandidateTotals"]]},
   Flatten[{
       Style[Keys@#, Bold],
       FormatVoteTotal /@ Values[#[[2, 2 ;; All]]]}
      ] & /@ data["VoteCounts"]],
  BaseStyle -> "Text"]

Lastly, we stack the results vertically using Column:

ElectionDataGrid
&#10005

ElectionDataGrid[totals_] :=
 Column[{
   Style[MDYFormat[totals["Date"]], "Title"],
   totals["VoteMap"],
   Panel[Style[totals["Summary"], "Text", LineIndent -> 0],
    ImageSize -> 500],
   Style["Vote Totals", "Section"],
   ElectionResultsGrid[totals]
   }]

The result is a clean summary of a given Election Atlas entry:

ElectionDataGrid'
&#10005

ElectionDataGrid[
 Last@ResourceData[
   "https://www.wolframcloud.com/objects/bwood/ElectionResource"]]

November 8, 2016

Deploying and Sharing

Finally, it’s time to create an interactive browser for sharing our results. Using FormPage, we can make a dynamic form that will import the data resource and display our information grid (adding a title using AppearanceRules):

fp=FormPage
&#10005

fp = FormPage[<|{"Year", "Select a Year"} ->
     AutoSubmitting@<|

       "Interpreter" ->
        ResourceData[
         "https://www.wolframcloud.com/objects/bwood/\
ElectionResource"],
       "Control" -> PopupMenu
       |>
    |>,
   ElectionDataGrid[#Year] &,
   AppearanceRules -> <|
     "Title" -> "US Presidential Election Results"|>
   ];

For viewing and testing within a desktop notebook, a scrollable Pane is a useful way to display this form:

Pane
&#10005

Pane[fp, Scrollbars -> {False, True}, Alignment -> {Center, Top},
 ImageSize -> {530, 530}]

Using CloudDeploy, we can create a web version of the form that is accessible to anyone:

CloudDeploy
&#10005

CloudDeploy[fp, "ElectionDataBrowser", Permissions -> "Public"]

Once this webpage is live, it provides continuous access to the new data resource. Any time the resource is updated, the deployment will pick up the new data as well.

US Presidential Election Results

Final Thoughts

Throughout this series, we’ve covered a full data science workflow—importing and exploring, cleaning and structuring data and finally creating a permanent cloud resource with an interactive web interface. Notably, everything here was done within the Wolfram ecosystem from start to finish, using built-in functionality and remarkably little code.

Although the result in this case is a simple display of historical information, it’s easy to apply the same strategies toward financial dashboards, image processing, linguistic analyses and other advanced deployments. With the breadth of algorithms and visualizations in the Wolfram Language, the possibilities are endless!

For more detail on the functions you read about here, see the Set Up a Personal Data Resource, Make a Grid of Output Data and Set Up a Repeated-Use Form Page workflows.

Download the data resource as a Wolfram Notebook.

Leave a Comment

No Comments




Leave a comment