Wolfram Blog
Aaron Enright
Eric Weisstein

Computational Exploration of the Mathematics Genealogy Project in the Wolfram Language

August 2, 2018
Aaron Enright, Data Scientist, Wolfram|Alpha Socioeconomic Content
Eric Weisstein, Senior Researcher, Wolfram|Alpha Scientific Content

The Mathematics Genealogy Project (MGP) is a project dedicated to the compilation of information about all mathematicians of the world, storing this information in a database and exposing it via a web-based search interface. The MGP database contains more than 230,000 mathematicians as of July 2018, and has continued to grow roughly linearly in size since its inception in 1997.

In order to make this data more accessible and easily computable, we created an internal version of the MGP data using the Wolfram Language’s entity framework. Using this dataset within the Wolfram Language allows one to easily make computations and visualizations that provide interesting and sometimes unexpected insights into mathematicians and their works. Note that for the time being, these entities are defined only in our private dataset and so are not (yet) available for general use.

The search interface to the MGP is illustrated in the following image. It conveniently allows searches based on a number of common fields, such as parts of a mathematician’s name, degree year, Mathematics Subject Classification (MSC) code and so on:

Search

For a quick look at the available data from the MGP, consider a search for the prolific mathematician Paul Erdős made by specifying his first and last names in the search interface. It gives this result:

Search results

Clicking the link in the search result returns a list of available data:

Available data

Note that related mathematicians (i.e. advisors and advisees) present in the returned database results are hyperlinked. In contrast, other fields (such as school, degree years and so on), are not. Clearly, the MGP catalogs a wealth of information of interest to anyone wishing to study the history of mathematicians and mathematical research. Unfortunately, only relatively simple analyses of the underlying data are possible using a web-based search interface.

Explore Mathematicians

For those readers not familiar with the Wolfram Language entity framework, we begin by giving a number of simple examples of its use to obtain information about the "MGPPerson" entities we created. As a first simple computation, we use the EntityValue function to obtain a count of the number of people in the "MGPPerson" domain:

&#10005

EntityValue["MGPPerson","EntityCount"]

Note that this number is smaller than the 230,000+ present in the database due to subsequent additions to the MGP. Similarly, we can return a random person:

&#10005

person=RandomEntity["MGPPerson"]

Mousing over an “entity blob” such as in the previous example gives a tooltip showing the underlying Wolfram Language representation.

We can also explicitly look at the internal structure of the entity:

&#10005

InputForm[person]

Copying, pasting and evaluating that expression to obtain the formatted version again:

&#10005

Entity["MGPPerson","94172"]

We now extract the domain, canonical name and common name of the entity programmatically:

&#10005

Through[{EntityTypeName,CanonicalName,CommonName}[person]]//InputForm

We can simultaneously obtain a set of random people from the "MGPPerson" domain:

&#10005

RandomEntity["MGPPerson",10]

To obtain a list of properties available in the "MGPPerson" domain, we again use EntityValue:

&#10005

properties=EntityValue["MGPPerson","Properties"]

As we did for entities, we can view the internal structure of the first property:

&#10005

InputForm[First[properties]]

We can also view the string of canonical names of all the properties:

&#10005

CanonicalName[properties]

The URL to the relevant MGP page is available directly as its own property, which can be done concisely as:

&#10005

EntityValue[person,"MathematicsGenealogyProjectURL"]

… with an explicit EntityProperty wrapper:

&#10005

EntityValue[person,EntityProperty["MGPPerson","MathematicsGenealogyProjectURL"]]

… or using a curried syntax:

&#10005

person["MathematicsGenealogyProjectURL"]

We can also return multiple properties:

&#10005

person[{"AdvisedBy","Degrees","DegreeDates","DegreeSchoolEntities"}]

Another powerful feature of the Wolfram Language entity framework is the ability to create an implicitly defined Entity class:

&#10005

EntityClass["MGPPerson","Surname"->"Nelson"]

Expanding this class, we obtain a list of people with the given surname:

&#10005

SortBy[EntityList[EntityClass["MGPPerson","Surname"->"Nelson"]],CommonName]

To obtain an overview of data for a given person, we can copy and paste from that list and query for the "Dataset" property using a curried property syntax:

&#10005

Entity["MGPPerson", "174871"]["Dataset"]

As a first simple computation, we use the Wolfram Language function NestGraph to produce a ten-generation-deep mathematical advisor tree for mathematician Joanna “Jo” Nelson:

&#10005

NestGraph[#["AdvisedBy"]&,Entity["MGPPerson", "174871"],10,VertexLabels->Placed["Name",After,Rotate[#,30 Degree,{-3.2,0}]&]]

Using an implicitly defined EntityClass, let’s now look up people with the last name “Hardy”:

&#10005

EntityList[EntityClass["MGPPerson","Surname"->"Hardy"]]

Having found the Hardy we had in mind, it is now easy to make a mathematical family tree for the descendants of G. H. Hardy, highlighting the root scholar:

&#10005

With[{scholar=Entity["MGPPerson", "17806"]},
HighlightGraph[
NestGraph[#["Advised"]&,scholar,2,VertexLabels->Placed["Name",After,Rotate[#,30 Degree,{-3.2,0}]&],ImageSize->Large,GraphLayout->"RadialDrawing"],
scholar]
]

A fun example of the sort of computation that can easily be performed using the Wolfram Language is visualizing the distribution of mathematicians based on first and last initials:

&#10005

Histogram3D[Select[Flatten[ToCharacterCode[#]]&/@Map[RemoveDiacritics@StringTake[#,1]&,DeleteMissing[EntityValue["MGPPerson",{"GivenName","Surname"}],1,2],{2}],(65<=#[[1]]<=90&&65<=#[[2]]<=90)&],AxesLabel->{"given name","surname"},Ticks->({#,#,Automatic}&[Table[{j,FromCharacterCode[j]},{j,65,90}]])]

As one might expect, mathematician initials (as well as those of all people in general) are not uniformly distributed with respect to the alphabet.

Explore Locations

The Wolfram Language contains a powerful set of functionality involving geographic computation and visualization. We shall make heavy use of such functionality in the following computations.

It is interesting to explore the movement of mathematicians from the institutions where they received their degrees to the institutions at which they did their subsequent advising. To do so, first select mathematicians who received a degree in the 1980s:

&#10005

p1980=Select[DeleteMissing[EntityValue["MGPPerson",{"Entity",EntityProperty["MGPPerson","DegreeDates"]}],1,2],1980

Find where their students received their degrees:

&#10005

unitransition[person_]:=Module[{ds="DegreeSchoolEntities",advisoruni,adviseeunis},advisoruni=person[ds];
adviseeunis=#[ds]&/@DeleteMissing[Flatten[{person["Advised"]}]];
{advisoruni,adviseeunis}]

Assume the advisors were local to the advisees:

&#10005

moves=Union[Flatten[DeleteMissing[Flatten[Outer[DirectedEdge,##]&@@@(unitransition/@Take[p1980,All]),2],2,1]]];

Now show the paths of the advisors:

&#10005

GeoGraphics[{Thickness[0.001],Opacity[0.1],Red,Arrowheads[0.01],Arrow@GeoPath[List@@#]&/@moves},GeoRange->"World",GeoBackground->"StreetMapNoLabels"]//Quiet

Explore Degrees

We can also perform a number of computations involving mathematical degrees. As with the "MGPPerson" domain, we first briefly explore the contents of the "MGPDegree" domain and show how to access them.

To begin, show a count of the number of theses in the "MGPDegree" domain:

&#10005

EntityValue["MGPDegree","EntityCount"]

List five random theses from the "MGPDegree" domain:

&#10005

RandomEntity["MGPDegree",5]

Show available "MGPDegree" properties:

&#10005

EntityValue["MGPDegree","Properties"]

Return a dataset of an "MGPDegree" entity:

Entity
&#10005

Entity["MGPDegree", "120366"]["Dataset"]

Moving on, we now visualize the historical numbers of PhDs awarded worldwide:

&#10005

DateListLogPlot[phddata={#[[1,1]],Length[#]}&/@GatherBy[Cases[EntityValue["MGPDegree",{"Date","DegreeType"}],{_DateObject,"Ph.D."}],First],
PlotRange->{DateObject[{#}]&/@{1800,2010},All},
GridLines->Automatic]

We can now make a fit to the number of new PhD mathematicians over the period 1875–1975:

&#10005

fit=Fit[Select[{#1["Year"],1. Log[2,#2]}&@@@phddata,1875<#[[1]]<1975&],{1,y},y]

This gives a doubling time of about 1.5 decades:

&#10005

Quantity[1/Coefficient[fit,y],"Years"]

Let’s write a utility function to visualize the number of degrees conferred by a specified university over time:

&#10005

DegreeCountHistogram[school_,bin_,opts___]:=DateHistogram[DeleteMissing[EntityValue[EntityList[EntityClass["MGPDegree","SchoolEntity"->school]],"Date"]],
bin,opts]

Look up the University of Chicago entity of the "University" type in the Wolfram Knowledgebase:

&#10005

Interpreter["University"]["university of chicago"]

Show the number of degrees awarded by the University of Chicago, binned by decade:

&#10005

DegreeCountHistogram[Entity["University", "UniversityOfChicago::726rv"],"Decades"]

... and by year:

&#10005

DegreeCountHistogram[Entity["University", "UniversityOfChicago::726rv"],"Years",DateTicksFormat->"Year"]

Now look at the national distribution of degrees awarded. Begin by again examining the structure of the data. In particular, there exist PhD theses with no institution specified in "SchoolEntity" but a country specified in "SchoolLocation":

&#10005

TextGrid[Take[Cases[phds=EntityValue["MGPDegree",{"Entity","DegreeType","SchoolEntity","SchoolLocation"}],{_,"Ph.D.",_Missing,_List}],5],Dividers->All]

There also exist theses with more than a single country specified in "SchoolLocation":

&#10005

TextGrid[Cases[phds,{_,"Ph.D.",_Missing,_List?(Length[#]!=1&)}],Dividers->All]

Tally the countries (excluding the pair of multiples):

&#10005

TextGrid[Take[countrytallies=Reverse@SortBy[Tally[Cases[phds,{_,"Ph.D.",_,{c_Entity}}:>c]],Last],UpTo[10]],Alignment->{{Left,Decimal}},Dividers->All]

A total of 117 countries are represented:

&#10005

Length[countrytallies]

Download flag images for these countries from the Wolfram Knowledgebase:

&#10005

Take[flagdata=Transpose[{EntityValue[countrytallies[[All,1]],"Flag"],countrytallies[[All,2]]}],5]

Create an image collage of flags, with the flags sized according to the number of math PhDs:

&#10005

ImageCollage[Take[flagdata,40],ImagePadding->3]

As another example, we can explore degrees awarded by a specific university. For example, extract mathematics degrees that have been awarded at the University of Miami since 2010:

&#10005

Length[umiamidegrees=EntityList[
EntityClass["MGPDegree",{
"SchoolEntity"->Entity["University", "UniversityOfMiami::9c2k9"],
"Date"-> GreaterEqualThan[DateObject[{2010}]]}
]]]

Create a timeline visualization:

&#10005

TimelinePlot[Association/@Rule@@@EntityValue[umiamidegrees,{"Advisee","Date"}],ImageSize->Large]

Now consider recent US mathematics degrees. Select the theses written at US institutions since 2000:

&#10005

Length[USPhDs=Cases[Transpose[{
EntityList["MGPDegree"],
EntityValue["MGPDegree","SchoolLocation"],
EntityValue["MGPDegree","Date"]
}],
{
th_,
loc_?(ContainsExactly[{Entity["Country", "UnitedStates"]}]),DateObject[{y_?(GreaterEqualThan[2000])},___]
}:>th
]]

Make a table showing the top US schools by PhDs conferred:

&#10005

TextGrid[Take[schools=Reverse[SortBy[Tally[Flatten[EntityValue[USPhDs,"SchoolEntity"]]],Last]],12],Alignment->{{Left,Decimal}},Dividers->All]

Map schools to their geographic positions:

&#10005

geopositions=Rule@@@DeleteMissing[Transpose[{EntityValue[schools[[All,1]],"Position"],schools[[All,2]]}],1,2];

Visualize the geographic distribution of US PhDs :

&#10005

GeoBubbleChart[geopositions,GeoRange->Entity["Country", "UnitedStates"]]

Show mathematician thesis production as a smooth kernel histogram over the US:

&#10005

GeoSmoothHistogram[Flatten[Table[#1,{#2}]&@@@geopositions],"Oversmooth",GeoRange->GeoVariant[Entity["Country", "UnitedStates"],Automatic]]

Explore Thesis Titles

We now make some explorations of the titles of mathematical theses.

To begin, extract theses authored by people with the surname “Smith”:

&#10005

Length[smiths=EntityList[EntityClass["MGPPerson","Surname"->"Smith"]]]

Create a WordCloud of words in the titles:

&#10005

WordCloud[DeleteStopwords[StringRiffle[EntityValue[DeleteMissing[Flatten[EntityValue[smiths,"Degrees"]]],"ThesisTitle"]]]]

Now explore the titles of all theses (not just those written by Smiths) by extracting thesis titles and dates:

&#10005

tt=DeleteMissing[EntityValue["MGPDegree",{"Date","ThesisTitle"}],1,2];

The average string length of a thesis is remarkably constant over time:

&#10005

DateListPlot[{#[[1,1]],Round[Mean[StringLength[#[[All,-1]]]]]}&/@SplitBy[Sort[tt],First],
PlotRange->{DateObject[{#}]&/@{1850,2010},All}]

The longest thesis title on record is this giant:

&#10005

SortBy[tt,StringLength[#[[2]]]&]//Last

Motivated by this, extract explicit fragments appearing in titles:

&#10005

tex=Cases[ImportString[#,"TeX"]&/@Flatten[DeleteCases[StringCases[#2,Shortest["$"~~___~~"$"]]&@@@tt,{}]],Cell[_,"InlineFormula",___],∞]//Quiet;

... and display them in a word cloud:

&#10005

WordCloud[DisplayForm/@tex]

Extract types of topological spaces mentioned in thesis titles and display them in a ranked table:

&#10005

TextGrid[{StringTrim[#1],#2}&@@@Take[Select[Reverse[SortBy[Tally[Flatten[DeleteCases[StringCases[#2,Shortest[" ",((LetterCharacter|"_")..)~~(" space"|"Space ")]]&@@@tt,{}]]],Last]],
Not[StringMatchQ[#[[1]],(" of " | " in " |" and "|" the " | " on ")~~__]]&],12],Dividers->All,Alignment->{{Left,Decimal}}]

Explore Mathematical Subjects

Get all available Mathematics Subject Classification (MSC) category descriptions for mathematics degrees conferred by the University of Oxford and construct a word cloud from them:

&#10005

WordCloud[DeleteMissing[EntityValue[EntityList[EntityClass["MGPDegree","SchoolEntity"->Entity["University", "UniversityOfOxford::646mq"]]],"MSCDescription"]],ImageSize->Large]

Explore the MSC distribution of recent theses. To begin, Iconize a list to use that holds MSC category names that will be used in subsequent examples:

&#10005

mscnames=List;

Extract degrees awarded since 2010:

&#10005

Length[degrees2010andlater=Cases[Transpose[{EntityList["MGPDegree"],EntityValue["MGPDegree","Date" ]}],{th_,DateObject[{y_?(GreaterEqualThan[2010])},___]}:>th]]

Extract the corresponding MSC numbers:

&#10005

degreeMSCs=DeleteMissing[EntityValue[degrees2010andlater,"MSCNumber"]];

Make a pie chart showing the distribution of MSC category names and numbers:

Pie chart
Pie chart labels
&#10005

With[{counts=Sort[Counts[degreeMSCs],Greater][[;;20]]},PieChart[Values[counts],ChartLegends->(Row[{#1,": ",#2," (",#3,")"}]&@@@(Flatten/@Partition[Riffle[Keys@counts,Partition[Riffle[(Keys@counts/.mscnames),ToString/@Values@counts],2]],2])),ChartLabels->Placed[Keys@counts,"RadialCallout"],ChartStyle->24,ImageSize->Large]]

Extract the MSC numbers for theses since 1990 and tally the combinations of {year, MSC}:

&#10005

msctallies=Tally[Sort[Cases[DeleteMissing[EntityValue["MGPDegree",{"Date","MSCNumber"}],1,2],
{DateObject[{y_?(GreaterEqualThan[1990])},___],msc_}:>{y,msc}]]]

Plot the distribution of MSC numbers (mouse over the graph in the attached notebook to see MSC descriptions):

&#10005

Graphics3D[With[{y=#[[1]],msc=ToExpression[#[[2]]],off=1/3},Tooltip[Cuboid[{msc-off,y-off,0},{msc+off,y+off,#2}],
#[[2]]/.mscnames]]&@@@msctallies,BoxRatios->{1,1,0.5},Axes->True,
AxesLabel->{"MSC","year","thesis count"},Ticks->{None,Automatic,Automatic}]

Most students do research in the same area as their advisors. Investigate systematic transitions from MSC classifications of advisors’ works to those of their students. First, write a utility function to create a list of MSC numbers for an advisor’s degrees and those of each advisee:

&#10005

msctransition[person_]:=Module[{msc="MSCNumber",d="Degrees",advisormsc,adviseemscs,dm=DeleteMissing},
advisormsc=#[msc]&/@person[d];
adviseemscs=#[msc]&/@Flatten[#[d]&/@dm[Flatten[{person["Advised"]}]]];
dm[{advisormsc,{#}}&/@DeleteCases[adviseemscs,Alternatives@@advisormsc],1,2]]

For example, for Maurice Fréchet:

&#10005

TextGrid[msctransition[Entity["MGPPerson", "17947"]]/.mscnames,Dividers->All]

Find MSC transitions for degree dates after 1988:

&#10005

transitiondata=msctransition/@Select[DeleteMissing[
EntityValue["MGPPerson",{"Entity","DegreeDates"}],1,2],Min[#["Year"]&/@#[[2]]]>1988&][[All,1]];

&#10005

transitiondataaccumulated=Tally[Flatten[Apply[Function[{a,b},Outer[DirectedEdge,a,b]],
Flatten[Take[transitiondata,All],1],{1}],2]]/.mscnames;

&#10005

toptransitions=Select[transitiondataaccumulated,Last[#]>10&]/.mscnames;

&#10005

Grid[Reverse[Take[SortBy[transitiondataaccumulated,Last],-10]],Dividers->Center,Alignment->Left]

&#10005

msctransitiongraph=Graph[First/@toptransitions,EdgeLabels->Placed["Name",Tooltip],VertexLabels->Placed["Name",Tooltip],GraphLayout->"HighDimensionalEmbedding"];

&#10005

With[{max=Max[Last/@toptransitions]},
HighlightGraph[msctransitiongraph,Style[#1,Directive[Arrowheads[0.05(#2/max)^.5],ColorData["DarkRainbow"][(#2/max)^6.],Opacity[(#2/max)^.5],Thickness[0.005(#2/max)^.5]]]&@@@transitiondataaccumulated]]

Explore Advisors

Construct a list of directed edges from advisors to their students:

&#10005

Length[advisorPairs=Flatten[Function[{a,as},DirectedEdge[a,#]&/@as]@@@DeleteMissing[EntityValue["MGPPerson",{"Entity","Advised"}],1,2]]]

Some edges are duplicated because the same student-advisor relationship exists for more than one degree:

&#10005

SelectFirst[Split[Sort[advisorPairs]],Length[#]>1&]

For example:

&#10005

(EntityValue[Entity["MGPPerson", "110698"],{"AdvisedBy","Degrees"}]/.e:Entity["MGPDegree",_]:>{e,e["DegreeType"]})

So build an explicit advisor graph by uniting the {advisor, advisee} pairs:

&#10005

advisorGraph=Graph[Union[advisorPairs],GraphLayout->None]

The advisor graph contains more than 3,500 weakly connected components:

&#10005

Length[graphComponents=WeaklyConnectedGraphComponents[advisorGraph]]

Visualize component sizes on a log-log plot:

&#10005

ListLogLogPlot[VertexCount/@graphComponents,Joined->True,Mesh->All,PlotRange->All]

Find the size of the giant component (about 190,000 people):

&#10005

VertexCount[graphComponents[[1]]]

Find the graph center of the second-largest component:

&#10005

GraphCenter[UndirectedGraph[graphComponents[[2]]]]

Visualize the entire second-largest component:

&#10005

Graph[graphComponents[[2]],VertexLabels->"Name",ImageSize->Large]

Identify the component in which David Hilbert resides:

&#10005

FirstPosition[VertexList/@graphComponents,Entity["MGPPerson", "7298"]][[1]]

Show Hilbert’s students:

&#10005

With[{center=Entity["MGPPerson", "7298"]},HighlightGraph[Graph[Thread[center->AdjacencyList[graphComponents[[1]],center]],VertexLabels->"Name",ImageSize->Large],center]]

As it turns out, the mathematician Gaston Darboux plays an even more central role in the advisor graph. Here is some detailed information about Darboux, whose 1886 thesis was titled “Sur les surfaces orthogonales”:

&#10005

Entity["MGPPerson", "34254"] ["PropertyAssociation"]

And here is a picture of Darboux:

&#10005

Show[WikipediaData["Gaston Darboux","ImageList"]//Last,ImageSize->Small]

Many mathematical constructs are named after Darboux:

&#10005

Select[EntityValue["MathWorld","Entities"],StringMatchQ[#[[2]],"*Darboux*"]&]

... and his name can even be used in adjectival form:

&#10005

StringCases[Normal[WebSearch["Darbouxian *",Method -> "Google"][All,"Snippet"]], "Darbouxian"~~" " ~~(LetterCharacter ..)~~" " ~~(LetterCharacter ..)]//Flatten//DeleteDuplicates // Column

Many well-known mathematicians are in the subtree starting at Darboux. In particular, in the directed advisor graph we find a number of recent Fields Medal winners. Along the way, we also see many well-known mathematicians such as Laurent Schwartz, Alexander Grothendieck and Antoni Zygmund:

&#10005

{path1,path2,path3,path4}=(DirectedEdge@@@Partition[FindShortestPath[graphComponents[[1]],Entity["MGPPerson", "34254"],#],2,1])&/@
{Entity["MGPPerson", "13140"],Entity["MGPPerson", "22738"],Entity["MGPPerson", "43967"],Entity["MGPPerson", "56307"]}

Using the data from the EntityStore, we build the complete subgraph starting at Darboux:

&#10005

adviseeedges[pList_]:=Flatten[Function[p,DirectedEdge[Last[p],#]&/@
DeleteMissing[Flatten[{Last[p][advised]}]]]/@pList]

&#10005

advgenerations=Rest[NestList[adviseeedges,{Null->Entity["MGPPerson", "34254"]},7]];

&#10005

alladv=Flatten[advgenerations];

It contains more than 14,500 mathematicians:

&#10005

Length[Union[Cases[alladv,_Entity,∞]]]-1

Because it is a complicated graph, we display it in 3D to avoid overcrowded zones. Darboux sits approximately in the center:

&#10005

gr3d=Graph3D[alladv,GraphLayout->"SpringElectricalEmbedding"]

We now look at the degree centrality of the nodes of this graph in a log-log plot:

&#10005

ListLogLogPlot[Tally[DegreeCentrality[gr3d]]]

Let’s now highlight the path to that plot for Fields Medal winners:

&#10005

style[path_,color_]:=Style[#,color,Thickness[0.004]]&/@path

&#10005

HighlightGraph[gr3d,
Join[{Style[Entity["MGPPerson", "34254"],Orange,PointSize[Large]]},
style[path1,Darker[Red]],style[path2,Darker[Yellow]],style[path3,Purple],
style[path4,Darker[Green]]]]

Geographically, Darboux’s descendents are distributed around the whole world:

&#10005

makeGeoPath[e1_e2_] :=
With[{s1=e1["DegreeSchoolEntities"],s2=e2["DegreeSchoolEntities"],d1=e1["DegreeDates"],d2=e2["DegreeDates"],color=ColorData["DarkRainbow"][(Mean[{#1[[1,1,1]],#2[[1,1,1]]}]-1870)/150]&},
If[MemberQ[{s1,s2,d1,d2},_Missing,∞]||s1===s2,{},{Thickness[0.001],color[d1,d2],Arrowheads[0.012],Tooltip[Arrow[GeoPath[{s1[[1]],s2[[1]]}]],
Grid[{{"","advisor","advisee"},{"name",e1,e2},Column/@{{"school"},s1,s2},
Column/@{{"degree date"},d1,d2}},Dividers->Center]]}]]

Here are the paths from the advisors’ schools to the advisees’ schools after four and six generations:

&#10005

GeoGraphics[makeGeoPath/@Flatten[Take[advgenerations,4]],GeoBackground->"StreetMapNoLabels",GeoRange->"World"]//Quiet

&#10005

GeoGraphics[makeGeoPath /@ Flatten[Take[advgenerations, 6]],
  GeoBackground -> "StreetMapNoLabels", GeoRange -> "World"] // Quiet

Distribution of Intervals between the Date at Which an Advisor Received a PhD and the Date at Which That Advisor's First Student's PhD Was Awarded

Extract a list of advisors and the dates at which their advisees received their PhDs:

&#10005

Take[AdvisorsAndStudentPhDDates=SplitBy[Sort[Flatten[Thread/@Cases[EntityValue["MGPDegree",{"Advisors","DegreeType","Date"}],{l_List,"Ph.D.",DateObject[{y_},___]}:>{l,y}],1]],First],5]

This list includes multiple student PhD dates for each advisor, so select the dates of the first students’ PhDs only:

&#10005

Take[AdvisorsAndFirstStudentPhDDates=DeleteCases[{#[[1,1]],Min[DeleteMissing[#[[All,2]]]]}&/@AdvisorsAndStudentPhDDates,{_,Infinity}],10]

Now extract a list of PhD awardees and the dates of their PhDs:

&#10005

Take[PhDAndDates=DeleteCases[Sort[Cases[EntityValue["MGPDegree",{"Advisee","DegreeType","Date"}],{p_,"Ph.D.",DateObject[{y_},___]}:>{p,y}]],{_Missing,_}],10]

Note that some advisors have more than one PhD:

&#10005

Select[SplitBy[PhDAndDates,First],Length[#]>1&]//Take[#,5]&//Column

For example:

&#10005

Entity["MGPPerson", "100896"]["Degrees"]

... who has these two PhDs:

&#10005

EntityValue[%,{"Date","DegreeType","SchoolName"}]

While having two PhDs is not unheard of, having three is unique:

&#10005

Tally[Length/@SplitBy[PhDAndDates,First]]

In particular:

&#10005

Select[SplitBy[PhDAndDates,First],Length[#]===3&]

Select the first PhDs of advisees and make a set of replacement rules to their first PhD dates:

&#10005

Take[FirstPhDDateRules=Association[Thread[Rule@@@SplitBy[PhDAndDates,First][[All,1]]]],5]

Now replace advisors by their first PhD years and subtract from the year of their first students’ PhDs:

&#10005

Take[times=-Subtract@@@(AdvisorsAndFirstStudentPhDDates/.FirstPhDDateRules),10]

The data contains a small number of discrepancies where students allegedly received their PhDs prior to their advisors:

&#10005

SortBy[Select[Transpose[{AdvisorsAndFirstStudentPhDDates[[All,1]],AdvisorsAndFirstStudentPhDDates/.FirstPhDDateRules}],GreaterEqual@@#[[2]]&],-Subtract@@#[[2]]&]//Take[#,10]&

Removing these problematic points and plotting a histogram reveals the distribution of years between advisors’ and first advisees’ PhDs:

&#10005

Histogram[Cases[times,_?Positive]]

We hope you have found this computational exploration of mathematical genealogy of interest. We thank Mitch Keller and the Mathematics Genealogy Project for their work compiling and maintaining this fascinating and important dataset, as well as for allowing us the opportunity to explore it using the Wolfram Language. We hope to be able to freely expose a Wolfram Data Repository version of the MGP dataset in the near future so that others may do the same.

Leave a Comment

8 Comments


Diana

In[3]:= EntityValue["MGPPerson","EntityCount"]
Out[3]= Missing[UnknownType,MGPPerson]

Posted by Diana    August 3, 2018 at 4:39 pm
    Wolfram Blog

    Hi Diana. That doesn’t work because we haven’t made the data available in the Wolfram Data Repository yet. Once we do, one can use ResourceSearch to find the ResourceObject and import it into a Wolfram Language session.

    Posted by Wolfram Blog    August 7, 2018 at 8:40 am
Kurt Shatov

Great blog!
Two questions: a) Have you tried to do something similar for other disciplines (https://academictree.org) and compare various quantitative characteristics of the resulting trees (e.g. vertex degree distributions)?
b) What’s the distribution underlying the last histogram? It is a lognormal distribution (as one might conjecture from arxiv 1607.02952)?

Posted by Kurt Shatov    August 5, 2018 at 10:05 am
    Wolfram Blog

    Thanks for your feedback, Kurt!

    As for your first question, we have not looked at other disciplines. https://academictree.org/ looks fascinating. We’ll look into getting that into the Wolfram Data Repository.

    And for your second question, indeed, the distribution is well-approximated by a log-normal

    distribution as follows:

    In[159]:= hist = Histogram[data]

    In[170]:= fit =
    FindDistributionParameters[data,
    LogNormalDistribution[\[Mu], \[Sigma]]]
    Out[170]= {\[Mu] -> 2.35869, \[Sigma] -> 0.605468}

    In[172]:= Show[{Histogram[data, {1}, "PDF"],
    Plot[PDF[LogNormalDistribution[\[Mu], \[Sigma]], t] /. fit, {t, 0, 40}]}]

    Posted by Wolfram Blog    August 7, 2018 at 8:39 am
Barrie Stokes

Another tour de force blog from the Wofram staff!
A common question:is there, will there be, a Notebook of this blog? It’s always instructive to take a great Notebook and play with it.

Thanks Aaronm, thanks Eric.

Posted by Barrie Stokes    August 5, 2018 at 9:51 pm
    Wolfram Blog

    We hope to make a NB of this blog post once we are able to publish the MGP EntityStore in the Wolfram Data Repository, from which it will then be available to all.

    Posted by Wolfram Blog    August 6, 2018 at 2:33 pm
Mpsc Ganit

What are the reasons behind MGP data becomes more accessible and easily computable by entity framework?

Posted by Mpsc Ganit    August 6, 2018 at 5:12 am
    Wolfram Blog

    As an EntityStore, the MGP data becomes computable in the Wolfram Language, allowing us to do the analysis you see in the blog posting. We put the data into an EntityStore because a) the data seemed well suited for it and b) we wanted to show off what an EntityStore can do.

    Posted by Wolfram Blog    August 6, 2018 at 2:28 pm


Leave a comment in reply to Kurt Shatov

Loading...

Or continue as a guest (your comment will be held for moderation):