The Ultimate Univariate Probability Distribution Explorer
In this blog post, we want to report some work in progress that might interest users of probability and statistics and also those who wonder how we add new knowledge every day to Wolfram|Alpha.
Since the beginning in 1988, Mathematica knew not only elementary functions (sqrt, exp, log, etc.) but many special functions of mathematical physics (such as the Bessel function K and the Riemann Zeta function) and number theoretical functions. All together, Mathematica knows now more than 300 such functions. The Wolfram Functions Site lists 300,000+ formulas and identities for these functions. And, based on Mathematica‘s algorithmic computation capabilities and the Functions Site’s identities, most of this knowledge is now easily accessible in Wolfram|Alpha. For example, relation between sin(x) and cos(x), series representations of the Beta function, relation between BesselJ(n, x) and AiryAi(x), differential equation for ellipticF(phi, m), and examples of complicated indefinite integrals containing erf.
But Wolfram|Alpha also knows about many special functions that are not in Mathematica because they are less common or less general. For instance, haversine(x), double factorial binomial(2n, n), Dickman rho(10/3), BesselPolynomialY[6, x], Conway’s base 13 function(4003/371293), and Goldbach function(1000).
Mathematica 7 knew 42 probability distributions; Mathematica 9 knows over 130 (parametric) probability distributions. Based on Mathematica, Wolfram|Alpha can answer a lot of queries about these distributions, such as characteristic function of the hyperbolic distribution or variance of the binomial distribution with p = 1/3, and give general overview pages for queries such as Student’s t distribution or Gumbel distribution.
Similar to special functions, there also exists a whole zoo of probability distributions due to historical reasons and their use in specialized application areas. Some distributions are very specialized, and some are pretty general and have multiple parameters (multiple parameters make a distribution more flexible and versatile, but typically make calculations with them more difficult). In particular, the very specialized ones are not well suited for Mathematica because of their narrow scope. The goal of Wolfram|Alpha is to answer quantitative questions of any form. This, of course, includes probability distributions. This means that in Wolfram|Alpha, which has a broader scope than Mathematica, we do want to support as many probability distributions as possible. It turned out that there are quite a few univariate probability distributions in use. Many of these probability distributions are defined through their probability density function (PDF), which defines the probability of the occurrences of the possible events. But in some application areas it is more natural to start with the inverse cumulative density function (CDF) or a hazard function. Yet, depending on the task at hand, one often needs a PDF, a CDF, various moments, generating functions, entropies, hazard functions, and so on, for a given distribution. And it is often either difficult or even not possible to easily calculate these properties and locate them in the scholarly literature.
So, some time ago, we set out to collect and calculate all that can be collected and calculated about these univariate distributions. We carried out a detailed literature search, and found more than 500 different parametric univariate probability distributions.
There are a few different types of univariate probability distributions. The main classification typically used is discrete versus continuous distributions. One can subdivide these further by their support. Some have finite support, some have half-infinite support, and some are defined over the whole real axis (or at all integers for discrete distributions, respectively). Here is the complete list of all probability distributions that we included in our collection after searching the literature:
Continuous distributions defined on the double infinite interval (-∞, ∞):
Continuous distributions defined on half-infinite interval (a, ∞):
Continuous distributions defined on a finite interval (a, b):
Discrete distributions defined on the double infinite interval (-∞, ∞):
Discrete distributions defined on half-infinite interval [a, ∞):
Discrete distributions defined on a finite interval [a, b]:
Various of these distributions occur with a varying number of parameters, adding another few dozen distributions to this list. Some of these above distributions, such as the Bernoulli and Poisson distributions, are well known and some are not so well known, but a quick web search points to some papers about them (e.g. the McCullagh distribution or the Wallenius distribution).
Mathematically, univariate probability distributions are just univariate functions with some supplementary conditions, such as having a monotonic increasing CDF and normalizability. In this sense they are really quite similar to the special functions of mathematical physics. Although univariate probability distributions are in most cases having a PDF made from elementary functions, their CDF and other properties quite frequently contain higher special functions. To avoid propagating potential typos and mistakes and not depend on certain conventions, we just extracted the defining properties (mostly the PDF) of the distributions from the literature and derived all relevant properties from scratch, carefully taking into account convergence conditions for integrals and sums. Calculating properties such as moments, generating functions, and entropies results often in quite challenging integrals and sums. As much as possible, we expressed these through named mathematical functions.
Concretely, the properties we calculated (on the base of the defining formulas) for each probability distribution were:
These are about 60 properties. Multiplied by 500 probability distributions, this resulted in more than 30,000 formulas; 31,591, to be precise. (Not every property applies to every distribution; for example, the Cauchy distribution does not have higher moments due to its fat tail, and some properties cannot be expressed in closed form through named mathematical functions.) A typical mathematics handbook has at most 20 to 25 formulas per page (e.g. integrals that are displayed as multiline objects spanning more than a single line). This means that putting all these formulas into a book would easily give a 1,200-page book (even without including any of the plots of the distributions). A good chunk of mathematical knowledge—just for one of the many ongoing additions to Wolfram|Alpha.
Incorporating these probability distributions and their properties into Wolfram|Alpha will happen in the near future. To give you an idea about the scope and depth of coverage, we thought it might be interesting for Wolfram|Alpha users to have a preview of these formulas and our team’s work to strive for a comprehensive coverage. So, how to present such a large amount of formulas in a comprehensible way? We decided on an interactive “single-page” version of the complete collection, in the form of a result of Mathematica‘s Manipulate function, similar to the format of the interactive documents at the Wolfram Demonstrations Project. This allow us to put all the 30,000+ formulas in a uniform, easily browsable format, and all in initially about 800×900 pixels (less than 1 megapixel). The file can be downloaded here as a Computable Document Format (CDF) file. A quite similar version of this interactive viewer has been used in the development phase of the project. Because this interactive document allows you to explore the properties of univariate probability distributions, and everything is implemented in Mathematica, we call it “The Ultimate Univariate Probability Distribution Explorer.”
Here is what the .cdf document looks like after opening. (We choose the well-known normal distribution and its most important characteristics as the default.)
On the top are the menus to choose the type of distribution and the concrete distribution, and on the left are the checkboxes to specify which properties to display. The initial state shows the well-known properties of the normal distribution. The pulldown menu on the top also allows you to choose the function alphabetically.
Mousing over one of the distribution classes, for instance over the continuous distributions over a half-infinite interval, shows the list of all distributions from that class.
To get some idea of what is included, just press the “random” button. It will randomly select a distribution and some properties. As mentioned, properties are often integrals and sums containing the distribution. And some of these integrals and sums can be quite complicated. As much as possible, the Ultimate Univariate Probability Distribution Explorer returns a closed form for these integrals and sums. Here is a state reached from using the random button that shows this quite clearly (because of the size of the result, we show only part of it).
There is also an “all properties” button. Use it with care so you aren’t scared about how complicated certain properties of even simple distributions can be. Here are some entropies for the lognormal distribution.
One can also compare the PDFs (and CDFs etc.) for a whole class of distributions. For instance, here is part of the result that compares the PDFs for all discrete distributions defined on a finite interval.
Or, for quickly flipping through a lot of pages with the same information, you can use the “flip-through” mode and move the corresponding slider.
In addition to looking up formulas, the Explorer also allows you to view plots of the PDF and CDF (and their derivatives). And the left-hand side in the property specification selects the relevant plots. Here are some plots of the (generalized) Student’s t distribution.
The Univariate Probability Explorer has a variety of other features, such as generating sample random numbers for a given distribution; the possibility to display all results in Mathematica StandardForm so you can easily copy and paste them and carry out computations with the formulas; a list of often-used names for a distribution, including definitions of special functions used in results (such as hypergeometric functions, Beta functions, etc.); and more. The next screen shot shows various generalized moments for the Poisson distribution together with some links for the occurring special functions.
Instead of selecting a concrete probability distribution, you can also select a general (abstract) distribution (using the two right-most “general “buttons) in the “distribution class” menu. In this case, the defining formulas for the selected properties are shown. Here are some selected definitions for a general discrete distribution.
Before coming to the end, we want to mention the “Relations between distribution properties” feature at the bottom of the Explorer. This section serves as a handy set of definitions and connection formulas. These identities follow from the definitions of the properties and hold for any distribution. For instance, for a given CDF of a continuous probability distribution, how do you get the PDF?
(This section adds another 2,500 formulas to the Explorer, which were not included in the above count.) And here is a more complicated example: Given the sparsity, how do you calculate the variance?
This ends our quick overview of the features and identities of the Ultimate Univariate Probability Distribution Explorer—a small, but useful side effect of a formula-collecting initiative for Wolfram|Alpha. In addition to giving users some idea of how we assemble mathematical information in Wolfram|Alpha, we hope readers will find it a handy reference for their work with probability distributions, or a rich source for potential exercises in integration and probability theory homeworks.
Now that we have assembled the data for univariate probability distributions, we will start adding these distributions and their properties to Wolfram|Alpha to make them even easier to use than through the interactive CDF document described here. In Wolfram|Alpha these distributions will become accessible through the natural language interface of Wolfram|Alpha. In the coming months, Wolfram|Alpha will know about many of the distributions. Once again, you can download the Univariate Probability Explorer here.