In fact, `RandomPoint` can be used to uniformly sample from any bounded geometric region, in any dimension. In 2D:

In 3D:

In 10D:

Use the 10D points to estimate the region centroid:

Compare to the numerical value of the exact coordinates:

`RandomPoint` aims to enable sampling from all the geometric regions supported in the Wolfram Language, including basic geometric regions, mesh regions, and formula and derived regions.

For example, you can use `RandomPoint` to mark uniformly distributed locations on a map of Africa:

Random points can be used to approximate geometric quantities. For instance, to estimate the maximum distance between two points in a regular pentagon inscribed in the unit circle, find the maximum distance between 1,000 pairs of random points on the boundary of a pentagon:

Or estimate the area of the symmetric difference using `RegionSymmetricDistance` (or any Boolean combination) of region sets:

Visualize the point cloud:

Build the `Nearest` function to quickly test if a point is within a given distance *r* from the point cloud:

Use the Monte Carlo method to estimate the area of the underlying region ℛ_{*} from the set of sample points pts. This is done by sampling *n* points from the bounding rectangular region and counting the fraction of points that are within the range of *r* from the point cloud.

Accumulate the estimate statistics:

Estimate the distribution density:

Compare the estimated value with the exact numerical value:

I will conclude with a one-liner, a `RandomPoint`-based, Styrofoam-style visualization of the 8₃ knot:

`RandomPoint` is part of both the geometric computation and the probability and statistics capabilities in the Wolfram Language. `RandomPoint` was first introduced in Version 10.2 of the Wolfram Language and has been extended to cover new methods with the release of 10.3

Download this post as a Computable Document Format (CDF) file.

]]>Jacob Bernoulli was the first mathematician in the Bernoulli family, which produced many notable mathematicians of the seventeenth and eighteenth centuries.

Jacob Bernoulli’s mathematical legacy is rich. He introduced Bernoulli numbers, solved the Bernoulli differential equation, studied the Bernoulli trials process, proved the Bernoulli inequality, discovered the number ** e**, and demonstrated the weak law of large numbers (Bernoulli’s theorem).

Bernoulli’s treatise *Ars Conjectandi* (i.e. *The Art of Conjecturing*) was posthumously published in 1713, eight years after his demise, and was written in Latin, science’s *lingua franca* of the time. It is considered a seminal work of mathematical probability. Its importance is witnessed, in part, by its translations to French by G. Le Roy in 1801, and, recently, to English by E. D. Sylla in 2005.

*The Art of Conjecturing* comprises four parts. The first part reproduces Christiaan Huygens’ *De Ratiociniis in Ludo Aleae* (*On Reasoning in Games of Chance*), with extensive commentary from Bernoulli and detailed solutions of Huygens’ five problems, posed at the end of Huygens’ work with answers, but without derivations. In the first part, Bernoulli also derives the probability that at least *m* successes will occur in *n* independent trials with success probability of *p*:

The second part, “The Doctrine of Permutations and Combinations,” is devoted to combinatorics and to the study of figurate numbers, i.e. numbers that can be represented by a regular geometrical arrangement of equally spaced points:

It is here that Bernoulli introduces Bernoulli numbers. He starts by noting the identity among binomial coefficients , namely that .

Bernoulli knew that for a fixed *m*, binomial coefficient is a polynomial in *n*, namely

. This identity allows him to solve for . He gives a table of results for 0≤m≤10.

. This identity allows him to solve for . He gives a table of results for 0≤m≤10.

To reproduce Bernoulli’s table, define a function to construct equations for the sum of powers:

Solving for the sum of powers:

Bernoulli writes, “[*W*]*hoever has examined attentively the law of the progression herein can continue the Table further without these digressions for computations*” by making the following educated guess:

He notes that coefficients *B _{r+1}* do not depend on

These coefficients are the celebrated Bernoulli numbers, which have found their way into many areas of mathematics [e.g. see mathoverflow.net].

In the second part of his book, Bernoulli counts the number of permutations, the number of permutations in sets with repetitions, the number of choosing objects from a set, etc., which he later applies to compute probabilities as the ratio of the number of configurations of interest to the total number of configurations.

In part three, Bernoulli applies results from the first two chapters to solve 24 problems related to games of chance. A recurring theme in these problems is a sequence of independent 0 or 1 outcomes, which bears the name of Bernoulli trial, or Bernoulli process. I thought Jacob Bernoulli’s birthday anniversary to be an apt occasion to explore his problems with *Mathematica*.

For example, problem 9 asks to find the expected payout in a three-player game. Players alternately draw cards without replacement from a pack of twenty cards, of which ten are face cards. When the cards are exhausted, winnings are divided among all those who hold the highest number of face cards.

With c1, c2, and c3 denoting the number of face cards each player has, the payout of the first player is:

After the pack of twenty has been distributed, the first and the second players each receive seven cards, but the third one only receives six. The tally vector of face cards received by each player follows `MultivariateHypergeometricDistribution`:

This and other problems are stated and solved in the accompanying *Mathematica* notebook.

The concluding part four of *Ars Conjectandi* discusses uses of probability in civil, moral, and economic matters. Here Bernoulli argues that the probability reflects our incomplete knowledge of the state of the world, and unlike in a game of chance, where probabilities can be determined by finding the proportion that configurations of interest take in the whole set of possible configurations, the probabilities here cannot be a priori established. Bernoulli argues that these unknown probabilities can be inferred from past outcomes.

He proves the weak law of large numbers, asserting that the observed frequency of successes in *n* independent trials where the probability of success equals *p* will converge to *p* as the number of trials grows. Thus, you can estimate *p* arbitrarily accurately by running a sufficient number of trials. Specifically, for any *δ* and *ε*, there exists a large enough sample size *n* that:

The demonstration “Simulated Coin Tossing Experiments and the Law of Large Numbers” by Ian McLeod, among others, explores this convergence.

Download this post as a Computable Document Format (CDF).

Download Bernoulli problems as a *Mathematica* notebook.

It’s short enough to reproduce in its entirety: “Find the mathematical expectation of the area of the projection of a cube with edge of length 1 onto a plane with an isotropically distributed random direction of projection.” In other words, what is the average area of a cube’s shadow over all possible orientations?

This blog post explores the use of *Mathematica* to understand and ultimately solve the problem. It recreates how I approached the problem.

Before tackling the case of the cuboid, I started with a unit square, randomly rotated around its center of mass, with the intent to find the average length of its projection on a horizontal axis.

For the sake of simplicity, I placed the center of the mass of the square at the origin.

I found the left and right boundaries of the projection as the smallest and largest *x* coordinates of the vertices of the square, rotated around the origin through an angle of *α* degrees:

I then combined these functions within the `Manipulate` to be able to dynamically change the angle of rotation:

Here’s the plot of the length of the projection as the function of the rotation angle:

Assuming the rotation angle *α* is uniformly distributed in the interval 0 ≤ *α* < 360, I readily found the expected length of the projection, that is, the average length of the square’s shadow:

That was easy enough, but in order to take the victory to 3D, I needed to change the point of view. Instead of rotating the square, I rotated the line we project on.

Perspective of the wire frame:

I found this change of perspective illuminating, as it made me think that the length of the projection is the sum of the lengths of the projections of the two sides of the square visible from the plane.

The length of the projection of a segment of length ℓ with unit normal vector *n*_{1} onto a line with unit normal vector *n*_{2} equals ℓ Dot[*n*_{1},*n*_{2}].

The projection of each side of the square only contributes if Dot[*n*_{1},*n*_{2}] is positive (i.e. the side is visible); otherwise it is hidden behind other sides. The length of the shadow is thus the sum of the contributions of the east, west, top, and bottom sides of the square:

Thus the length of the projection is the sum of the absolute values of coordinate components of the normal vector *nvec*. I implemented this way of computing the length of the shadow in a function:

And, of course, it agrees with the earlier way of computing the projection length:

Naturally, the expectation is the same:

Guided by the insight gained by considering the square, I adopted the reference frame of the cuboid, whose center of mass is situated at the origin. The cuboid casts its shadow on a plane whose orientation is parametrized by its perpendicular (i.e. normal) vector *nvec*.

I started by straightforwardly building projections of each face of the cuboid and drawing them together. First I defined this helper function to project a vector *xvec* onto a plane with normal *nvec*:

The following function gives vertex coordinates of a face in the (*i*,*j*) plane with the other coordinate being *z*_{0}. Here *i* and *j* range over {1,2,3}, which designates the *x*-, *y*-, and *z*- directions, and *z*_{0} ranges over {-1/2,1/2}:

I then defined a function to project each face onto a plane with normal vector *nvec* and produce the corresponding 3D polygon:

The area of a polygon that happens to be a quadrangle projected onto a plane with normal *nvec* is computed as the sum of the areas of two triangles that the quadrangle is split into by a diagonal:

With these utilities in place, I was ready to visualize the projection of the cuboid with edge of length one, whose center of mass is situated at the origin and whose sides are aligned with the coordinate axis. I parametrized the unit normal vector to the projection plane using spherical coordinates *θ* and *φ*: {sin(*θ*) sin(*φ*),sin(*θ*) cos(*φ*),cos(*θ*)}.

Since at any one time, only three of the cuboid’s faces are visible, and since the area contributed by the invisible faces is the same (just imagine a parallel plane on another side of the cuboid), I sum over all the six faces and divide the result by two. For a particular orientation *nvec*=={-1,2,3}√14 of the plane, the area of the projection therefore is computed as follows:

Applying the insight learned from solving the 2D case, I checked if the area of the projection was again the sum of the absolute values of dot products of the normal vector *nvec* with axis vectors:

Bingo! This makes sense, I thought. Each term corresponds to the area of one of at most three visible faces. Indeed, consider a patch of area *S* on a plane with unit normal vector *n*_{1}. The area of the projection of the patch onto another plane with unit normal vector *n*_{2} equals *S* Abs[Dot[*n*_{1},*n*_{2}]].

Here is the spherical plot of the area of the shadow cast by the cuboid onto a plane with unit normal vector {sin(*θ*) sin(*φ*),sin(*θ*) cos(*φ*),cos(*θ*)} as the function of Euler’s angles *θ* and *φ*:

The minimum of the area function corresponds to the area of one face, which equals 1, and the maximum of √3 corresponds to the projection on the plane, whose normal vector aligns with the cuboid’s diagonal:

I was then almost ready to tackle the expected value of the area. For the normal unit vector {*n*_{x},*n*_{y},*n*_{z}}, the expected projection area is:

The surface area of an infinitesimal patch (*θ*,*θ*+ ⅆ*θ*)×(*φ*,*φ*+ⅆ*φ*) of the unit sphere is well known to be sin(*θ*) ⅆ*θ* ⅆ*φ*. Dividing the infinitesimal area by the total surface area of the unit sphere, I obtain the infinitesimal probability measure, corresponding to the uniform distribution on the unit sphere: ⅆ *S*(*θ*,*φ*)==1/(4 π)sin(*θ*) ⅆ*θ* ⅆ*φ*.

And, finally, the expected projection area equals:

Of course, I could simplify the computation, using the symmetry argument:

It is a well-known fact (also see this relevant question on math.stackexchange.com) that each individual component of a random vector, uniformly distributed on the unit sphere, follows a uniform distribution on the interval (-1,1). With this insight, the answer can be worked out in one’s head, explaining why this problem was deemed by Arnol’d a “trivium”:

The insight I gained allowed me to easily construct the answer for the case of the *d*-dimensional hypercube, projected on the randomly oriented hyperplane:

I simply needed to know the distribution of a component of the *d*-dimensional random unit vector.

The computations are simple, and are based on hyperspherical coordinates (e.g. see Wikipedia: *n*-sphere). The infinitesimal hyperspherical area also factors as (Sin[*θ*_{1}]^{d-2} ⅆ*θ*_{1})(Sin[*θ*_{2}]^{d-3} ⅆ*θ*_{2})⋯(Sin[*θ*_{d-2}]ⅆ*θ*_{d-2})ⅆ*θ*_{d-1}. Since the *n*_{d}==cos(*θ*_{1}), I needed to find the normalization constant for the quasi-density Sin[*θ*_{1}]^{d-2}:

Therefore the expected area of the projection of *d*-hypercube is:

Alternatively, I could use the closed-form result for the probability density function (PDF) of the distribution from here:

I concluded this rewarding exploration by reproducing the results obtained earlier and tabulating results for higher dimensions:

The average area of the shadow grows boundlessly with space dimension, as the number of faces that contribute to its area also increases:

In the high-dimensional space, the area scales as a square root of the dimension *d*:

This concluded my exploration of Arnol’d's trivium problem with *Mathematica*. The use of *Mathematica* led to many “Aha!” moments, and enabled me to readily answer various “What if…” questions. It is my hope that I was able to convey the excitement of the discovery process largely accelerated with the use of *Mathematica*, and to inspire you to begin explorations of your own.

Download this post as a Computable Document Format (CDF) file.

]]>At the time, the Russian Empire was using the Julian calendar. The 100th anniversary of the celebrated presentation is actually February 5, 2013, in the now used Gregorian calendar.

To perform his analysis, Markov invented what are now known as “Markov chains,” which can be represented as probabilistic state diagrams where the transitions between states are labeled with the probabilities of their occurrences.

Here we repeat the analysis that Markov applied to Pushkin’s text on *Alice’s Adventures in Wonderland*, by Lewis Carroll. To this end, let’s define a function computing frequencies of elements of a list, returning results as rules.

First, extract words from the text, making them lowercase:

Split the text into a sequence of letters:

Then classify them as vowels or consonants:

And compute the frequencies of vowels and consonants in the text:

Therefore, if we treat the text as a random sequence of either a vowel or a consonant, the probability of a vowel turning up is *p* = 0.3866.

Following Markov, let’s look at the frequencies of pairs of consecutive symbols:

Then we find the probabilities of a vowel or a consonant given by what precedes it:

In his paper, Markov observed that the sequence of vowels and consonants agreed much better with the model where the probability of a vowel depended on the preceding characters than with the model where it did not. Moreover, he found that the model where the probability depended on the two preceding characters agreed yet better.

Empowered with *Mathematica*, we can continue this investigation. Markov found that pairs of consecutive vowel-consonants carry more information than the sequence of vowel-consonants itself. One measure of the information stored in the data is the `Entropy`. The greater the entropy, the more information the data contains. Let’s compute it for the sequences of *k*-tuples of vowel-consonants (known as *k*-grams) for different values of *k*.

The plot confirms Markov’s findings. Curiously, it also shows that 25-grams carry little more information that 20-grams.

The probabilistic 2-gram model describing the sequence is now known as a Markov chain process.

The Markov chain process describes the evolution of a probability distribution *π** _{n}* on a state space at step

In the case at hand, the transition matrix is:

Assuming the initial state is a vowel (encoded as 1), the 2-gram model is defined in *Mathematica* as follows:

With it, we can ask about the distribution of the distance between vowels in the text and compare the result with the data:

Thus the Markov model accurately predicts that the average distance between vowels in the text is about 2.586 characters. The distributions of distances predicted by the model also agree well with those actually found in the text:

The text of *Alice in Wonderland* only uses 1,484 unique words:

Repeating the same analysis with words, rather than vowel-consonants, we find that 4-grams carry essential information:

With the 4-gram model, the frequency of {*w*_{1}, *w*_{2}, *w*_{3}, *w*_{4}} encodes the probability of *w*_{4}, given the three most recent words *w*_{1} *w*_{2} *w*_{3}.

We encode transitions {*w*_{1}, *w*_{2}, *w*_{3} } → {*w*_{2}, *w*_{3}, *w*_{4}} in a directed graph, associating each edge with the “Probability” property, giving the conditional transition probability.

Assuming the initial probability vector given by consecutive word pair frequencies defines the discrete Markov chain process:

Now we define a function, assembling a sequence of *k*-grams that resulted from walking the graph into a text.

We can now simulate the resulting Markov chain to generate a random 100-word text:

The graph associated with 4-grams has visibly long linear subgraphs, seen as long threads of vertexes. Words occurring along these threads will always appear in combination in the randomly generated text. It is interesting to examine length of such unchanged sequences.

These long subgraphs can be singled out by removing from the original graph all the vertexes that share more than two edges.

We use `WeaklyConnectedComponents` to extract vertexes of these lines. After sorting them in the order in which they appeared in the text, we recreate the longest six such sequences. Each is a passage from the text (minus punctuation) in which every four-word sequence occurs uniquely in the text:

As you can see, the six sequences are actually quite long. In fact, they are exceptionally long. A median length of such fragments is eight words. One could continue this analysis on other pieces of literature and compare the results, but we’ll leave that for another time.

Finite Markov processes in *Mathematica* can be used to solve a wide range of applied and theoretical problems. There are many examples at the Wolfram Demonstrations Project to help you celebrate 100 years of Markov chains.

Download this post as a Computable Document Format (CDF) file.

]]>I always analyze and explore these problems in *Mathematica*. Being a kernel developer, I see whether *Mathematica* can indeed find a solution. This last issue has challenging problems, and it was particularly gratifying to observe that *Mathematica* could solve them right out of the box. So here are my solutions to three of the paraphrased problems:

**Problem 11457, by M. L. Glasser:**

**Solution in Mathematica:** Relaxing assumptions on

**Problem 11456, by R. Mortini:**

**Solution in Mathematica:**

**Problem 11449, by M. Bataille:**

**Solution in Mathematica:** As the expression is left unchanged by the variable’s rescaling, and as

Hence the minimal value of 9/8 is attained for *a*==*b*==*c*.

Because the expression is left invariant by interchanging any of the variables, the maximum value of 2 is attained for *a*==*b*==*c*/2 or *a*==*c*==*b*/2 or *b*==*c*==*a*/2.

Last week I decided to test this on one particular example. The problem I chose happens to be a classic. In fact, the very first nontrivial computer program ever written—by Ada Lovelace in 1842—was solving the same problem.

The problem is to compute Bernoulli numbers.

Bernoulli numbers have a long history, dating back at least to Jakob Bernoulli’s 1713 *Ars Conjectandi*.

Bernoulli’s specific problem was to find formulas for sums like .

Before Bernoulli, people had just made tables of results for specific *n* and *m*. But in a *Mathematica*-like way, Bernoulli pointed out that there was an algorithm that could automate this.

For any given *n*, the answer is a polynomial in *m*, and the coefficients are constants that Bernoulli showed could be computed by a simple recurrence formula.

It could have been that Bernoulli numbers would be useful only for solving this particular problem. But in fact over the past 300 years they have found their way into a remarkable range of areas of mathematics. They appear in the formula for the Riemann zeta function at even integers. They are in the coefficients in the series expansion for tan(*x*). They appear in the Euler-Maclaurin formula for approximating integrals by sums and in the Stirling series for the gamma function. They even relate to Fermat’s last theorem—which was first proved by Ernst Kummer for “regular primes” characterized by Bernoulli numbers.

In 1713 Bernoulli was rather proud of being able to compute the first ten Bernoulli numbers in “a quarter of an hour”.

Of course, in *Mathematica*, it’s now instantaneous:

But just how far can we go—295 years after Bernoulli, and now with *Mathematica*?

I decided to try scaling up Bernoulli’s computation—by a factor of a million—and computing the 10 millionth Bernoulli number.

Until recently, doing this would have been impractical, even in *Mathematica*. Because *Mathematica* used essentially the same classic recurrence relation for computing Bernoulli numbers that Jakob Bernoulli himself used, and that Ada Lovelace described programming on Charles Babbage’s Analytical Engine.

This algorithm has the feature (already recognized by Lovelace) that it takes about *n*^2 steps to compute the *n*th Bernoulli number. So even if one could compute the 10th Bernoulli number in a millisecond, it’d take several thousand years to compute the 10 millionth Bernoulli number.

But a few years ago I programmed a quite different algorithm into *Mathematica*. Instead of directly computing the Bernoulli numbers using a recurrence relation, I instead used a trick recently suggested by Bernd Kellner: computing Bernoulli numbers by computing the Riemann zeta function.

It’s the integrated nature of *Mathematica* that makes things like this practical. Without *Mathematica*, one has to use the simplest building blocks to make efficient algorithms. But with *Mathematica*, one can take for granted access to efficient very-high-level operations—like computing Riemann zeta functions.

Bernoulli numbers are related to the zeta function by:

The denominator of a Bernoulli number can be computed using a corollary to the von Staudt-Clausen theorem as:

To get the numerator, one then just has to use the relation to the zeta function. The right-hand side is, then, evaluated approximately and multiplied by the already-known denominator of the Bernoulli number. Provided the approximation is done with enough significant digits, the numerator of the Bernoulli number is a mere integer part of the product.

But there’s still a problem: to get all the digits in a Bernoulli number, one has to compute powers of pi to extreme precision.

Of course, *Mathematica* can do that—to billions of digits if necessary.

A week ago, I took our latest development version of *Mathematica*, and I typed `BernoulliB[10^7]`.

And then I waited.

Yesterday—5 days, 23 hours, 51 minutes, and 37 seconds later—I got the result! [Download 24MB bzip2 file]

The denominator is simple; it’s just 9601480183016524970884020224910.

But the numerator is not so simple; in fact it’s 57,675,292 digits long. It took less than 1 gigabyte of memory to compute, and required computing pi to about 66 million digits.

The numerator is negative, it begins with -47845869850733698144899338333210878162030638218660 and ends with 57164275665935124168181176013725629647185402960697.

Its digits seem *almost* random. I counted occurrences of all possible *k*-digit subsequences in the numerator of the 10 millionth Bernoulli number for *k*=1, 2, 3, and 4. The ratio of the standard deviation to the mean stays low, though grows with *k*.

So how can we tell if it’s correct?

Bernoulli numbers have lots of interesting properties. One that’s particularly revealing was discovered by Kummer in 1843, and goes by the name of *p*-adic continuity. In particular, for two integers *n* and *m*, and a prime number *p* such that neither *n* nor *m* is divisible by *p*-1, and a natural number *r*, such that:

the p-adic continuity asserts that:

This property is completely independent of the way we computed the Bernoulli number.

So let’s start checking. Obviously *n*=10^7. I checked the *p*-adic continuity for *p*=43, *r*=3, *m*=59776; *p*=59, *r*=2, *m*=916; *p*=7919, *r*=1, *m*=7484; and *p*=27449, *r*=1, *m*=8928. Every one of these congruences is 0, as it should be.

We’ve successfully found the 10 millionth Bernoulli number.

We’ve done what Ada Lovelace believed should be possible: we’ve mechanized the computation of Bernoulli numbers—so well, in fact, that 295 years after Jakob Bernoulli, it’s taken us only 500 times longer to compute a million-times-as-large Bernoulli number.

And all with a single line of *Mathematica* input.

So how come inside *Mathematica* there are thousands of pages of code devoted to working out definite integrals–beyond just subtracting indefinite ones?

The answer, as is often the case, is that in the real world of mathematical computation, things are more complicated than one learns in basic mathematics courses. And to get the correct answer one needs to be considerably more sophisticated.

In a simple case, subtracting indefinite integrals works just fine.

Consider computing the area under a sine curve, which equals

We work out the indefinite integral:

Then we can just subtract its value at each end point, and correctly find a definite integral such as

But consider a more complicated case:

We can verify that this indefinite integral at least formally differentiates correctly:

Now let’s compute the definite integral by subtracting values of our indefinite one:

But this cannot be correct. After all, if we plot the integrand, we can see that it is positive throughout the range 0 to 2π:

*Mathematica*‘s built-in definite integration of course gives exactly the correct answer:

So what went wrong with subtracting the end points? The issue is that the Fundamental Theorem of Calculus isn’t directly applicable here. Because, when you state it fully, the theorem requires that the antiderivative that is going to be subtracted be continuous throughout the interval.

But the antiderivative we have here looks like this:

It has a discontinuity right in the middle of the interval.

So how does *Mathematica* get its answer? It has to be more careful. Sometimes it works by detecting discontinuities in the antiderivative, and then breaking up the integration region into parts, and carefully taking directional limits at the discontinuity points:

Another thing it can do is to make use of ambiguity in the antiderivative. Every calculus student knows that antiderivatives can contain an arbitrary additive constant. But in fact, there’s more arbitrariness than that: one can add different constants on different parts of the interval.

Often it’s not obvious from the algebraic form that one has added a piecewise constant like this. For example, consider:

Differentiating this shows that it is indeed an antiderivative of the same function as above.

It doesn’t happen to be the antiderivative that *Mathematica* generates by default. But it is a perfectly valid one. And it turns out to be continuous over the region of integration:

So if one uses it, one can now directly apply the Fundamental Theorem of Calculus and get the correct result:

Even though their algebraic forms look different, one can verify that the antiderivatives differ by a piecewise constant:

So how come Mathematica doesn’t always generate the “better” antiderivative, so that the Fundamental Theorem of Calculus applies directly?

The problem is that there’s actually no way to produce an antiderivative that has this property for all definite integrals one might want to compute. Here’s the formal situation.

The Fundamental Theorem of Calculus states that an antiderivative continuous along a chosen path always exists. It is defined as , where the integration is performed along the path. Its existence is of theoretical importance–though in practice cannot always be expressed in terms of any predetermined set of elementary and special functions.

Moreover, if a meromorphic integrand has simple poles in the complex plane, it is impossible to choose an antiderivative continuous along every imaginable path in the complex plane–because of branch cuts in .

Our integrand has simple poles at :

But now consider how our two antiderivatives behave in the complex plane. Here are the real parts of these functions:

Looking on the real axis, is not continuous, so the Fundamental Theorem cannot directly be applied. But is continuous, so the Fundamental Theorem will work.

However, look now at the line from , indicated in the picture by a black line. Along this line, is continuous, so the Fundamental Theorem will work fine for it. But now is not continuous, so the Fundamental Theorem will not work:

And indeed one can show that there is no single choice of antiderivative for which the Fundamental Theorem will always work.

So *Mathematica* has to go to more effort to get the correct answer for the definite integral.

This may seem subtle–but actually it is just the tip of the iceberg of the issues that crop up in doing definite integration correctly in *Mathematica*–and in mathematics. It’s the job of our group at Wolfram Research to understand all these issues and figure out good algorithms for handling them. It’s a fascinating exercise not only in algorithm development but also in mathematics itself.

The first-place winner was Vladimir Dudchenko, an undergraduate student at the Moscow Institute of Physics and Technology (MIPT). He correctly solved all seven competition problems and displayed remarkable ingenuity and skill in his use of *Mathematica*. In addition to a student copy of *Mathematica* 6, Vladimir won a new MacBook Pro, a top-of-the-line machine donated by Apple and DPI Computers (Apple’s partner in Russia).

The runner-up winners were Yulia Kalugina, Konstantin Kanishev, Il’ja Lysenkov, and Aleksey Valabuev. Each won a student copy of *Mathematica* and other prizes from Wolfram Research.

And a special congratulation goes to 14-year-old Askar Safin. Askar was the youngest participant, and he correctly solved two of the problems–a significant achievement for a person of his age.

One of the problems, which turned out to be the most challenging, asked competitors to compute an inertia tensor of the spikey polyhedron, a cumulated icosahedron. You can now visualize the spikey polyhedron (which Michael Trott discussed in a recent blog post) instantly using the new `PolyhedronData` function in *Mathematica*.

We thought the problem’s most elegant solution was the one submitted by Aleksey Valabuev, a student at MIPT. He remarked that the spikey’s inertia tensor with respect to its center of mass must be spherical, i.e. * I==I_{1}==I_{2}==I_{3}*. Aleksey further noted that is manifestly invariant under rotations. He pointed out that it suffices to integrate over a tetrahedron with one vertex at the spikey’s center of mass and a base being one of the spikey’s faces, instead of integrating over the whole spikey. This observation follows because the spikey’s symmetry group acting on this tetrahedron would generate the whole spikey!

In order to carry out the integration over a tetrahedron with base vertices of coordinates `{a, b, c}`, he represented the point inside the tetrahedron as

for

Of course, *Mathematica* now lets one look up the inertia tensor of a spikey immediately using `PolyhedronData`:

We were impressed by the variety of answers we received and congratulate everyone who participated. We’d like to hold other student competitions soon… but more on that later.

]]>