# Big O and Friends: Tales of the Big, the Small and Every Scale in Between

July 26, 2018 — Itai Seggev, Senior Kernel Developer, Algorithms R&D

One of the many beautiful aspects of mathematics is that often, things that look radically different are in fact the same—or at least share a common core. On their faces, algorithm analysis, function approximation and number theory seem radically different. After all, the first is about computer programs, the second is about smooth functions and the third is about whole numbers. However, they share a common toolset: asymptotic relations and the important concept of asymptotic scale.

By comparing the “important parts” of two functions—a common trick in mathematics—asymptotic analysis classifies functions based on the relative size of their absolute values near a particular point. Depending on the application, this comparison provides quantitative answers to questions such as “Which of these algorithms is fastest?” or “Is function a good approximation to function *g*?”. Version 11.3 of the Wolfram Language introduces six of these relations, summarized in the following table.

The oldest (and probably the most familiar of the six relations) is `AsymptoticLessEqual`, which is commonly called big O or big Omicron. It was popularized by Paul Bachmann in the 1890s in his study of analytic number theory (though the concept had appeared earlier in the work of Paul du Bois-Reymond). At a point , is asymptotically less than or equal to , written , if for some constant for all near . This captures the notion that cannot become arbitrarily larger in magnitude than . Bachmann used this in his study of sums and the growth rate of number theoretic functions to show that he could split complicated sums into two parts: a leading part with an explicit form, and a subleading part without a concrete expression. The subleading part could, however, be shown to be asymptotically less than or equal to some other function that is, for some reason, unimportant compared with the leading part, and therefore only the leading part needed to be kept. Donald Knuth would popularize the notion of big O in computer science, using it to sort algorithms from fastest to slowest by whether the run time of one is asymptotically less than or equal to the next at infinity.

`AsymptoticLess`, also called little O or little Omicron, came next. Introduced by Edmund Landau approximately 15 years after Bachmann’s work (leading to the name “Bachmann–Landau symbols” for asymptotic relations in certain disciplines), it quantified the notion of the unimportance of the subleading part. In particular, is asymptotically less than , written , if for all constants and all near . The condition that the inequality holds for all positive , not just a single , means that can be made arbitrarily smaller in magnitude compared to *g*. Thus, the essentially equals . `AsymptoticLess` is also important in the analysis of algorithms, as it allows strengthening statements from “algorithm a is no slower than algorithm b” to “algorithm a is faster than algorithm b.”

After this point, the history becomes rather complicated, so for the time being we’ll skip to the 1970s, when Knuth popularized `AsymptoticEqual` (commonly called big Theta). This captures the notion that neither function is ignorable compared with the other near the point of interest. More formally, is asymptotically equal to , written , if for some constant and , for all near . After exploring these first three relations with examples, both the history and the other relations will be easily explained and understood.

Consider three simple polynomials: , and . The two linear polynomials are both asymptotically less than and asymptotically less than or equal to the quadratic one at infinity:

✕
AsymptoticLessEqual[x,x^2,x->∞] |

Even though for many values of , because eventually will become bigger and continue increasing in size:

✕
AsymptoticLess[10^5 x,x^2,x->∞] |

On the other hand, is not asymptotically less than . Even though is always smaller, the ratio is a constant instead of going to zero:

✕
AsymptoticLess[x,10^5 x,x->∞] |

Indeed, the two linear polynomials are asymptotically equal because their ratio stays in a fixed range away from zero:

✕
AsymptoticEqual[10^5 x,x,x->∞] |

The linear polynomials are not asymptotically equal to the quadratic one, however:

✕
AsymptoticEqual[10^6 x,x^2,x->∞] |

The following log-log plot illustrates the relationships among the three functions. The constant offset in the log-log scale between the two linear functions shows that they are asymptotically equal, while their smaller slopes with respect to show that the former are asymptotically less than the latter.

✕
LogLogPlot[{Abs[x],Abs[10^5 x],Abs[x^2]},{x,10,10^9},PlotLegends->"Expressions"] |

A typical example for the application of these examples concerns analyzing the running time of an algorithm. A classic example is the merge sort. This sort works by recursively splitting a list in two, sorting each half and then combining them in sorted order. The following diagram illustrates these steps:

The time to sort elements will be the sum of some constant time to compute the middle, to sort each half and some multiple of the number of elements to combine the two halves (where and are determined by the particular computer on which the algorithm is run):

✕
reqn=T[n]==2T[n/2]+ a n +b |

In this particular case, solving the recurrence equation to find the time to sort elements is straightforward:

✕
t=RSolveValue[reqn, T[n],n]//Expand |

Irrespective of the particular values of and and the constant of summation , , and thus the algorithm is said to have run time:

✕
AsymptoticEqual[t,n Log[n],n->∞,Assumptions->a>0] |

Any other algorithm that has run time takes roughly the same amount of time for large inputs. On the other hand, any algorithm with run time , such as radix sort, will be faster for large enough inputs, because :

✕
AsymptoticLess[n,n Log[n],n->∞] |

Conversely, any algorithm with run time , such as bubble sort, will be slower for large inputs, as :

✕
AsymptoticLess[n Log[n],n^2,n->∞] |

Another set of applications for `AsymptoticEqual` comes from convergence testing. Two functions that are asymptotically equal to each other will have the same summation or integration convergence—for example, at infinity:

✕
AsymptoticEqual[1/n,ArcCot[n], n->∞] |

It is well known that the sum of , known as the harmonic series, diverges:

✕
DiscreteLimit[Sum[1/n,{n,1,k}],k->∞] |

Thus, must also diverge:

✕
SumConvergence[ArcCot[n], n] |

Be careful: although the name `AsymptoticLessEqual` suggests a similarity to the familiar operator, the former is a partial order, and not all properties carry over. For example, it is the case that for any two real numbers and , either or , but it is not true that for any two functions either or :

✕
{AsymptoticLessEqual[Sin[1/x],x,x->0],AsymptoticLessEqual[x,Sin[1/x],x->0]} |

Similarly, if , then it is true that either or . But it is possible for to be true while both and are false:

✕
{AsymptoticLessEqual[Sin[x],1,x->∞],AsymptoticLess[Sin[x],1,x->∞],AsymptoticEqual[1,Sin[x],x->∞]} |

Because `AsymptoticLessEqual` is a partial order, there are two possibilities for what `AsymptoticGreaterEqual` (also called big Omega) could mean. One option is the logical negation of `AsymptoticLessEqual`, i.e. iff . In the previous example, then, 1 and are each asymptotically greater than or equal to each other. This captures the notion that is never less than some fixed multiple of even if the relative sizes of the two functions change infinitely many times close to . Another sensible definition for `AsymptoticGreaterEqual` would be simply the notational reverse of `AsymptoticLessEqual`, i.e. iff . This captures the notion that is eventually no greater than some fixed multiple of in magnitude. Similar considerations apply to `AsymptoticGreater`, also called little Omega.

Historically, Godfrey Harold Hardy and John Edensor Littlewood first used and popularized `AsymptoticGreaterEqual` in their seminal work on series of elliptic functions in 1910s, using the first definition. This definition is still presently used in analytic number theory. In the 1970s, Knuth proposed that the first definition is not widely used, and that the second definition would be more useful. This has become the standard in the analysis of algorithms and related fields. The Wolfram Language follows the second definition as well. Knuth also proposed using a similar definition for `AsymptoticGreater`, i.e. iff , which is used in computer science.

The last of the newly introduced relations, `AsymptoticEquivalent`, also comes from Hardy’s work in the early part of the 20th century. Roughly speaking, if their ratio approaches 1 at the limit point. More formally, is asymptotically equivalent to if for all constants and all near . Put another way, iff . Hence, asymptotic equivalence captures the notion of approximation with small relative error, also called asymptotic approximation. A well-known example of such an approximation is Stirling’s approximation for the factorial function:

✕
s[n_]:=Sqrt[2π n] (n/E)^n |

This function is asymptotically equivalent to the factorial function:

✕
AsymptoticEquivalent[n!,s[n],n->∞] |

This means the relative error, the size of the difference relative to the size of the factorial function, goes to zero at infinity:

✕
Underscript[, n->∞](n!-s[n])/n! |

Note that this is only a statement about relative error. The actual difference between and blows up at infinity:

✕
Underscript[, n->∞](n!-s[n]) |

Because asymptotic approximation only demands a small relative error, it can be used to approximate many more classes of functions than more familiar approximations, such as Taylor polynomials. However, by Taylor’s theorem, every differentiable function is asymptotically equivalent to each of its Taylor polynomials. For example, the following computation shows that is equivalent to each of its first three Maclaurin polynomials:

✕
{AsymptoticEquivalent[1,E^x,x->0],AsymptoticEquivalent[1+x,E^x,x->0],AsymptoticEquivalent[1+x+x^2/2,E^x,x->0]} |

Yet is also asymptotically equivalent to many other polynomials:

✕
AsymptoticEquivalent[1+2x,E^x,x->0] |

Plotting the relative errors for each of the four polynomials shows that it does go to zero for all of them:

✕
Plot[{Abs[(1-E^x)/E^x],Abs[(1+x-E^x)/E^x],Abs[(1+x+x^2/2-E^x)/E^x],Abs[(1+2x-E^x)/E^x]},{x,-.1,.1},PlotLegends->"Expressions",ImageSize->Medium] |

What, then, makes the first-order and second-order polynomials better than the zeroth? In the previous plot, they seem to be going to zero faster than the linear polynomials, but this needs to be made quantitative. For this, it is necessary to introduce an asymptotic scale, which is a family of functions for which . For Maclaurin series, that family is . Each monomial is, in fact, asymptotically greater than the one before:

✕
AsymptoticGreater[x^m,x^n,x->0,Assumptions->m |

Once an asymptotic scale has been defined, the error in the -order approximation can be compared not with the original function but with the member of the asymptotic scale. If that error is small, then the approximation is valid to order . Each of the three Maclaurin polynomials for has this property, again by Taylor’s theorem:

✕
{AsymptoticLess[1-Exp[x],1,x->0],AsymptoticLess[1+x-Exp[x],x,x->0],AsymptoticLess[1+x+x^2/2-Exp[x],x^2,x->0]} |

On the other hand, while is a valid zeroth-order approximation to at 0, it is not a valid first-order approximation:

✕
{AsymptoticLess[1+2x-Exp[x],1,x->0],AsymptoticLess[1+2x-Exp[x],x,x->0]} |

Indeed, is the only linear polynomial that is a first-order approximation to at 0 using the asymptotic scale . Visualizing the ratio of to and it is clear that the error is small with respect to but not with respect to . The ratio of the error to goes to 1, though any positive number would indicate is false:

✕
Plot[{Abs[1+2x-E^x],Abs[(1+2x-E^x)/x]},{x,-.1,.1},PlotLegends->"Expressions",ImageSize->Medium] |

The scale , often called the Taylor or power scale, is the simplest and most familiar of a huge number of different useful scales. For example, the Laurent scale is used to expand functions in the complex plane. In “Getting to the Point: Asymptotic Expansions in the Wolfram Language,” my colleague Devendra Kapadia showed how different scales arise when finding approximate solutions using the new functions `AsymptoticDSolveValue` and `AsymptoticIntegrate`. For example, the asymptotic scale (a type of Puiseux scale) comes up when solving Bessel’s equation, the asymptotic scale is needed to approximate the integraland Airy’s equation leads to the scale . We can verify that each of these indeed forms a scale by creating a small wrapper around `AsymptoticGreater`:

✕
AsymptoticScaleQ[list_,x_->x0_]:=And@@BlockMap[AsymptoticGreater[#1[[1]],#1[[2]],x->x0]&,list,2,1] |

The first few examples are asymptotic scales at 0:

✕
AsymptoticScaleQ[{1/x^2,1/x,1,x,x^2},x->0] |

✕
AsymptoticScaleQ[{x^(1/2),x^(5/2),x^(9/2),x^(13/2)},x->0] |

The last two, however, are asymptotic scales at ∞:

✕
AsymptoticScaleQ[{E^ω/ω^(1/2),E^ω/ω^(3/2),E^ω/ω^(5/2)},ω->∞] |

✕
AsymptoticScaleQ[{E^(-((2 x^(3/2))/3)) x^(-1/4),E^(-((2 x^(3/2))/3)) x^(-7/4),E^(-((2 x^(3/2))/3)) x^(-13/4)},x->∞] |

In computer science, algorithms are rated by whether they are linear, quadratic, exponential, etc. (in other words, whether their run times are asymptotically less than or equal to particular monomials, the exponential function, etc.). However, the preferred exponential scale is different— rather than . Thus, in addition to , they also consider . These are both asymptotic scales:

✕
AsymptoticScaleQ[{n^3,n^2,n,1},n->∞] |

✕
AsymptoticScaleQ[{2^n^4,2^n^3 ,2^n^2,2^n},n->∞] |

Problems are then classified by the run time scale of the fastest algorithm for solving them. Those that can be solved in polynomial time are said to be in , while problems that require an exponential time algorithm are in . The famous problem asks whether the class of problems that can be verified in polynomial time can also be solved in polynomial time. If then it is theoretically possible that , i.e. all problems solvable in exponential time are verifiable in polynomial time.

The power of asymptotic relations comes from the fact that they provide the means to define asymptotic scales, but the particular choice of scale and how it is used is determined by the application. In function approximation, the scales define asymptotic expansions—families of better and better asymptotic approximations using a given a scale. Depending on the function, different scales are possible. The examples in this blog illustrate power and exponential scales, but there are also logarithmic, polynomial and many other scales. In computer science, the scales are used for both theoretical and practical purposes to analyze and classify problems and programs. In number theory, scales are chosen to analyze the distribution of primes or other special numbers. But no matter what the application, the Wolfram Language gives you the tools to study them. Make sure you download your free trial of Wolfram|One in order to give Version 11.3 of the Wolfram Language a try!

*Download this post as a Wolfram Notebook.*

## 2 Comments

Nice to see that Mathematica knows that AsymptoticEquivalent[PrimePi[n]/n, 1/Log[n], n -> \[Infinity]] has value True!

Nice essay. Some clarifications: In “… f ≥ g iff g ≤ f. This captures the notion that f is eventually no greater than some fixed multiple of g”. Should that “no” be there? Also, ” If P ≠ NP then it is theoretically possible that NP=EXPTIME, i.e. all problems solvable in exponential time are verifiable in polynomial time.” is a bit of a non-sequitur. The P ≠ NP question has no bearing on the NP=EXPTIME equivalence so I’m not sure why this was included (also bearing in mind that the latter equivalence is generally considered to be highly unlikely).