In a sense, you can view neural network regression as a kind of intermediary solution between true regression (where you have a fixed probabilistic model with some underlying parameters you need to find) and interpolation (where your goal is mostly to draw an eye-pleasing line between your data points). Neural networks can get you something from both worlds: the flexibility of interpolation and the ability to produce predictions with error bars like when you do regression.
For those of you who already know about neural networks, I can give a very brief hint as to how this works: you build a randomized neural network with dropout layers that you train like you normally would, but after training you don’t deactivate the dropout layers and keep using them to sample the network several times while making predictions to get a measure of the errors. Don’t worry if that sentence didn’t make sense to you yet, because I will explain all of this in more detail.
To start, let’s do some basic neural network regression on the following data I made by taking points on a bell curve (e.g. the function ) and adding random numbers to it:
✕
exampleData = {{-1.8290606952826973`, 0.34220332868351117`}, {-0.6221091101205225`, 0.6029615713235724`}, {-1.2928624443456638`, 0.14264805848673934`}, {1.7383127604822395`, \ -0.09676233458358859`}, {2.701795903782372`, 0.1256597483577385`}, {1.7400006797156493`, 0.07503425036465608`}, {-0.6367237544480613`, 0.8371547667282598`}, {-2.482802633037993`, 0.04691691595492773`}, {0.9566109777301293`, 0.3860569423794188`}, {-2.551790012296368`, \ -0.037340684890464014`}, {0.6626176509888584`, 0.7670620756823968`}, {2.865357628008809`, -0.1120949485036743`}, \ {0.024445094773154707`, 1.3288343886644758`}, {-2.6538667331049197`, \ -0.005468132072381475`}, {1.1353110951218213`, 0.15366247144719652`}, {3.209853579579198`, 0.20621896435600656`}, {0.13992534568622972`, 0.8204487134187859`}, {2.4013110392840886`, \ -0.26232722849881523`}, {-2.1199290467312526`, 0.09261482926621102`}, {-2.210336371360782`, 0.02664895740254644`}, {0.33732886898809156`, 1.1701573388517288`}, {-2.2548343241910374`, \ -0.3576908508717164`}, {-1.4077788877461703`, 0.269393680956761`}, {3.210242875591371`, 0.21099679051999695`}, {0.7898064016052615`, 0.6198835029596128`}, {2.1835077887328893`, 0.08410415228550497`}, {0.008631687647122632`, 1.0501425654209409`}, {2.1792531502694334`, \ -0.11606480328877161`}, {-3.231947584552822`, -0.2359904673791076`}, \ {-0.7980615888830211`, 0.5151437742866803`}} plot = ListPlot[exampleData, PlotStyle -> Red] |
A regression neural network is basically a chain of alternating linear and nonlinear layers: the linear layers give your net a lot of free parameters to work with, while the nonlinear layers make sure that things don’t get boring. Common examples of nonlinear layers are the hyperbolic tangent, logistic sigmoid and the ramp function. For simplicity, I will stick with the Ramp nonlinearity, which simply puts kinks into straight lines (meaning that you get regressions that are piecewise linear):
✕
netRamp = NetChain[ {LinearLayer[100], Ramp, LinearLayer[100], Ramp, LinearLayer[]}, "Input" -> "Real", "Output" -> "Real" ]; trainedRamp = NetTrain[netRamp, <|"Input" -> exampleData[[All, 1]], "Output" -> exampleData[[All, 2]]|>, Method -> "ADAM", LossFunction -> MeanSquaredLossLayer[], TimeGoal -> 120, TargetDevice -> "GPU"]; Show[Plot[ trainedRamp[x], {x, -3.5, 3.5}, PlotLabel -> "Overtrained network"], plot, ImageSize -> Full, PlotRange -> All] |
As you can see, the network more or less just follows the points because it doesn’t understand the difference between the trend and the noise in the data. In the range above, the mix-up between trend and noise is particularly bad. The longer you train the network and the larger your linear layer, the stronger this effect will be. Obviously this is not what you want, since you’re really interested in fitting the trend of the data. Besides: if you really want to fit noise, you could just use interpolation instead. To prevent this overfitting of the data, you regularize (as explained in this tutorial) the network by using any or all of the following: a ValidationSet, regularization or a DropoutLayer. I will focus on the regularization coefficient and on dropout layers (in the next section you’ll see why), so let me briefly explain how they work:
To get a feeling of how these two methods regularize the regression, I made the following parameter sweeps of and :
✕
log\[Lambda]List = Range[-5, -1, 1]; regularizedNets = NetTrain[ netRamp, <|"Input" -> exampleData[[All, 1]], "Output" -> exampleData[[All, 2]]|>, LossFunction -> MeanSquaredLossLayer[], Method -> {"ADAM", "L2Regularization" -> 10^#}, TimeGoal -> 20 ] & /@ log\[Lambda]List; With[{xvals = Range[-3.5, 3.5, 0.1]}, Show[ ListPlot[ TimeSeries[Transpose@Through[regularizedNets[xvals]], {xvals}, ValueDimensions -> Length[regularizedNets]], PlotLabel -> "\!\(\*SubscriptBox[\(L\), \(2\)]\)-regularized networks", Joined -> True, PlotLegends -> Map[StringForm["`1` = `2`", Subscript[\[Lambda], 2], HoldForm[10^#]] &, log\[Lambda]List] ], plot, ImageSize -> 450, PlotRange -> All ] ] |
✕
pDropoutList = {0.0001, 0.001, 0.01, 0.05, 0.1, 0.5}; dropoutNets = NetChain[ {LinearLayer[300], Ramp, DropoutLayer[#], LinearLayer[]}, "Input" -> "Real", "Output" -> "Real" ] & /@ pDropoutList; trainedDropoutNets = NetTrain[ #, <|"Input" -> exampleData[[All, 1]], "Output" -> exampleData[[All, 2]]|>, LossFunction -> MeanSquaredLossLayer[], Method -> {"ADAM"(*,"L2Regularization"\[Rule]10^#*)}, TimeGoal -> 20 ] & /@ dropoutNets; With[{xvals = Range[-3.5, 3.5, 0.1]}, Show[ ListPlot[ TimeSeries[Transpose@Through[trainedDropoutNets[xvals]], {xvals}, ValueDimensions -> Length[trainedDropoutNets]], PlotLabel -> "Dropout-regularized networks", Joined -> True, PlotLegends -> Map[StringForm["`1` = `2`", Subscript[p, drop], #] &, pDropoutList] ], plot, ImageSize -> 450, PlotRange -> All ] ] |
To summarize:
Both regularization methods mentioned previously were originally proposed as ad hoc solutions to the overfitting problem. However, recent work has shown that there are actually very good fundamental mathematical reasons why these methods work. Even more importantly, it has been shown that you can use them to do better than just produce a regression line! For those of you who are interested, I suggest reading this blog post by Yarin Gal. His thesis “Uncertainty in Deep Learning” is also well worth a look and is the main source for what follows in the rest of this post.
As it turns out, there is a link between stochastic regression neural networks and Gaussian processes, which are free-form regression methods that let you predict values and put error bands on those predictions. To do this, we need to consider neural network regression as a proper Bayesian inference procedure. Normally, Bayesian inference is quite computationally expensive, but as it conveniently turns out, you can do an approximate inference with minimal extra effort on top of what I already did above.
The basic idea is to use dropout layers to create a noisy neural network that is trained on the data as normal. However, I’m also going to use the dropout layers when doing predictions: for every value where I need a prediction, I will sample the network multiple times to get a sense of the errors in the predictions.
Furthermore, it’s good to keep in mind that you, as a newly converted Bayesian, are also dealing with priors. In particular, the network weights are now random variables with a prior distribution and a posterior distribution (i.e. the distributions before and after learning). This may sound rather difficult, so let me try to answer two questions you may have at this point:
Q1: Does that mean that I actually have to think hard about my prior now?
A1: No, not really, because it simply turns out that our old friend , the regularization coefficient, is really just the inverse standard deviation of the network prior weights: if you choose a larger , that means you’re only allowing small network weights.
Q2: So what about the posterior distribution of the weights? Don’t I have to integrate the predictions over the posterior weight distribution to get a posterior predictive distribution?
A2: Yes, you do, and that’s exactly what you do (at least approximately) when you sample the trained network with the dropout layers active. The sampling of the network is just a form of Monte Carlo integration over the posterior distribution.
So as you can see, being a Bayesian here really just means giving things a different name without having to change your way of doing things very much.
Let’s start with the simplest type of regression in which the noise level of the data is assumed constant across the x axis. This is also called homoscedastic regression (as opposed to heteroscedastic regression, where the noise is a function of x). It does not, however, mean that the prediction error will also be constant: the prediction error depends on the noise level but also on the uncertainty in the network weights.
So let’s get to it and see how this works out, shall we? First I will define my network with a dropout layer. Normally you’d put a dropout layer before every linear layer, but since the input is just a number, I’m omitting the first dropout layer:
✕
\[Lambda]2 = 0.01; pdrop = 0.1; nUnits = 300; activation = Ramp; net = NetChain[ {LinearLayer[nUnits], ElementwiseLayer[activation], DropoutLayer[pdrop], LinearLayer[]}, "Input" -> "Real", "Output" -> "Real" ] |
✕
trainedNet = NetTrain[ net, <|"Input" -> exampleData[[All, 1]], "Output" -> exampleData[[All, 2]]|>, LossFunction -> MeanSquaredLossLayer[], Method -> {"ADAM", "L2Regularization" -> \[Lambda]2}, TimeGoal -> 10 ]; |
Next, we need to produce predictions from this model. To calibrate the model, you need to provide a prior length scale l that expresses your belief in how correlated the data is over a distance (just like in Gaussian process regression). Together with the regularization coefficient , the dropout probability p and the number of training data points N, you have to add the following variance to the sample variance of the network:
The following function takes a trained net and samples it multiple times with the dropout layers active (using NetEvaluationMode → "Train"). It then constructs a time series object of the –1, 0 and +1σ bands of the predictions:
✕
sampleNet[net : (_NetChain | _NetGraph), xvalues_List, sampleNumber_Integer?Positive, {lengthScale_, l2reg_, prob_, nExample_}] := TimeSeries[ Map[ With[{ mean = Mean[#], stdv = Sqrt[Variance[#] + (2 l2reg nExample)/(lengthScale^2 (1 - prob))] }, mean + stdv*{-1, 0, 1} ] &, Transpose@ Select[Table[ net[xvalues, NetEvaluationMode -> "Train"], {i, sampleNumber}], ListQ]], {xvalues}, ValueDimensions -> 3 ]; |
Now we can go ahead and plot the predictions with 1σ error bands. The prior seems to work reasonably well, though in real applications you’d need to calibrate it with a validation set (just like you would with and p).
✕
l = 2; samples = sampleNet[trainedNet, Range[-5, 5, 0.05], 200, {l, \[Lambda]2, pdrop, Length[exampleData]}]; Show[ ListPlot[ samples, Joined -> True, Filling -> {1 -> {2}, 3 -> {2}}, PlotStyle -> {Lighter[Blue], Blue, Lighter[Blue]} ], ListPlot[exampleData, PlotStyle -> Red], ImageSize -> 600, PlotRange -> All ] |
As you can see, the network has a tendency to do linear extrapolation due to my choice of the ramp nonlinearity. Picking different nonlinearities will lead to different extrapolation behaviors. In terms of Gaussian process regression, the choice of your network design influences the effective covariance kernel you’re using.
If you’re curious to see how the different network parameters influence the look of the regression, skip down a few paragraphs and try the manipulates, where you can interactively train your own network on data you can edit on the fly.
In heteroscedastic regression, you let the neural net try and find the noise level for itself. This means that the regression network outputs two numbers instead of one: a mean and a standard deviation. However, since the outputs of the network are real numbers, it’s easier if you use the log-precision instead of the standard deviation: :
✕
\[Lambda]2 = 0.01; pdrop = 0.1; nUnits = 300; activation = Ramp; regressionNet = NetGraph[ {LinearLayer[nUnits], ElementwiseLayer[activation], DropoutLayer[pdrop], LinearLayer[], LinearLayer[]}, { NetPort["Input"] -> 1 -> 2 -> 3, 3 -> 4 -> NetPort["Mean"], 3 -> 5 -> NetPort["LogPrecision"] }, "Input" -> "Real", "Mean" -> "Real", "LogPrecision" -> "Real" ] |
Next, instead of using a MeanSquaredLossLayer to train the network, you minimize the negative log-likelihood of the observed data. Again, you replace σ with the log of the precision and multiply everything by 2 to be in agreement with the convention of MeanSquaredLossLayer.
✕
FullSimplify[-2* LogLikelihood[ NormalDistribution[\[Mu], \[Sigma]], {yobs}] /. \[Sigma] -> 1/ Sqrt[Exp[log\[Tau]]], Assumptions -> log\[Tau] \[Element] Reals] |
Discarding the constant term gives us the following loss:
✕
loss = Function[{y, mean, logPrecision}, (y - mean)^2*Exp[logPrecision] - logPrecision ]; netHetero = NetGraph[<| "reg" -> regressionNet, "negLoglikelihood" -> ThreadingLayer[loss] |>, { NetPort["x"] -> "reg", {NetPort["y"], NetPort[{"reg", "Mean"}], NetPort[{"reg", "LogPrecision"}]} -> "negLoglikelihood" -> NetPort["Loss"] }, "y" -> "Real", "Loss" -> "Real" ] |
✕
trainedNetHetero = NetTrain[ netHetero, <|"x" -> exampleData[[All, 1]], "y" -> exampleData[[All, 2]]|>, LossFunction -> "Loss", Method -> {"ADAM", "L2Regularization" -> \[Lambda]2} ]; |
Again, the predictions are sampled multiple times. The predictive variance is now the sum of the variance of the predicted mean + mean of the predicted variance. The priors no longer influence the variance directly, but only through the network training:
✕
sampleNetHetero[net : (_NetChain | _NetGraph), xvalues_List, sampleNumber_Integer?Positive] := With[{regressionNet = NetExtract[net, "reg"]}, TimeSeries[ With[{ samples = Select[Table[ regressionNet[xvalues, NetEvaluationMode -> "Train"], {i, sampleNumber}], AssociationQ] }, With[{ mean = Mean[samples[[All, "Mean"]]], stdv = Sqrt[Variance[samples[[All, "Mean"]]] + Mean[Exp[-samples[[All, "LogPrecision"]]]]] }, Transpose[{mean - stdv, mean, mean + stdv}] ] ], {xvalues}, ValueDimensions -> 3 ] ]; |
Now you can plot the predictions with 1σ error bands:
✕
samples = sampleNetHetero[trainedNetHetero, Range[-5, 5, 0.05], 200]; Show[ ListPlot[ samples, Joined -> True, Filling -> {1 -> {2}, 3 -> {2}}, PlotStyle -> {Lighter[Blue], Blue, Lighter[Blue]} ], ListPlot[exampleData, PlotStyle -> Red], ImageSize -> 600, PlotRange -> All ] |
Of course, it’s still necessary to do validation of this network; one network architecture might be much better suited to the data at hand than another, so there is still the need to use validation sets to decide which model you have to use and with what parameters. Attached to the end of this blog post, you’ll find a notebook with an interactive demo of the regression method I just showed. With this code, you can find out for yourself how the different model parameters influence the predictions of the network.
The code in this section shows how to implement the loss function described in the paper “Dropout Inference in Bayesian Neural Networks with Alpha-Divergences” by Li and Gal. For an interpretation of the α parameter used in this work, see e.g. figure 2 in “Black-Box α-Divergence Minimization” by Hernández-Lobato et al (2016).
In the paper by Li and Gal, the authors propose a modified loss function ℒ for a stochastic neural network to solve a weakness of the standard loss function I used above: it tends to underfit the posterior and give overly optimistic predictions. Optimistic predictions are a problem: when you fit your data to try and get a sense of what the real world might give you, you don’t want to be thrown a curveball afterwards.
During training, the training inputs (with indexing the training examples) are fed through the network K times to sample the outputs and compared to the training outputs . Given a particular standard loss function l (e.g. mean square error, negative log likelihood, cross-entropy) and regularization function for the weights θ, the modified loss function ℒ is given as:
The parameter α is the divergence parameter, which is typically tuned to (though you can pick other values as well, if you want). It can be thought of as a “pessimism” parameter: the higher it is, the more the network will tend to err on the side of caution and the larger error estimates. Practically speaking, a higher α parameter makes the loss function more lenient to the presence of large losses among the K samples, meaning that after training the network will produce a larger spread of predictions when sampled. Literature seems to suggest that is a pretty good value to start with. In the limit α→0, the LogSumExp simply becomes the sample average over K losses.
As can be seen, we need to sample the network several times during training. We can accomplish this with NetMapOperator. As a simple example, suppose we want to apply a dropout layer times to the same input. To do this, we duplicate the input and then wrap a NetMapOperator around the dropout layer and map it over the duplicated input:
✕
input = Range[5]; NetChain[{ ReplicateLayer[10], NetMapOperator[ DropoutLayer[0.5] ] } ][input, NetEvaluationMode -> "Train"] |
Next, define a net that will try to fit the data points with a normal distribution like in the previous heteroscedastic example. The output of the net is now a length-2 vector with the mean and the log precision (we can’t have two output ports because we’re going to have wrap the whole thing into NetMapOperator):
✕
alpha = 0.5; pdrop = 0.1; units = 300; activation = Ramp; \[Lambda]2 = 0.01; (*L2 regularization coefficient*) k = 25; (* number of samples of the network for calculating the loss*) regnet = NetInitialize@NetChain[{ LinearLayer[units], ElementwiseLayer[activation], DropoutLayer[pdrop], LinearLayer[] }, "Input" -> "Real", "Output" -> {2} ]; |
You will also need a network element to calculate the LogSumExp operator that aggregates the losses of the different samples of the regression network. I implemented the α-weighted LogSumExp by factoring out the largest term before feeding the vector into the exponent to make it more numerically stable. Note that I’m ignoring theterm since it’s a constant for the purpose of training the network.
✕
logsumexp\[Alpha][alpha_] := NetGraph[<| "timesAlpha" -> ElementwiseLayer[Function[-alpha #]], "max" -> AggregationLayer[Max, 1], "rep" -> ReplicateLayer[k], "sub" -> ThreadingLayer[Subtract], "expAlph" -> ElementwiseLayer[Exp], "sum" -> SummationLayer[], "logplusmax" -> ThreadingLayer[Function[{sum, max}, Log[sum] + max]], "invalpha" -> ElementwiseLayer[Function[-(#/alpha)]] |>, { NetPort["Input"] -> "timesAlpha", "timesAlpha" -> "max" -> "rep", {"timesAlpha", "rep"} -> "sub" -> "expAlph" -> "sum" , {"sum", "max"} -> "logplusmax" -> "invalpha" }, "Input" -> {k} ]; logsumexp\[Alpha][alpha] |
Define the network that will be used for training:
✕
net\[Alpha][alpha_] := NetGraph[<| "rep1" -> ReplicateLayer[k],(* replicate the inputs and outputs of the network *) "rep2" -> ReplicateLayer[k], "map" -> NetMapOperator[regnet], "mean" -> PartLayer[{All, 1}], "logprecision" -> PartLayer[{All, 2}], "loss" -> ThreadingLayer[ Function[{mean, logprecision, y}, (mean - y)^2*Exp[logprecision] - logprecision]], "logsumexp" -> logsumexp\[Alpha][alpha] |>, { NetPort["x"] -> "rep1" -> "map", "map" -> "mean", "map" -> "logprecision", NetPort["y"] -> "rep2", {"mean", "logprecision", "rep2"} -> "loss" -> "logsumexp" -> NetPort["Loss"] }, "x" -> "Real", "y" -> "Real" ]; net\[Alpha][alpha] |
… and train it:
✕
trainedNet\[Alpha] = NetTrain[ net\[Alpha][alpha], <|"x" -> exampleData[[All, 1]], "y" -> exampleData[[All, 2]]|>, LossFunction -> "Loss", Method -> {"ADAM", "L2Regularization" -> \[Lambda]2}, TargetDevice -> "CPU", TimeGoal -> 60 ]; |
✕
sampleNet\[Alpha][net : (_NetChain | _NetGraph), xvalues_List, nSamples_Integer?Positive] := With[{regnet = NetExtract[net, {"map", "Net"}]}, TimeSeries[ Map[ With[{ mean = Mean[#[[All, 1]]], stdv = Sqrt[Variance[#[[All, 1]]] + Mean[Exp[-#[[All, 2]]]]] }, mean + stdv*{-1, 0, 1} ] &, Transpose@Select[ Table[ regnet[xvalues, NetEvaluationMode -> "Train"], {i, nSamples} ], ListQ]], {xvalues}, ValueDimensions -> 3 ] ]; |
✕
samples = sampleNet\[Alpha][trainedNet\[Alpha], Range[-5, 5, 0.05], 200]; Show[ ListPlot[ samples, Joined -> True, Filling -> {1 -> {2}, 3 -> {2}}, PlotStyle -> {Lighter[Blue], Blue, Lighter[Blue]} ], ListPlot[exampleData, PlotStyle -> Red], ImageSize -> 600, PlotRange -> All ] |
I’ve discussed that dropout layers and the regularization coefficient in neural network training can actually be seen as components of a Bayesian inference procedure that approximates Gaussian process regression. By simply training a network with dropout layers like normal and then running the network several times in NetEvaluationMode → "Train", you can get an estimate of the predictive posterior distribution, which not only includes the noise inherently in the data but also the uncertainty in the trained network weights.
If you’d like to learn more about this material or have any questions you’d like to ask, please feel free to visit my discussion on Wolfram Community.
We sat down with Daniel to learn more about his research and how the Wolfram Language plays a part in it.
This was actually a perfect choice in my research area, and the timing was perfect, since within one week after I joined the group, there was the first gravitational wave detection by LIGO, and things got very exciting from there.
I was very fortunate to work in the most exciting fields of astronomy as well as computer science. At the [NCSA] Gravity Group, I had complete freedom to work on any project that I wanted, and funding to avoid any teaching duties, and a lot of support and guidance from my advisors and mentors who are experts in astrophysics and supercomputing. Also, NCSA was an ideal environment for interdisciplinary research.
Initially, my research was focused on developing gravitational waveform models using post-Newtonian methods, calibrated with massively parallel numerical relativity simulations using the Einstein Toolkit on the Blue Waters petascale supercomputer.
These waveform models are used to generate templates that are required for the existing matched-filtering method (a template-matching method) to detect signals in the data from LIGO and estimate their properties.
However, these template-matching methods are slow and extremely computationally expensive, and not scalable to all types of signals. Furthermore, they are not optimal for the complex non-Gaussian noise background in the LIGO detectors. This meant a new approach was necessary to solve these issues.
My article was featured in the special issue commemorating the Nobel Prize in 2017.
Even though peer review is done for free by referees in the scientific community and the expenses to host online articles are negligible, most high-profile journals today are behind expensive paywalls and charge thousands of dollars for publication. However, Physics Letters B is completely open access to everyone in the world for free and has no publication charges for the authors. I believe all journals should follow this example to maximize scientific progress by promoting open science.
This was the main reason why we chose Physics Letters B as the very first journal where we submitted this article.
I think the attendees and judges found this very impressive, since it was connecting high-performance parallel numerical simulations with artificial intelligence methods based on deep learning to enable real-time analysis of big data from LIGO for gravitational wave and multimessenger astrophysics. Basically, this research is at the interface of all these exciting topics receiving a lot of hype recently.
I was always interested in artificial intelligence since my childhood, but I had no background in deep learning or even machine learning until November 2016, when I attended the Supercomputing Conference (SC16).
There was a lot of hype about deep learning at this conference, especially a lot of demos and workshops by NVIDIA, which got me excited to try out these techniques for my research. This was also right after the new neural network functionality was released in Version 11 of the Wolfram Language. I already had the training data of gravitational wave signals from my research with the NCSA Gravity Group, as mentioned before. So all these came together, and this was a perfect time to try out applying deep learning to tackle the problem of gravitational wave analysis.
Since I had no background in this field, I started out by taking an online course by Geoffrey Hinton on Coursera and CS231 at Stanford, and quickly read through the Deep Learning book by Bengio [Courville and Goodfellow], all in about a week.
Then it took only a couple of days to get used to the neural net framework in the Wolfram Language by reading the documentation. I decided to give time series inputs directly into 1D convolutional neural networks instead of images (spectrograms). Amazingly, the very first convolutional network I tried performed better than expected for gravitational wave analysis, which was very encouraging.
Here are some advantages of using deep learning over matched filtering:
1) Speed: The analysis can be carried out within milliseconds using deep learning (with minimal computational resources), which will help in finding the electromagnetic counterpart using telescopes faster. Enabling rapid followup observations can lead to new physical insights.
2) Covering more parameters: Only a small subset of the full parameter space of signals can be searched for using matched filtering (template matching), since the computational cost explodes exponentially with the number of parameters. Deep learning is highly scalable and requires only a one-time training process, so the high-dimensional parameter space can be covered.
3) Generalization to new sources: The article shows that signals from new classes of sources beyond the training data, such as spin precessing or eccentric compact binaries, can be automatically detected with this method with the same sensitivity. This is because, unlike template-matching techniques, deep learning can interpolate to points within the training data and generalize beyond it to some extent.
4) Resilience to non-Gaussian noise: The results show that this deep learning method can distinguish signals from transient non-Gaussian noises (glitches) and works even when a signal is contaminated by a glitch, unlike matched filtering. For instance, the occurrence of a glitch in coincidence with the recent detection of the neutron star merger delayed the analysis by several hours using existing methods and required manual inspection. The deep learning technique can automatically find these events and estimate their parameters.
5) Interpretability: Once the deep learning method detects a signal and predicts its parameters, this can be quickly cross-validated using matched filtering with a few templates around these predicted parameters. Therefore, this can be seen as a method to accelerate matched filtering by narrowing down the search space—so the interpretability of the results is not lost.
I have been using Mathematica since I was an undergraduate at IIT Bombay. I have used it for symbolic calculation as well as numerical computation.
The Wolfram Language is very coherent, unlike other languages such as Python, and includes all the functionality across different domains of science and engineering without relying on any external packages that have to be loaded. All the 6,000 or so functions have explicit names and are designed with a very similar syntax, which means that most of the time you can simply guess the name and usage without referring to any documentation. The documentation is excellent, and it is all in one place.
Overall, the Wolfram Language saves a researcher’s time by a factor of 2–3x compared to other programming languages. This means you can do twice as much research. If everyone used Mathematica, we could double the progress of science!
I also used it for all my coursework, and submitted Mathematica notebooks exported into PDFs, while everyone else in my class was still writing things down with pen and paper.
The Wolfram Language neural network framework was extremely helpful for me. It is a very high-level framework and doesn’t require you to worry about what is happening under the hood. Even someone with zero background in deep learning can use it successfully for their projects by simply referring to just the documentation.
Using GPUs to do training with the Wolfram Language was as simple as including the string TargetDevice->"GPU" in the code. With this small change, everything ran on GPUs like magic on any of my machines on Windows, OSX or Linux, including my laptop, Blue Waters, the Campus Cluster, the Volta and Pascal NVIDIA DGX-1 deep learning supercomputers and the hybrid machine with four P100 GPUs at the NCSA Innovative Systems Lab.
I used about 12 GPUs in parallel to try out different neural network architectures as well.
I completed the whole project, including the research, writing the paper and posting on arXiv, within two weeks after I came up with the idea at SC16, even though I had never done any deep learning–related work before. This was only possible because I used the Wolfram Language.
I had drafted the initial version of the research paper as a Mathematica notebook. This allowed me to write paragraphs of text and typeset everything, even mathematical equations and figures, and organize into sections and subsections just like in a Word document. At the end, I could export everything into a LaTeX file and submit to the journal.
Everything, including the data preparation, preprocessing, training and inference with the deep convolutional neural nets, along with the preparation of figures and diagrams of the neural net architecture, was done with the Wolfram Language.
Apart from programming, I regularly use Mathematica notebooks as a word processor and to create slides for presentations. All this functionality is included with Mathematica.
Read the documentation, which is one of the greatest strengths of the language.
There are a lot of included examples about using deep learning for various types of problems, such as classification, regression in fields such as time series analysis, natural language processing, image processing, etc.
The Wolfram Neural Net Repository is a unique feature in the Wolfram Language that is super helpful. You can directly import state-of-the-art neural network models that are pre-trained for hundreds of different tasks and use them in your code. You can also perform “net surgery” on these models to customize them as you please for your research/applications.
The Mathematica Stack Exchange is a very helpful resource, as is the Fast Introduction for Programmers, along with Mathematica Programming—An Advanced Introduction by Leonid Shifrin.
Deep Learning for Real-Time Gravitational Wave Detection and Parameter Estimation: Results with Advanced LIGO Data (Physics Letters B)
Glitch Classification and Clustering for LIGO with Deep Transfer Learning (NIPS 2017, Deep Learning for Physical Science)
Deep Neural Networks to Enable Real-Time Multimessenger Astrophysics (Physics Review D)
]]>The Wolfram Language is essential to many Bridges attendees’ work. It’s used to explore ideas, puzzle out technical details, design prototypes and produce output that controls production machines. It’s applied to sculpture, graphics, origami, painting, weaving, quilting—even baking.
In the many years I’ve attended the Bridges conferences, I’ve enjoyed hearing about these diverse applications of the Wolfram Language in the arts. Here is a selection of Bridges artists’ work.
George Hart is well known for his insanely tangled sculptures based on polyhedral symmetries. Two of his recent works, SNO-Ball and Clouds, were puzzled out with the help of the Wolfram Language:
This video includes a Wolfram Language animation that shows how the elements of the Clouds sculpture were transformed to yield the vertically compressed structure.
One of Hart’s earliest Wolfram Language designs was for the Millennium Bookball, a 1998 commission for the Northport Public Library. Sixty wooden books are arranged in icosahedral symmetry, joined by cast bronze rings. Here is the Wolfram Language design for the bookball and a photo of the finished sculpture:
One of my favorite Hart projects was the basis of a paper with Robert Hanson at the 2013 Bridges conference: “Custom 3D-Printed Rollers for Frieze Pattern Cookies.” With a paragraph of Wolfram Language code, George translates images to 3D-printed rollers that emboss the images on, for example, cookie dough:
It’s a brilliant application of the Wolfram Language. I’ve used it myself to make cookie-roller presents and rollers for patterning ceramics. You can download a notebook of Hart’s code. Since Hart wrote this code, we’ve added support for 3D printing to the Wolfram Language. You can now send roller designs directly to a printing service or a local 3D printer using Printout3D.
Christopher Hanusa has made a business of selling 3D-printed objects created exclusively with the Wolfram Language. His designs take inspiration from mathematical concepts—unsurprising given his position as an associate professor of mathematics at Queens College, City University of New York.
Hanusa’s designs include earrings constructed with mesh and region operations:
… a pendant designed with transformed graphics primitives:
… ornaments designed with ParametricPlot3D:
… and a tea light made with ParametricPlot3D, using the RegionFunction option to punch an interesting pattern of perforations into the cylinder:
Hanusa has written about how he creates his designs with the Wolfram Language on his blog, The Mathematical Zorro. You can see all of Hanusa’s creations in his Shapeways shop.
William F. Duffy, an accomplished traditional sculptor, also explores forms derived from parametric equations and cast from large-scale resin 3D prints. Many of his forms result from Wolfram Language explorations.
Here, for example, are some of Duffy’s explorations of a fifth-degree polynomial that describes a Calabi–Yau space, important in string theory:
Duffy plotted one instance of that function in Mathematica, 3D-printed it in resin and made a mold from the print in which the bronze sculpture was cast. On the left is a gypsum cement test cast, and on the right the finished bronze sculpture, patinated with potassium sulfide:
On commission from the Simons Center for Geometry and Physics, Duffy created the object on the left as a bronze-infused, stainless steel 3D print. The object on the right was created from the same source file, but printed in nylon:
Duffy continues to explore functions on the complex plane as sources for sculptural structures:
You will be able to see more of Duffy’s work, both traditional and mathematical, on his forthcoming website.
Robert Fathauer uses the Wolfram Language to explore diverse phenomena, including fractal structures with negative curvature that are reminiscent of natural forms. This print of such a form was exhibited in the Bridges 2013 art gallery:
Fathauer realizes the ideas he explores in meticulously handcrafted ceramic forms reminiscent of corals and sponges:
One of Fathauer’s Mathematica-designed ceramic works consisted of 511 cubic elements (!). Here are shots of the Wolfram Language model and its realization, before firing, as a ceramic sculpture:
Unfortunately, in what Fathauer has confirmed was a painful experience, the sculpture exploded in the kiln during firing. But this structure, as well as several other fractal structures designed with the Wolfram Language, is available in Fathauer’s Shapeways shop.
Martin Levin makes consummately crafted models that reveal the structure of our world—the distance, angular and topological relationships that govern the possibilities and impossibilities of 3D space:
What you don’t—or barely—see is where the Wolfram Language has had the biggest impact in his work. The tiny connectors that join the tubular parts are 3D printed from models designed with the Wolfram Language:
Levin is currently designing 3D-printed modules that can be assembled to make a lost-plastic bronze casting of a compound of five tetrahedra:
The finished casting should look something like this (but mirror-reversed):
Henry Segerman explored some of the topics in his engaging book Visualizing Mathematics with 3D Printing with Wolfram Language code. While the forms in the book are explicitly mathematical, many have an undeniable aesthetic appeal. Here are snapshots from his initial explorations of surfaces with interesting topologies…
… which led to these 3D-printed forms in his Shapeways shop:
His beautiful Archimedean Spire…
… was similarly modeled first with Wolfram Language code:
In addition to mathematical models, Segerman collaborates with Robert Fathauer (above) to produce exotic dice, whose geometry begins as Wolfram Language code—much of it originating from the Wolfram MathWorld entry “Isohedron”:
In addition to constructing immersive virtual reality hyperbolic spaces, Elisabetta Matsumoto turns high-power mathematics into elegant jewelry using the Wolfram Language. This piece, which requires a full screen of mathematical code to describe, riffs on one of the earliest discovered minimal surfaces, Scherk’s second surface:
Continuing the theme of hyperbolic spaces, here’s one of Matsumoto’s Wolfram Language designs, this one in 2D rather than 3D:
You can see Matsumoto’s jewelry designs in her Shapeways shop.
Father and son Koos and Tom Verhoeff have long used the Wolfram Language to explore sculptural forms and understand the intricacies of miter joint geometries and torsion constraints that enable Koos to realize his sculptures. Their work is varied, from tangles to trees to lattices in wood, sheet metal and cast bronze. Here is a representative sample of their work together with the underlying Wolfram Language models, all topics of Bridges conference papers:
Three Families of Mitered Borromean Ring Sculptures
Mitered Fractal Trees: Constructions and Properties
Folded Strips of Rhombuses, and a Plea for the Square Root of 2 : 1 Rhombus
Tom Verhoeff’s YouTube channel has a number of Wolfram Language videos, including one showing how the last of the structures above is developed from a strip of rhombuses.
In 2015, three Verhoeff sculptures were installed in the courtyard of the Mathematikon of Heidelberg University. Each distills one or more mathematical concepts in sculptural form. All were designed with the Wolfram Language:
You can find detailed information about the mathematical concepts in the Mathematikon sculptures in the Bridges 2016 paper “Three Mathematical Sculptures for the Mathematikon.”
Edmund Harriss has published two best-selling thinking person’s coloring books, Patterns of the Universe and Visions of the Universe, in collaboration with Alex Bellos. They’re filled with gorgeous mathematical figures that feed the mind as well as the creative impulse. Edmund created his figures with Mathematica, a tribute to the diversity of phenomena that can be productively explored with the Wolfram Language:
Loe Feijs and Marina Toetters are applying new technology to traditional weaving patterns: puppytooth and houndstooth, or pied-de-poule. With Wolfram Language code, they’ve implemented cellular automata whose patterns tend toward and preserve houndstooth patterns:
By adding random elements to the automata, they generate woven fabric with semi-random patterns that allude to houndstooth:
This video describes their houndstooth work. You can read the details in their Bridges 2017 paper, “A Cellular Automaton for Pied-de-poule (Houndstooth).”
You can hardly find a more direct translation from mathematical function to artistic expression than Caroline Bowen’s layered Plexiglas works. And yet her craftsmanship and aesthetic choices yield compelling works that transcend mere mathematical models.
The two pieces she exhibited in the 2016 Bridges gallery were inspired by examples in the SliceContourPlot3D documentation (!). All of the pieces pictured here were created using contour-plotting functions in Mathematica:
In 2017, Bowen exhibited a similarly layered piece with colors that indicate the real and imaginary parts of the complex-valued function ArcCsch[z^{4}]+Sec[z^{2}] as well as the function’s poles and branch cuts:
Paper sculptor Jeannine Mosely designs some of her origami crease patterns with the Wolfram Language. In some cases, as with these tessellations whose crease patterns require the numerical solution of integrals, the Wolfram Language is essential:
Mosely created these “bud” variations with a parametric design encapsulated as a Wolfram Language function:
If you’d like to try folding your own bud, Mosely has provided a template and instructions.
The design and fabrication of Helaman Ferguson’s giant Umbilic Torus SC sculpture was the topic of a Bridges 2012 paper authored with his wife Claire, “Celebrating Mathematics in Stone and Bronze: Umbilic Torus NC vs. SC.”
The paper details the fabrication of the sculpture (below left), an epic project that required building a gantry robot and carving 144 one-ton blocks of sandstone. The surface of the sculpture is textured with a Hilbert curve, a single line that traverses the entire surface, shown here in a photo of an earlier, smaller version of the sculpture (right):
The Hilbert curve is not just surface decoration—it’s also the mark left by the ball-head cutting tool that carved the curved surfaces of the casting molds. The ridges in the surface texture are the peaks left between adjacent sweeps of the cutting tool.
Ferguson attacked the tasks of modeling the Hilbert curve tool path and generating the G-code that controlled the CNC milling machine that carved the molds with Mathematica:
I too participate in the Bridges conferences, and I use the Wolfram Language nearly every day to explore graphical and sculptural ideas. One of the more satisfying projects I undertook was the basis of a paper I presented at the 2015 Bridges conference, “Algorithmic Quilting,” written in collaboration with Theodore Gray and Nina Paley.
The paper describes an algorithmic method we used to generate a wide variety of single-line fills for quilts. Starting with a distribution of points, we make a graph on the points, extract a spanning tree from it and render a fill by tracing around the tree:
We tested the algorithm by generating a variety of backgrounds for a quilt based on frames of Eadweard Muybridge’s horse motion studies:
Here’s an animation of the frames in the quilt:
We’re fascinated by artificial intelligence and machine learning, and Achim Zielesny’s second edition of From Curve Fitting to Machine Learning: An Illustrative Guide to Scientific Data Analysis and Computational Intelligence provides a great introduction to the increasingly necessary field of computational intelligence. This is an interactive and illustrative guide with all concepts and ideas outlined in a clear-cut manner, with graphically depicted plausibility arguments and a little elementary mathematics. Exploring topics such as two-dimensional curve fitting, multidimensional clustering and machine learning with neural networks or support vector machines, the subject-specific demonstrations are complemented with specific sections that address more fundamental questions like the relation between machine learning and human intelligence. Zielesny makes extensive use of Computational Intelligence Packages (CIP), a high-level function library developed with Mathematica’s programming language on top of Mathematica’s algorithms. Readers with programming skills may easily port or customize the provided code, so this book is particularly valuable to computer science students and scientific practitioners in industry and academia.
The Art of Programming in the Mathematica Software, third edition
Another gem for programmers and scientists who need to fine-tune and otherwise customize their Wolfram Language applications is the third edition of The Art of Programming in the Mathematica Software, by Victor Aladjev, Valery Boiko and Michael Shishakov. This text concentrates on procedural and functional programming. Experienced Wolfram Language programmers know the value of creating user tools. They can extend the most frequently used standard tools of the system and/or eliminate its shortcomings, complement new features, and much more. Scientists and data analysts can then conduct even the most sophisticated work efficiently using the Wolfram Language. Likewise, professional programmers can use these techniques to develop more valuable products for their clients/employers. Included is the MathToolBox package with more than 930 tools; their freeware license is attached to the book.
Introduction to Mathematica with Applications
For a more basic introduction to Mathematica, readers may turn to Marian Mureşan’s Introduction to Mathematica with Applications. First exploring the numerous features within Mathematica, the book continues with more complex material. Chapters include topics such as sorting algorithms, functions—both planar and solid—with many interesting examples and ordinary differential equations. Mureşan explores the advantages of using the Wolfram Language when dealing with the number pi and describes the power of Mathematica when working with optimal control problems. The target audience for this text includes researchers, professors and students—really anyone who needs a state-of-the art computational tool.
Geographical Models with Mathematica
The Wolfram Language’s powerful combination of extensive map data and computational agility is on display in André Dauphiné’s Geographical Models with Mathematica. This book gives a comprehensive overview of the types of models necessary for the development of new geographical knowledge, including stochastic models, models for data analysis, geostatistics, networks, dynamic systems, cellular automata and multi-agent systems, all discussed in their theoretical context. Dauphiné then provides over 65 programs that formalize these models, written in the Wolfram Language. He also includes case studies to help the reader apply these programs in their own work.
Our tour of new Wolfram Language books moves from terra firma to the stars in Geometric Optics: Theory and Design of Astronomical Optical Systems Using Mathematica. This book by Antonio Romano and Roberto Caveliere provides readers with the mathematical background needed to design many of the optical combinations that are used in astronomical telescopes and cameras. The results presented in the work were obtained through a different approach to third-order aberration theory as well as the extensive use of Mathematica. Replete with workout examples and exercises, Geometric Optics is an excellent reference for advanced graduate students, researchers and practitioners in applied mathematics, engineering, astronomy and astronomical optics. The work may be used as a supplementary textbook for graduate-level courses in astronomical optics, optical design, optical engineering, programming with Mathematica or geometric optics.
Don’t forget to check out Stephen Wolfram’s An Elementary Introduction to the Wolfram Language, now in its second edition. It is available in print, as an ebook and free on the web—as well as in Wolfram Programming Lab in the Wolfram Open Cloud. There’s also now a free online hands-on course based on the book. Read Stephen Wolfram’s recent blog post about machine learning for middle schoolers to learn more about the new edition. |
In Mathematica 10, we introduced support for anatomical structures in EntityValue, which included, among many other things, a “Graphics3D” property that returns a 3D model of the anatomical structure in question. We also styled the models and aligned them with the concepts in the Unified Medical Language System (UMLS).
The output is a standard Graphics3D expression, but it contains metadata in the form of an Annotation that allows for additional exploration.
This means each model knows what lower-level structures it’s made of.
I should note that the models being used are not just eye candy. If that were the intent, we might explore low polygon count models and use textures for a more realistic appearance. But these models are not just good for looking at—you can also use them for computation. For meaningful results, you need accurate models, which may be large, so you may need to be patient when downloading and/or rendering them. Keep in mind that some entities, like the brain, have lots of internal structures. So the model may be larger than you expect, although you may not see this internal structure from the outside.
One example of using these models for computation would be calculating the eigenfrequencies of an air-filled transverse colon (let the jokes fly). Finite element mesh (FEM) calculations are common in medical research today. By retrieving the mesh from AnatomyData, we can perform computations on the model.
Now we can obtain the resonant frequencies of the transverse colon.
You can use Sound to listen to the resonant frequencies of the transverse colon.
We can find the surface area of the model by directly operating on the MeshRegion. Units are in square millimeters.
To compute the volume, we need to convert the MeshRegion into a BoundaryMeshRegion first. Units are in cubic millimeters.
You can even compute the distance between the transverse colon and the region centroid of the large intestine. Units are in millimeters.
Lower-resolution models will give lower-quality results.
To make it easy to render anatomical structures, we introduced AnatomyPlot3D.
AnatomyPlot3D allows directives to modify its “primitives,” similar in syntax to Graphics3D.
The output of AnatomyPlot3D doesn’t contain the original Entity objects. They get resolved at evaluation time to 3D graphics primitives, and the normal rules of the graphics language apply to them. The output of AnatomyPlot3D is a Graphics3D object.
Because AnatomyPlot3D can be thought of as an extension to the Graphics3D language, you can mix anatomical structures with normal Graphics3D primitives.
In AnatomyPlot3D, the Graphics3D language has been extended at multiple levels to make use of anatomical structures. Within AnatomyPlot3D, anatomical entities work just like graphics primitives do in Graphics3D. But they can also be used in place of coordinates in Graphics3D primitives like Point and Line. In that context, the entity represents the region centroid of the structure. This allows you, for example, to draw a line from one entity to another.
This concept can be applied to annotate a 3D model using arrows and labels.
You can refer to the subparts of structures and apply styles to them using AnatomyForm. It applies only to anatomical structures (not Graphics3D primitives) and supports a couple of different forms. The following example behaves similar to a standalone directive, except that it applies only to the anatomical structure, not the Cuboid.
A more useful form can be used to style subparts.
AnatomyForm works by allowing you to associate specified directives with specified entities that may or may not exist in the structure you are visualizing. Any Directive supported by Graphics3D is supported, including Lighting, Opacity, EdgeForm, FaceForm, ClipPlanes and any combination thereof. In addition to supporting styles for specific entities, AnatomyForm also supports a default case via the use of an underscore. The following example shows the left humerus in red and everything else transparent and backlit, giving an X-ray-like appearance.
PlotRange can make use of entities to constrain what would otherwise include all of the referenced entities. The following example includes several bones of the left lower limb, but the PlotRange is centered on the left patella and padded out from there by a fixed amount.
SkinStyle is a convenient way to include any enclosing skin that can be found around the specified entities.
The default styling can be overridden.
You can use ClipPlanes to peel away layers of skin.
Use multiple clip planes to peel away successive anatomical layers.
Apply geometric transformations to anatomical structures to rotate them. The following example includes many bones in the skull, but applies a rotation to just the elements of the lower jaw around the temporomandibular joint.
A mix of styles can be useful for highlighting different tissue types in the head.
A similar approach can be used in the torso for different organs.
Here is an advanced example showing the use of ClipPlanes to remove muscles below a specific cutting plane.
Inner structures can be differentiated using styles, in this case within the brain.
Here are links to some animations rendered using AnatomyPlot3D:
Scaling Transform Applied to the Bones of the Skull
As time goes on, we will continue to add additional models and tools allowing you to explore human anatomy more deeply.
To download this post as a CDF, click here. New to CDF? Get your copy for free with this one-time download.
]]>Until now, it has been difficult for the average engineer to perform simple vibration analysis. The initial cost for simple equipment, including software, may be several thousand dollars—and it is not unusual for advanced equipment and software to cost ten times as much. Normally, a vibration specialist starts an investigation with a hammer impact test. An accelerometer is mounted on a structure, and a special impact hammer is used to excite the structure at several locations in the simplest and most common form of hammer impact testing. The accelerometer and hammer-force signals are recorded. Modal analysis is then used to get a preliminary understanding of the behavior of the system. The minimum equipment requirements for such a test are an accelerometer, an impact hammer, amplifiers, a signal recorder and analysis software.
I’ve figured out how to use the Wolfram Language on my smartphone to sample and analyze machine vibration and noise, and to perform surprisingly good vibration analysis. I’ll show you how, and give you some simple Wolfram Language code to get you started.
Throughout the history of the development of machines, vibration and sound measurements have been important issues. There are two reasons for this:
The many applications of vibration analysis have led to a huge number engineers studying this subject in universities all over the world. The research area of “machine vibrations” has its own conferences and publications. Companies specialize in developing different kinds of equipment and services. Most large machine-building industries have departments specializing in vibrations.
So here we go: recording the sound of a vibrating machine with an iPhone is simple. I used the iPhone’s built-in digital voice recorder, Voice Memo. The recording can be converted to an MP3 file just by saving it in the MP3 format in iTunes. The default Apple M4A format cannot be used directly in the Wolfram Language. If iTunes is not available, there are a lot of free converters on the internet that will change your M4A files to MP3 files.
I’ll use an industrial gearbox as an example. The example itself is not that important, but it does suggest possible areas where the method can be used. Some data and dimensions have been modified from the original application.
A gearbox is making a lot of noise. What is the problem?
In the figure below, a motor drives the input shaft. The shaft speed is reduced by the gearbox. The power is used by the output shaft. Typically, the motor is a diesel or electrical engine; in this example, it is a diesel engine. The number of teeth are z1 = 23, z2 = 65, z3 = 27 and z4 = 43, respectively. The largest wheels are about one meter in diameter.
We ran the engine at 1200 rpm, recording five seconds of sound with my iPhone. Converted to MP3, the sound file was named “measurement.mp3”. Then all I needed to do was import it into the Wolfram Language to plot the frequency spectrum.
The excitation frequency in Hertz of the gear contact on the input shaft is z1*rpm/60, and
(z1/z2)*z3 *rpm/60 on the output shaft. Marking the histogram for the input and output shafts’ gear mesh excitation frequencies—red and green, respectively—makes it clear that these frequencies are correlated with the spectrogram.
Let’s make the spectrogram interactive: often we don’t want to use the whole sound file, so we add an option to select a start and end time within the file. Let’s also make it possible to change the rpm.
The analysis toolkit is ready. The options PlotRange, MaxValue and Manipulate in the plot above are set manually. Of course, this can be developed further. But we stop here to keep it simple.
So what happened with the real application investigation? Well, the same analysis as above was performed over the whole rpm region. The maxima of the peaks at each rpm are plotted below.
The input shaft’s highest value is 747.7 rpm, and the output shaft’s is 1,800 rpm. Both become excited at about 287 Hz, the gearbox fundamental resonance frequency. Note that
747.7/60*23 = 286.6 Hz and 1800*23/65*27/60 = 286.6 Hz
We concluded that the gear mesh was not optimized for smooth running and the gearbox had a bad resonance frequency. We opened the gearbox and were able to confirm wear on the teeth, which suggested possibilities for improving the contact pattern. We improved contact by selecting an optimum helical angle, as well as tip relief and correct crowning. Tip relief is a surface modification of a tooth profile; a small amount of material is removed near the tip of the gear tooth. Crowning is a surface modification in the lengthwise direction to prevent contact at the teeth ends, where a small amount of material is removed near the end of the gear tooth.
I have used this method, utilizing my smartphone and the Wolfram Language, several times for real-world and often complex investigations and applications. Often, measurement specialists have already gotten involved before I arrive. But they may have missed the basics because they are using comprehensive measurement programs.
The method I describe here may sometimes yield a similar—or even better—understanding of the problem in just a few minutes at no cost. Well worth trying.
]]>For the past couple of years, I’ve been playing with, collecting and analyzing data from used car auctions in my free time with an automotive journalist named Steve Lang to try and get an idea of what the used car market looks like in terms of long-term vehicle reliability. I figured it was about time that I showed off some of the ways that the Wolfram Language has allowed us to parse through information on over one million vehicles (and counting).
I’ll start off by saying that there isn’t anything terribly elaborate about the process we’re using to collect and analyze the information on these vehicles; it’s mostly a process of reading in reports from our data provider (and cleaning up the data), and then cross-referencing that data with various automotive APIs to get additional information. This data then gets dumped into a database that we use for our analysis, but having all of the tools we need built into the Wolfram Language makes the entire operation something that can be scripted—which greatly streamlines the process. I’ll have to skip over some of the details or this will be a very long post, but I’ll try to cover most of the key elements.
The data we get comes in from a third-party provider that manages used car auctions around the country (unfortunately, our licensing agreement doesn’t allow me to share the data right now), but it’s not very computable at first (the data comes in as a text file report once a week):
Fortunately, parsing this sort of log-like data into individual records is easy in the Wolfram Language using basic string patterns:
Then it’s mostly a matter of cleaning up the individual records into something more standardized (I’ll spare you some of the hacky details due to artifacts in the data feed). You’ll end up with something like the following:
From there, we use the handy Edmunds vehicle API to get more information on an individual vehicle using their VIN decoder:
We then insert the records into an HSQL database (conveniently included with Mathematica), resulting in an easy way to search for the records we want:
From there, we can take a quick look at metrics using larger datasets, such as the number of transmission issues for a given set of vehicles for different model years:
Or a histogram of those issues broken down by vehicle mileage:
It also lets us look at industry-wide trends, so we can develop a baseline for what the expected rate of defects for an average vehicle (or vehicle of a certain class) should be:
We can then compare a given vehicle to that model:
We then use that model, as well as other information, to generate a statistical index. We use that index to give vehicles an overall quality rating based on their historical reliability, which ranges from a score of 0 (chronic reliability issues) to 100 (exceptional reliability), with the industry average hovering right around 50:
We also use various gauges to put together informative visualizations of defect rates and the overall quality:
There is a lot more we do to pull all of this together (like the Wolfram Language templating we use to generate the HTML pages and reports), and honestly, there is a whole lot more we could do (my background in statistics is pretty limited, so most of this is pretty rudimentary, and I’m sure others here may already have ideas for improvements in presentation for some of this data). If you’d like to take a look at the site, it’s freely available (Steve has a nice introduction to the site here, and he also writes articles for the page related to practical uses for our findings).
Our original site was called the Long-Term Quality Index, which is still live but showed off my lack of experience in HTML development, so we recently rolled out our newer, WordPress-based venture Dashboard Light, which also includes insights from our auto journalist on his experiences running an independent, used car dealership.
This is essentially a two-man project that Steve and I handle in our (limited) free time, and we’re still getting a handle on presenting the data in a useful way, so if anyone has any suggestions or questions about our methodology, feel free to reach out to us.
Cheers!
Continue the conversation at Wolfram Community.
]]>This is where Wolfram comes in. Our UK-based Technical Services Team worked with the British NHS to help solve a specific problem facing the NHS—one many organizations will recognize: data sitting in siloed databases, with limited analysis algorithms on offer. They wanted to see if it was possible to pull together multiple data sources, combining off-the-shelf clinical databases with the hospital trusts’ bespoke offerings and mine them for signals. We set out to help them answer questions like “Can the number of slips, trips and falls in hospitals be reduced?”
I was assigned by Wolfram to lead the analysis. The databases I was given consisted of about six years’ worth of anonymized data, just over 120 million patient records. It contained a mixture of aggregate averages and patient-level daily observations, drawn from four different databases. While Mathematica is not a database, it has the ability to interface with them easily. I was able to plug into the SQL databases and pull in data from Excel, CSV and text files as needed, allowing us to inspect and streamline the data.
Working closely with a steering committee comprising healthcare professionals, academics and patients, we identified a range of parameters to investigate, including the level of nurse staffing and training, average patient heart rate and the rate of patients suffering from slips and falls. Altogether, the team identified around 1,000 parameter pairings to investigate, far too many to work through by hand in the limited time available.
Some of the tools in the Wolfram Language that made this achievable include:
These tools enabled us to rapidly scale up the analysis across this complex dataset, allowing more time to consider the validity of the relationships and signals that emerged. Some of these seemed obvious—wards where patients were more likely to be bed-bound for medical reasons had fewer falls. But not all the signals were this easy to explain. For example, an increase in the number of nurses appeared to be linked to an increase in falls.
This observation seemed surprising. Given that there is little variation in ward size, it seemed unlikely that more nurses would lead to a decrease in patient safety. But not all nurses are equivalent. When we considered the ratio of registered nurses to healthcare support workers, we saw a strong relationship between the increase in highly trained registered nurses and the increase in patient safety.
So we see an increase in falls in some wards that rely more heavily on healthcare support workers. Could these wards be forced to rely on these less qualified, lower-paid nurses when in truth fully licensed, registered nurses are needed? I can only speculate, and the data at this stage is insufficient to answer this question. But following this analysis, the hospital trust in question has changed its staffing policy to increase the level of registered-nurse employment. Whether it leads to an increase in patient safety or a new issue raises its head—we will have to wait and see.
For the full findings, see the paper published this week in BMJ Open.
This project has only started to scrape the surface of the complexities hidden inside this rich dataset. In a mere 10 days, relying on the flexibility designed into the Wolfram Language, we’re able to deliver some insight into this complex problem.
Contact the Wolfram Technical Services group to discuss your data science or coding projects.
]]>Participants in the competition submit 128 or fewer tweetable characters of Wolfram Language code to perform the most impressive computation they can dream up. We had a bumper crop of entries this year that showed the surprising power of the Wolfram Language. You might think that after decades of experience creating and developing with the Wolfram Language, we at Wolfram Research would have seen and thought of it all. But every year our conference attendees surprise us. Read on to see the amazing effects you can achieve with a tweet of Wolfram Language code.
Amy calls this homage to the 2016 Nobel Laureate in Literature her contribution to “the nascent field of Bob Dylan analytics.” She writes further, “I started teaching myself how to code in the Wolfram Language yesterday after breakfast, with the full encouragement of my son and aided solely by Stephen Wolfram’s Elementary Introduction to the Wolfram Language.”
Amy’s helpful son, Jesse, is the youngest-ever prize winner in our One-Liner Competition. In 2014, at the age of 13, he took second place.
(faster than actual speed)
Order proceeds from chaos in this hypnotic simulation that appealed to the judges’ inner physicists. Points evenly distributed in a spherical volume slowly evolve thread-like structures as they migrate toward target points.
This impressively compact implementation of a smooth transition between map projections gave the judges an “Aha!” moment as they perceived the relationship between orthographic and Mercator projections. Stephan’s key insight in producing a submission that is graphically engaging as well as instructive is that the structure of a map’s geometric data is the same, regardless of the projection.
Manuel writes that he generated this graphic as a quilt pattern for his girlfriend. The judges were impressed by its combination of repetition and variety. No word yet on whether Manuel’s girlfriend has succeeded in assembling the 6,000 quilt squares cut from 645 different colors of fabric.
Achieving this graphically appealing image required some clever coding tricks from George, including factoring out the function slot c and naming the range a so that it could be compactly reused. Binning the points generated by an iterated function and plotting the log of the bin counts yields the refined graphical treatment in the result.
Starting with three million digits of the transcendental number E, David’s deft application of a series of image processing functions yields this visual representation of the randomness of the digits.
A timely entry, given that Halloween—celebrated in the United States with pumpkins—was little more than a week after the conference. Abby takes skillful advantage of the Wolfram Language’s default plotting colors. In a plot of multiple functions, pumpkin orange is the first default color. The second is blue, which isn’t appropriate for a pumpkin’s stem. But by bumping the stem function to third place with Nothing, Abby achieved a green stem and squeaked in just under the 128-character limit.
Thanks to Philip, you no longer have to travel to Disneyland to get your mouse ears. All you need is the Wolfram Language!
It was slightly embarrassing to have to award a (Dis)Honorable Mention to one of our distinguished Innovation Award winners. But Richard’s helpful two-minute timer drove the judges nuts with its incessant counting and prompted them to warn each other not to evaluate that one.
I must point out its ingenious construction, though, which Richard helpfully illustrated with this image:
The third-place prize went to two of Abby Brown’s high-school students, whom she brought to the conference to present work they had done in her Advanced Topics in Mathematics class (taught with the Wolfram Language). Shishir and Alex made an amusing video transformation that, in real time, pastes the face of the person on the left onto the person on the right, making them virtual twins.
Snapchat, watch out. Here comes Mathematica!
Michael Sollami took second place with this unusual and visually stunning application of the neural net functionality that debuted in Version 11.
After viewing the animation for a short time, the judges were glassy-eyed and chanting in unison, “Second place! Second place! …”. Dunno, Michael. Bug in your code somewhere?
(5x actual speed)
Philip Maymin’s winning entry packs an impressive load of functionality into 128 characters of code. Not only does it implement a complete and thoroughly playable game of solitaire Pong (“Shorter, rounder and more fun than the original.”), it encourages you to play dangerously by rewarding you with bonus points if you almost let the “ball” escape before swooping in to deflect it.
A brilliant and creative combination of features implemented concisely with complex arithmetic, the game nearly derailed the One-Liner judges, who had to be reminded to stop playing Pong and get back to work.
There were many more impressive contributions than we had time to recognize in the awards ceremony. You can see all of the submissions in this signed CDF. (New to CDF? Get your copy for free with this one-time download.) There’s a wealth of good ideas to take away for anyone willing to invest a little time understanding the code.
Thanks to all who participated and impressed us with their coding chops and creativity. Come again next year!
]]>As a first-timer from the Wolfram Blog Team attending the Technology Conference, I wanted to share with you some of the highlights for me—making new friends, watching Wolfram Language experts code and seeing what the Wolfram family has been up to around the world this past year.
I was only able to attend one talk at a time, and with over a hundred talks going on over three days, there was no way I could see everything—but what I saw, I loved. Tuesday evening, Stephen Wolfram kicked off the event with his fantastic keynote presentation, giving an overview of the present and future of Wolfram Research, demoing live the new features of the Wolfram Language and setting the stage for the rest of the conference.
The nice thing about the Technology Conference is that if you’ve had a burning question about how something in the Wolfram Language works, you won’t get a better opportunity to ask the developers face to face. When someone in the audience asked about storing chemical data, the panel asked, “Is Michael Trott in the room?” And sure enough, Michael Trott was sitting a few seats down from me, and he stood up and addressed the question. Now that’s convenient.
Probably my favorite speaker was Igor Bakshee, a senior research associate here at Wolfram. He described our new publish-subscribe service, the Channel Framework, which allows asynchronous communication between Wolfram systems without dealing with the details of specific senders and receivers. I especially appreciated Igor’s humor and patience as messages came in from someone in the audience: he raised his hands and insisted it was indeed someone else sending them.
This talk was the one I was most looking forward to, and it was exactly what I wanted. Jakub Kabala talked about how he used Mathematica to compare 12th-century Latin texts in his search to determine if the monk of Lido and Gallus Anonymus were actually the same author. Jakub’s talk will also be in our upcoming virtual conference, so be sure to check that out!
It would be downright silly of me to not mention the extremely memorable duo Thomas Carpenter and Daniel “Scantron” Reynolds. The team used Wolfram Language code and JLink to infuse traditional disc jockey and video jockey art with abstract mathematics and visualizations. The experience was made complete when Daniel passed special glasses throughout the audience.
We had the best Wolfram Language programmers all in one place, so of course there had to be competitions! This included both our annual One-Liner Competition and our first after-hours live coding competition on Wednesday night. Phil Maymin won both competitions. Incidentally, in between winning competitions, Phil also gave an energetic presentation, “Sports and eSports Analytics with the Wolfram Language.” Thanks to everyone who participated. Be sure to check out our upcoming blog post on the One-Liner Competition.
Thursday night at Stephen’s Keynote Dinner, six Wolfram Innovator Awards were given out. The Wolfram Innovator Award is our opportunity to recognize people and organizations that have helped bring Wolfram technologies into use around the world. Congratulations again to this year’s recipients, Bryan Minor, Richard Scott, Brian Kanze, Samer Adeeb, Maik Meusel and Ruth Dover!
Like many Wolfram employees around the world, I usually work remote, so a big reason I was eager to go to the Wolfram Technology Conference was to meet people! I got to meet coworkers that I normally only email or talk on the phone with, and I got to speak with people who actually use our technologies and hear what they’ve been up to. After almost every talk, I’d see people shaking hands, trading business cards and exchanging ideas. It was easy to be social at the Technology Conference—everyone there shared an interest in and passion for Wolfram technologies, and the fun was figuring out what that passion was. And Wolfram gave everyone plenty of opportunities for networking and socializing, with lunches, dinners and meet-ups throughout the conference.
Attending the Wolfram Technology Conference has been the highlight of my year. The speakers were great across the board, and a special thanks goes to the technical support team that dealt with network and display issues in stride. I strongly encourage everyone interested in Wolfram technologies to register for next year’s conference, and if you bump into me, please feel free to say hi!
]]>