Wolfram Computation Meets Knowledge

Finding the Most Unhygienic Food in the UK

The UK, like many other countries, runs a food hygiene inspection system that tries to ensure that establishments with poor hygiene standards improve or are shut down. As is often the case, the data collected for operational reasons can provide a rich source of insight when viewed as a whole.

Questions like “Where in the UK has the poorest food hygiene?”, “What kinds of places are the most unhygienic?”, and “What kinds of food are the most unhygienic?” spring to mind. I thought I would apply Mathematica and a little basic data science and provide the answers.

The collected data, over half a million records, is updated daily and is openly available from an API, but this API seems to be targeted at performing individual lookups, so I found it more efficient to import the 414 files from this site instead.

All data in this blog post was extracted on July 15, 2016. You can find the Wolfram Language code for importing and other utilities in the CDF at the foot of this post.

Oxford

As a warmup, I started with somewhere I knew, so here is the Dataset representing my local city of Oxford.

Importing Oxford food rating data
Oxford food rating dataset

There are 1,285 places to buy food in Oxford, and the rating scheme grades them from 0 (“Urgent improvement necessary”) through 5 (“Very good”).

We can throw the ratings onto a map of Oxford and see, as I would expect, concentrations of establishments around the tourist center and along major arterial roads.

Mapping ratings for places to buy food on Oxford map

Map of places to buy food in Oxford

We can see that the vast majority are rated 4 or 5 (in green). We should only be concerned about the 0, 1, and 2 ratings (“Urgent improvement necessary”, “Major improvement necessary”, and “Improvement necessary”), so let’s look at just those.

Mapping places to buy food rated with 0, 1, and 2

Mapping places to buy food rated with 0, 1, and 2

There are obvious clusters in the center (where all the tourists go) and along Cowley Road (leading to Temple Cowley), which is where a lot of students live. But these also have lots of good establishments. So to normalize for that, we must find the average rating for a location. Since no two establishments are in exactly the same place, I need to create a function that collects all the data within a certain distance of a geo position and finds the average rating.

Creating a function to collect all data within a certain distance of a geo position

We can now run that function over the entire map grid to create a moving average value of hygiene. I have used 0.4 miles for the averaging disk, which is large enough to collect quite a few establishments at a time but small enough to avoid blurring the whole city together.

Running previous function over the map grid to create a moving average value of hygiene

Running previous function over the map grid to create a moving average value of hygiene

My initial intuition proved right. Cowley Road and the area between the city center and the station are areas of poor average hygiene, but there is also a hotspot in the southwest that I can’t explain. The best average hygiene is in the north, Walton Manor to Summertown, which are the expensive parts of Oxford and the Headington area.

Which councils are failing to protect us?

I am happy that the data is plausible and I have understood it, but there is another issue we must consider before going for our answers: data quality. While the Food Hygiene Rating Scheme is controlled by the national Food Standards Agency, it is operated by over 400 different local authorities. Are they all doing a consistent job? One of the promised benefits of open data is that we can hold our governments accountable—so let’s do that. This is the kind of analysis that I hope central government is doing too.

We can easily look at who is on top of the workload by counting the fraction of businesses that are not yet rated.

Unrated establishments

Unrated establishments

So if you eat out in North Norfolk, you might be nervous to discover that nearly 25% of establishments have never been inspected.

Suspicious in another way is that around a third of the authorities have inspected every business. That would be great if it were true, but since new businesses must open regularly, you would expect to find a few that are awaiting inspection, so this may just indicate that these authorities don’t record (or perhaps even know about) new establishments until they are inspected.

Eateries awaiting inspection

Time since rating

We can also see how often the average establishment is inspected. The best authorities inspect establishments at least once per year.

How often the average establishment is inspected

But alarmingly, Croydon has an average time since inspection of over 3.5 years. A lot can change in that time.

Average time since last rating from longest to shortest

I can’t see an easy way to measure if the different authorities are applying the rules in a consistent way when they do inspect, so I am just going to have to trust that the values are equivalent.

Regional differences

So back to our original questions. First I am going to throw out all data that does not have a numerical rating. Unfortunately, this excludes Scotland, which runs a different scheme that provides only a pass-or-fail-type conclusion.

Removing data with no numerical rating

We still have plenty of data to work with…

Amount of data that can used

The good news is that most establishments are “Good” or “Very good.”

Histogram of rating value

The average rating value across the country is 4.37.

Average rating across the country

Here is a quick map of all the 0-rated establishments in the country.

Map of 0-rated estabilshments in the country

Map of 0-rated establishments in the country

The easiest way to group the data is by the local authority that collected it, since that is stored in every record. By that measure, Newham in London is the worst, with an average rating of 3.4.

Grouping data by the local authority that collected it

And the best is Torridge in Devon at 4.86.

Grouping data by the local authority that collected it, starting with the best rating

But we can use the "PostCode" key to be much more precise. A full UK postcode is shared by around 15 properties. That is too fine grained, as we will find a lot of postcodes with only one restaurant. We need a collection to infer anything about a neighborhood, so I will use only the first part of the postcode, and throw out all postcodes that do not contain at least 10 establishments.

Using PostCode key

Finally, I hooked up a postcode API to translate back from the partial postcode to a location name.

The result puts E13, in East London, at the bottom of the list, with adjacent postcodes E12, E7, E8, and E15 also on the list. Indeed, nearly all of the worst postcodes are parts of London, apart from a few Birmingham postcodes.

Translating partial postcode to a location name

Topping the best hygiene-rated postcodes is Craigavon in Northern Ireland, with a perfect score.

Best hygiene-rated postcodes

Regional trends

Can we infer some long-distance trends? For the whole country, we have lots of data and are not looking for very small features. There is a much faster method than the one I used on Oxford. Essentially, by aggregating over square regions rather than circular, I can round each geo position once, rather than having to test it repeatedly for membership of the region. I round all the locations to the nearest 20 miles and then aggregate all the points that now share the same location. I then repeat the process, shifting the box centers by 5 miles to create a moving average square. The Wolfram Language code is attached in a CDF at the bottom of the blog. Here is the result.

UK contour map

So there is an unhygienic center in London (as we already saw) that spreads toward Birmingham (going around north Oxfordshire) before turning east at Manchester until it reaches Hull. There is another notable low area in South Wales around, but not centered on, Cardiff. Generally, rural areas appear to be more hygienic, particularly North Devon, North Wales, and East Cumbria.

What kind of establishments are least hygienic?

Enough regional anthropology. Let’s consider what kind of food is safe. The analysis of the "BusinessType" key is reassuringly predictable. Fast food is the worst; schools and hospitals are the best.

Average rating by business type

We can drill deeper by inferring something about the food from the business name. Here is a function to measure the average hygiene rating for all establishments containing a particular word.

Function to measure the average hygiene rating for all establishments containing a particular word

To reduce the search and ensure enough data for conclusions, I will pick out a list of all words that appear in at least 100 different business names.

Reducing to words that appear in at least 100 different business names

And now for each word, we calculate the average rating for businesses using that word in their names.

Average rating for businesses using that word in their names

Amusingly, “lucky” appears on the list of the worst word associations. The worst is “halal.” With the exception of Dixy (which appears to mostly be linked to a chain), they are words associated with small, independent businesses.

Words and their average ratings

We can see it more easily, though less precisely, as a WordCloud of the 80 worst-rated words.

80 worst-rated words WordCloud

The words associated with the best ratings are mostly large chains, who presumably can put more effort into good management processes. At the top of the list is the Japanese-inspired restaurant chain Wagamama, followed by upmarket supermarket chain Waitrose. There are also some school- and hospital-related words.

Words with best ratings

WordCloud for 80 best-rated words

Of course, none of this necessarily has anything to do with how good the food tastes, and it is unproven whether there is any link between satisfying the food inspectors and making safe food.

If you really care about food hygiene, then the best advice is probably just to never be rude to the waiter until after you have gotten your food!

Download this post as a Computable Document Format (CDF) file. New to CDF? Get your copy for free with this one-time download.

Comments

Join the discussion

!Please enter your comment (at least 5 characters).

!Please enter your name.

!Please enter a valid email address.

9 comments

  1. Jon

    I could have saved you a lot of trouble in finding the Most Unhygienic Food in the UK: it was in the local deli just around the corner from the fellows cottages at Warwick university when I was living there in 1992. I am sure it is now part of the UK’s bio-weapons defense laboratory.

    Michael

    Reply
  2. Why’d you cut Northern Ireland out in the regional trends section?

    Reply
  3. It sounds like you double-counted the Welsh food outlets. There are 326 (England) + 32 (Scotland) + 11 (NI) + 22 (Wales) = 391 local authorities in the UK, plus River Tees which is treated separately for some reason. If you downloaded 414 files you probably downloaded both the Welsh language and the English language versions of the Wales files. This is backed up by the fact that some Welsh councils appear twice above (e.g. Sir y Fflint = Flintshire).

    This has probably skewed your stats somewhat. Other than that it’s a great piece of analysis. The three Welsh words in the final table are also school-related: ysgol=school; gynradd=primary; meithrinfa=nursery.

    Reply
  4. I didn’t think of Nearest. Might be more efficient than my approach.
    Yes. The scroll bars (and other point and click features) are being added to Dataset shortly.

    Reply
  5. On reflect, using Nearest is a different thing. I have done a moving average by spatial distance, not by number of data points. Using Nearest with a single point would not be a moving average, but more of a Voronoi diagram (order zero interpolation) and using a larger number of points would give a variable distance moving average. It would give higher spacial resolution where the data was dense, in cities, and lower resolution in the countryside. For the country map that isn’t useful, but if we wanted a map that showed detail in cities and coverered country that would be great. From experiance 100m in a city can put you in a different neighbourhood with completely different socio-economic characteristics, but in the country you have to travel miles before you notice any difference. So Nearest is probably better, but not necessarily for performance reasons. But it is different.

    Reply