AI and the Wolfram Language Work toward Partial Automation in the Search for Cancer
As more technology is folded into medical environments all over the world, Wolfram’s European branch has taken on work with the United Kingdom’s National Health Service (NHS) in an effort to partially automate the process of cancer diagnosis. The task is to use machine learning to avoid checking thousands of similar-looking images of people’s insides by hand for signs of cancer.
In the modern age, we have computers to take a lot of intellectual drudgery off our hands, but not all of it. Everyone knows what it’s like to have to do something that’s really important to get right, but also really time-consuming. Sometimes the work can be split up among many people, but often it has to be consistently and thoroughly accomplished by one expert in particular. With image analysis for signs of cancer, if you also happen to be the most qualified person to do the job, you cannot ask someone else to take over—nor can you go on autopilot. Even if you’re very motivated by the importance of your task, the boredom will get you eventually, making it more and more difficult to maintain the level of quality that the job warrants. It’s just human nature.
For example: Famously, in the year 1873 the amateur mathematician William Shanks (1812–1882) had calculated π to an unprecedented 707 decimal places—calculating mathematical constants was a hobby of his. Unfortunately, when a mechanical calculator was used to check his results 71 years later, it turned out that only the first 527 were correct. If even someone as highly motivated as Shanks can make mistakes in repetitive tasks, anyone can.
Computing π is something we can safely let a computer handle because it will always outperform a human. However, some jobs can only be automated with machine learning algorithms, which cannot guarantee correct results. So we’re back to the dilemma we started with: what do we do with tasks that are both very important and very tedious?
Hours of Looking at Insides
In a Wolfram Technical Services project with the NHS and their service provider CorporateHealth International—funded by Innovate UK—we are exploring a way to review videos of the insides of people to check them for signs of bowel cancer. These videos are made by a pill camera that travels through your digestive system and continuously sends images to a recorder you wear on your body. The procedure, as explained in this video about the HI-CAP Project and another video about data science in endoscopy, is significantly easier, cheaper and more comfortable than going to the hospital to have a surgeon poke around your insides with an endoscope. For this reason and others, it has the potential to save many lives by detecting tumors early when they can still be treated easily.
The ease of gathering the data does not directly translate to ease of analysis. Each video consists of thousands of frames, and some polyps or tumors will only appear on a single frame and may not even stand out from the background all that much. This means that a small army of nurses—employed by CorporateHealth International—is currently needed to analyze every single frame of each video, which is a laborious process, as you can imagine.
To alleviate this workload, we work together with the Computer Vision group from the University of Barcelona, where neural networks are being developed for exactly this task of polyp identification. Currently, this network has been implemented in TensorFlow, but we plan to port it over to the Wolfram neural network framework (using some intermediary format like ONNX) to make it part of a larger data-processing pipeline for pill camera videos.
Trusting AI Results
It is not enough to simply train a network and test it on a validation set before it can be put into practice. If the people who actually have to review the videos (and therefore bear responsibility for that analysis) are not convinced of the quality of the computer’s results, they will double-check everything by hand regardless, or even just return to the tools they are currently using. You can’t blame them for wanting to be thorough.
For this reason, we are experimenting with different ways to present computer results to nurses, allowing corrections where necessary. This means playing around with the order in which the frames are presented (e.g., chronological vs. ordering by classification); how the computer classification is presented (a number, a class, a heat map on the image, etc.); and what kind of actions the nurse can take to correct the result so it can then be fixed in the next training round of the AI.
The goal is to use the Wolfram dynamic interactivity language to build a tool that allows users to slowly build experience in such a way that they start trusting AI results more and more—in particular, the parts of the video where a computer indicates no risk factors. If a few frames are unjustly highlighted as polyps because it’s a little overcautious, it’s not much work to correct the result manually. On the other hand, if the AI tells the user that 99% of the video is free of polyps and the user doesn’t trust that verdict, they will still check the entire video and the addition of an AI to the process will not have saved much time at all.
The Work Ahead
In complex tasks like polyp detection, computers cannot provide completely authoritative computations like the digits of π; their role is closer to that of a second opinion from another specialist. Unlike other specialists, though, we cannot directly communicate with a computer and ask it why it made a certain decision. The computer is a sort of “silent expert,” if you will. While the technology is promising, it is still a work in progress with questions yet to be explored. The best we can do is to interrogate the internals of the neural network to try and understand how it works, making it important to think carefully about how this silent expert is incorporated into a decision-making process that ultimately affects people’s lives.