One of the biggest misconceptions in AI today is how well it can actually predict things – especially things that are rare.
This is most directly applicable to Machine Learning (as they are just statistical models) but the same principle applies to LLMs. The fundamental problem is the same and AI is not magic.
In reality, AI’s predictive power is more complicated. One of the key challenges? False positives—incorrect detections that can significantly undermine the value of AI-driven decision-making. Let’s explore why this happens and how businesses can better understand AI’s limitations.
The Zombie Test: A Thought Experiment in AI Predictive Power
Let’s say there is a zombie disease going round. About one in one thousand people are infected and will turn into murderous zombies if not stopped.
Thankfully, we have a test that is 99% accurate. Given that, if the test finds that you are a zombie – what are the chances that you are actually a zombie? 99%? If so, we can probably put you down, just to be sure.
In reality, it is less than 10%… So, in nine cases out of ten, when the test says you are a zombie, you are not.
Wait, what?
Let’s work this through.
Firstly, when I say the test is 99% “accurate” then that is a simplification. For medical tests, we distinguish between Sensitivity (how good is the test at detecting it if you do have the disease – what we call True Positives) and Specificity (how good is the test at detecting you do not have the disease – what we call True Negatives).
In AI we use the terms Precision and Recall which cover the same concepts, except one of them is measured “upside down” compared to the medical terms. That is not important, here, the principles are what matter. I learned Sensitivity and Specificity first, so I tend to be more comfortable with those terms, but this applies exactly the same to Precision and Recall, you just need to invert some of the percentages.
Let’s say we are testing 100,000 people in our little scenario. We know that there are 100 zombies in that population, because the incident rate is one-in-a-thousand. The Sensitivity of our test is 99% so we will detect 99 of those. In other words, we get 99 True Positives and 1 False Negative.
Of the other 99,900 people who are not zombies, we will detect that 99% of those are not zombies – so we get 98,901 True Negatives. But we will detect that 1% - 999 False Positives – are zombies, even though they are not.
Putting all that together, we detected a total of 1,098 zombies. But only 99 of them are actual zombies. That means that if you get detected as being a zombie – there is only about a 9% chance of you actually being a zombie. And this is a very, very accurate test. 99% Sensitivity and Specificity is extremely good.
The key reason for this is that “being a zombie” is rare in our example. If the incident rate was 10% or 50%, the results would be much more reliable.
Note: In AI applications, the Precision here would be 9%; It is measured as the likelihood that a “True” result is actually True.
Why This Matters for AI in Business
This example isn’t just about zombies. It applies to many real-world AI use cases where we try to predict rare events, such as:
- Identifying fraudulent transactions.
- Detecting at-risk employees for long-term sick leave.
- Predicting criminal reoffending risks.
The challenge? When you’re predicting something rare, false positives will dominate your results. In business, this means wasted resources chasing incorrect predictions.
For example, if an AI system flags employees at risk of burnout, but 98% of flagged employees are actually fine, your business might take unnecessary actions—wasting time, money, and potentially harming employee morale.
Three Key Variables to Consider in AI Predictions
To use AI effectively in predictive scenarios, businesses need to consider three critical factors:
- Incident Rate (How common is the thing you’re predicting?)
The rarer the event, the higher the proportion of false positives. If you’re trying to detect something that happens rarely, expect a lot of false alarms. - Sensitivity / Recall (How good is the AI at detecting real cases?)
A high sensitivity means AI will catch more true cases but might also generate more false positives if not balanced with high specificity. - Specificity / Precision (How good is AI at avoiding false alarms?)
This is often overlooked but is crucial in real-world applications. If AI isn’t selective enough, it will create too many false positives, making its predictions unreliable. The way “Precision” is measured in AI applications is superior to Specificity, as Precision takes into account the incident rate (though this can also make it harder to determine in a test environment).
The Trade-off: Sensitivity vs. Specificity
There’s always a trade-off between detecting all possible cases (Sensitivity) and avoiding false alarms (Specificity):
- Fire Alarms: You want them to be highly sensitive, even if that means some false alarms (burnt toast triggering it). Better safe than sorry.
- Criminal Justice: We’d rather let a guilty person go free than imprison an innocent one. Here, Specificity is more important than Sensitivity.
The same logic applies to AI. Depending on the application, you need to decide whether avoiding false positives or catching all true cases is more important.
Conclusion: AI Isn’t Magic—It’s a Tool That Needs Careful Design
AI is a powerful tool, but its predictions need to be carefully evaluated, especially in business contexts. A 99% accuracy claim can still mean an AI system is effectively useless if it produces too many false positives.
At NewOrbit, we specialise in AI solutions that work in real-world conditions—helping businesses design AI systems that balance accuracy, false positives, and practical decision-making.
Want to learn more about how we approach AI?
Check out our AI Expertise page and see how we can help you build AI solutions that actually work.