Table of Contents
TL;DR: An edge case is any situation outside the pattern an AI was trained on. Machine learning systems are structurally incapable of handling them, because the system has never seen them before and has no judgment about what to do when it doesn’t know. Edge cases cost a wrong answer at small scale, a lawsuit at medium scale, and a death at large scale. The only fix is a human in the loop wherever the cost of a single failure exceeds the convenience of automation how I keep a human in the loop. Map your edge cases before you deploy, not after a customer or a regulator finds them for you.
An edge case is the part of the job your AI doesn’t know how to do.
Not the part it does badly. The part it has never seen before, has no example of in its training data, and has no way to recognize that it’s in trouble. The part where the system gives an answer with full confidence and no idea that the answer is wrong.
Edge cases are not a bug in your AI rollout. They are the structural feature of every machine learning system that has ever shipped, and they are the reason most AI deployments either fail outright or quietly hurt the business that paid for them.
What an edge case actually is
Machine learning works by finding patterns in examples. For more, see mit research shows ChatGPT weakens your brain — a profession. You show the system a million pictures of cats labeled “cat” and a million pictures of dogs labeled “dog,” and it learns to tell them apart with high accuracy. The system isn’t reasoning about what makes a cat a cat. For more, see AI and writing. It’s matching patterns. When a new picture comes in, the system asks: does this look more like the cat pattern or the dog pattern?
That’s the entire mechanism. It works beautifully when the new picture is, in fact, a cat or a dog.
The edge case is the picture that’s a fox, a ferret, a stuffed animal, a cat-shaped cloud, or a dog in a cat costume. The system has never been shown those examples. It still has to give an answer. So it picks one, with full confidence, and gets it wrong.
Now scale that mechanism up to the actual jobs companies are deploying AI for. A customer service bot trained on thousands of routine billing questions hits an angry customer whose card was charged twice for a flight that got canceled, and the bot has no example of what to do. A legal AI trained on standard contracts hits a clause it has never seen. A medical AI trained on common presentations hits a rare condition that mimics a common one. Each system gives an answer, with full confidence, and gets it wrong.
Why no amount of training fixes this
The obvious response is: just train the system on more examples. If the AI is failing on angry customers, train it on more angry customers. If it’s failing on rare medical conditions, train it on more rare conditions.
This works, partially, on the cases you can predict. The problem is that the edge cases you can predict aren’t really the edge. The actual edge is the case nobody on the team thought to put in the training set, because nobody could imagine it.
And the world produces those constantly. Every new product, every new customer demographic, every new regulation, every new fraud pattern, every new emotional state a real human walks in with on a Tuesday morning creates new edges the system has never been shown. You cannot train your way out of this problem. You can only build the system to recognize when it’s outside what it knows and hand the job to a human.
That’s the only fix. Everything else is the executive equivalent of hoping.
The cost at three scales
The reason this matters is that the cost of an unhandled edge case doesn’t stay constant. It scales with what you put the AI in charge of.
Small scale: a wrong answer
At the bottom of the stack, an edge case is a wrong answer. A chatbot tells a customer the wrong shipping date. A search assistant invents a citation that doesn’t exist. A spreadsheet AI mislabels a column. The customer gets frustrated, the analyst catches the mistake, life goes on. Annoying, recoverable, and the cost is measured in time.
This is the level most people think AI failures live at. They don’t.
Medium scale: a lawsuit
The next level up, an edge case is a legal exposure. Air Canada deployed a customer service chatbot that invented a refund policy that didn’t exist. A customer relied on the chatbot’s promise, the company refused to honor it, the customer took them to court, and the court ruled that the company was responsible for what its chatbot said. The “the chatbot was wrong” defense did not work.
That’s the medium-scale edge case. A hallucinated answer your AI gave to a customer, with full confidence, that becomes a binding promise you have to either honor or fight in court. Multiply that across the volume of interactions a customer service AI handles, and the math gets ugly fast.
I get into the specific failure mode of confident invention in AI Hallucination: A Survival Guide for People Who Publish Under Their Own Name. The mechanism is the same whether the output is a customer service answer or a chapter of your book.
Large scale: a death
At the top of the stack, an edge case is a person’s life.
Self-driving cars are the most public version of this. The system handles every routine driving scenario it was trained on. The edge case is the pedestrian whose silhouette the model has never seen, the bicycle wobbling in a way the model has never been shown, the traffic backup caused by something the model can’t categorize. The cost of the unhandled edge case at this scale isn’t a wrong answer. It’s a human being.
Medical AI sits in the same category. So does any system that decides who gets a loan, who gets a job interview, who gets parole, who gets removed from a no-fly list. The cost of the system being confidently wrong, at scale, on the cases it was never trained for, can be a life destroyed or ended.
What this means for your rollout
Before you deploy any AI system into a real workflow, you have to answer one question honestly. What does the cost of a single failure look like, and who pays it?
If the answer is “the user has to retry the search,” fine. Ship the system, let it fail occasionally, and move on. The cost is small, the cost lands on the user, and the user has another search engine they can use.
If the answer is “the company has to honor an invented refund policy,” you have a problem the law has already decided. Build a human in the loop for any answer that creates an obligation. The cost of that human’s time is less than the cost of the lawsuit you will eventually lose.
If the answer is “a person gets hurt,” you cannot deploy this system without a human in the loop, full stop. The cost of getting this wrong is something money cannot get back, and no amount of training data ever solves the next edge case the world will produce.
The mistake every Klarna-style rollout makes is answering the cost question with what the AI handles 95 percent of the time. The question is what happens the other 5 percent.
The rule
Keep a human in the loop wherever the cost of one unhandled edge case exceeds the convenience of automating the routine ones.
That’s the whole rule. The work of adopting AI well is the work of mapping where that line falls for every system you deploy. Most rollouts fail because nobody did the mapping. They benchmarked the AI against the easy cases, took the savings, and waited for the hard ones to arrive.
The hard ones always arrive.
Frequently Asked Questions