When should you rely on human judgement instead of AI recommendations?

Quick Answer: Use the 5-Question Framework before acting on any AI recommendation affecting people. Ask: Could this cause irreversible harm? Is essential context missing? Would this break trust? Could this treat similar people differently? Does this set a precedent we would regret? If any answer is yes, use human deliberation instead.

Key Characteristics:

AI processes data, but humans process lives. Pattern recognition is not wisdom.
A 2025 MIT study found ChatGPT users showed reduced neural connectivity and prefrontal cortex activation
BCG research: 74% of companies struggle with AI ROI, with 70% of problems from people and process
70% of customers become open to brand switching after a single poor AI interaction

Real Example:

An agency leader's AI-powered performance management system flagged Sarah for review based on missed deadlines and decreased output. The manager noticed physical signs the data could not capture: dark circles, trembling hands. Through conversation, she learned Sarah's mother had recently been diagnosed with dementia. The manager overrode the AI, adjusted workload, and three months later Sarah was back to full performance.

Article

When Should You NOT Use AI in Decision-Making?

I accidentally leaked company data to AI.

Riley Coleman

November 04, 2025·13 min read

The AI Was Right. The Human Story Was Invisible.

A few weeks ago, I was speaking to a leader in an agency whose company had adopted an AI-powered performance management system. It had flagged Sarah for a performance review. The data was clear: missed deadlines, late arrivals, decreased output. By every quantifiable measure, the system’s recommendation was justified.

But as she sat across from Sarah, she realised the AI had missed the most important data of all.

Her dark circles. The tremor in her hands. The scattered post-it notes on a desk that was usually immaculate. The way her voice caught when she mentioned home.

Through conversation, the full story emerged: her mother’s recent dementia diagnosis. Six years of excellence, mentoring juniors, taking on extra work without complaint, couldn’t be captured in a dataset.

In that moment, she understood something critical about the gap between AI and human decision-making.

“AI processes data. Humans process lives.”

My Own Wake-Up Call

I understand how these blind spots happen because I’ve lived them.

In early 2024, after being made redundant, I used my gardening leave to enrol in the Ethics of AI course at the London School of Economics. Most people would travel, or finally tackle that home renovation. I chose to confront something that had been nagging at me.

What I discovered in Week 3 has fundamentally changed how I think about AI decision-making.

As a self-confessed tech geek and early adopter, I’d eagerly embraced AI. For seven years, I’d specifically requested to work on AI projects when I heard of them in the companies I worked for. I’d worked on systems that determined health insurance claim approvals and tax refunds.

When ChatGPT launched, I thought, “This changes everything!” and dove in headfirst, encouraging my design and research teams to start experimenting with it.

Then came the moment of truth.

After that ethics course, I realized I had (accidentally) unwittingly enabled IT at my previous company to share what amounted to financial information and legal contacts.

“What used to require deliberate intent to commit corporate espionage now only requires naive enthusiasm.”

That realisation sent me down a rabbit hole: If I, a design leader who’d spent years working on AI projects, could make this mistake with ChatGPT, what were we all missing about AI decision-making in general?

The answer became this framework. Five questions I wish I’d asked before I shared company data with AI tools.

The Wisdom Gap

We’re living through a quiet crisis of judgement.

AI is brilliant at pattern recognition. It sees correlations we’d never spot, processes information at speeds we can’t match, scales in ways that feel magical. It sees patterns everywhere.

But correlation is not causation. And brilliance at pattern recognition isn’t the same as wisdom about people.

Consider what AI cannot grasp: the weight of six years of trust built through shared struggles. The unspoken agreement that when someone has consistently shown up, you show up for them. The knowledge that disrupting one person’s life sends ripples through an entire team’s sense of psychological safety.

These aren’t soft concerns. They’re the substrate of how organisations actually function.

Yet the more we rely on AI, the less we exercise our capacity for this kind of judgement. A 2025 MIT study using brain scans found that people using ChatGPT showed significantly reduced neural connectivity and prefrontal cortex activation (the regions responsible for decision-making and critical thinking) compared to those using traditional tools or working independently.¹

We’re outsourcing not just tasks, but the very cognitive muscles we need to remain wise.

The stakes are measurable. Research from Acquire.ai (2025) found that 70% of customers who have a single poor AI interaction become open to brand switching, with 53% reducing spending immediately.² Meanwhile, Boston Consulting Group’s survey of 1,000+ executives found that 74% of companies struggle to achieve scale and ROI on AI initiatives, with 70% of problems stemming from people and process-related issues (adoption and trust), not technology.³

The question isn’t whether to use AI. It’s how to know when human judgement isn’t optional.

The Human Wisdom Check: Five Questions That Expose What AI Cannot See

Before acting on any AI recommendation, pause. Ask these five questions. Each one illuminates a different blind spot where algorithms fail and human judgement becomes essential.

Think of them not as a checklist, but as a lens: a way of seeing what data alone will never reveal.

Question 1: Could this decision cause direct harm?

The test: Would this affect someone’s livelihood, reputation, or wellbeing in ways that can’t easily be undone?

Why it matters: AI optimises for efficiency. Harm often hides in efficient outcomes.

What to watch for: Decisions that close doors. Denial of opportunities, public accountability that damages trust, financial consequences that ripple beyond the immediate.

When a U.S. court system deployed the COMPAS algorithm to assess recidivism risk, judges began treating its scores as definitive rather than advisory. The algorithm had been trained on historical arrest data, data that reflected decades of biased policing. The result: Black defendants were nearly twice as likely to be incorrectly flagged as high-risk compared to white defendants.⁴

The harm was direct. The harm was measurable. And the harm was invisible to the algorithm itself.

“The judges had the Harm Test available to them. They didn’t use it. They trusted the data over their own capacity for moral reasoning.”

If you answer YES: Stop. This decision requires human deliberation, not algorithmic delegation.

Question 2: Is there critical context the AI couldn’t access?

The test: Does essential information exist outside the dataset: recent personal crises, cultural nuances, unspoken team agreements?

Why it matters: AI mistakes silence for absence. What it cannot measure, it cannot consider.

Here’s what lives outside the algorithm’s view: the institutional knowledge carried in people’s heads. The collaboration that happens in hallway conversations. The cultural understanding of how things really work versus how the org chart says they work. The fact that Tom always covers for Maria on Fridays because she’s managing her son’s medical appointments, and Maria mentored Tom when he was new.

And it’s not just the invisible human infrastructure. It’s also the visible information that happens to live in another team’s system, another department’s spreadsheet, another part of the process that your AI simply isn’t connected to. The algorithm sees its own dataset as complete. It has no way of knowing what it’s missing.

What to watch for: The space between datapoints. The personal circumstances that explain statistical outliers. The informal social contracts that keep teams functioning. The documented reality that exists somewhere else entirely.

Sarah’s performance metrics were accurate. Her mother’s diagnosis wasn’t in the system. To the algorithm, these were unrelated facts. To any human who’d worked with Sarah, they were the same fact: a devoted employee facing an impossible situation with as much grace as she could manage.

“Context isn’t noise. Context is often the signal.”

If you answer YES: Gather the missing context before deciding. What looks like underperformance might be resilience.

Question 3: Could this damage trust?

The test: Would this decision break promises, undermine psychological safety, or signal that data matters more than relationships?

Why it matters: Trust is built over time, through consistency, reliability, and empathy. And it can be broken in an instant. AI understands none of this.

What to watch for: Decisions that contradict previous commitments. Actions that would make people second-guess whether their humanity matters to the organisation. Moments where algorithmic efficiency clashes with human decency.

AI doesn’t understand that trust is a form of infrastructure. It can’t see that when you override a manager’s promise to accommodate flexible hours because the algorithm says it’s inefficient, you’re not just affecting one employee. You’re teaching everyone that the system’s logic supersedes human judgement.

“Once you teach that lesson, people stop bringing their full selves to work. They bring only what the algorithm can measure.”

If you answer YES: Protect the trust. Because the truth is this: broken trust damages productivity faster than the time you’d ‘lose’ maintaining it.

Question 4: Would this treat similar people differently without justification?

The test: Does this decision create inequality through biassed data, unequal access, or inherited historical disadvantages?

Why it matters: AI inherits the biases in its training data. Fairness requires active scrutiny, not passive acceptance.

What to watch for: Patterns where protected characteristics correlate with outcomes, even if those characteristics aren’t explicitly coded. Decisions that advantage people who “fit the pattern” of past success. Historical inequities being amplified rather than corrected.

The COMPAS algorithm didn’t explicitly use race as a variable. It didn’t need to. Postcode, employment history, and social network data served as proxies, encoding decades of structural inequality into something that looked neutral because it was mathematical.

“Bias doesn’t announce itself. It arrives dressed as optimisation.”

If you answer YES: Redesign the process. If your AI can’t make fair decisions, you need different AI, or no AI at all for this use case.

Question 5: Does this set a precedent we’d regret if applied universally?

The test: If this decision became standard practice, would it create a culture or system we’d be proud of?

Why it matters: AI doesn’t consider long-term consequences. Each recommendation is evaluated in isolation.

What to watch for: Decisions that feel justifiable in one case but disturbing as policy. Actions that sacrifice long-term relationship health for short-term metrics. Moments where you’d struggle to explain your reasoning to someone you respect.

Imagine this scenario: Your AI recommends always terminating the bottom 10% of performers, regardless of circumstances. In any single quarter, you might find justification. Zoom out, and you’ve created an organisation where people live in constant fear, where asking for help signals weakness, where collaboration becomes risk.

If you answer YES: Step back. Ask whether this is the kind of organisation you’re trying to build.

What Happened with Sarah

Her manager applied her emotional intelligence to the situation and overrode the AI’s decision.

Proceeding with a performance review whilst Sarah was managing a family crisis would have damaged not just her wellbeing but her trust in the organisation. The Context Test revealed what the data couldn’t capture: six years of exemplary work provided the context for understanding recent struggles. The Trust Test made clear that how we responded would signal to every employee whether we saw them as resources or as people.

The decision became obvious. Not easy, but obvious.

They adjusted Sarah’s workload. They connected her with their employee assistance programme. They documented the accommodation so it wouldn’t affect future evaluations. They set up check-ins, not to monitor performance, but to offer support.

Three months later, Sarah was back to her previous level of contribution. More importantly, she told a colleague, “I’d never leave because they saw me as a person, not a metric.”

That’s not sentiment. That’s retention. That’s the competitive advantage of organisations that know when to trust their humanity over their algorithms.

The Path Forward

The most sophisticated AI in the world cannot grasp the weight of a mother’s dementia diagnosis. That’s not a limitation of current technology: it’s a fundamental truth about what algorithms can and cannot do.

They can process data. But data is a thin facsimile of reality. We must use our lived experience.

This isn’t about being anti-AI. It’s about being pro-wisdom. It’s about building organisations where technology amplifies our judgement rather than replacing it. Where we become more human through our use of AI, not less.

The five questions aren’t comprehensive. They’re not meant to be. They’re meant to be memorable enough that you’ll actually use them, specific enough that they’ll reveal what you need to see, and simple enough that they won’t slow you down.

Start small. Next time an algorithm makes a recommendation about a person (hiring, performance, promotion, termination), pause. Ask the questions. Notice what becomes visible.

Then decide.

Not because the data told you to. But because you’ve weighed what the data reveals against what it cannot capture, and you’re willing to take responsibility for that judgement.

That’s the work. That’s what makes us irreplaceable.

How to Use This Framework in Your Organisation

For Individual Contributors

Before accepting an AI recommendation that affects people:

Run through all 5 questions
If ANY answer is “yes,” escalate to human review
Document the human context the AI missed

For Managers

Establish a policy:

High-stakes decisions (hiring, firing, promotions, resource allocation) ALWAYS require the 5-Question Check
Train your team to recognise when human judgement is non-negotiable
Create safe channels for overriding AI when questions flag concerns

For Organisations

Build this into governance:

Add the 5-Question Check to your AI deployment protocols
Measure how often AI recommendations are overridden (This is a GOOD signal, it means humans are exercising judgement)
Celebrate override stories (like Sarah’s) that demonstrate wise human judgement

Questions or experiences to share?

I’d genuinely like to hear about moments when you’ve had to choose between algorithmic efficiency and human wisdom. What did you learn?

Get in touch or connect with me on LinkedIn.

About Riley Coleman

I’m the founder of AI Flywheel, and I specialise in human-centred AI systems. After accidentally enabling my company to leak data to AI (driven by naive enthusiasm rather than malicious intent), I enrolled in the LSE’s Ethics of AI course. That wake-up call transformed how I think about AI design.

Since then, I’ve taught over 200 designers, researchers, and product leaders how to build AI systems that earn and deserve human trust. My frameworks, including this 5-Question Check, have been applied in organisations ranging from startups to enterprises.

I’m based in Sydney (though I founded AI Flywheel in Amsterdam), and I believe the most important question in AI isn’t “Can we?” but “Should we?”

Learn more about my work →

Related Frameworks

Trust Journey Framework – How trust develops in AI relationships (5 stages)
Strategic Friction Framework – Why deliberate pauses create value
STOP Framework – My personal protocol for preventing AI data leaks

Want to Learn More About Building Trustworthy AI?

Join 200+ designers, researchers, and product leaders learning to build AI systems that earn and deserve human trust.

Subscribe to AI Flywheel Newsletter

Disclosure note

This article has been generated in partnership with Claude. The story, the thinking, the question framework were all my work and the fact-checking of sources was my responsibility.

References

Kosmyna, N., Hauptmann, E., Yuan, Y.T., Situ, J., Liao, X., Beresnitzky, A.V., Braunstein, I., & Maes, P. (2025). “Your brain on ChatGPT: Accumulation of cognitive debt when using an AI assistant for essay writing tasks.” arXiv preprint arXiv:2506.08872. MIT Media Lab.
Acquire.ai (2025). “Current Market Realities: The Financial Stakes of AI Customer Interactions.” Research study on AI-driven customer experience and brand switching behaviors.
Boston Consulting Group (BCG) (2025). “Where’s the Value in AI?” Survey of 1,000+ CxOs and senior executives across 59 countries and 20+ sectors.
ProPublica, “Machine Bias: Risk Assessments in Criminal Sentencing” (2016) authored by Julia Angwin, Jeff Larson, Surya Mattu, and Lauren Kirchner.
UCLA Law Review, “Injustice Ex Machina: Predictive Algorithms in Criminal Sentencing” (2019)

Quick Answer

Before acting on any AI recommendation that affects people, ask these 5 questions:

Could this decision cause direct harm? (livelihood, reputation, wellbeing)
Is there critical context the AI couldn’t access? (personal crises, cultural nuances, institutional knowledge)
Could this damage trust? (break promises, compromise psychological safety)
Would this treat similar people differently without justification? (bias, inequality)
Does this set a precedent we’d regret if applied universally? (long-term culture impact)

If you answer YES to any question, stop and use human deliberation instead of algorithmic delegation.

Or, hear the full story of how I learned this the hard way…

Written by

Riley Coleman

Founder, AI Flywheel

Riley Coleman (they/them) is the Founder and Trustworthy AI Design Lead at AI Flywheel, a Sydney-based AI training organisation for designers. They have trained 312+ designers across 7 cohorts with a 94% completion rate, drawing on insights from 240 designer interviews.

Share this article

Want more insights like this?

Join 1,000+ design leaders getting weekly insights on trustworthy AI.

Frequently Asked Questions

What is the 5-Question Framework for AI decision-making?

Before acting on any AI recommendation affecting people, ask: Could this cause harm? Is critical context missing? Could this damage trust? Would this treat similar people differently? Does this set a precedent we would regret?

What evidence shows AI can reduce human critical thinking?

A 2025 MIT study found ChatGPT users showed reduced neural connectivity and prefrontal cortex activation compared to those using traditional tools.

How did Riley Coleman's personal experience shape this framework?

After being made redundant in 2024, Riley enrolled in the LSE Ethics of AI course and realised they had accidentally enabled sharing of financial information and legal contacts with AI tools.

How should organisations implement the 5-Question Framework?

Individual contributors run all five questions before accepting AI recommendations. Managers mandate the check for high-stakes decisions. Organisations add it to AI deployment protocols.