Guide
Is AI Reliable Enough to Talk to My Customers?
Not unsupervised — and any vendor who tells you otherwise is selling you the risk. The honest answer is that an AI agent should draft every customer-facing action, but a human should approve it before it ships. That review gate is what makes the work reliable. You're buying an approved outcome, not a bot you cross your fingers on.
At TeamShift, our team comes from revenue-operations and business-brokerage backgrounds — we buy, sell, and run revenue and operations systems for small businesses. So when a contractor or shop owner asks us "is AI safe to talk to my customers," we don't hear a tech question. We hear a reputation question, and sometimes a resale-value question. Let us answer it straight.
The real question isn't "will the AI mess up." It's "will anyone catch it."
Here is the part most vendors won't say out loud: every unsupervised agent eventually misfires. This isn't a knock on any one model. It's the base rate. According to an industry roundup from Swept AI, "roughly 80% of organizations have encountered risky or unexpected behavior from their AI agents in production" (verified 2026-05-29, source: https://www.swept.ai/post/when-ai-customer-service-agents-fail-real-examples). Eight in ten. That is not the exception — that is the norm.
It tracks with how little trust the people deploying these tools actually have. A Harvard Business Review Analytic Services survey of 603 business and technology leaders (reported by Fortune in December 2025) found that only 6% of companies fully trust AI agents to autonomously run their core business processes (verified 2026-05-29, source: https://fortune.com/2025/12/09/harvard-business-review-survey-only-6-percent-companies-trust-ai-agents/). In the same body of research, 92% of respondents agreed agents need rules-based guardrails to operate safely — but fewer than half (48%) said their organization had actually defined those rules (verified 2026-05-29, source: https://www.prnewswire.com/news-releases/new-survey-from-harvard-business-review-analytic-services-finds-ai-adoption-remains-high-yet-value-may-lag-without-modernization-and-workflow-integration-302756865.html).
So the question was never "is the AI smart enough." Smart isn't the problem. Unsupervised is the problem. The danger is that small errors compound quietly across many calls until they become a reputation issue — and nobody notices until a customer is angry, a quote is wrong, or a policy got invented out of thin air.
What "invented out of thin air" actually costs: the Air Canada precedent
If you want a clean, citable example of an unsupervised agent creating a real obligation, look at Moffatt v. Air Canada (2024 BCCRT 149), decided by the British Columbia Civil Resolution Tribunal on February 19, 2024 (verified 2026-05-29, sources: https://www.canlii.org/en/bc/bccrt/doc/2024/2024bccrt149/2024bccrt149.html and https://www.mccarthy.ca/en/insights/blogs/techlex/moffatt-v-air-canada-misrepresentation-ai-chatbot).
A passenger, Jake Moffatt, was flying after his grandmother died. Air Canada's website chatbot told him he could book a full-price ticket and claim a bereavement discount retroactively within 90 days. That policy did not exist — the chatbot fabricated it. When Air Canada refused the refund, Moffatt took it to the tribunal and won, with the airline ordered to pay roughly CA$812 in damages.
The legal reasoning is what every business owner should tattoo on the wall. Air Canada argued the chatbot was "a separate legal entity" responsible for its own statements. The tribunal flatly rejected that:
"While a chatbot has an interactive component, it is still just a part of Air Canada's website... It should be obvious to Air Canada that it is responsible for all the information on its website."
Read that again. The business is liable for what its bot says. Not the vendor. Not the model. You. An unsupervised agent doesn't just risk an awkward conversation — it can manufacture a promise you're legally and reputationally on the hook for.
The quieter failure mode: automation that creates work instead of removing it
The lawsuit is the dramatic version. The everyday version is more boring and, honestly, more common.
A practitioner review of AI answering services from Smash VC describes the trap precisely: "you find an AI phone agent that takes calls, but it generates so much downstream cleanup that staff spend more time fixing transcripts than they would have spent answering the phone."
We've watched this happen in real shops. The bot "answers" — and then someone on the team has to read every garbled transcript, re-key the customer's actual phone number because it mis-heard it, figure out what the customer really wanted, and call them back anyway. You didn't remove labor. You added a second job: babysitting the robot. That's negative ROI dressed up as automation.
The same review sets a clean bar for what counts as actually working: "A real receptionist replacement fields the call, captures the data cleanly, and routes the next step without your team picking up the slack." Field it. Capture it cleanly. Route the next step. No slack picked up. That's the standard. Most raw bots fail at least one of those three.
The reframe: a review gate isn't a safety net, it's the product
Here's where we part ways with the rest of the market. Most vendors respond to all of this by promising a smarter model — "ours won't hallucinate." That's a promise about a probability, and you already saw the probability: ~80% hit trouble in production.
We do the opposite at TeamShift. We concede that any unsupervised agent will eventually misfire, and we engineer around it. Our AI agent teams handle the volume — fielding missed calls, drafting lead and quote follow-ups, covering the inbox, prepping the back-office work. But before anything reaches your customer, a human approves it.
That's the review gate, and we want to be clear about how to think about it: it is not an apology for AI being risky. It is a control surface. Reliability isn't a thing you hope the model has. It's a design choice you make on purpose.
- The agent drafts and queues the customer-facing action — the quote, the booking, the reply.
- A human reviews and approves before it goes out.
- Only then does the customer receive it.
The wrong answer never reaches the customer, so the wrong answer never becomes your reputation. The 80% base rate of "risky behavior in production" is exactly the slice the gate catches. We let the AI do what it's great at (speed, coverage, never sleeping) and put a human exactly where humans are great (judgment on the last inch).
Three ways to handle the phone — side by side
| Raw DIY AI bot | Reviewed-outcome model (TeamShift) | Hiring a person | |
|---|---|---|---|
| Speed to respond | Instant | Fast (agent drafts immediately) | Slow; limited hours |
| Who approves customer-facing actions | Nobody | A human, every time | The person themselves |
| Reputation risk | High — unsupervised; ~80% hit production issues | Low — wrong answers caught before send | Variable; human error, turnover |
| Coverage | 24/7 | 24/7 intake, reviewed output | Gaps: nights, weekends, sick days |
| What you're buying | Software to manage and hope | A delivered, approved outcome | A hire to recruit, train, and retain |
| Downstream cleanup | Often high (fixing transcripts) | Built into the gate | Minimal if trained, but costly |
| Cost shape | Low sticker, hidden cleanup cost | Outcome-priced | Salary + benefits + ramp |
DIY bots optimize for "instant." Hiring optimizes for "judgment." The reviewed-outcome model is the only column that gives you both speed and a human on the last inch — without making you the one managing it.
What this means if you run a home-service business
You don't have time to read transcripts. You're on a roof or under a sink. The whole point of getting help with the phone is to get time back — not to trade ringing phones for a queue of robot mistakes to clean up.
So the test we'd apply to any AI phone or follow-up offer is simple:
- Does a human approve customer-facing actions before they send? If no, you've adopted the 80% risk and the Air Canada liability with it.
- Does it field, capture cleanly, and route — without your team picking up slack? If you're fixing its output, it failed.
- Are you buying an outcome or renting a tool? A tool is your problem to operate. An outcome is delivered to you.
That's the difference between renting a chatbot and hoping, versus buying a delivered outcome with a control surface on it. One protects your reputation — and, if you ever sell the business, the operational reliability buyers actually pay a premium for. The other quietly erodes both.
AI is more than reliable enough to do the work. It is not reliable enough to be unsupervised on your customers. Put a human gate on it and you get the best of the machine without betting your name on it.
FAQ
Is it safe to let an AI talk to my customers directly? Unsupervised, no — roughly 80% of organizations have hit risky or unexpected AI agent behavior in production (Swept AI). It becomes safe when a human reviews and approves every customer-facing action before it sends. The model handles volume; the human owns the last inch.
Can an AI chatbot actually create a legal or financial obligation for my business? Yes. In Moffatt v. Air Canada (2024 BCCRT 149), the British Columbia Civil Resolution Tribunal held the airline liable for a discount its chatbot invented, ruling the company "is responsible for all the information on its website." The business owns what the bot says.
Won't AI just save me time on the phone? Only if it's supervised. Practitioner testing (Smash VC) warns that some AI phone agents "generate so much downstream cleanup that staff spend more time fixing transcripts than they would have spent answering the phone." A good system fields the call, captures data cleanly, and routes the next step without your team picking up the slack.
What is a "review gate" exactly? It's a control surface: AI agents draft and queue customer-facing actions — quotes, bookings, replies — and a human approves them before they reach the customer. Reliability becomes a design choice, not a promise about the model.
How is this different from just hiring someone? Hiring gives you judgment but comes with cost, ramp time, turnover, and coverage gaps (nights, weekends, sick days). The reviewed-outcome model gives you 24/7 intake with a human approving the output — speed and judgment, without you having to recruit, train, and retain.
Written by the TeamShift team — our background is in revenue-operations and business-brokerage for small businesses. TeamShift delivers finished back-office and customer-response outcomes for home-service and SMB businesses, with a human review gate on every customer-facing action.