The engineering lead pulled up the spreadsheet. We'd been building this AI meeting prep tool for six months—it pulled context from emails, past meetings, company docs. Everything you needed to walk into any conversation prepared.
The product worked. Beta users loved it. The team was excited.
Then we saw the annual cost projection: $16 million.
For a product that saved people 15 minutes before meetings. Nice to have, not must-have. No clear signal it would drive adoption or revenue.
We killed it that afternoon.
Six months of work, shelved in a single meeting. And you know what? That was the right call. Because I've now watched enough AI products fail to recognize the warning signs early.
The numbers are brutal. RAND Corporation found that 80% of AI projects fail—twice the rate of regular IT projects. S&P Global's 2025 survey shows 42% of companies abandoned most of their AI initiatives this year, up from just 17% in 2024.
After working on three AI products that failed in completely different ways, I started seeing the patterns. The expensive patterns nobody talks about until after the project is dead.
Two Years to Build, One Month of Use
Early days of generative AI. Medical distribution company. Simple problem: help sales reps find alternative products when items are out of stock.
Our plan: Embed product data in Excel (because that's where it lived), use RAG to suggest alternatives, surface it through Teams.
First week: Excel embeddings were garbage. Products with similar names but completely different purposes got mixed up. Surgical gloves matched with industrial cleaning supplies.
Week 8: Latency was killing us. Request goes from Teams → Copilot Studio → Power Automate → FastAPI → our RAG system → Azure OpenAI → back through the chain. Three seconds minimum. Eternity when a customer's on hold.
Month 3: Nobody wanted to commit a frontend engineer. We were stuck with Streamlit—fine for our team, not for 200 sales reps nationwide.
Month 6: Still figuring out Azure OpenAI. This was before Azure AI Foundry existed. Every API call felt like educated guessing.
Nearly two years later, we launched. Accuracy was acceptable. Latency wasn't great but workable. UI didn't crash.
Month 1 after launch: Decent usage.
Month 2: Dead silence.
I heard they restructured and tried again. But two years for one month of actual use? That's most AI products right there.
When 92% Accuracy Loses to Office Politics
Different project. Built a RAG system for the DMV to handle FAQs. Six months of work. 92% accuracy on test questions. Could answer everything from license renewals to title transfers.
Then someone in a meeting: "Why don't we just use Copilot? Microsoft's offering it at $30 per user per month."
Never mind our system was custom-built for DMV regulations. Never mind it was more accurate. Never mind $30 per user times however many employees meant costs would stack up.
Office politics won. Our system got used for about a month, then shelved.
The technical work was solid. The business case made sense. But we didn't account for the "someone has a Microsoft relationship" factor.
When the Math Stops Working
Back to that meeting prep tool.
During design, the ROI made sense. Average meeting costs a company $X in employee time. Save 15 minutes per meeting, times Y meetings per day, times Z employees... profit!
Except we wanted to give users maximum context. "Maximum" meant large context windows. Large windows times thousands of meetings per day times GPT-4 pricing... wait, how much?
$16 million annually.
The business case evaporated in one cell of an Excel spreadsheet.
We'd been so focused on making it work technically—can we embed the data, retrieve it fast, summarize well—that we forgot to keep checking: does this actually make financial sense?
By the time someone asked that question, we'd built most of it.
What Actually Kills These Projects
Here's what I learned watching these three die:
The AI model is almost never the problem.
RAND researchers interviewed 65 data scientists and engineers and found that misunderstandings about project purpose and domain context are the most common reasons for failure. Not bad models. Not weak compute. Miscommunication and unclear business cases.
The Architecture Creates Bottlenecks Before You Notice
That product support bot? The architecture was the killer.
We jerry-rigged five systems:
- Excel for data (wrong tool from day one)
- Power Automate for orchestration (added latency)
- Copilot Studio as middleware (more latency)
- Custom FastAPI backend (finally, something that made sense)
- Azure OpenAI (which we were learning on the job)
Each handoff added 200-500ms. Ask a question, get an answer... 3-4 seconds later. When someone's on hold with a customer, that's forever.
We built based on what tools we had access to and what people knew. Not based on what the product needed.
Organizations might not have adequate infrastructure to manage their data and deploy completed AI models, which increases the likelihood of project failure.
Your Data Doesn't Match Your Use Case
Excel spreadsheets are great for humans. Terrible for embeddings.
Our product catalog had:
- Inconsistent naming ("Product A" vs "Product A (legacy)" vs "Prod-A")
- Missing fields (half the products had no category tags)
- Embedded business logic (pricing rules in formulas, not actual data)
- No hierarchy (which products substitute for each other? Nobody documented that)
We spent months trying to clean it up enough to embed. The real issue: data was structured for accounting, not semantic search.
The data you have is rarely the data you need. Fixing it takes longer than building the model.
Unit Economics Look Great Until You Scale
Small numbers lie.
In testing, that meeting prep tool cost maybe $200/month in API calls. Model it for the whole company: $16 million a year.
I've seen this repeatedly:
- "Only $0.02 per call!" (Times 10 million calls monthly = $200K/month)
- "So fast in testing!" (With one user. Try 1,000 concurrent.)
- "Storage is cheap!" (Until you're storing embeddings for every document ever created)
Between 70-85% of GenAI deployments fail to meet their ROI, and economics that looked good on paper but collapsed at scale is a major reason why.
Calculate real costs before building the thing. Not after.
Nobody Actually Wants to Change
The DMV project failed because nobody wanted to change their workflow.
The system was better than what they had. More accurate than Copilot. Custom-built for their regulations.
But:
- Training staff on a new system takes time
- Someone had a Microsoft relationship
- "Let's just use Copilot" is easier than "let's deploy custom infrastructure"
- Change is hard, even when change is better
In 2023, 52% of employees were more concerned than excited about AI, up from 37% in 2021. You can't deploy AI into an organization that doesn't want it, even when the AI is objectively better.
Technical Debt From Day One
Nobody wants to admit this: most AI products ship with massive technical debt because we're figuring it out as we go.
Product support bot: we used Streamlit because that's what the team knew. We didn't have frontend resources. We needed something quick. We'd "fix it later."
Later never came. We were always in "just get it working" mode.
The embeddings were wrong from the start—no time to rebuild them. The API chain was too long—no time to refactor. The data model was broken—no time to fix the source.
You ship with debt thinking you'll pay it down. Then you're too busy dealing with the next crisis.
No Clear Reason for AI
The meeting prep tool perfectly demonstrates a solution looking for a problem.
Could we build it? Yes.
Was it technically impressive? Absolutely.
Did anyone need AI to solve it? Not really.
A good calendar integration, a shared notes system, decent search would have solved 80% of the problem for 1% of the cost.
But AI was sexy. AI got executive buy-in. AI was what everyone wanted to talk about in demos.
Successful projects are laser-focused on the problem to be solved, not the technology used to solve it. We were laser-focused on showing off what AI could do.
Wrong focus.
Pilot Paralysis
MIT reports that 95% of GenAI pilots fail, while vendor-led, workflow-integrated projects succeed twice as often.
Every project I mentioned spent months in pilot mode:
- "Let's test with 10 users first"
- "We need more feedback"
- "One more feature before full rollout"
- "Accuracy needs to hit 95% before we scale"
Pilots are where projects die slowly.
The longer you stay in pilot, the more things can derail. Someone leaves the team. Priorities shift. A new tool launches. The executive sponsor moves to a different division.
Scale or kill. Don't linger.
What Might Have Worked
After watching enough failures, patterns emerge on what might have saved these projects:
Model Costs in Week 1, Not Month 5
That $16 million meeting tool? Should have run those numbers immediately.
Before writing code:
- What's cost per transaction at scale?
- What will users pay or business spend?
- Does the math work?
If economics don't work at scale, stop. You can't engineer your way out of bad math.
Build for Your Needs, Not Your Comfort
The product support bot needed sub-second response, simple retrieval, reliable accuracy.
What it got: 3+ second latency, five system handoffs, dependency on tools we didn't fully understand.
We built with tools we had instead of getting tools we needed.
Sometimes that means telling leadership: "We need a frontend engineer." Or "Excel won't work for this." Or "This needs different infrastructure."
Hard conversation early beats failed product later.
Start With 10 Users, Not 10,000
The DMV system was built for the entire organization from day one. Full scalability. Complete feature set. Production-grade everything.
It never got 10 real users.
Before beginning any AI project, leaders should be prepared to commit to solving a specific problem for at least a year. But start that year with 10 users.
Build something that works for 10 people. Watch them use it. Fix what breaks. Add what's missing.
Then think about 100. Then 1,000.
Most projects die before they get to 10.
Measure What Actually Matters
That 92% accurate DMV system lost to 70% accurate Copilot.
Accuracy doesn't matter if:
- Nobody trusts it
- It's hard to use
- Someone has a vendor relationship
- People don't want to change workflow
What actually matters:
- Do people use it without being forced?
- Does it save them real time?
- Do they ask for it when it's unavailable?
- Would they be upset if you took it away?
Those are harder to measure than accuracy. But they determine whether something succeeds.
Account for Politics From Day One
The Microsoft Copilot decision wasn't technical. It was political.
Every AI project succeeds or fails partly based on:
- Who sponsors it internally
- Which vendor relationships exist
- Who controls budget
- What previous initiatives failed
- Who's afraid of being replaced
You can have the best technical solution and lose to a worse one backed by the right executive.
This isn't cynical. It's realistic.
Know When to Kill It
That meeting prep tool should have died in week 2 when we realized the cost model didn't work. Instead, we spent six months thinking we could optimize to profitability.
We couldn't.
42% of companies are now abandoning most AI initiatives, and that's actually healthy. Better to kill bad projects fast than drain resources for months.
Set clear success criteria. If you're not hitting them, stop.
The Checklist That Could Have Saved These
Looking back, there were warning signs. We just didn't have a framework to catch them.
Before building anything, check:
Business Viability
- Can you state the problem in one sentence?
- Does this need AI, or would something simpler work?
- Do you know what users will pay or business will spend?
- Have you modeled costs at 10x and 100x scale?
- Would a skeptical CFO approve the ROI?
Technical Readiness
- Is your data structured for this use case?
- Do you know how to access/clean/transform it?
- Can you hit required latency with your proposed architecture?
- Do you have the skills in-house or clear plan to get them?
- Have you tested with real data (not clean test data)?
Organizational Fit
- Do people who'll use this want it?
- Have you talked to 10+ end users about their workflow?
- Does this require changing how they work?
- Who internally might block this (and why)?
- Do you have executive sponsorship lasting 12+ months?
Production Readiness
- Can you deploy without requiring five new systems?
- Do you have monitoring plans once it's live?
- What happens when it gives wrong answers?
- Can you roll back quickly if something breaks?
- Is there a human in the loop for high-risk decisions?
Scale Planning
- What's your plan for 10 users? 100? 1,000?
- At what scale do economics break?
- What's infrastructure cost at each tier?
- Do you have a clear path from pilot to production?
- What's your criteria for killing this if it doesn't work?
Can't check most of these? You're not ready to build. That's fine. Better to know now than after six months.
The Question That Would Have Saved Everything
"If this works perfectly, will anyone use it enough to justify the cost?"
Not: "Can we build it?"
Not: "Is the model accurate?"
Not: "Is the tech impressive?"
Just: "Will people use it?"
That meeting prep tool? Even perfect, it solved a small inconvenience, not a major pain.
That product support bot? Even with perfect accuracy, it required sales reps to change a 20-year workflow.
That DMV system? Even with 92% accuracy, it required organizational commitment when they had a "good enough" alternative.
Technical work was solid. Business cases were questionable from the start.
What I'd Do Differently
If I could restart each project:
Week 1: Spend with end users. Watch them work. Find actual pain points. Make sure the problem is real.
Week 2: Model economics. If math doesn't work at scale, stop. Don't assume you'll optimize later.
Week 3: Build the ugliest possible prototype demonstrating value. No fancy UI. No perfect accuracy. Just: does this solve the core problem?
Week 4: Put it in front of 5 real users. Not a demo. Not a presentation. Let them use it for real work.
Week 5-8: Fix what breaks. Add only what blocks them from using it. Resist every non-critical feature request.
Week 9: Decide: Scale to 50 users or kill it.
Most projects I've been on spent weeks 1-4 choosing tech stacks and arguing about architecture.
By the time we had something for users, we'd burned half the budget and locked into decisions that were hard to undo.
Get the Production-Ready AI Checklist
I built a comprehensive framework based on every failure I've seen.
It's not just questions. It's a complete evaluation system for whether your AI project is actually ready to ship—covering technical architecture, data readiness, business viability, organizational fit, and production operations.
Use it before you start. Use it before you scale. Use it before you commit serious budget.
Won't guarantee success. Might save you from six months building something nobody will use.
Download the Production-Ready AI Checklist → LLM Production Readiness Checklists
_I help companies ship AI products that people actually use. I've worked on everything from early-stage RAG systems to production ML pipelines. These days, I spend time helping teams figure out if they should build something before they waste months building it._
