Not in the Model, in the Enterprise Environment
By: Yan-David “Yanda” Erlich, General Partner, B Capital
The Model Isn’t the Moat Anymore
For two years, “AI progress” meant “the model got better.” That era is ending.
The evidence is stark: according to MIT’s State of AI in Business 2025 report, only 5% of enterprise GenAI pilots achieve measurable P&L impact.1 S&P Global found that 42% of companies abandoned most AI initiatives in 2025, up from 17% in 2024.2 The average organization scrapped 46% of AI proof-of-concepts before reaching production.2
These aren’t bad models. They’re bad environments.
Model capability is still improving, but for most enterprises it is no longer the limiting constraint. What matters now is everything around the model: integration, governance, distribution, measurement and the ability to learn in production without breaking trust.
One constraint is consistently underweighted in almost every AI strategy deck I see: organizational fit.
If AI is going to deliver durable value, it must function less like a tool and more like a coworker. One that collaborates with humans, operates inside real team workflows and carries context over time.
The winners won’t be the teams with the “smartest” model. They’ll be the teams with the best environment to deploy, trust and continuously improve AI.
The Coworker vs. Tool Distinction
This isn’t semantics. The difference between AI-as-tool and AI-as-coworker determines whether value compounds or collapses.
A tool waits to be invoked. It processes inputs and returns outputs. It has no memory of your organization, no understanding of how your team actually works, no awareness of who should approve what. Every session starts from zero.
A coworker maintains context across interactions. It knows your domain, your team’s terminology and your approval workflows. It can be delegated to, supervised and held accountable. It gets better at its job over time because it learns from outcomes, not just prompts.
The MIT data validates this distinction. Their research found that vendor-built solutions succeed at a 67% rate, while internal builds fail at a 67% rate.1 Why? Vendors who win are the ones building coworker-like systems with deep workflow integration, not generic tools bolted onto existing processes.
Consider what a new human hire experiences: onboarding, permissions, a manager, feedback loops, access to institutional knowledge, clear escalation paths. We don’t hand them a keyboard and expect productivity on day one. Yet that’s exactly how most enterprises deploy AI.
The Shift: From Capability Race to Execution Advantage
In practice, this shift shows up when AI performs well in pilots but fails to survive first contact with real workflows. Buyers are no longer asking “Does it ace a benchmark?” They’re asking a different class of questions altogether:
- Integration: Can it plug into my workflows without rewriting my org chart?
- Governance: Can it touch sensitive data without creating security, privacy or compliance blowback?
- Accountability: Who is responsible when it’s wrong?
- Measurement: Can we evaluate it in production and improve it safely?
- Scale: Can we roll it out to thousands of users without adoption collapsing?
- Collaboration: Can it work with my team like a competent new hire, or does it just generate text?
Those are not model questions. Those are execution questions.
And execution compounds. Deployment creates feedback. Feedback enables improvement. Improvement drives adoption. Adoption earns deeper integration. That loop becomes the moat.
McKinsey’s 2025 AI survey confirms this pattern: organizations reporting “significant” financial returns are twice as likely to have redesigned end-to-end workflows before selecting models.3 The execution advantage emerges less from initial capability and more from the ability to learn safely in production over time.
5 Ingredients of AI Execution Advantage
By “environment,” I mean the structural conditions that let AI compound in production. These conditions determine whether improvement accumulates or stalls after deployment.
1. Integration Surface
How quickly AI can ship into real workflows.
Value compounds fastest when AI lives inside the system of record, removes steps, reduces cycle time and tightens feedback loops. The MIT research shows that ROI is lowest in sales and marketing pilots, where most GenAI budgets are concentrated, and highest in back-office automation where integration is deepest.1
The integration question isn’t “can we connect via API?” It’s “can we embed deeply enough to observe outcomes and improve?”
2. Data Rights and Governance
What the system can legally and operationally learn from in production.
If you can’t observe outcomes, you can’t improve. If you can’t improve, you don’t compound. Companies that solve this and can learn from production without violating governance will outperform those that can’t.
3. Distribution and Procurement
How deployments become default, not optional.
AI doesn’t win by demos. It wins by rollout. PwC’s 2025 survey found that 79% of organizations have adopted AI agents at some level, but only 35% report broad adoption, and 68% say half or fewer employees interact with agents in their daily work.4 The gap between “we have AI” and “AI is how we work” is primarily a distribution problem.
4. Production Learning Loop
Evaluation, monitoring and improvement without breaking trust.
Real-world evaluation tied to business KPIs. Monitoring for drift and failure modes. Human routing for uncertainty. Continuous improvement with governance guardrails. Gartner predicts that 30% of GenAI projects will be abandoned after proof-of-concept by the end of 2025.5 Not because the technology failed, but because organizations couldn’t build the infrastructure to improve safely in production.
Organizational Fit
AI must function as a coworker, not a tool.
This is the missing pillar that most AI strategies ignore entirely. Enterprises are networks of roles, permissions, incentives and handoffs. “Agentic” only works when AI behaves like a well-scoped teammate: collaboration mechanics inside existing workflows, identity and least-privilege permissions, durable memory and context and on-the-job learning that operates without violating governance.
When I evaluate AI companies, I ask: “Would you hire this system as a junior employee?” If the answer requires caveats about supervision, permissions and trust boundaries, you’ve identified the product work that matters.
Where AI Value Compounds Fastest
Value concentrates where execution environments support compounding
Instrumented digital workflows where shipping is fast and telemetry is rich. Software development, customer support, back-office operations. Anywhere outcomes can be observed quickly and iteration is cheap.
High-volume operational workflows with clear accountability and measurable outcomes. Claims processing, compliance review, financial operations. Environments where “better” is quantifiable and feedback is continuous.
Physical operations with telemetry and hard KPIs. Manufacturing, logistics, healthcare delivery. Domains where the system of record captures reality and improvement is directly measurable.
Generic assistants without durable data rights, distribution leverage and a compounding learning loop get competed down to commodity margins.
What This Means for Founders
Markets are not just industries. They are execution environments. This favors teams that optimize for compounding environments over early polish or surface-level performance.
Wedge into the system of record. Don’t build alongside the workflow. Become the workflow. The difference between “we integrate with Salesforce” and “we are where deals get done” is the difference between tool and coworker.
Secure data rights early. The legal and operational ability to learn from production is a moat. Companies that negotiate this upfront, while offering clear value exchange, will outperform those who treat it as a Phase 2 problem.
Design for procurement from day one. Audit logs, SSO, role-based access and compliance certifications. These aren’t features; they’re prerequisites for the environments where AI compounds.
Treat evaluation as product. If you can’t show measurable improvement on business KPIs, you can’t justify continued investment.
Build the AI coworker layer. Collaboration, identity, permissions, memory and handoffs. This is the unsexy work that separates pilots from production systems.
Environments that support compounding often look weaker early yet outperform over time. This allows founders to look wrong early and still be right in the long run.
What This Means for Enterprises
Buying AI like ordinary software and expecting it to behave like ordinary software does not work. AI systems improve only when they are treated as production systems with owners, feedback and failure modes.
Establish an AI operating model. Clear owners, defined accountability and incident response. Who is responsible when the AI makes a mistake? If you can’t answer this question, you’re not ready for production.
Tie AI performance to business KPIs. Not accuracy metrics, not user satisfaction scores. Actual business outcomes: revenue, cost, cycle time and error rates.
Reduce fragmentation where learning loops need consistency. Every team using a different AI tool means every team learning in isolation. Consolidation isn’t about cost savings; it’s about compounding.
Treat AI coworker fit as a first-class requirement. When evaluating vendors, ask: “How does this integrate with how my team actually works?” Not how it works in a demo. How it works in your environment, with your permissions, your approval flows and your existing tools.
What This Means for Investors
Model quality is no longer the primary diligence question. Instead, evaluate:
Ownership of the integration surface. Does the company control the system of record, or are they dependent on someone else’s platform?
Durable data rights and credible governance. Can they legally and operationally learn from production? Is their data strategy a moat or a liability?
A scalable distribution path. Can they reach thousands of users without a proportional increase in sales and support costs?
Evidence of a production learning loop. Are they improving from deployment, or shipping static models?
A credible path to AI coworker fit. Can they function inside real enterprise environments with real permissions and real accountability?
We believe the best AI investments right now are companies building execution infrastructure, not model capability alone. The model layer is commoditizing; the execution layer is where durable value will be built.
How This Shows Up in Our Portfolio
This framework has shaped our investing strategy for some time. A few examples:
Perplexity: Enterprise knowledge work is an execution environment problem. Perplexity’s enterprise offering is explicitly about deploying AI into organizational context: collaboration in Spaces, answers from organizational apps and files, enterprise permissioning, auditability and “no training on your data.” This is governance, distribution and coworker-fit working together in production.
Unblocked: A literal AI coworker for engineering teams. Unblocked plugs into the tools engineers already use, connects code, documentation and conversations, supplies shared team context that makes other AI coding tools more effective. Enterprise fit is table stakes: SSO, RBAC, audit logs and security posture designed for production.
Goodfire: If you care about production reliability, you eventually care about controlling behavior, not just prompting it. Goodfire is building interpretability tooling that surfaces failure modes, enables behavior design and supports durable fixes. This maps directly to the production learning loop and governance required for AI systems to improve safely.
Axiom: In domains where correctness is existential, value shifts toward systems that can reason rigorously and be evaluated against hard truth. Axiom’s focus on an AI mathematician is a wedge into verifiability-first reasoning. It’s upstream capability in service of downstream production requirements.
Where Advantage Compounds
For the next decade, the biggest AI outcomes will not come from “the model got better.”
They will likely come from environments where AI can be deployed, trusted, measured and improved continuously inside real workflows. The environments we choose to build in will determine which AI systems endure.
The data is already pointing the way: 95% of pilots fail not because AI doesn’t work, but because organizations haven’t built the necessary working environment.1 The 5% that succeed share common characteristics: deep workflow integration, clear governance, production learning loops and organizational fit.1
Capability is table stakes. Execution advantage is the moat.
The question for founders, enterprises and investors isn’t “which model is best?” It’s “which environments support compounding?”
Build there, and if you’re already building there, I’d love to talk to you.
—
Yan-David “Yanda” Erlich is a General Partner at B Capital, where he focuses on AI infrastructure and AI coworker investments. Previously, he was COO & CRO at Weights & Biases and a GP at Coatue.
LEGAL DISCLAIMER
All information is as of 1.21.2026 and subject to change. This content is a high-level overview and for informational purposes only. Certain statements reflected herein reflect the subjective opinions and views of B Capital personnel. Such statements cannot be independently verified and are subject to change. The investments discussed herein are portfolio companies of B Capital; however, such investments do not represent all B Capital investments. It should not be assumed that any investments or companies identified and discussed herein were or will be profitable. Past performance is not indicative of future results. The information herein does not constitute or form part of an offer to issue or sell, or a solicitation of an offer to subscribe or buy, any securities or other financial instruments, nor does it constitute a financial promotion, investment advice or an inducement or incitement to participate in any product, offering or investment. Much of the relevant information is derived directly from various sources which B Capital believes to be reliable, but without independent verification. This information is provided for reference only and the companies described herein may not be representative of all relevant companies or B Capital investments. You should not rely upon this information to form the definitive basis for any decision, contract, commitment or action.
SOURCE
- MIT Sloan Management Review and Boston Consulting Group, “The State of AI in Business 2025,” 2025. https://mlq.ai/media/quarterly_decks/v0.1_State_of_AI_in_Business_2025_Report.pdf
- S&P Global Market Intelligence, “Generative AI Shows Rapid Growth but Yields Mixed Results,” October 2025. https://www.spglobal.com/market-intelligence/en/news-insights/research/2025/10/generative-ai-shows-rapid-growth-but-yields-mixed-results
- McKinsey & Company, “The State of AI in 2025: Agents, Innovation, and Transformation,” 2025. https://www.mckinsey.com/capabilities/quantumblack/our-insights/the-state-of-ai
- PwC AI Agent Survey, “AI Agents and Enterprise Adoption,” May 2025. https://www.pwc.com/us/en/tech-effect/ai-analytics/ai-agent-survey.html
- Gartner, “Gartner Predicts 30% of Generative AI Projects Will Be Abandoned After Proof of Concept by End of 2025,” July 29, 2024. https://www.gartner.com/en/newsroom/press-releases/2024-07-29-gartner-predicts-30-percent-of-generative-ai-projects-will-be-abandoned-after-proof-of-concept-by-end-of-2025