Microsoft is Stress-Testing the Agentic AI Bubble in Its Own Gaming Division

Playback speed

Share post at current time

Share from 0:00

0:00

Transcript

Microsoft is Stress-Testing the Agentic AI Bubble in Its Own Gaming Division

A case study in corporate self-surgery by the world's most professionalized AI company

Chris Guo

Mar 05, 2026

Part 1: The Black Box and the Glass House

Last week the stock market processed two opposing theories of the AI bubble with the emotional regulation of a toddler.

Theory A, proposed by my college classmate Alap Shah1, is that replacing knowledge workers with AI collapses the consumer economy and therefore corporate profits, housing markets, and the financial system. On Monday, this triggered a broad selloff in software stocks.

Theory B is that replacing knowledge workers with AI prints money, and the market’s greed reflex will aggressively reward anyone who actually does it. On Thursday, Jack Dorsey laid off almost half of Block’s knowledge workers, explicitly stating that new AI tools have fundamentally changed what it means to run a company. The market immediately forgot about Monday's apocalyptic macroeconomic panic and sent Block’s stock up 17 percent, basically begging every other tech CEO to do the same thing.

Then on Saturday, which is reserved for thought leadership, Howard Marks of Oaktree Capital Management stepped in to mediate. He asked two critical questions, which if you think about it, are actually the same:

Will AI infrastructure investments produce an adequate return?
Are the AI business valuations rational?

Marks concluded his memo by essentially throwing his hands up in defeat.

“Since we don’t have full knowledge of AI’s business potential or its impact on profitability, this question can’t be answered... We’ll know in 10 years whether the resulting profits justified it.”

It is very funny to watch the smartest people in finance admit they are driving blind.

The Observability Problem with AI

Wall Street is desperately trying to validate the Agentic AI Thesis: the idea that AI will autonomously run daily operations and expand profit margins at scale2. The problem is that business-to-business (B2B) enterprise software is legally and economically engineered to be a black box:

Maximum Price Discrimination: In consumer businesses (B2C), the price of a Netflix subscription is public. In B2B, enterprise software pricing is entirely bespoke. If Salesforce or Palantir publicly reveals exactly what they charge a client, they lose all their negotiating leverage with the next buyer.
Workflows are Trade Secrets: If a logistics company figures out how to use Agentic AI to cut their routing costs by 15%, that workflow becomes their primary competitive moat. They are not going to detail how they did it in their quarterly 10-K filings; they are just going to report a blended margin expansion.
NDA Chokehold: Standard enterprise software contracts explicitly forbid clients from publishing performance benchmarks or integration metrics without the vendor’s permission. This means Wall Street only gets the cherry-picked, heavily sanitized PR case studies that the vendor allows to be published.

Wall Street analysts cannot actually see agentic AI projects working. They can only squint at lagging financial exhaust and guess whether a slight profit margin bump was caused by an AI agent, a change in tax policy, or a mild winter.

If investors want to know whether AI will pay off, they need to look for a glass house.

Microsoft recently signaled a total regime change in its Xbox gaming division. In a single stroke, the company cleared the deck of its leadership, announcing the retirement of the longtime chief, Phil Spencer and the abrupt resignation of President Sarah Bond, long viewed as Phil’s heir apparent. In their place, the company installed its top operational AI executive, Asha Sharma. While the press is hyper-focused on her recent title as President of CoreAI, they are missing her true pedigree. Sharma is a marketplace and logistics specialist who cut her teeth scaling Messenger at Meta and running the complex operations of Instacart as COO.

When a company swaps a beloved creative-first figurehead for a platform-first operator, even if the initial mandate is just standard corporate cost-cutting, it’s reasonable to suspect the ultimate goal is operationalization (i.e., engineering a system that is measured, optimized, repeatable). I spent enough time inside Activision and Microsoft to recognize an experiment when I see one: can AI-driven efficiency turn a fickle, hit-driven consumer business into a predictable, high-margin platform business?

For the broader tech and finance markets, the stakes of this restructuring go far beyond video games. If Microsoft, the most formidable AI company on earth, cannot use AI to turn around a digitally native, data-rich business that it owns entirely, the broader Agentic AI Thesis is in trouble.

Xbox is the Laboratory for Agentic AI

Video games are historically the most ruthless, high-velocity laboratories for figuring out what keeps people engaged and what gets them to spend money. If a B2B enterprise software tool has a clunky interface, employees just suffer through it because Bill Lumbergh mandates it. Video games, however, are purely discretionary. If a video game has a clunky interface, the user uninstalls it in thirty seconds and the studio goes bankrupt. This forces gaming platforms to solve complex economic and behavioral problems years before the enterprise sector even realizes they exist. Eventually, enterprise software adopted the freemium model, retention psychology, and constant product updates that gaming pioneered.

If you want to know how fast agentic AI is going to operationalize corporate America, just look at how quickly video games iterated on its own monetization model. In the mid-2010s, the dominant model was the Loot Box (charging players a few dollars for a randomized mystery box). The player might open it and receive a highly coveted, ultra-rare Gingerbread Exo Suit, or they might receive a common, practically worthless pair of digital knee pads. This randomized monetization was a highly efficient business model, right up until the player base started gathering outside with torches and pitchforks.

What followed was, by enterprise standards, an overnight revolution. Within roughly eighteen months, the entire industry abandoned the loot box, but getting to the replacement required a Cambrian explosion of frantic experimentation. Inside Activision, we were deploying a new business model for virtual goods with every game update.

To put that in perspective for the physical economy: imagine owning a retail store. Now imagine bulldozing the building, completely retraining the staff, redesigning the floor plan, repackaging every piece of inventory, rewriting the pricing strategy, and reopening the doors to millions of customers. Now imagine doing that every eight weeks, for a year straight.

The evolutionary result across the industry was the Battle Pass, a subscription chore-wheel where you pay upfront for a transparent list of digital goods you will receive, provided you treat playing the game like a part-time job. When we pivoted, I started referring to the new model internally as “deterministic,” which is just the standard economics term for systems that aren’t random, and the descriptor actually filtered all the way up to the CEO’s investor calls as a strategic pillar for monetization and long-term player trust. As it turns out, consumers vastly prefer the predictability of a second job to the opaque probabilities of a mystery box.

That is the speed at which the gaming industry experiments, and right now, Microsoft is using gaming as laboratory to test how to restructure a massive workforce around centralized AI.

Part 2: The Racecar vs. The Railroad

In an era of relentless expansion, Microsoft went on a massive land grab, culminating in the $69 billion acquisition of Activision Blizzard. If you map the organizational structure from this period, it resembles a sprawling frontier territory dotted with fiercely competitive game studios. Video games, like art and movies, are inherently volatile assets with uncertain returns. You’re always digging for a hit. The strategy of the last CEO of gaming, Phil Spencer, was to embrace portfolio diversification, aggregate a large number of games into a single, recurring monthly subscription called Game Pass to create a predictable financial product.

The entertainment industry is realizing a brutal truth: scaling a subscription business out of disparate content is incredibly capital intensive. Across the gaming sector, budgets for AAA games are escalating, and the traditional hit-driven model is facing severe profit margin pressure. Microsoft cannot abandon content, but they will try to solve the runaway cost problem by completely reshaping the infrastructure underneath.

To explain this, I’m going to compare a racecar and a railroad.

For twenty years, a big-budget video game studio has operated like a Formula 1 team. A publisher (think of them as the billionaire team owner) writes a $300 million check to a game studio (the racing team) and says: build me a machine that wins. The game studio is a small team of elite designers, engineers, and artists obsessing over one finely tuned machine (the game) for years. Behind them is a giant pit crew of support staff, including data scientists tracking player analytics, marketers fanning the hype, and QA teams kicking the tires.

The entire economic model relies on winning races. The first race of the season is launch week, usually during the holiday season. If the machine performs, the game sells millions of copies, and expansion packs become the equivalent of additional races on the calendar, each one extending the season, compounding revenue, keeping the engine hot.

When it works, the margins are breathtaking. However, Formula 1 is unforgiving. If the car is slow, or unreliable, or mistimed, there’s no participation trophy. When reviews disappoint and sales underperform, the giant pit crew is idled. Planned expansions are canceled, and the car gets wheeled back into the garage and quietly dismantled. With a AAA video game, the fate of hundreds of millions of dollars is decided in a few critical weekends.

Activision was phenomenal at building race cars, and it had a great pit crew system localized to support each race car. There was a pit crew for Candy Crush, a pit crew for World of Warcraft, and a pit crew for Call of Duty, and these pit crews were hyper specialized, siloed, and bespoke to each car.

So now Microsoft buys Activision, and the awkward M&A reality is that Microsoft builds railroads, and you don’t need pit crews on a train. Microsoft is in the business of infrastructure, predictable returns, and utility-like scale. They want to own the tracks, the stations, and the ticketing system, and they want to force everyone to pay a monthly pass just to ride.

This brings us back to the recent leadership shakeup at Xbox and the appointment of Asha Sharma. If you read the mainstream tech press, the generic assumption is that an AI executive was brought in to replace coders and artists to cut costs (i.e., AI procedural tools will generate infinite levels and assets).

However this misunderstands the economics of digital scarcity. In her first public statements, Sharma explicitly banned "soulless AI slop" from the ecosystem, declaring that games "are and always will be art, crafted by humans." Her statements serve both as a PR concession to angry gamers and a moat-building strategy. As generative AI commoditizes basic digital assets, infinite supply destroys the value of common content. If Microsoft, the most professionalized AI company in the world, won’t let AI touch the front-end product in their own digital sandbox, it implies they know generative AI cannot replicate human creativity. By strictly positioning flagship franchises like Call of Duty or Halo as premium, human-crafted art, Microsoft preserves its pricing power and protects the value of its biggest hits.

The GenAI hype cycle (using AI to create the racecar) is a distraction. The actual enterprise value is in Agentic AI: using bots to automate the pit crew, flatten the operational middle layer, and run the railroad.

Part 3: A Case Study — Industrializing Curiousity

In the mid-19th century, the American railway system wasn’t a system at all. It was a chaotic web of competing companies, and each one built its tracks to a different width, or gauge. A train traveling from New York to Chicago couldn’t make a seamless trip. Cargo had to be manually unloaded and reloaded onto different trains every time the track gauge changed. The network could only achieve national scale, and actual economic efficiency, when the industry finally agreed to tear up the old iron and standardize the tracks.

Likewise, you can’t automate across a company if every game studio runs on proprietary back-end systems (e.g. matchmaking, security, in-game marketplace, and achievement systems). You see this friction most clearly in mergers and acquisitions. When Activision merged with King, the creator of Candy Crush, it was widely known that King possessed one of the most advanced experimentation systems in the industry, but the racecar model fundamentally resists shared parts. Console games and mobile games are built on completely different technical chassis. You cannot just plug a mobile engine into a console frame. Overcoming this exact architectural friction became a core competency of mine (internal tech arbitrage). It requires dissecting the mechanics of hyper-optimized, acquired tech platforms, and then engineering an equivalent infrastructure for legacy racing teams entirely from scratch.

Before Microsoft Gaming can deploy AI at scale, it first has to standardize and centralize the core infrastructure across the portfolio. To see how workflow fundamentally changes, let’s go through an illustrative example: what happens to the unit economics of a SQL query.

In the racecar model, curiosity is a hand-crafted luxury good. A game director wanders over to the analytics bullpen and asks, “Do we have any data on how many people played our new extraction mode? I couldn’t find a dashboard for it.” The analyst says, “Good question, let me check if we even track that.” Two days later, the analyst knocks on the director’s door to clarify: “How do you want the graph to look? Percentages or absolute numbers? Only highly engaged players or all players?”

The game director, who is a creative visionary and not a statistician, furrows his brow and says, “Yes. Why don’t we just look at all the possibilities?”

Five days later, the director receives a magnificent Excel spreadsheet with fifty neatly labeled tabs. To be clear, this is a good outcome in the artisanal era. It’s a very pleasant, collaborative process, but it means the cost of answering one simple question was roughly a week of fully loaded salary for a highly trained analyst.

In the railroad model, a hyperscaler uses AI agents to industrialize that curiosity. The localized bullpen is replaced by a centralized AI support desk. Now, the game director logs into an internal portal and fills out a ticket, explicitly justifying why his question matters for the company’s core metrics. An AI agent triages the request, writes the SQL query, and generates an interactive chart. A human supervisor briefly clicks around to make sure the chart isn’t broken and sends it back. The game director, having completed a mandatory one-hour corporate training module, is now expected to just click a toggle switch on the dashboard to flip between percentages and absolute numbers himself.

It completely lacks the personal, collaborative feel of the racecar era, but the railroad era is about scale and self-checkout. More importantly, the questions being asked and the answers generated are visible across the entire global network, rather than locked within a single studio’s Slack channel. The company has successfully restructured the fixed cost of a localized pit crew into the scalable, variable cost of a cloud compute operating expense.

Part 4: AI Integration Costs

The transition from artisanal racecars to an industrial railroad sounds elegant in theory, but it often requires paying upfront AI integration costs.

You cannot just build a few smart AI agents, drop them into the middle of a legacy company, and expect them to magically start doing the work of a hundred analysts. You think you're buying a high-margin software business (Theory B), but you're actually funding a massive industrial infrastructure project (a Railroad).

To visualize this, imagine AI integration as an iceberg. Above the waterline, you see the AI agent interface that everyone is excited about. Unfortunately, building the agent itself is only a small fraction of the total cost. The vast majority of the work is submerged beneath the surface. The deeper you go, the more expensive it gets. The exact numbers vary wildly depending on the company, how ambitious the project is, and whether you read a Mckinsey3 or Gartner or IBM report, but as a very rough order-of-magnitude estimate, the costs tend to fall into three tiers.

Departmental Tooling (Costs: up to ~$2M): This is the smallest, most shallow part of the iceberg. It represents basic, low-friction AI tools like answering customer questions or writing emails. The data it needs is already reasonably organized, so the company can mostly plug a new app into the existing system. This is high-margin, Theory B software in its purest form, but it also offers the smallest competitive moat.
Workflow Automation (Costs: up to ~$10M): This deeper layer represents a significant increase in complexity and cost. Now the agents are taking on specific workloads, like predicting sales or optimizing a supply chain. Before that can happen, the company has to pay a small army of expensive consultants or data engineers to refactor the legacy databases that actually hold the company’s information. Refactoring sounds about as fun as telling a teenager to clean his room, except the room is twenty years of corporate data. Tables are reorganized. Columns with names like customer_final_final2 get renamed. Duplicate datasets get deleted. Dates that were stored as text get turned into real dates. Queries get optimized so they don’t accidentally scan the company’s entire data warehouse. Giant catch-all tables get broken into smaller ones that actually make sense. And perhaps most importantly, someone finally writes down what the fields are supposed to mean.
Enterprise Transformation (Costs: up to ~$200M): This massive base of the iceberg represents a railroad project, a total rebuild of the company’s data infrastructure so everything runs on a single system. In practice this means ripping out a lot of bespoke software, standardizing how data is stored, and persuading dozens of semi-independent teams to adopt the same platform. The goal is that once everything runs on the same “track gauge,” AI can automate workflows across the entire company. This is Theory A risk disguised as Theory B upside.
The AI agent is the cheap part. The expensive part is reorganizing the company’s data infrastructure so the AI has something intelligible to read.

The natural question, then, is whether this actually makes financial sense. Let’s run a very simplified, purely hypothetical example. Imagine a mid-size company trying to standardize its data and automate most of its analytics function.

Upfront AI Integration Costs: $15 million to clean up decades of messy data systems and make them usable for AI.
Gross Annual Savings: $5 million from eliminating roughly 50 human analysts.

On paper, this looks fine. The payback period is three years ($15M / $5M), which is perfectly acceptable for a traditional enterprise IT overhaul. The problem is that in the AI world, three years is basically forever. By the time you break even, the models, tools, and infrastructure you built the system around could already be obsolete.

In addition, replacing humans with AI introduces new, significant ongoing operating expenses (OpEx). If you replace 50 people with an AI system, you now have to pay for the computing power to run the models, hire specialized AI staff like engineers and data scientists, and maintain the system by constantly checking for errors, fixing security issues, and tuning the AI because it can drift or make mistakes.

Suppose all of that costs $2.5 million per year. Now your net savings are only $2.5 million annually, which means the original $15 million investment takes six years to pay off, which is a non-starter.

This is the excruciating reality of the agentic AI thesis, and it is exactly what a skeptical investor should push back on. Before a massive conglomerate can build its AI railroad, it has to tear out the bespoke systems in each of its studios and replace it with centralized infrastructure, all while the trains are running at full speed.

The question isn’t whether AI agents are smart enough to do the job. We already know they can write production-grade SQL, the kind that filters early, joins on indexed keys, and doesn’t accidentally scan half the warehouse. The real test is whether the upfront integration costs and ongoing maintenance will swallow the theoretical savings from replacing humans.

We are about to find out exactly how much it costs to lay the track.

Don’t worry about the carbon fiber. The increased profits from using a standardized shipping container will more than cover it.

Alap Shah co-authored a viral Citrini Research essay. My freshman year at Harvard, I remember happily philosophizing with Alap over pizza at Noch’s at 2AM. It’s genuinely impressive to see him apply that same late-night intensity to sketching out the end of the world.

With investment outpacing value delivery, the enterprise consulting world has started drawing a line between Generative AI (creating text/images) and Agentic AI (taking actions, automating workflows, running daily operations). Imagine an AI agent at Salesforce that constantly scans thousands of data points to predict which corporate client is about to cancel their million-dollar contract, automatically triggering a custom discount before a human even realizes there’s a problem. Imagine a global manufacturer of fluid control equipment using Google Cloud’s agentic workflow to automate email-based order processing, reducing average customer response from 42 hours to real time. Imagine an AI agent autonomously triaging low-level fraud alerts at a mid-tier bank. I’m trying really hard to make these sound like the dawn of a new technological epoch, but even writing this is causing my melatonin levels to spike.

These numbers are intentionally rough. As one reference point, McKinsey & Company estimates that deploying a basic off-the-shelf generative AI system might cost around $2 million, customizing models with enterprise data can reach roughly $10 million, and building a full internal AI capability from scratch can run as high as $200 million. https://www.mckinsey.com/capabilities/tech-and-ai/our-insights/technologys-generational-moment-with-generative-ai-a-cio-and-cto-guide