Microsoft shipped seven new AI models, but that's not the part to focus on

“Build your stack so it does not matter who wins.”

Before we count the seven models, let me move the question somewhere more useful, because counting models is the wrong place to start. Everybody is counting them right now, seven fresh new ones from Microsoft AI at Build; a reasoning model, a coding model, image, voice, transcription, all (the whole family?) released in one go.

Impressive, but also not the point. If you run your estate on Microsoft, the model sitting under your Copilot, under your Foundry workloads, under the agents, etc. your people are starting to lean on, that model is part of your supply chain now, whether you ever decided to think of it that way.

So the moment Microsoft starts building its own models instead of renting them from someone else, that is not something to scroll past, because that is your supply chain slowly changing shape. And here is that one line, straight from Microsoft's own blog, where they write that their flagship reasoning model, MAI-Thinking-1, is preferred over Claude Sonnet 4.6 in their own blind side-by-side tests, and that it matches Claude Opus 4.6 on one of the tougher coding benchmarks.


Why now

So why build all this, and why now. That's because the renegotiated OpenAI deal removed the clause that used to stop Microsoft from building its own broadly capable models, and Mustafa Suleyman has said in almost those words that the team was set free around six months ago to go after exactly this.

Then there is the plain business risk of resting your whole product line on one outside supplier whose roadmap and pricing you do not control. And there is the margin, because Microsoft co-designs these models with its own Maia silicon.

Think of it this way: every time Copilot answers a question, someone pays for that answer. Today a big part of that payment goes to an outside model partner. Own the model, and Microsoft is paying itself, at the scale of hundreds of millions of users. That is the model behind the model.


The Anthropic question

Now the Anthropic side, and I want to be careful here, because it would be easy to read this as a falling-out, and it is nothing of the sort. Microsoft has put real money into Anthropic and wired Claude deep into Microsoft 365 Copilot, in a good number of regions Claude is even on by default.

At the same time Microsoft is shipping models built to go toe to toe with Claude. That is not bypassing anyone, that is what a platform company does when it wants options on the table.

The question worth asking, as the one paying the bill, is this. What does it mean for you that your platform now offers you three frontier models, and owns one of them outright?


The real story is not the models

Here is where I think a lot of the coverage walked past the real story. The seven models are the headline, but the thing worth your attention is something Microsoft calls Frontier Tuning, where you tune a model on your own workflow data, inside your own environment, and it stays yours.

Their own example, a model tuned for Excel work, keeps pace with a much larger general model at a fraction of the cost. Read past the demo and you can see the shift underneath, the value is moving away from which model you happened to pick this week, and toward your data and the way your organization works in practice.

Note: Microsoft leaned hard on clean, licensed, traceable data as a reason to trust these models, a strong pitch for anyone sitting in a regulated corner of the world. The technical preprint also lists Common Crawl, billions of scraped web pages, in the training mix. The models look strong on their own merits, so this is not a gotcha, it is the kind of claim you want to read the fine print on before you repeat it to your own auditors.


Are we not getting tired of this

And then the question I suspect a fair few of us are carrying around, are we not getting tired of all this. The short answer is yes, and the numbers back the feeling, there was a stretch this year with a new model landing roughly every two days, while a large share of companies parked most of their AI projects before they ever reached production.

So the tiredness is real, and it is measurable. The way out of it is the part I find useful, the teams that are pulling ahead have stopped chasing the model of the month, because the model was never the moat in the first place. Your data is, your workflows are, your review process is, all the boring stuff that everyone keeps treating as plumbing.


What it means for us

In practice: for EUC and the learning work we do around it, a couple of moves follow. Build model-agnostic, put a thin layer between you and whichever model you are calling, so you can move from Claude to MAI to OpenAI without rebuilding the house every time. Resist standardizing too early, precisely because the ground is shifting this fast underneath us. And a few of these are useful to us right now, the transcription model across forty-plus languages and the voice model that picks up a tone from a short sample are real tools for course and video production, today. This is of specific importance to myself, for example.

So I will leave you where I think the real question lives. It is not whether MAI beats Claude, that race will probably swing back and forth for years, and most of us will read about it rather than feel it.

The question is whether you have built your stack so that it does not much matter who wins. Get that right, and every new generation of models turns into a gift you get to choose from. Get it wrong, and you are back to migrating in six months, and then again after that.

I want to take Frontier Tuning apart properly in a follow-up, that is the piece I am most curious about. For now, I would love to hear what you are seeing in your own estate.

Thank you for reading!

BvK.

Keep exploring