Meet AI Agent Johan π€ , the star of this year’s World Cup Soccer 2026 prediction series β½.
Some of you who followed my blog during Euro 2024 might remember the series where I used an AI Builder prediction model to predict how the Netherlands would perform during that tournament (URL to 2024 post: here). It was a great experiment combining two passions of mine: soccer β½ and AI π€. The feedback I received was really awesome and encouraging. Even though the model predicted “ok-ish” as I would say, it also showed the limitations of building a prediction model on limited data.
Fast forward two years and the FIFA World Cup 2026 is coming to North America this summer β½π. The Netherlands π― qualified, and of course I am doing it again, but this time properly and better. Not just a quick copy of the 2024 setup, but a genuine upgrade with better meaningful data utilizing newer and enhanced technology. And that upgrade has a name: AI Agent Johan π€.
Named after the greatest Dutch footballer of all time, Johan Cruyff π―, Johan is my Copilot Studio Prediction Agent for the 2026 World Cup. Just like the man himself, Johan sees the game differently, reads patterns others miss, and always has an opinion worth hearing. Whether he is as reliable as the legend is another question, but we will find out together over the coming weeks.
In this intro post I want to walk you through what I built in 2024, what was wrong with it according to me, and why AI Agent Johan π€ is meaningfully better.

What I built in 2024 β and what was wrong with it
Before I go into some the improvements, let me be honest about the 2024 model. It worked. It gave predictions. But looking back at it now there were a couple of things that were clearly not right.
The data was too thin and I cheated to fix it.
The model needed a minimum of 50 training rows and at least 10 examples per outcome class. When you limit your training data to only Netherlands vs. their opponents, you run out of data fast. My solution at the time was to copy all historical rows four times to hit the threshold. It worked, but it was a workaround, not a real solution. Duplicating data impacts the quality of the model and it is not something you want to explain to a data scientist π
. Johan deserves better than that.
The columns were too simple.
The 2024 model was trained on only three columns:
- Opponent,
- Home/Away,
- Friendly/Tournament.
That is a very thin set of features. The model had no idea whether Netherlands were on a five-game winning streak or had just been thrashed 3-0 in their last match. It had no idea whether the opponent was ranked 3rd or 53rd in the world. It was essentially predicting based on historical head-to-head results alone.
Real training data, no duplication
Instead of limiting the training data to Netherlands-only games, Johan is trained on all major international tournaments from 1990 onwards. That means World Cup, UEFA Euro, UEFA Nations League, Copa AmΓ©rica, AFCON and the AFC Asian Cup. Every match is included from both team perspectives so Johan learns patterns from all nations, not just the Netherlands.
The result is 7,666 training rows covering 182 nations and their official games. Therefore it is fair to say that AI Agent Johan π€ is trained on a much better dataset then the 2024 version
In summary the differences between the 2024 model and AI Agent Johan.
| Data component | 2024 model | AI Agent Johan 2026 |
|---|---|---|
| Opponent | β | β |
| Home / Away / Neutral | β (2 options) | β (3 options) |
| Match type | Friendly / Tournament | World Cup / Euro / Nations League / etc. |
| Tournament stage | β | β Group / R32 / R16 / QF / SF / Final |
| FIFA ranking difference | β | β |
| Goals scored last 5 games | β | β |
| Goals conceded last 5 games | β | β |
| Win streak | β | β |
These extra columns give Johan real context. Netherlands ranked 7th playing a team ranked 45th in the group stage is a very different prediction to Netherlands playing Argentina in a semi-final β and now Johan can actually tell the difference.
What is coming in this series?
Here is what I am planning to publish over the coming weeks:
Post 2 β Building Johan’s brain: the data layer How I combined three Kaggle datasets, joined them with FIFA historical rankings, and calculated rolling form statistics β without duplicating a single row.
Post 3 β Training Johan: the AI Builder model Importing the data into Dataverse, training the model, and reading the feature importance results β which columns actually mattered most to Johan.
Post 4 β Bringing Johan to life: the Copilot Studio Agent Setting up the Copilot Studio topic, wiring the Power Automate flow as an action to generate a prediction with a game summary, and testing the full chain end to end.
Pre-match posts β one per Dutch group stage game Day before each match β I ask Johan the question, and we see what he predicts. Same format as 2024 but with richer output, real probabilities, and a conversational interface.
Post-tournament verdict How did Johan do? Which predictions were right, which were wrong, and what would I change for next time.
Closing thoughts
Two years ago this was a fun side project using a relatively new feature in AI Builder. In 2026 it is the same fun side project, but the toolset has genuinely matured and AI Agent Johan π€ is the result of that.
Johan Cruyff famously said “football is simple, but it is difficult to play simply.” Building a prediction agent is a bit like that. The idea is simple. Getting the data right, the model right, and the agent right β that takes a bit more work. The coming posts will show every step of it.
Enjoy the tournament everyone π¦β½π₯

Check the entire schedule on the website of FIFA here