AI Video and Immersive Training: What Gets Automated, and What Only Looks Like It

Prefer listening? Play the audio version:

This is the last piece in our series on the technologies actually changing elite sport in 2026. As ever, this is not a manual written from the touchline. It is a map of where the evidence is strong, where it is thin, and what a performance department should ask before letting any of it near a decision.

Two technologies usually get bundled together under the heading of the automated coach: generative AI applied to video and analysis, and immersive virtual reality applied to training. They are bundled because they share a promise, that the expensive, repetitive labour of coaching can be handed to a machine. But they sit at very different points on the evidence curve, and they fail in different ways. The AI-video half is largely real: computer vision increasingly automates the tagging and retrieval work that used to require paid human analysts, generative interfaces are beginning to compress the reporting layer on top of it, and the main question there is increasingly managerial and economic, provided the model clears validation on the club’s own use case. The immersive-training half is more contested: the technology clearly does something, but the something is smaller and more specific than the marketing implies, and the gap between what it improves in the lab and what it improves on the pitch is the whole story. This piece takes the two halves separately, because conflating them is exactly the mistake the marketing wants a buyer to make.

The video half: real automation, and the question is economic

Start with the half that is genuinely happening. The tagging and clipping of match footage, and increasingly the summarising of it, work that occupied dedicated analyst staff for decades, are being absorbed into software. Some platforms now ingest match footage and return automated event tags, clips and summary reports on timelines that materially compress what used to be overnight analyst work. Parts of the work that paid analytical services used to deliver overnight now run largely on computer vision for detection and, increasingly, language models for the summary.

This is not a claim about the future. It is a description of capabilities already present in commercial products, even if their reliability varies by sport and setup, and it is the most economically consequential thing in this article. The underlying capability is well documented: a 2025 systematic review and meta-analysis of AI in sports performance analysis reported a pooled classification accuracy of nearly 88 percent across computer-vision and deep-learning models, though with very high heterogeneity between studies and an explicit caveat about the difficulty of generalising from controlled conditions to live competition (Pietraszewski et al., 2025).

The important point for a performance department is what this does and does not replace. It replaces the lower-value layer of analysis, the mechanical labour of finding and labelling the moments. It does not replace the higher-value layer, the judgement that turns a labelled moment into a coaching intervention, an opposition model, or an in-match decision. The analyst role is being restructured, not eliminated: the hours once spent tagging are freed for interpretation, and the analysts who thrive will be the ones whose value was always in the reading rather than the labelling. A club that treats the automation as a headcount saving is missing the point. The gain is redeployment, moving skilled people up the value chain, not removing them from it.

There is a second-order effect worth naming, because it cuts against the whole premise of buying analysis for an edge. Analytical advantage has always come from seeing what rivals do not. When every club runs the same automated tagging and reads the same machine-generated summaries off the same platforms, the analysis converges, and the edge that used to come from a sharper analyst erodes into a commodity everyone owns. The automation lowers the floor, which is real value, but it also flattens the ceiling. The clubs that keep an advantage will be the ones whose humans ask questions the platform does not answer by default, which is another way of saying the value migrates, again, to interpretation. Buying the same tool as everyone else buys parity, not advantage, and it is worth being honest about which one you are actually paying for.

So the honest read on the AI-video half is that the technology works for many workflows and the decision is managerial. The questions worth asking a vendor are unglamorous and commercial: how accurate is the automated tagging on your sport and your camera setup, what is the error rate on the events you care about, who owns the resulting data, and what does the workflow cost against the analyst time it frees. These are procurement questions, not scientific ones, and that itself is the tell that this half of the category has matured.

There is one caveat that keeps the procurement decision from being purely mechanical. Automated tagging is accurate on average and wrong in specific, sometimes systematic, ways: an occlusion, an unusual camera angle, a congested set piece, a sport or league the model was undertrained on. When a human analyst tagged a match, the errors were random and visible. When a model tags it, the errors are patterned and invisible, buried inside an output that looks complete and authoritative. A club that builds its opposition analysis on automated tagging without spot-checking the model against ground truth is trusting a system whose failure modes it has not characterised. The mature posture is not to distrust the automation but to audit it: know its error rate on your sport, and keep a human in the loop precisely where the model is weakest.

Concretely, auditing means asking a vendor for precision and recall on the specific events that matter, not a single headline accuracy figure; for the false-positive and false-negative rates, because they carry different costs in scouting than in recruitment; for agreement against a human analyst’s ground truth; and for how performance breaks down by camera angle, by occlusion, by set piece, and across competitions and seasons where the model may drift. The threshold a club should accept depends entirely on the use: a tagging error that is trivial for post-match coaching review can be disqualifying for a recruitment decision worth millions. The tool does not know which use it is being trusted for. The buyer has to.

Where the video half overreaches: from description to prescription

There is a line inside the AI-video category that separates the mature from the speculative, and it is worth naming because vendors routinely blur it. The distinction that matters runs along a gradient of confidence. Tagging events and surfacing clips are increasingly robust for many workflows. Summarising patterns is improving but sits closer to interpretation and should be audited more carefully, because all of it remains dependent on the sport, the camera setup, how events are defined, and the data the model was trained on. And at the far end of that gradient sits prescription: the tactical AI assistant that tells a coach the opponent’s full-back overcommits after a switch of play and implies they should exploit it. The mechanical description layer, tagging events and surfacing clips, is increasingly mature; prescribing what to do about it is not. Those systems exist and are improving, but their output is a hypothesis generated from historical data, not a validated recommendation, and the best coaching staffs treat it as one input among many rather than an instruction. The failure mode is subtle: a fluent, confident, natural-language tactical summary reads as authoritative whether or not the pattern it describes is real or actionable. A category that can describe the past reliably is not thereby able to prescribe the future, and the fluency of the language model makes that distinction easier to lose, not harder.

Beyond both of these sits a genuinely new capability that is still more promise than practice: synthetic scenario generation, the use of generative models to build training situations that never actually occurred, whether as video or as an immersive environment. In principle a coaching staff could generate, from the previous weekend’s footage, an interactive simulation of an opponent’s tactical pattern and let players rehearse against it. It is one of the most ambitious threads in this space and, as of 2026, there is little public evidence that any of it is operational at elite level; it remains a roadmap capability rather than a deployed one. It is worth flagging not because it is ready but because it is where the AI-video and immersive halves would eventually converge, and because the same evidential caution applies in advance: a synthetic scenario is only as useful as its fidelity to the real one, and a generated defence that behaves in ways a real defence would not is a convincing way to train the wrong instinct.

The immersive half: real effects, and the lab-to-pitch gap is the whole story

The virtual-reality half is where the evidence gets genuinely interesting, because it is neither the miracle the vendors sell nor the gimmick the sceptics dismiss. The strongest and most honest way to read it is through the distinction between cognitive and motor applications. For perceptual-cognitive training, decision-making, anticipation, pattern recognition, reading a developing play, the evidence is real. A meta-analysis of 22 studies found that perceptual-cognitive training meaningfully improved athletes’ anticipation and decision-making (Zhu et al., 2024). That is the finding the marketing quotes, and it is true. But the same meta-analysis contains the finding the marketing omits, and it is the one that matters most.

The improvement measured in the laboratory, on the trained task itself, was large, an effect size of 1.51. The improvement that actually transferred to real-game performance was less than half of that, an effect size of 0.65 (Zhu et al., 2024). The training works; it is the transfer that shrinks. An athlete who gets dramatically better at the VR drill gets only modestly better at the thing the drill is supposed to improve. This is the single most important fact in the immersive-training category, and it is precisely the number a vendor demonstrating an impressive in-headset improvement curve is not showing you. The question is never whether the athlete improves at the task on the screen. It is how much of that improvement survives contact with the pitch.

That gap has a name in the wider literature, the problem of far transfer, and it is genuinely unresolved. Some researchers argue the current evidence is not sufficient to support the claim that perceptual or cognitive training transfers to athletic performance at all, as opposed to making athletes better at tests that resemble the training. A scoping review of extended-reality tools for perceptual-cognitive skills found the field growing quickly but still working out basic questions of study design and transfer measurement (Jia et al., 2024).

The defensible position for 2026 is that VR is a promising and increasingly credible tool for training and measuring parts of the cognitive layer of sport, with the size of the real-world benefit still uncertain and probably modest, and that this is a respectable place for a technology to be, as long as nobody pretends the lab number is the pitch number.

There is a subtler trap underneath the lab number itself, and it should make a buyer more sceptical still. A 2025 meta-analysis of 33 controlled trials found that a large part of the measured improvement in digital visual and decision-making training may be a learning effect: when the test used to measure progress closely resembles the training task, often the same software or platform, scores inflate because the athlete has learned the test, not the underlying skill (Guo et al., 2025). The effect was especially pronounced for decision-making accuracy. This is the methodological version of the vendor demo problem: an impressive in-headset improvement curve can reflect familiarity with the drill rather than a transferable gain, and unless the measurement is independent of the training, the number on the screen is measuring the wrong thing.

The motor half is weaker still. The claim that practising a physical skill in VR, a technique, a movement pattern, transfers to improved execution with a real ball or implement is far less supported than the cognitive claim, because the sensory and mechanical fidelity of current VR does not reproduce the forces, the weight, and the fine proprioceptive feedback that motor learning depends on. VR can rehearse the decision of when and where to strike. It cannot yet rehearse the strike itself with enough fidelity to be trusted as a substitute for physical practice. A performance director should treat VR motor-skill claims as the least evidenced part of an already uneven category.

This is not simply a matter of the technology being young. The limitation is partly physical, and partly a deeper question of representativeness that the skill-acquisition literature has argued over for years: skills transfer best when the practice environment reproduces the perceptual information and action possibilities of the real one, and a headset that removes the ball’s true flight, an opponent’s real body, and the forces of the ground removes exactly the information the skill is tuned to (Pinder et al., 2011). Motor learning in particular is built on the exact forces, timing and proprioceptive feedback of the real action: the weight of the implement, the resistance of the ground, the precise loading of a joint through a movement. Current VR reproduces the visual scene convincingly and the physical one hardly at all. That is why the cognitive applications, which mostly need the visual scene and the decision to be right, are on firmer ground than the motor ones, which need the physical scene to be right and do not get it. Even the cognitive case is not fully exempt: a decision rehearsed without the real body of an opponent or the real consequences of getting it wrong is a partial rehearsal, which is part of why the transfer shrinks rather than holds. Until haptics and physical fidelity improve by an order of magnitude, the honest framing is that VR trains what the athlete sees and decides, not what the body does, and a claim that it improves technique should be met with the question of what, mechanically, is supposed to have transferred.

Buy the automation, pilot the immersion, and always ask for the pitch number

The two halves resolve into two different postures. The AI-video half is a managerial decision about a working technology: it automates the mechanical layer of analysis, it frees skilled analysts for the interpretive work only humans do well, and it should be evaluated on accuracy, cost and data ownership like any other piece of infrastructure. The overreach to watch is the tactical assistant that crosses from describing the past to prescribing the future under cover of a fluent summary.

The immersive half is an experimental decision about a partially validated one. Cognitive VR is worth piloting for decision and anticipation training, with clear eyes about the lab-to-pitch gap and a measured endpoint that lives on the pitch, not in the headset. Motor-skill VR is worth watching, not buying. So the position we would hold is this. In the video half, ask the commercial and validation questions and move if the audit clears your use case, because the underlying workflow automation is now mature enough to buy. In the immersive half, ask one question above all others, and ask it of every demo, every validation slide, every vendor claim: not how much the athlete improved at the task, but how much of that improvement showed up in competition. The two halves carry different evidential burdens, and keeping them separate is the whole discipline. In video, the value claim is workflow compression: can the system produce the same clips and summaries faster and cheaper, without introducing unacceptable error? In VR, it is transfer: can the system produce a better player outside the system? Those are different questions with different evidence attached, and a vendor who leads with the easy one while hiding the hard one is telling you which of them they would rather you not examine.

How this series is made, and how to read it: this is editorial analysis, not a practitioner’s memoir and not a systematic review. PERFORM’s pieces are researched and drafted with the assistance of AI tools, then reviewed, edited and fact-checked by our editorial team against primary sources, peer-reviewed literature, clearly labelled preprints, industry reports, league and company announcements, and practitioners’ own published work. Where the evidence is strong we say so; where it is limited we treat it as limited; where a claim comes from a vendor or corporate announcement we treat it as a hypothesis, not proof. The views here are our editorial position, drawn from the published record rather than first-hand experience inside an elite performance department. Where practitioners are named or quoted, those words are their own. Where we couldn’t verify a claim, we left it out. And where you have the hands-on experience we’re writing about, we’d rather hear from you than pretend to it.

References

Guo, Y., Yuan, T., Yang, M., & Qiu, J. (2025). Does the “learning effect” caused by digital devices exaggerate sports visual training outcomes? A systematic review and meta-analysis. Frontiers in Physiology, 16, 1664572. https://doi.org/10.3389/fphys.2025.1664572

Jia, Y., Zhou, X., Yang, J., & Fu, Q. (2024). Animated VR and 360-degree VR to assess and train team sports decision-making: A scoping review. Frontiers in Psychology, 15, 1410132. https://doi.org/10.3389/fpsyg.2024.1410132

Pietraszewski, P., Terbalyan, A., Roczniok, R., Maszczyk, A., Ornowski, K., Manilewska, D., Kuliś, S., Zając, A., & Gołaś, A. (2025). The role of artificial intelligence in sports analytics: A systematic review and meta-analysis of performance trends. Applied Sciences, 15(13), 7254. https://doi.org/10.3390/app15137254

Pinder, R. A., Davids, K., Renshaw, I., & Araújo, D. (2011). Representative learning design and functionality of research and practice in sport. Journal of Sport and Exercise Psychology, 33(1), 146–155. https://doi.org/10.1123/jsep.33.1.146

Zhu, R., Zheng, M., Liu, S., Guo, J., & Cao, C. (2024). Effects of perceptual-cognitive training on anticipation and decision-making skills in team sports: A systematic review and meta-analysis. Behavioral Sciences, 14(10), 919. https://doi.org/10.3390/bs14100919