The Worst AI Dubbing We'll Ever See
- Yota Georgakopoulou

- May 19
- 8 min read
Updated: May 20
This article accompanies the Think Tank AI Reals episode featuring Yota Georgakopoulou, “The Worst AI Dubbing We’ll Ever See.” You can watch the full episode or take a deeper dive in the written article.
After three decades of working in media localization, I have witnessed many shifts, but nothing quite like AI dubbing. The change is not incremental. Unlike automatic speech recognition or machine translation, which slotted into existing subtitling workflows as useful aids, AI dubbing does not merely supplement the studio process; in many cases, it replaces it entirely. Synthetic voices and video lip-sync technologies are creating entirely new job profiles, new quality benchmarks, and new legal obligations.
1. Adoption: A Market Still Finding Its Footing
The AI dubbing market exists, and it is growing, but it is still an early-stage. Slator reported that 2024 revenues for AI dubbing platforms reached approximately $360 million. However, around $30 million came from remote human dubbing platforms. The figure also excludes language service provider (LSP) revenues for AI dubbing services. A separate estimate from Justin Beaudin, CEO of Adapt (one of the pioneering LSPs in human-led AI dubbing), puts the pure AI dubbing services market at $10–15 million. Either way, this remains a small figure compared to the global dubbing market, which is counted in the billions.
You can picture adoption as a pyramid. At the base sit YouTube and TikTok creators. They usually cannot afford traditional dubbing, so self-serve platforms like ElevenLabs are their entry point.
In the middle are broadcasters running unscripted programming, documentary makers, FAST channel operators, e-learning companies, and sports rights holders. These clients often work with factual, single-narrator content where speed matters, quality standards are achievable with human-in-the-loop workflows, and the economics of expansion into new languages work. These are the most commercially active segments today. The drivers are clear: brands want to expand their catalogues into new language markets, and new distribution channels — AVOD, FAST, social media — simply cannot absorb traditional dubbing costs. AI changes that calculation.
Premium entertainment is at the top of the pyramid. Here, the progress is much slower. There are many proof-of-concept projects — too many, in my view, if they are not translating into actual rollouts. The Amazon anime series Banana Fish has become something of a cautionary tale: the AI-dubbed release triggered significant backlash from the series’ fanbase. Adoption at a lower quality threshold can be risky for brands with devoted audiences, e.g. anime or gaming, as Scott McCarthy from DreamWorks Animation has argued: While “credibility” may be enough to gain audience acceptance for low-emotional content, audiences require “authenticity” when it comes to high-emotional, narrative episodic and feature film content. Current technology, even with skilled human oversight, is not reliably there yet, many believe.
The responsible approach, as Volker Steinbiss of AppTek describes it, is to start with the low-hanging fruit: low emotional content, markets without strong pre-existing dubbing traditions. Companies can use these projects to build expertise before moving up the pyramid to more complex content.
There is also a cultural dimension to audience acceptance, as Ludo Dufour, VP of Licensing at CuriosityStream, notes in a Slator interview. He points out that Asian audiences show roughly 80% positive sentiment toward AI-generated content. I expect the FIGS markets (France, Italy, Germany, Spain) are likely to be less accepting of AI dubbing. FIGS audiences have been conditioned by decades of high-quality local dubbing, and their expectations are correspondingly high. Into-English dubbing, by contrast, has a less established convention; the audience is not conditioned, and thus there is potentially more room for experimentation.
Three Blockers, Not One
When people discuss what is holding AI dubbing back, the first answer is usually quality. It is a real constraint for premium content, but it is not the only one.
Cost is the second blocker. Even with AI, the cost of AI dubbing at scale remains high because there’s still a need for significant human involvement, which has to be priced in. Both these issues will be addressed as technology improves — and the technology is improving continuously. It is worth keeping in mind that the AI dubbing we see today is the worst AI dubbing we will ever see.
The third blocker may be the most immediately pressing: capacity. There are simply not enough trained AI speech editors in the market. I am aware of content owners who have attempted to place orders that major LSPs could not fulfil. So, we not only have a technology problem; we also have a workforce pipeline problem.
LSPs are beginning to address it: RWS, for instance, has announced plans to upskill existing linguists to become AI speech editors. Training courses are also available. I collaborated with Voiseed on an introductory course, designed specifically for linguists entering this space, available through OOONA EDU. I am working on a more in-depth course as we speak. The idea of professional certification has also been raised, though that conversation feels somewhat premature.
For linguists who wonder whether to develop these skills, I believe this represents a growing career path. The AI speech editor role asks translators to bring language skills, timing, script adaptation, and audio sensitivity to bear in “directing” a version of a film or series into another language. For those coming from subtitling, the transition will feel smoother, and it can be creative and fun work.
2. Evaluation: How Do We Measure Quality?
There is no universal quality standard for AI dubbing yet, but frameworks are emerging.
At The Global Creative and Security Community (GC-SC), I co-lead the AI Dubbing Working Group, which represents the closest thing the industry currently has to a coordinated effort on standards. We are working toward a white paper with consolidated findings. In the meantime, most companies are operating with proprietary rubrics, shaped by their particular use cases.
A comprehensive quality evaluation framework needs to address five distinct dimensions:
Voice suitability — Sometimes called ‘voice casting’ or, in Prof. Giselle Spiteri Miggiani’s framing, physique du rôle. I often think of it as a fourth type of sync, where the voice must fit to the character. Another issue to deal with here is maintaining voice consistency across the dub. This is not usually a concern in traditional dubbing. With AI dubbing, however, it cannot be taken for granted.
Script translation and adaptation — Natural phrasing, adherence to lip-sync and isochrony requirements, appropriate register, the linguistic craft that translation professionals are trained for, applied under some of the hardest constraints, in my view.
Voice synthesis quality — What would traditionally be called ‘performance’: emotional range, tempo, stress patterns, non-verbal sounds. Synthesis has improved significantly, but this remains a technical frontier.
Synchronicity — The alignment of dialogue with what appears on screen: visual sync (body movements, hand gestures), isochrony, and lip-sync. Video lip-sync technology is advancing quickly, too.
Audio engineering — The technical quality of the final mix: levels, acoustics, integration with the original soundtrack.
The only published rubric addressing all five dimensions in detail is Prof. Spiteri Miggiani’s Script, Speech and Sound (SSS) framework. What we are attempting at GC-SC is to consolidate existing approaches into a single industry-level standard.
Underpinning all of this, however, is a harder question: what does audience acceptance look like in practice? The ultimate quality measure is whether viewers like, tolerate, or reject a dub — and whether it generates additional revenue for content owners. There is a scarcity of reliable data on this. Aside from the known failures reported in the press, one success story shared in a GC-SC working group meeting was of a microdrama AI-dubbed into French (traditionally a demanding market), which accumulated 150,000 subscribers and over 10 million views in a single month. In the e-fitness domain, Apple Fitness+ and Peloton have both launched AI-dubbed fitness content in the past year, and the fact that both programs are ongoing suggests positive outcomes.
One research finding is worth flagging for its broader implications. A 2024 paper Human Bias in the Face of AI found that in blind tests, raters could not reliably distinguish between human and AI-generated content. But when content was labelled, raters consistently scored human-attributed content higher, even when the labels were deliberately swapped. The ‘human’ label itself added perceived quality. This has real implications for how content owners approach disclosure.
3. Compliance and Protection: What Content Owners Need to Know
The legal landscape around AI dubbing is still forming. That is precisely why engaging legal counsel before beginning even a proof-of-concept project is essential for risk management reasons.
Protecting Talent and Yourself
It does not suffice to be the rights holder for a piece of content. If the original actors’ performances are to be affected — their faces altered for lip-sync, or their voices cloned — explicit consent from the talent is required. Existing agreements with talent need to be reviewed for what they say, or fail to say, about synthetic reproduction.
Even where a synthetic library voice is used rather than a clone, the original talent’s voice will typically be submitted to an AI dubbing company as source material. Content owners need to understand how that biometric data will be used. Will it be used to train base models? Might it be sold even? Contracts must explicitly prohibit any use of talent voice data beyond the specific production in question.
A second layer of protection relates to the training data underlying the Text-To-Speech (TTS) systems one plans to use. There is a pressing question of whether those systems were trained ethically, on recordings properly licensed for that purpose. Reports of companies training on scraped internet content regardless of rights are not the exception. A lawyer at a recent GC-SC meeting cited over 50 active class-action suits against AI companies for alleged illegal use of scraped data.
Disclosure and Compliance
In Europe, disclosure of AI-generated content is a legal obligation under the EU AI Act. Similar requirements are being implemented across US states. The EU has also recommended a universally recognisable symbol for AI-generated content though the specifics of implementation are still evolving.
In practice, the current approach involves including a slate at the start or end of video content acknowledging the use of AI-generated voices, and perhaps also publishing an FAQ explaining the process and the role of human oversight. The stakes are especially high for factual and instructional content, where audiences assume authenticity. Nevertheless, the legal obligation applies across content types.
One initiative worth following closely is C2PA (Coalition for Content Provenance and Authenticity), an industry effort involving major companies that focus on embedding verifiable metadata into content to trace its origin, creation process, and any modifications. For AI dubbing, this could point toward a future where disclosure is not just a slate at the end of a video, but provenance information built into the content itself.
Early Days, Real Stakes
AI dubbing is here to stay.
What I find exciting is that right now, we have an opportunity to establish how this technology gets adopted, what standards govern it, and what obligations come with it, before the market matures and those questions are answered by default rather than by design. The GC-SC AI Dubbing Working Group is one part of that effort. The emergence of training pathways for AI speech editors is another. So are the industry conversations around consent, transparency, and provenance.
AI dubbing is the most significant change I have seen in thirty years of media localization. It is also, at this moment, an opportunity for us to shape this change.
Yota Georgakopoulou is a member of the AI Localization Think Tank and a media localization consultant with 30 years of industry experience. She has worked on AI dubbing market research, quality evaluation, proof-of-concept projects, and speech editor training, including collaborations with AppTek, Dubformer and Voiseed, and co-leads the AI Dubbing Working Group at The Global Creative and Security Community (GC-SC) where she consults as Lead Localisation Expert.




