top of page
Blurry Blue_edited_edited.jpg

AI ThoughtCon – The Technology (Day 3)


By the time Day 3 started, most of the big questions had already been put on the table. What AI is doing to the industry, how roles are shifting, and where the risks sit. The last day moved away from that and into something more practical. What actually gets built, how it runs, and what happens once you move beyond controlled examples.

It also brought in a slightly different perspective. Less discussion from within localisation itself, more from people working closer to the systems that everyone else is reacting to. That changes the tone quite quickly, because some of the assumptions we tend to rely on don’t really hold up when you look at how priorities are set on the product side.

Across the sessions, the same tension kept coming up. It’s not difficult anymore to get something to work. The difficulty sits in everything that happens after that, and each speaker approached that from a slightly different angle.

Keynote: The Next 5 Years (From the Builder’s Perspective)

Speaker: Ben Hylak

Ben’s keynote was uncomfortable. Not in a dramatic way, just in the sense that it forces you to sit with things that don’t quite line up.

He spoke from the builder's side, and that immediately exposes a gap. What matters there and what we in localisation tend to focus on are not aligned. Language quality, as we define it, simply isn’t a priority in those environments. Not because people don’t care, but because nothing is forcing them to care yet. Systems are being optimised for output and scale, and unless quality becomes a problem that affects product performance or creates risk, it stays secondary. That “yet” sits in the background of everything.

There’s still an assumption in our space that if we just explain quality better, or push harder on nuance, things will shift. In reality, that’s not how decisions get made. Vague arguments don’t land. General concerns about quality don’t move anything. What does move things is failure. Something that costs money, damages trust, and creates legal exposure (money again). Until that happens often enough, priorities stay where they are.

At the same time, the gap isn’t one-sided. The people building these systems often don’t have a deep understanding of the linguistic and cultural complexity they’re working with. Not out of negligence, but because they’ve never had to operate at that level of detail. As systems scale across languages and markets, that becomes a risk, even if it doesn’t show up immediately.

There is also very little crossover between these worlds. Localisation spends a lot of time talking to itself, and Silicon Valley to itself. Without shared context, both sides end up making decisions based on partial understanding, and that only becomes visible later.

The uncomfortable part is that change here may not come from alignment. It may only come from pressure. Regulation, above-mentioned failure, legal challenges, and, yes, even loss of lives. Not ideal, but historically effective.

The reaction in the chat during the session made it clear that this landed. It’s not necessarily that everything said is new, but we might not have been confronted with it so plainly and honestly.

Talk 1: Scaling without SaaS Lock-in

Speaker: Emily Diamandopoulou

Emily’s talk felt familiar very quickly, which is probably why it worked. A new tool appears, promises to solve a long list of problems, you spend time setting it up, training people, moving data across, and for a while it feels like progress. Then the gaps start to show, something doesn’t quite work the way you need it to. Another tool appears that claims to fix that gap, and before long, you’re repeating the same process again.

Author’s note: If you look at it over time, it explains why something like the Nimdzi technology radar looks the way it does. Layer upon layer of tools, each trying to address a piece of the problem, without the underlying structure really changing.

What she did was shift the focus away from the tools and onto where the friction actually sits. In smaller teams, it’s rarely the linguistic work, it’s everything around it. In short, the workflows. The moments where something needs to move, where someone needs to know that something has happened or is about to happen. E.g. a file is delivered, a quote is waiting, an invoice that slipped through. That layer is often informal, held together by memory or habit. That’s where the hours go every day.

That’s also where automation starts to make sense, if it’s applied with some restraint, to reduce the constant coordination around it. Once you look at it like that, a lot of the noise around replacing translators starts to feel misplaced.

The locked-in discussion goes further than the usual technical angle. Once workflows and habits settle into a platform, leaving becomes difficult regardless of whether the tool still serves you properly. There’s the cost of moving, the effort of retraining, and the simple fact that people get used to how things work, and fights change.

Her advice is fairly straightforward, but not always followed. Keep your data in formats you can move. If your data is not portable, you are effectively tied to the system that holds it. Document your workflows somewhere that isn’t tied to a specific platform. Read contracts with the assumption that you may need to leave. And every now and then, test that. If you had to rebuild your setup within a week, could you actually do it? If not, something is too tightly coupled.

The alternative she described is lighter than most people expect. Not another platform, but a thin layer that connects processes and keeps things moving, while the underlying tools remain replaceable.

There’s also a human side to this. When you start automating coordination, roles shift. There are some things that disappear and others that change. Not everyone adjusts at the same pace, and that needs to be managed properly. The idea that stays with you is simple enough. Automate the coordination and leave the core work alone.

Talk 2: From Prototype to Production

Speaker: István Lengyel

István’s session brought things back to something very practical. You build something quickly with these tools, it works, it looks convincing, and for a moment, it feels like the problem is solved. Then you start thinking about what it would take to run that same thing properly, and it becomes clear that you’re not there yet.

Using Lovable as the example, he showed how quickly ideas can now be turned into working prototypes. That part is real. You can connect systems, test workflows, build interfaces, and get to something tangible far faster than before. For alignment and internal discussions, that has obvious value.

The complication comes from how easy it is to mistake that for a finished solution. A prototype only needs to demonstrate that something can work. It doesn’t need to deal with test or edge cases, scale properly, or remain stable over time. Production is where it does. It has to deal with real inputs, unexpected behaviour, and the expectation that it will keep running without constant intervention. The gap between those two states is still there, even if it feels smaller.

That’s where the usual discipline becomes relevant again. Clear thinking and defined processes, as well as an understanding of what happens when something fails. The speed at which things can be built now doesn’t remove that work. It just means the gaps show up later if they are ignored early.

One point that stands out is that service providers may, in some cases, be in a safer position when using these tools than software companies. A product team has to anticipate and handle every possible failure scenario upfront. In a service environment, there is usually still a human layer that reviews and adjusts outputs before they move further. That creates a buffer, which makes it easier to use AI-generated components in a controlled way.

These tools don’t improve the underlying thinking. If the idea is not well defined, the system reflects that. You just get there faster and usually well before production.

The barrier to building has dropped. More people can now assemble working systems and test ideas without needing deep technical backgrounds. But once those systems move beyond internal use, the same questions remain. Who maintains them? How are they secured? How can they evolve without becoming fragile? And most importantly, what happens when something breaks”

Talk 3: What “Good” Looks Like

Speaker: Marta Nieto Cayuela

Marta took a question that sounds simple and showed fairly quickly that it isn’t. What does “good” actually mean when you’re dealing with GenAI output?

For a long time, quality in localisation had fairly stable reference points. Accuracy, consistency, terminology, and error counts. You could argue about thresholds, but the general idea of what “good” looked like didn’t move much. But GenAI might change that.

You can have output that reads well and flows naturally. It looks deceptively convincing, while still being wrong in ways that are not immediately obvious. Sometimes it’s factual, sometimes contextual, sometimes it only becomes visible when the content is used in a real situation. That creates a different kind of risk, because it’s easier to miss.

It also exposes the limits of some of the models and frameworks still in use. Approaches built around identifying errors in relatively stable outputs don’t always handle variability well. They rely on references that don’t map neatly to how these systems behave. At the same time, automated metrics can measure similarity, but that doesn’t tell you much about whether something is actually useful or appropriate.

Human evaluation still matters for those reasons, but it doesn’t scale easily when volumes increase and content is generated continuously.

So the question shifts. The scoring output is about defining what “good” means in a specific context, and then making sure that definition is reflected in how systems are set up and evaluated. Problems don’t only appear at the output stage, they often originate in how prompts are structured, which models are used, and how workflows are designed.

That’s where her Define → Evaluate → Measure approach comes in. Its a way to structure thinking. What matters in this context setup actually works best for that. And whether the results meet expectations.

Another important point is that quality is no longer static. What is acceptable in one context may not be acceptable in another, and expectations can change over time. That makes standardisation more difficult in practice, as it needs targeted human review where it matters, and ongoing monitoring rather than a single check at the end. “Good” is no longer something you assume. “Good” needs to be defined and revisited.

CLOSING

By the end of Day 3, it was quite clear that getting something to work is no longer the difficult part. You can build quickly now, connect tools, generate output, and get to something that looks convincing without too much effort. Where things start to fall apart is later.

Once that same setup is used with real data, across different languages, by people who were not part of the initial build, small issues start to compound. Sometimes it’s obvious, but more often it isn’t. Things look fine on the surface, but something is slightly off, or inconsistent, or just not quite right for the context, and that only becomes visible once it’s already in use.

Less time goes into producing content directly, more time into deciding what can be trusted, what needs checking, and where something carries risk, even if it reads well. That shift sounds efficient when you describe it, but in practice it puts more pressure on judgment.

There are still gaps that haven’t really closed. The way these systems are built, the way language is treated inside them, and the way quality is understood in our space don’t line up particularly well. At the same time, tools are being adopted quickly, often faster than teams fully understand how they behave once things are no longer controlled.

You see it in small ways. A workflow that works perfectly in a demo, but becomes fragile when volumes increase. Output that looks fine in isolation, but creates problems when used in sequence or at scale. None of that is really new, but the speed at which it’s happening makes it harder to ignore.

For localisation, we shouldn’t just keep up with tools, we should stay close enough to how these systems are actually used to see where things go wrong, and being able to step in where needed.

The technology will keep moving. The difference sits in how well it’s understood, and how deliberately it’s used.


 
 
bottom of page