Artificial IntelligenceBlogContent StrategyLocalization

The Perils and Promises of AI in Translation: A Critical Examination of LLMs

By January 17, 2024 January 31st, 2024 No Comments
AI and Machine Translation strengths and pitfalls

There are perils and promises when using AI in translation and localization. The interest of translation and localization experts in integrating AI into their translation workflows is consistently increasing. However, LSP (Language Service Providers) companies must clearly understand the strengths and limitations of these AI systems before incorporating them into their operations. By objectively evaluating GenAI (Generative Artificial Intelligence) capabilities, companies can better balance the need for technological innovation with their commitment to maintaining quality standards.

An LLM (Large Language Model) is a software system that assembles words and text fragments to reply to a user’s prompt. Using a statistical model based on its training data, it gathers relevant fragments to form a contextually appropriate and well-written response to a prompt. However, the smoothness of the LLM’s reaction can hide its weaknesses and trick users into believing they are interacting with a semantic model or even that LLMs possess some level of consciousness. For example, in 2022, Blake Lemoine, a software engineer at Google, later fired by the company, reported that an AI chatbot possesses consciousness. He was wrong. LLMs’ have no semantic model or consciousness. Given the statistical nature of LLMs, LSPs should carefully consider all of their risks and limitations. We identified 12 significant LLMs’ limitations:


LLM’s can provide illogical answers.

Despite their advanced ability to parse and generate human-like responses, Large Language Models can make illogical conclusions. They may need help from the user to follow an argument’s logic or to stick to a solid line of reasoning across a lengthy conversation. If users are not aware of this limitation, it can lead to erroneous conclusions and decision-making errors.


LLM’s responses have no factual accuracy.

An LLM’s fluency fixation may result in factual faults. The primary function of an LLM is prompt-based text generation, not factual accuracy. The citation of imaginary court cases or academic papers is a prime example of LLM fabrication. In 2023, a lawyer cited six cases in court, including Varghese v. China Southern Airlines and Shaboon v. Egypt Air, but the cases had been entirely fabricated by ChatGPT. The lawyer was “mortified” and “did not understand [ChatGPT] was not a search engine, but a generative language-processing tool.” LLMs should never be confused with search engines; checking their responses for real-world accuracy is essential. This behavior becomes more evident when a prompt pushes LLM into a context where it receives very little information during the training sessions. The technical word used for this behavior is “hallucination.”


LLM’s responses are frequently lengthy and bloated.

LLMs tend to give more information than necessary when elaborating on simple answers. Although setting a specific answer length with the prompt is possible, verbose explanations can cause unwanted distractions in fast-paced business settings where being concise is highly valued.


LLM’s responses tend to be generic.

AIs often give generic responses owing to their broad training across a wide spectrum of diverse domains. Unless we set specific context boundaries with the prompt, the generated content can miss nuances of particular niches and reduce the effectiveness of personalized communication.


LLM responses are non-deterministic.

LLM responses are intentionally designed to be non-deterministic. In other words, the LLM can generate two different responses to the same prompt. When unanticipated, this unpredictability could disrupt operational processes heavily dependent on consistency. The non-deterministic nature of LLMs has been a pain point for developers seeking to integrate LLMs into existing systems. These systems expect predictable, deterministic inputs like valid JSON or XML. You can poke and prod the LLM to produce valid JSON or XML, but there are no guarantees that it will always work. Because the variability of the responses results from a randomly selected initial seed, when the LLMs are accessed via API, it is possible to make the LLM fully deterministic by explicitly proving the same seed with each prompt.


LLMs could be injudicious.

LLMs’ lack of ethical judgment is another critical limitation, posing severe risks when applied in domains requiring high ethical standards like healthcare, law, finance, or human resources. Indeed, LLMs cannot distinguish between credible and non-credible sources in their training data. The responsibility remains on us to double-check all the answers before integrating them into our documents or processes.


LLMs practice hedging with their responses.

LLMs use a communication technique called hedging, a non-committal language that may preclude direct answers. For example, frequently using the words “may,” “might,” and “could.” For businesses with critical, clear-cut responses, hedging muddles the message and impedes decision-making. Another example of hedging is when an LLM appends an obligatory disclaimer to the response. Ask an LLM about the benefits of the translation memory. It will list some benefits, and then at the end of the response, you may see a disclaimer: “Translation memory is not a silver bullet. It’s important to carefully weigh the pros and cons of applying translation memory to your project.”


LLM’s factual information is frozen in time.

Time-sensitive queries pose a challenge to LLMs because they are not trained on real-time data and cannot understand the context of current events. This limitation could lead to outdated business insights or sub-optimal advice in rapidly evolving markets. GPT-3.5’s cutoff date has advanced from September 2021 to January 2022, and GPT-4-Turbo’s cutoff date is April 2023. Unlike ChatGPT, Google Bard doesn’t have this limitation and is constantly connected to updated data on the web.


LLMs have limited visibility of the context.

LLMs might need help maintaining context over extended conversations, leading to fragmented and confusing outputs. This poses severe limitations for applications requiring intricate understanding like customer support, negotiations, or comprehensive business analysis.
A generic LLM has little visibility into the context of your documents, your company’s knowledge base, or style guides. Some context can be provided through few-shot prompting, fine-tuning, and vector embeddings. ChatGPT proposes a solution allowing users to build their GPTs and train them with documents, knowledge bases, and any document. While this can be a very promising evolution for the future, at the moment, it can pose critical privacy concerns.


LLM’s can be ethnically myopic.

Models like ChatGPT are primarily trained on English text data from the Internet, limiting their capabilities in diverse ethnic contexts. LLM responses are frequently myopic, parroting Anglo-centric talking points. Companies leveraging LLMs in pursuing a global strategy could lose opportunities due to miscommunications, cultural oversights, or unintended offenses. 


LLM’s responses can biased.

LLMs can unintentionally perpetuate existing biases in their training data, posing severe ethical concerns. If the LLM’s responses are unthinkingly reused, they can create some risk for your brand reputation. MediaLocate recently tested the image generation capabilities of a DALL-E 3. Although the images were visually impressive, the AI defaulted to stereotypes when depicting people from various countries. As image generation AI can produce biased content, LLMs may also stumble. Companies should proceed cautiously and be vigilant for inherent bias in LLM responses.


LLM’s responses can be legally tenuous.

LLMs may touch upon legally sensitive areas, such as data privacy or unintended copyright infringement, depending on how we use them. Companies must tread carefully when integrating LLMs into their systems to avoid legal complications. The New York Times recently sued OpenAI and Microsoft for using its journalists’ writing to train AI and return (“to regurgitate” is the technical word) text blocks identical to the ones used in the training sessions. If you ask ChatGPT to write a paragraph in the style of the New York Times, to what extent are NYT journalists being infringed upon? The answers are far from clear.


LSP companies must comprehensively understand the perils and promises of AI in translation. They must be able to evaluate their inherent risks and limitations as LLMs constantly redefine technology and productivity boundaries. We must adjust our expectations of GenAI by remembering that these are statistical machines without a semantic model. A possible solution to overcome some of the current limitations of LLMs is to create well-crafted and compelling prompts. Prompts cannot be improvised; effective prompts result from a good understanding of how an LLM works and significant experience built on the field. By fully understanding the potentials and pitfalls of AI technologies, we can foster better decision-making and enable our organizations to employ LLMs responsibly, ethically, and efficiently. Enterprises that know how to leverage the strengths of LLMs and navigate their limitations will undoubtedly be best positioned to benefit from this transformative AI technology and achieve a strategic edge.

MediaLocate‘s experts have matured a clear conceptualization of the strengths and limitations of AI. Contact us for level-headed help in confidently and judiciously applying AI to your translation and localization workflows.

Leave a Reply