overview

Responsible AI • March 4, 2025

LLMs & Responsible AI #3: Addressing Robustness Risks in LLMs to Build Resilient AI

Introduction: Why Robustness in LLMs Matters More Than Ever

As organizations increasingly deploy generative AI for productivity, customer service, and innovation, ensuring robustness in LLMs is critical. Recent studies reveal that many NLP systems, including LLMs, are vulnerable to small input perturbations, or often failing to generalize well beyond their training data.

Users frequently make typos when interacting with AI systems. For instance, a user intending to type "schedule" might input "scheduel",  or rephrasing "Where’s my package?" as "Yo, where’s my order at?" may cause errors if the LLM lacks robustness, leading to misunderstandings or incorrect responses. Research has shown that LLMs can be sensitive to such perturbations, affecting their performance. Robust LLMs can handle such variations seamlessly, ensuring consistent, accurate performance across different phrasing and styles.

Robustness plays a key role in ensuring effectiveness, performance stability, and fairness in AI applications, directly impacting user experience, brand reputation, regulatory compliance, and operational efficiency. Compliance with AI regulations , such as Article 15 of the EU AI Act, mandates robustness, accuracy, and consistency in AI systems, particularly in high-risk domains like healthcare. Robustness can also contribute to cost efficiency by reducing manual interventions and retraining efforts.

This blog covers key robustness risks, real-world examples, and strategies to strengthen LLM performance across various scenarios.

Risks of LLM Robustness: Failures and Assessment

Vulnerabilities of LLMs to noise and distribution shift difficulties in generalizing knowledge, interpreting language variations, resisting adversarial inputs, and maintaining contextual accuracy. Below, we highlight the two such major vulnerabilities of LLMs:

  1. Input with Natural Noise: Natural noise refers to unintentional textual perturbations like typos, misspellings, grammatical mistakes, punctuation errors, etc. Unlike adversarial inputs, which are intentionally crafted to manipulate LLMs into generating undesirable or incorrect outputs, natural noise occurs organically in user-generated content such as social media posts, chats, and voice-based systems.
  2. Out-of-Distribution (OOD) Inputs: Out-of-Distribution (OOD) risk refers to an LLM’s inability to generalize to inputs that differ significantly from the data it was trained on. Although LLMs are trained on extensive datasets, these datasets remain limited in scope and cannot cover all possible variations. As a result, LLMs may struggle when exposed to new knowledge, unfamiliar input styles, or language-mixed prompts not present during training. This can lead to model failures such as hallucinations, misclassifications, or irrelevant responses. We will explore three key OOD risks in detail in the table below:
    • OOD Knowledge
    • OOD Style
    • OOD Language

Below are some real-world examples illustrating these robustness failures:

In the first example,  LLMs are expected to generate structured hypotheses, i.e., domain, intent, and slots, in their response  to given user requests. The Model prediction changes from its anticipated response(in blue) to anomalous response(in red) when  the input is paraphrased or the style of input is changed.

LLMs & Responsible AI-3 infographic-1

The example demonstrates robustness risks in LLMs when the input style is changed. The model’s prediction shifts from its expected response when the input is paraphrased into Shakespearean style, highlighting the impact of stylistic changes on model generalization.

LLMs & Responsible AI-3 infographic-2

(Figure:  Examples of different types of styles)

The following table outlines various risk types associated with LLMs' Robustness, providing examples, potential reasons for failure, impacts, and evaluation metrics to assess robustness.

Risk Type & Description Examples (LLM Prompts) Possible Reasons Impacts Evaluation
LLMs fail to process inputs with natural noise. Typos & Misspellings:
"cncl" instead of "cancel" Voice-to-Text

Errors:
"sight" instead of "site"

Punctuation Issues:
"wheres my package" instead of "Where's my package?"

Paraphrased Queries:
"Cancel my order" vs. "Stop my order"

Unfamiliarity of Noise: LLMs trained on mostly clean, structured text, might struggle with noisy text containing typos, slang, grammatical and transcription errors. User Frustration: Users expect AI to handle simple typos.

Accessibility Issues: People with disabilities, like dyslexia.
Ideal Behavior: Accurately process noisy inputs and maintain semantic understanding.

Metrics to Use:
- ASR (Attack Success Rate):

#failures with noisy prompts #noisy prompts

- Robustness Score (RS):
Resistance to noise (RS = Accuracy on adversarial inputs (Acc(adv))−ASR.

- Cosine Similarity: between LLM responses to original and noisy prompts
OOD Knowledge Risk - LLMs struggle to handle new concepts, trends, or facts introduced after training. Scientific Domain: "Explain the research on quantum time crystals."

Specialized Knowledge: "How do reversible protein modifications regulate metabolic pathways?"
Lack of Up-to-Date Knowledge: LLMs cannot access information beyond their training data.

Specialized Domain Gaps: Limited exposure to niche or specialized topics like healthcare, law, and finance.

Static Training Data: No real-time knowledge integration post-training.
Misinformation on new topics
User distrust in AI’s ability to handle recent or niche queries.
Ideal Behavior: Recognize OOD prompts, provide accurate responses, or appropriately refuse to answer.

Metrics to Use:
- RtA (Refuse-to-Answer) Rate:

#correct refusals #OOD prompts

- MACC:
#correct answers ( excluding refusals ) #OOD prompts
OOD Style Risk - LLMs fail to handle unfamiliar styles or slang like legal, poetic, or technical formats. word-level substitutions:
Shakespearean style word substitutions (e.g., do → doth)

Legal Style: "Draft a memorandum in legalese about breach of contract."
Limited Stylistic Variety: LLMs are often trained on formal, structured text only.

Lack of Diverse Input Styles: Insufficient exposure to slang, technical, and creative formats.

Overfitting: LLMs overfit to formal text during training and fail to generalize
Reduced user trust, Potential brand damage for failing to adapt to varied user styles. Ideal Behavior: Accurately interpret and respond to diverse styles (e.g., formal, informal, legal, creative).

Metrics to Use:
- Classification Accuracy:

#correct classifications #OOD prompts

- Diversity Coverage:
#input styles handled correctly #ood prompts
OOD Language Adaptability Risk - LLMs struggle with code-mixed language prompts (like "Hinglish") or multilingual inputs and low-resource languages. Language Mixture Prompts: "Explain quantum mechanics in English, but summarize in Hindi."

Language Switching Tasks: "Translate this to Spanish, then answer in English: '¿Qué es la inteligencia artificial?'"
Lack of Multilingual Training: LLMs are often trained on single-language datasets.

Low-Resource Language Gaps: Limited exposure to low-resource languages like Bambara.

Code-Switching Complexity: Mixing languages in prompts confuses LLMs trained on monolingual data.
Inclusiveness issues for multilingual users Increased reliance on human support and raises operational costs. Ideal Behavior: Seamlessly handle multilingual and code-switched inputs with accurate responses.

Metrics to Use:
- BLEU/ROUGE:
Translation accuracy.

- Multilingual Consistency Score: Semantic consistency across languages.
# semantically consistent translations Total translations tested

- Error Rate for Mixed-Language Inputs: #incorrect responses #total code-switched prompts

Mitigation Strategies for Robustness in LLMs Across the Model Lifecycle

Ensuring robustness in LLMs requires proactive measures at every stage of the LLM lifecycle. The following measures can be applied at different stages of the LLM lifecycle to mitigate robustness issues.

  1. Design Phase 
    • Developer Awareness of robustness risks like noisy inputs, OOD prompts, and multilingual adaptability is key.
    • Strategic plans should include incremental training on domain-specific data, Retrieval-Augmented Generation (RAG) for knowledge gaps, and limited search engine access for real-time updates.
  2. Data Curation & Preprocessing 
    • Collect Diverse Datasets covering slang, informal language, and multilingual content (e.g., Hinglish, Spanglish).
    • Apply Data Augmentation with synthetic noise (e.g., typos, paraphrasing) to train for varied prompts.
    • Preprocess inputs using spell-checking, grammatical correction, and normalization techniques.
  3. Model Training & Fine-Tuning
    • Adversarial Training: Expose models to challenging inputs like typos, slang, and mixed-language prompts.
    • Contrastive Learning: Train models to differentiate between in-domain and OOD inputs for better generalization.
    • Targeted Fine-Tuning: Fine-tune on multilingual, domain-specific, and stylistically diverse datasets (legal, healthcare, slang).
    • Knowledge Distillation: Transfer knowledge from larger models to improve handling of low-resource languages and unseen styles.
    • Self-Supervised Pre-Training: Leverage unlabeled, diverse data to improve robustness in unseen domains.
  4. Deployment 
    • Real-Time or Periodic Knowledge Integration:  Integrate external APIs for dynamic updates to maintain up-to-date information.
    • Language Adapters: Implement dynamic language adapters that enable LLMs to switch between languages in real-time for hybrid prompts like Hinglish or Spanglish. 
    • Input Sanitization & Normalization: Correct typos and reformat inputs during deployment using grammar-correction models.
    • Semantic Similarity Recognition: Use semantic similarity models to recognize rephrased prompts (e.g., "Cancel my order" vs. "Stop my order") and map them to standardized intent.
    • Continuous Monitoring: Implement feedback loops to track user interactions, detect model failures, and update robustness strategies dynamically.

Final Thoughts: The Road to Resilient LLMs

As LLMs become integral to domains like healthcare, education, and customer service, addressing challenges such as OOD Knowledge, OOD Style, and Cross-Language Adaptability is crucial. Unchecked, these issues can lead to misinformation, miscommunication, and diminished trust in AI systems. Ensuring robustness requires proactive strategies like adversarial testing, continuous updates, and retrieval-augmented generation (RAG) to handle evolving inputs and multilingual queries.

At Quantiphi, we prioritize robustness and resilience to create adaptive, secure, and inclusive AI systems that foster trust and accountability. As LLMs play a larger role in decision-making, robustness must remain a core pillar of every Responsible AI strategy.

Call to Action

Interested in building more robust, responsible AI models? Stay informed with our latest resources, or explore our blogs on safety and risks with LLMs here: https://quantiphi.com/resources/blog/

Join the Movement Toward Responsible AI!

Himanshi Agrawal

Author

Himanshi Agrawal

Senior Machine Learning Engineer

Top Trending Blogs

Start Your Next Gen AI Journey Today

Discover how Quantiphi’s AI-powered solutions can transform your business. Fill out the form, and we’ll help you explore tailored AI strategies to unlock new opportunities for growth.

Thank you for reaching out to us!

Our experts will be in touch with you shortly.

In the meantime, explore our insightful blogs and case studies.

Something went wrong!

Please try it again.

Share