Adversarial Machine Learning: What NIST’s Latest Report Means for AI Security

Shawn Elliott

President, Operations

AI systems now shape decisions from grid stability to national security—making them prime targets. Adversaries aim not only to breach systems, but also to manipulate the decision-making process at scale. 

In response, NIST’s latest report on adversarial machine learning (NIST AI 100-2e2025), finally provides a standardized framework for identifying AI vulnerabilities, classifying attack methods, and implementing countermeasures across system lifecycles.  

For any organization deploying AI, understanding this threat landscape is now fundamental to operational resilience and compliance. 

What Is Adversarial Machine Learning? 

Adversarial machine learning refers to methods that mislead AI models into making faulty decisions by manipulating their inputs. These attacks don’t exploit traditional code vulnerabilities. Instead, they distort how AI systems interpret the world around them. 

Here’s how it works:  

First, attackers study how your AI detects threats, often by testing similar models or probing exposed systems. Then they manipulate inputs (such as malware) designed to appear safe to the model. By reordering code, adding harmless instructions, or altering metadata, the attacker fools the AI into misclassifying the threat. The malicious input bypasses both automated and human review. 

Why NIST Created This Report 

As AI adoption accelerates across government, critical infrastructure, and industry, there’s an urgent need for standardized security approaches. This report helps fill critical gaps in how we identify, classify, and manage AI security risks. 

Creating a Common Language for AI Security 

One agency calls it a “model poisoning attack,” another says “training data compromise,” when referring to the same threat. NIST closes this communication gap by standardizing how we talk about adversarial AI threats. With a shared vocabulary, teams across agencies and vendors can now coordinate defenses without losing critical insights in translation. 

Building a Roadmap to Protect AI Systems 

NIST moves security beyond patchwork fixes, calling for protection built into every phase of the AI lifecycle. Its framework helps teams identify weak points during data collection, model training, or deployment, before attackers can exploit them. In addition, a shared, standardized defense strategy ensures that as threats grow more complex, security stays aligned. 

Overview of the Different Types of Adversarial AI Attacks 

These attacks exploit how machine learning systems process and interpret data. NIST groups these attacks into four main types. Each one focuses on a different part of the AI system’s lifecycle. 

1. Evasion Attacks 

Evasion attacks change input data to avoid being caught by the AI model. Attackers slightly modify harmful inputs—like malware, so the model no longer sees them as threats. These changes can include adding no-op code (which does nothing) or reordering parts of the file. 

The malware still works the same, but the AI system misclassifies it as safe and lets it through. 

How to defend against evasion attacks: 

  • Use adversarial training during model development 
  • Employ model ensembles to catch discrepancies 
  • Deploy validation layers to flag unusual inputs 

Why it matters: Evasion attacks are among the most common adversarial ML attacks in production environments. 

2. Poisoning Attacks 

“Trust but verify” takes on new weight when it comes to AI training data. Unlike evasion attacks, poisoning happens when adversaries strategically inject malicious examples into your training datasets. 

A poisoning attack looks something like this: An attacker quietly inserts malicious data into your training set. The poisoned model functions perfectly during all your testing and validation benchmarks. Then, months later, the malicious injection is triggered by a specific code phrase or image pattern, and your system suddenly makes catastrophic misclassifications or reveals sensitive information. 

How to defend against poisoning attacks: 

  • Rigorously vet all training data, especially third-party sources 
  • Maintain dataset chain-of-custody 
  • Use anomaly detection during training 
  • Conduct adversarial testing regularly 

Why it matters: If you’re using pre-trained models or public datasets, your AI systems could have been poisoned long before you acquired them.  

3. Privacy Attacks 

Privacy attacks occur when attackers use an AI model’s responses to uncover information about the private data it was trained on. These attacks do not require direct access to the training dataset, but only the ability to query the model and observe its outputs. 

There are two common types: 

  • Membership inference: The attacker tries to determine whether a specific data record (such as a person’s medical history or financial info) was used in the model’s training set.
  • Model inversion: The attacker uses patterns in the model’s outputs, like prediction scores to reconstruct features of the original training data. 
    This technique can be used to recover images, personal traits, or other sensitive input data, even if the model never shares that information directly. 

How to defend against privacy attacks: 

  • Implement differential privacy 
  • Limit model output detail 
  • Restrict and audit API access 

Privacy attacks are more likely when models are overfitted, lack differential privacy safeguards, or provide overly detailed outputs. 

4. Abuse Attacks: When AI is Hijacked to Cause Harm 

In abuse attacks, adversaries use AI use the model exactly as designed, but for harmful goals. For example, using a chatbot to spread disinformation or running a generative model to create phishing emails. 

How to defend against abuse attacks: 

  • Enforce strict content filters and usage policies 
  • Monitor for unusual outputs and patterns 
  • Conduct regular audits post-deployment 

Common Vulnerabilities Across the AI Lifecycle 

Keeping AI systems need safe means protecting every stage of their lifecycle, since each phase creates different chances for attackers to cause harm. As the NIST report says, your AI is only as secure as its weakest part. And the risks often start right at the beginning: during data collection. 

1. Data Collection & Preparation  

Think of data collection like classified intelligence gathering: without proper vetting and chain of custody protocols, adversaries can compromise your entire operation before it begins.  

Here are just some of the ways adversaries can attack your data: 

  • Insert poisoned examples into public datasets that appear legitimate to standard vetting procedures 
  • Compromise trusted third-party data providers in your supply chain, turning allies into unwitting threats 
  • Identify and exploit gaps in your data diversity to create blind spots in your security perimeter 

As NIST notes in their report, because foundation models rely on massive datasets, it’s now common to scrape data from public sources, making them more vulnerable to data poisoning. 

To reduce this risk, use clear chain-of-custody tracking for all training data. Without it, spotting tampering is nearly impossible, and your AI could be built on compromised data. 

2. Model Training 

Now that you’ve secured your data pipeline, the next step is locking down the environment where your model is trained on the data and learns how to make decisions. One of the risks here is the temptation to zero-in on getting the model accurate and compliant, while overlooking the myriad of attack vectors open to bad actors.  

Here are the three main ways AI development stacks are open to attack: 

  • Training parameter manipulation: Attackers can tweak how the model is trained, causing weak spots or hidden behaviors 
  • Unsecured environments: If the system used to train the model isn’t isolated or monitored, it can be accessed and altered 
  • Lack of adversarial testing: Without simulating real-world attacks, you won’t know how your model reacts under pressure 

According to the NIST report: “Model poisoning attacks attempt to directly modify the trained ML model to inject malicious functionality into it. In centralized learning, TrojNN reverse engineers the trigger from a trained neural network and then retrains the model by embedding the trigger in external data to poison it.”  

NIST warns that “designing ML models that are robust in the face of supply-chain model poisoning vulnerabilities is a critical open problem.” 

3. Deployment 

Deployment exposes your AI to the real world and real adversaries. Even perfectly secure models face new risks when they: 

  • Interact with potentially malicious inputs 
  • Connect with existing systems and inherit their vulnerabilities 
  • Operate in environments where subtle manipulations can go undetected 

NIST highlights that “the evasion of AI models can be optimized for a set of data in a domain rather than per data point,” meaning attackers can develop “functional attacks” that systematically compromise your deployed models.  

Most organizations don’t have the right tools to spot signs of adversarial attacks. Without them, the attacks can go unnoticed for months and cause severe damage. 

Mitigation and Risk Management Strategies 

Building resilient AI systems requires implementing defense-in-depth strategies that address vulnerabilities throughout the entire AI lifecycle. NIST’s framework breaks down AI defense into practical steps any agency can implement: 

Train Your AI to Expect Attacks 

Think of it as immune system training for your AI. By exposing your models to simulated attacks during development, they learn to recognize and resist similar tricks in the real world.  

The NIST report highlights that “the stronger the adversarial attacks for generating adversarial examples are, the more resilient the trained model becomes.” 

Build Defense into Your AI’s Core 

Some AI models are naturally more resistant to attacks. NIST recommends using these “hardened” approaches that maintain accuracy when facing unexpected or manipulated inputs. You may trade a bit of performance in ideal conditions, but your systems stay reliable when under fire. 

Watch for Unusual Behavior 

Set up monitoring systems that flag suspicious inputs or unexpected outputs. When your AI suddenly classifies routine data differently or shows unusual confidence patterns, it could signal an attack in progress. Connect these alerts directly to your incident response team for quick action. 

NIST cautions that “detecting adversarial examples is as difficult as building a defense,” highlighting the challenge of identifying attacks in progress. 

Test Your Defenses Regularly 

NIST advises organizations to go beyond basic adversarial testing by conducting structured red team exercises: controlled simulations where experts attempt to exploit AI systems using real-world tactics.  

These pre-deployment tests help uncover hidden vulnerabilities and prevent failures in high-risk environments. 

To explore how these defensive strategies fit into broader risk planning, read our companion guide: NIST Risk Assessment Report. 

Partner with IPKeys for Comprehensive AI and Cybersecurity Protection 

As adversarial machine learning threats grow more complex, organizations need partners with deep expertise in both AI and cybersecurity. IPKeys provides end-to-end protection—from secure data prep to deployment and monitoring—aligned with NIST’s defense-in-depth guidance.  

Contact IPKeys today to schedule a comprehensive assessment of your machine learning security posture. We’re here to help! 

More from IPKeys

Want IPKeys insights and news delivered directly to your email?

We'll notify you when new content is published at the email below (and you can opt-out any time)

Thank you! Your submission has been received!

We will never share your information with any third-parties without your permission, nor will we ever spam you. We take privacy very seriously and you can read our full privacy policy here.