AI for Diplomats
This primer is written for diplomatic professionals to give a basic introduction to the main approaches to artificial intelligence (AI), the ways such systems work, and how they are being applied in diplomacy and security.
This primer is written for diplomatic professionals to give a basic introduction to the main approaches to artificial intelligence (AI), the ways such systems work, and how they are being applied in diplomacy and security.
This primer is written for diplomatic professionals to give a basic introduction to the main approaches to artificial intelligence (AI), the ways such systems work, and how they are being applied in diplomacy and security. We set out to explain in practical terms:
Artificial intelligence is not new, but it is becoming increasingly central to our everyday lives. The term AI was first coined in the 1950s, and for many years, AI systems were mainly used behind the scenes for tasks like predicting trends or sorting information. Now, with improvements in computing power and data availability, tools like ChatGPT have made AI visible to a much wider audience, but these language-based querying tools only scratch the surface of AI capabilities. The field of AI covers a vast range of computational tools, with so-called large language models representing only part of it.
You already interact with AI systems multiple times each day:
None of these systems truly understands you or your day, but each performs a discrete task. Through complex algorithms, your systems spend the day recognizing, predicting, ranking, and generating information all under the broad heading of AI.
Based on existing capabilities, a senior diplomat’s day could look like this:
None of these tools makes policy on its own, but together they change the breadth, depth, and speed of your work. In short, they augment you. With AI you can see more, sooner, and spend more time deciding what to do with relevant information rather than gathering information and sifting through it.
In this primer, artificial intelligence means computer systems that can perform tasks that usually require human intelligence. Though this is a moving target, examples include:
These AI systems conduct two main types of tasks:
Today’s AI systems conduct these tasks in three main ways: pattern matching, rule-following, and independent learning. They often overlap in practice, but the distinctions are useful for understanding their capabilities and risks.
1. Pattern Matching – classical machine learning
Classical machine learning uses statistical methods to find patterns in data. A typical setup looks like this:
Because humans choose the key input variables, these systems are often more transparent. It is easier to inspect what the model is using and to check whether it is behaving sensibly. But they are not fully automatic and still require expert design and oversight.
2. Rule Following – symbolic reasoning
Symbolic systems are closer to traditional rulebooks or doctrine encoded into software. They work by applying explicit rules written by humans. For example:
These systems are good when the rules are clear and stable. They are easier to audit, because you can read the rules directly. However, they struggle with ambiguity, messy real-world data, and rapidly changing environments.
3. Independent Learning – deep learning
Deep learning is a more recent and powerful approach. Instead of relying on human-designed rules or hand-picked variables, deep learning systems function more independently. They:
Deep learning powers many of the AI breakthroughs of the last decade, including large language models. These systems often outperform classical methods on complex tasks. However, because they learn independently on large amounts of data, they come with significant challenges:
Most AI systems are trained using one or more of three broad learning approaches: supervised learning, unsupervised learning, and reinforcement learning.
1. Supervised learning: The system learns from labelled examples:
Uses include identifying threats or anomalies in operational data, labeling images (e.g., satellite images of damaged buildings), or classifying documents into categories (e.g., topic or sensitivity).
2. Unsupervised learning: Data is not labelled and the system interprets data on its own:
Uses include organizing large text collections (e.g., by theme), finding anomalies, or segmenting populations based on behavioral patterns (e.g., travel, communication, or spending).
3. Reinforcement learning: Reinforcement learning is akin to trial and error guided by feedback and requires defining success, typically through scoring outcomes the system should prioritize.
Many applications train in simulation and then transfer the strategy to real settings with safeguards. This is used in robotics, simulating alternative courses of action, testing strategies in resource management, and supporting decision-making in complex, dynamic environments.
Language models are a kind of deep learning system designed for text. At their most basic, they are prediction models trying to predict the most likely continuation of “tokens.” A token is a unit that is either a word, part of a word, punctuation, or whitespace, depending on frequency and language. Predicting the next token is, therefore, like a sentence autocomplete function you might have on an email or messaging platform.1
OpenAI released ChatGPT in November 2022 to demonstrate its general purpose “text in, text out” language model. The chatbot was released as a research preview, instead of a product. Before this point, the model had been used to extract information from documents, conduct basic drafting and editing, summarize documents, and perform basic language analysis.
Language models are trained through three stages:
The model reads billions of words and sentences to learn how different parts of words (tokens) fit together and relate to one another mathematically. From this, a sufficiently powerful prediction model can “guess” what the next word should be. They predict tokens one at a time. In modern systems, that next token prediction may be based on the relationships of hundreds of thousands of prior, related tokens.
The model is given many examples of high-quality answers to the kinds of questions it will have to answer. This might be drafting, summarizing, or information extraction, making it more capable and reliable.
Here humans judge the quality of different responses. Answers deemed “good” by some criteria are “rewarded” through reinforcement learning, so the model is adjusted to produce scores more aligned with the criteria. In this context, the model is trying to learn how a human allocates rewards to accrue as much reward as possible. This process is known as reinforcement learning through human feedback (RLHF).
A chat interface is a packaging of the same underlying mechanism: The model receives a block of text and generates a continuation. But a persistent “conversation” with a chatbot is an illusion: Each message is independently generated but is made to feel like a continuous conversation.
Each time an answer is generated, the previous messages are included as an input alongside other relevant context and information. Each message is a separate prediction problem or “query” for the model, so the text you type is only one component of what the model sees.
When you send a query to a model like ChatGPT, the system provides a lot of scaffolding around what you send. This scaffolding is known as “context” to your message. Language models don’t “remember” things; the scaffolding is just included in the context to your message, or “prompt.” The other information includes things like:
This means the message sent to the actual language model has been processed, prepended, and appended. Knowing how to behave, it tries to “predict” the correct response. A bit like roleplaying, the model generates text based on instructions and context to best match learned patterns.
Separately, many systems apply guardrails outside the model (before or after generation), such as:
• filtering or blocking certain inputs;
• preventing certain categories of outputs;
• redacting sensitive strings;
• enforcing logging, audit, or retention rules.
This distinction is important operationally: The “model” is only one component of an end-to-end system.
AI is already embedded in many analytical and operational systems across government. It supports, but does not replace, human judgment. Below are some examples organized by the underlying approach.
1. Classical machine learning in practice
This tool automatically matches text descriptions of products to tariff codes using classification. It speeds up and standardizes cargo classification, improving both efficiency and accuracy.
This model uses open-source political, economic, and social data to estimate the risk of mass killings of civilians in each country. It supports the Bureau of Conflict and Stabilization Operations with resource allocation in early warning and conflict prevention.
This system scans large volumes of network traffic and flags unusual patterns that might indicate cyber intrusions or attacks. It does not, by itself, confirm an attack, but it focuses human attention on suspicious activity.
2. Deep learning in practice
Machine learning tools draft translations. Because these models can make subtle errors and may reflect biases in training data, human translators always review outputs.
This program used high-resolution satellite imagery and deep learning to automatically assess damage in conflict zones, helping to inform response and accountability efforts.
This tool attempted to distinguish real human faces from AI-generated synthetic ones. It supported efforts to identify manipulated media and protect information integrity.
3. Symbolic reasoning in practice
This system uses rule-based matching to improve efficiency in Freedom of Information Act processes, such as reducing duplicate work and routing similar requests together.
Even as AI becomes more capable, several important limitations remain, particularly in diplomacy, where situations are highly complex, diverse, and ambiguous. AI should be understood as a support tool:
AI systems, including advanced models, currently:
There are also some subtleties in diplomatic contexts. AI models still:
Many of these shortcomings exist in the models themselves, the mathematical “black box” that provides a response, but many teams build scaffolding to support the models in their failure modes. For example, though a model can’t keep track of the date or large amounts of information, we can provide them with access to databases to prevent them from making up or “hallucinating” information.
Using AI in diplomatic and security settings raises specific governance questions. Three areas are especially important: testing, transparency, and security.
1. Testing: Before deployment, systems need robust testing to answer questions such as:
Ongoing AI tool evaluation is essential, because systems may drift over time as data and environments change.
2. Transparency: Diplomats, partners, and citizens need to understand:
Clarity on these points helps assign responsibility, build trust, and avoid overreliance on tools that were never designed to function beyond their defined scope.
3. Security: Governments must ensure AI systems:
As diplomacy integrates AI tools, maintaining public and international confidence will require:
Several trends are likely to shape the future of AI in diplomacy and security:
AI will likely become a standard part of the diplomatic toolkit, with powerful applications for analysis and coordination, but dependent on clear human leadership to ensure it is aligned with political objectives, legal obligations, and ethical commitments.