For almost a year, Large Generative AI Models (LGAIMs) such as ChatGPT have broken through as part of our daily lives. The model’s capacity to engage in a convincing dialogue and produce high-order texts has raised many questions, including whether such models possess independent reasoning capabilities, who trains whom (the machine or us), what will become of professions heavily based on texts, and the fear that this is a highly skilled “BS” machine.
Basically, can human reasoning and deduction be replaced with artificial intelligence? Fifty years after the Yom Kippur War, we wonder, how might ChatGPT analyze the intelligence that was available at the time, for this and other military conflicts?
In a recent preliminary study, we aimed to explore the potential benefits of LGAIMs in simulating intelligence and their potential contribution to the research processes of intelligence analysts as used by the major militaries and spy agencies of the western world. Specifically, we conducted three series of simulations at the highest level of intelligence analysis, strategic national intelligence, to test the machine’s ability to act as an “intelligence evaluator.”
This is the first empirical study using GPT in an intelligence context. We presumed that we would encounter three possible scenarios: Those in which machines perform tasks better (faster and more efficient); those in which machines perform functions less effectively than humans; and those in which machines do things differently from humans, creating a complement rather than a substitute for human abilities.
The first simulations focused on the Japanese attack on Pearl Harbor in 1941, the second series centered around the Yom Kippur War involving Israel, Egypt, and Syria in October 1973, and the third series – entirely a figment of our imagination – related to a surface-to-surface missile launched at Israel from Syria in 2014. Each series consisted of approximately 10 simulations, with adjustments and tweaks to test various characteristics of each such as the system’s ability to handle irrelevant information (“noise”), familiarity with the simulated events, and generic response patterns.
We did not explicitly mention the countries and organizations involved in the simulations, to avoid bias associated with the machine’s prior knowledge of the actual events. Instead, we created an imaginary world that reflected the complexity of the relevant years.
The simulations began with primary data about this fictional world. As raw intelligence data was fed into the machine in a series of time steps, GPT was asked questions about the information and the insights it could provide. Overall, GPT easily assumed the role of a “national intelligence evaluator” and was able to use language and terminology similar to that of skilled intelligence personnel. In the course of the simulations, GPT analyzed primary data, interpreted the information presented, identified connections between events occurring at different times and places, and provided policy recommendations.
Despite not being explicitly trained for the role of an intelligence evaluator and being unfamiliar with the world presented to it, the machine’s abilities were comparable to that of an analyst with several years of experience. However, we discovered that the main difference between GPT’s thinking and reasoning process and that of humans, was its ability to detect contradictions between ideas and information in a neutral non-emotional way.
When presented with conflicting information and expert evaluation, GPT could identify contradictions. When presented with reassuring information, it recognized the decrease in the likelihood of war, even in the face of an alarmist expert approach.
The interpretation provided by GPT based on the information from the eve of Yom Kippur 1973 was relatively alarming. Although it did not state “war tomorrow” during the dialogue, it did estimate that the likelihood of war was on the rise and recommended taking immediate defensive measures.
At first glance, identifying when specific information contradicts a distinct possibility or idea may seem like an uncomplicated task. However, the human evaluation process is inherently subjective and can lead to biased thinking due to the tendency of humans to form a “conception” that explains reality to them. This can cause individuals to obscure contradictions, which has led to fatal intelligence errors in the past. GPT’s ability to identify contradictions without these biases is a significant advantage.
Moreover, GPT can examine data using multiple “conceptions” or perspectives, whereas a human researcher may struggle with emotional and cognitive limitations, as was certainly the case in the Yom Kippur War. In each simulation, the system can test a different concept without preconceived notions, allowing it to process and analyze information with diverse and opposing hypotheses, theses, or conceptions. This makes it a powerful tool for analyzing data and identifying connections between events, even without the benefit of hindsight.
Moving forward, the challenge is to understand how GPT’s reasoning mechanism works and how it uses natural language processing to identify relevant concepts. This will enable it to detect contradictions and inconsistencies within a complex system. Improving GPT’s content expertise and providing access to classified intelligence data in a secure environment are other possible avenues for improvement.
The simulations uncovered some potential concerns to be aware of regarding GPT’s use in intelligence. First, the system may become “stuck” in familiar patterns and positions. Second, GPT may produce incorrect connections due to past “experience” or lack of context. These may compromise its independence and usefulness.
A new tool, not a replacement
While technology cannot eliminate uncertainty or surprise, it can help decision-makers think about the future. In writing about the American intelligence service, Prof. Joseph Nye of Harvard University noted that, “the job, after all, is not so much to predict the future as to help policymakers think about the future.” Perhaps such an additional voice at the table could have prevented the horrible consequences of the Yom Kippur War and Pearl Harbor.
We will never know. What we can definitely say is that the simulations demonstrate that GPT has unique properties that can aid intelligence professionals and decision-makers in thinking about the future. It is, therefore, essential to thoughtfully integrate artificial intelligence into the decision-making process of intelligence agencies.
Dr. Tehilla Shwartz Altshuler is a senior fellow and head of the Democracy in the Information Age program at the Israel Democracy Institute. Brig.-Gen. (ret.) Itai Brun is a former head of the IDF Defense Intelligence Analysis Division.