Beyond Classified: How AI Can Unlock the Power of Unclassified Data for US Intelligence

The US intelligence community has long been reluctant to use unclassified information, but the rapid development of AI and LLMs presents an opportunity to break with this aversion and tap into the vast amount of publicly available data that could enhance classified sources.

_{^{Photo by Forrest Smith on Unsplash}}

AI: The Key to Unlocking Unclassified Data for US Intelligence

The rapid development of generative AI and large language models (LLMs) has created an opportunity for the US intelligence community to break with its long-standing reluctance to use unclassified information. This aversion has until now largely closed an entire avenue of intelligence and information-gathering, making the US and its allies more vulnerable to strategic or tactical surprise.

AI and LLMs: The Future of Intelligence Gathering

Government agencies have long monitored news reports and online activity, starting with the Foreign Broadcast Information Service. However, they have largely overlooked the huge store of publicly available information from other reputable sources. There is now a rich stream of such data. For instance, just days before Hamas’ surprise attack on Israel last October, there was an upsurge in visits to Arabic-language web content about many of the locations subsequently targeted by Hamas. A properly trained AI model could have detected these and provided critical early warning.

The Intelligence Community has traditionally regarded anything that is secret as inherently better than anything that is not secret. This mindset has persisted even as good-quality online information has mushroomed. It is a cultural problem that many within the Intelligence Community believe should be addressed.

The Importance of Unclassified Information in Intelligence Gathering

To its credit, the Intelligence Community has long recognized it has a challenge here. For over a decade, the prominence of open-source missions and agencies has risen significantly in Congressional language, in agency reviews, and in Presidential Executive Orders. This year’s Intelligence Community strategy uses the strongest language yet to address this problem. Yet, despite lots of writing, several new agencies or offices created, and multiple Executive Branch-backed initiatives, they’ve still failed to break through long-held institutional biases.

Success or failure to use AI to its fullest in the IC will hinge on how analysts and officers go about training generative AI and LLM models to synthesize and analyze the vast amount of publicly available data that could enhance the classified sources they already employ. Critically, during this process, they will need to be careful not to transfer their inherent reservations about unclassified information to the models. Without such caution, the risk of human bias literally becoming machine bias is incredibly high and will limit the potential benefits of widening the scope of data sources.

“The Intelligence Community has traditionally regarded anything that is secret as inherently better than anything that is not secret. This mindset has persisted even as good-quality online information has mushroomed.”

One way such biases could be introduced into training AI models or LLMs is when the trainer grades how well the model answers the questions it is asked. If the answer cites unclassified information, when the trainer believes there is a better classified source it could have used, they may mark it down. If this pattern is repeated time and time again during training, then the model will have effectively been taught that it should disregard unclassified data. Therefore, when training generative AI models and LLMs, intelligence agencies would need to be expansive with both the set of data they train the models on and the set of graders they use to assess the quality of the content that’s being sourced.

The Risk of Human Bias in AI Models

The other principal concern for the introduction of bias is the propensity for the Intelligence Community to do things differently on high-side (or classified) computer networks as opposed to low-side (or unclassified) networks. The community is no doubt already at work experimenting with LLMs and Generative AI on both networks — however, because of security concerns, these are likely segregated efforts. The true transformational opportunity here is to let a model work across both networks, even if its results must only be displayed on the classified platform. If the models and resulting outputs are kept separated, the institutional bias will just be further perpetuated.

This is a pivotal moment. There’s now an opportunity to tackle a pre-existing and counterproductive prejudice within the Intelligence Community that everyone there recognizes. Yet, if they are not careful, intelligence analysts tasked with training models could end up setting the bias problem in “technological stone” and make it even more intractable.

The Future of Intelligence Gathering: A Pivotal Moment

In other words, there is a risk that analysts may spend the next few decades teaching models to make the same mistakes that they made in the past in excluding unclassified data from their assessments. However, with training where the only bias is towards the best quality data, generative AI models and LLMs could transform intelligence-gathering for the US and its allies.