It doesn’t take much to imagine a world where deepfakes can be used to authentically imitate politicians’ voices and fabricate scandals that could sway elections. It’s already here. Fortunately, there are many reasons to be optimistic about society’s ability to identify fake media and maintain a common understanding of current events.
We have reason to believe that the future is safe, but we worry that the past is not.
History can be a powerful tool for manipulation and fraud. The same generative AI that can disguise current events can also disguise past events. While new content may be protected by built-in systems, there is also a world of unwatermarked content out there. Watermarking is done by adding imperceptible information to a digital file so that its origin can be traced. As creation watermarks become more prevalent and people adapt to not trusting unwatermarked content, everything created before that point is likely to be called into question.
And this opens up opportunities to use the documents generated to thwart false claims, from photos of historical figures in dangerous situations to altering individual articles in historical newspapers to changing names on title deeds. A treasure trove will be born. All of these techniques have been used before, but they are much more difficult to counter now that the cost of creating near-perfect fakes has been greatly reduced.
This prediction is based on history. There are many examples of economic and political powers manipulating the historical record for their own purposes. Stalin executed his disloyal comrades, altered their photographic records and banished them from history as if they had never existed. When Slovenia became an independent country in 1992, it removed more than 18,000 people, mainly the Roma minority and other non-Slovenians, from its civil register. In many cases, governments destroyed their physical records and they lost access to housing, pensions, and other services, according to a 2003 report by the Council of Europe’s Human Rights Committee.
False documents are a key part of many efforts to rewrite the historical record. The infamous Protocols of the Elders of Zion, first published in a Russian newspaper in 1903, purported to be the minutes of a Jewish conspiracy to control the world. First discredited in August 1921 as a fabrication plagiarized from multiple unrelated sources, the Protocol featured prominently in Nazi propaganda and was cited in Article 32 of the Covenant established by Hamas in 1988. It has long been used to justify anti-Semitic violence.
In 1924, the Zinoviev Letter, said to be a secret communication sent by the head of the Communist International Committee in Moscow to the British Communist Party to rally support for normalizing relations with the Soviet Union, was published in the Daily four days before the general’s speech.・Published in the Mail newspaper. election. The resulting scandal may have cost Labor the election. The letter’s origin has never been proven, but its authenticity was questioned at the time, and an official investigation in the 1990s concluded that it was most likely the work of White Russians. The White Russians were a conservative political faction led by Russian immigrants who opposed the rebels at the time. communist government.
Decades later, Operation Infection, a Soviet disinformation campaign, used forged documents to spread the idea that the United States had invented HIV, the virus that causes AIDS, as a biological weapon. And in 2004, CBS News became embroiled in controversy after failing to authenticate documents (later discredited as forgeries) that cast doubt on then-President George W. Bush’s early service in the Texas Air National Guard. The article that caused the controversy was withdrawn. As historical disinformation becomes easier to generate and the sheer volume of digital fakes explodes, there will be an opportunity to reconstruct history, or at least question current historical understanding.
The potential for political actors to leverage generative AI to effectively alter history is frightening, not to mention fraudsters creating fake legal documents and transaction records. Fortunately, a path forward was forged by the same companies that created the risks.
In indexing much of the world’s digital media to train models, AI companies have effectively created systems and databases. These systems will soon contain all of humanity’s digitally recorded content, or at least a meaningful approximation of it. They could begin today to record watermarked versions of these primary documents, including newspaper archives and a wide range of other sources, so that subsequent forgeries could be instantly detected.
There are several barriers to such work. Google’s Digital Library effort to scan millions of library books around the world and make them easily accessible online has run into intellectual property restrictions, making them available to anyone with an Internet connection. Historical archives can no longer function for their original purpose of making texts searchable. Similar intellectual property concerns have creators and businesses worried about both the training data provided to generative AI and the impact if it is used to generate content.
Given this checkered history, including Google’s failed investment in the Digital Library Project, who would come forward and pay for a similarly large-scale effort to create immutable versions of historical data? The industry has strong incentives to do so, and many of the intellectual property concerns associated with providing searchable online archives do not apply to creating watermarked and timestamped versions of documents. These versions do not need to be made publicly available to provide documentation. the purpose. A mathematical transformation of a document known as a “hash” can be used to compare claimed documents to recorded archives. This is the same technology used by the Global Internet Forum Against Terrorism to help companies screen known terrorist content.
Apart from creating important public goods and protecting the public from the dangers posed by the manipulation of historical narratives, creating a verified record of historical documents could be valuable to large AI companies. New research suggests that when AI models are trained on AI-generated data, their performance degrades rapidly. Therefore, it can be important to distinguish between what is actually part of the historical record and newly created “facts.”
Preserving the past means preserving the training data, the associated tools that operate on it, and even the environment in which the tools were run. Early Internet pioneer Vint Cerf referred to this type of recording as “digital vellum.” Necessary to protect the information environment.
Such parchment will be a powerful tool. This will help businesses analyze what data to include to get the best content, help regulators audit bias and harmful content in models, and build better models. You will be able to do it. Big tech companies are already taking similar efforts to document the new content their models create. One reason for this is that the model needs to be trained on human-generated text, and the data generated after employing a large-scale language model can be contaminated with generated content.
It is time to extend this effort retroactively, before our politics too becomes severely distorted by the history that has been generated.
Jacob N. Shapiro is Professor of Politics and International Affairs and Managing Director of the Empirical Research Project on Conflict at Princeton University. Chris Mattmann is an adjunct research professor and director of the Information Retrieval and Data Science Group at the University of Southern California.
The Times is committed to publishing Diversity of characters To the editor. Please let us know what you think about this article or article.here are some chip. And this is our email: [email protected].
Follow the New York Times Opinion section. Facebook, Instagram, tick tock, X and thread.