From the Archivist Blog - APUS Richard G. Trefry Archives | Home - LibGuides at American Public University System

AI and Archives: Possibilities and Limits, Presents and Futures

by Justin McHenry on June 5th, 2024 | 0 Comments

At the simplest, most basic level, the primary goal of an archives is to preserve the materials in their care and to make them available to researchers and the public. Nuance exists in those two goals, depending on the type of archives, materials housed there, etc, but that is at the heart of the mission of any archive. So when we talk about utilizing AI to enhance an archives’ capabilities, it is to meet those two ends of preservation and accessibility.

Before continuing too much further, I should probably be defining exactly what artificial intelligence means at least in the archival technology space. Most of what has taken place in archive AI has fallen under machine learning, which gets used interchangeably with artificial intelligence.

Machine learning is the process of developing a system that can learn from data that has been supplied to it or to teach it how to make a decision using pre-tagged data. While true AI is a digital system automating or assisting with activities associated with human thinking like problem-solving, decision making, and creation.

Archives tend to exist in a liminal space. Where not only do they have their own processes they need to worry about, but those processes are centered around collecting the processes of others. A lot of what makes up the collections in an archive detail the decision-making process for whatever person or organization’s materials make up those collections. This is true of Department of Defense records where you would go to figure out what happened and why during a particular battle of the Vietnam War. And similarly you would go Sylvia Plath’s archive to gain a better understanding for why and how she wrote what she did.

The most exciting aspect for AI in the Archives is for how it can be used for large-scale holdings, and in particular collections of born-digital materials. It truly can be a difference maker in not only processing millions (if not billions) of pages of records and TBs of data to ensure that all of that information is managed properly, discoverable, and done in a timely fashion, i.e. actually in our lifetimes.

The organization that appears to really be at the forefront of finding ways to implement AI into their archival workflow is the National Archives. The work that they have started showcases the what is to come next for AI in the Archives.

The National Archives has laid out a series of AI initiatives which are just scratching the surface of all that they can do with AI technology, but still present an ambitious and potentially groundbreaking changes.

These include setting up pilot projects to train AI technology to automate their copious FOIA requests. That could then be used to assist in the nearly 1,000,000 FOIA requests that are received by the various Federal agencies. This will include identifying, flagging, and redacting any Personally Identifiable Information (PII). Along those same lines another pilot project is geared towards classified records and performing checks to see if the documents are potentially ready to be declassified or if information needs to be redacted.

Two of their proposed projects have some revolutionary aspects for archives and their patrons. The first is the “Auto-fill of Descriptive Metadata for Archival Descriptions”. This would include programming AI to review the digital records and documents and then taking that information and assigning key metadata to them. The assigning of metadata is a major timesuck for the processing of all digital records, and would save years if not decades of work for the millions upon millions of digitized papers.

The other major project is using AI to build a natural language semantic search based on the massive amount of born-digital records that they already have and are collecting more and more of every day. Adding to this mission is the recent opening of the National Archives Digitization Center that will increase NARA’s scanning capabilities tenfold.

This envisions a world where the government has a ChatGPT-like interface that you can use to search and locate information across the entirety of the Federal Government. So instead of having to conduct research with the various agencies on a particular topic there is one great big well of AI created and curated information that is pulling from across the entire breadth of the American bureaucracy supplying you information and resources.

This is all well and good for the biggest archive in America that has the material to make the massive commitment to AI worthwhile. But for smaller archives there still exists several issues and limitations that impact their ability to implement AI. The biggest of which is the significant amount of time and effort to ‘train’ any AI system, not to mention either creating an AI system or purchasing one through. Right now there are no out of the box automation option. All require lots of time and effort to prep the data and the system itself before it can even start to work.

In addition, there is a general distrust in technology and the results produced by the AI systems. Even more time and more effort is needed on the back end to ensure the results provided by the AI system is accurate.

All of this calls into question some considerations archives need to think about before introducing AI technology into their processes such as determining the acceptable level of risk or how to figure out what could be missing.

Another big factor in the addition of AI to any processes is then that algorithms become a whole new primary source for archives to preserve. They become a record in and of themselves that needs to preserved in some form, especially if they are used for government decision-making where there are rules and regulations in place to hold those agencies accountable and be transparent. Institutions and government bodies are out there already making rules for “algorithmic transparency” to ensure this accountability.

Archiving has been a labor-intensive job since there has been monks copying whole books by hand. The weening off of this mindset will take a considerable amount of time and even more effort on the part of archivists to properly navigate how to best utilize AI in the archives.

References

Colavizza, Giovanni, Tobias Blanke, Charles Jeurgens, and Julia Noordegraaf. “Archives and AI: An Overview of Current Debates and Future Perspectives.” Journal of Computing and Cultural Heritage 15, 1, (February 2022). https://doi.org/10.1145/3479010.

Cushing, Amber L. and Giulia Osti. "“So how do we balance all of these needs?”: how the concept of AI technology impacts digital archival expertise", Journal of Documentation 79 no. 7 (2023): 12-29. https://doi.org/10.1108/JD-08-2022-0170.

Jaillant, Lise and Arran Rees. “Applying AI to digital archives: trust, collaboration and shared professional ethics.” Digital Scholarship in the Humanities 38, no 2 (June 2023): 571–585. https://doi.org/10.1093/llc/fqac073.

Seles, Anthea. “A Brave New World: Artificial Intelligence and Archives.” Presentation, International Council on Archives, Tokyo, Japan, November 26, 2019. https://www.archives.go.jp/english/news/pdf/20191125_27e_01.pdf.

Add a Comment

APUS Richard G. Trefry Archives | Home: From the Archivist Blog

AI and Archives: Possibilities and Limits, Presents and Futures

0 Comments.

Search this Blog

Recent Posts

Archive

This post is closed for further discussion.

APUS Richard G. Trefry Archives | Home: From the Archivist Blog

AI and Archives: Possibilities and Limits, Presents and Futures

0 Comments.

Search this Blog

Recent Posts

Subscribe

Archive

This post is closed for further discussion.