UCSF Archives & Special Collections awarded $99,325 LSTA grant for textual data extraction from historical materials on AIDS/HIV

The Archives and Special Collections department of the University of California, San Francisco (UCSF) Library has been awarded a $99,325 “Pitch-An-Idea, Local” grant for the first year of a two-year project from the Institute of Museum and Library Services’ (IMLS) Library Services and Technology Act funding administered through the California State Library. The Archives will take the nearly 200,000 pages of textual AIDS/HIV historical materials which have been digitized as part of various digitization projects — including the National Historic Publications and Records Commission (NHPRC)-funded project­, “Evolution of San Francisco’s Response to a Public Health Crisis;” and the National Endowment for the Humanities (NEH)-funded project, “The San Francisco Bay Area’s Response to the AIDS Epidemic” — and will extract unstructured, textual data from these materials using Optical Character Recognition (OCR) and related software. The project team will prepare the text as a research-ready, unstructured textual dataset to be used for digital humanities, computationally driven cultural heritage, and machine learning research inquiries into the history of the HIV/AIDS epidemic.

The 24-month project, entitled “No More Silence — Opening the Data of the HIV/AIDS Epidemic” has commenced as of July 1, 2018. The digitized materials from which text will be extracted include handwritten correspondence, notebooks, typed reports, and agency records which represent a broad view of the lived experience of the epidemic, including documentation from People with AIDS and their friends, families, and scientists and public health officials working to slow the epidemic. All historical materials represented in this dataset have been previously screened to address privacy concerns. The resulting unstructured, textual dataset will be deposited in the UC Dash datasharing repository for public access and use by any interested parties, and will also be deposited in other similar data repositories as appropriate. “During my tenure at UCSF,” says health sciences historian and professor in the Department of Anthropology, History, and Social Medicine at UCSF, Dr. Aimee Medeiros, “I have been inspired by the library’s enthusiasm and dedication to public access and the use of practices in the digital humanities to help maximize access to HIV/AIDS material.” This project will build on that legacy by bringing these valuable historical materials into the realm of digital humanities and scientific research and making them computationally actionable.

According to Dr. Paul Volberding, director of the AIDS Research Institute at UCSF, “Discovering the complexities of the virus and developing effective treatments will be studied of course, but the lives of those directly involved as patients as well as care providers is equally significant. The cultural aspects of the epidemic will most directly benefit from the work [of this project]. Combining the growing field of computational science with the already large and rapidly growing archive of materials from all aspects of the AIDS epidemic demand the creation of new tools and I look forward to the new insights we gain from their application. [UCSF Library has] been sharply focused on the AIDS archives and have amassed a rich collection that, in its digitized form, will be the database for [these] new efforts. Together, this database and new computational tools, will enable a sophisticated analysis that I am convinced will be used to shed more insight in our understanding of the impact of the epidemic and ways our response will have meaning in the inevitable future crises.”

Once the preparation of the textual dataset is completed, the project team — consisting of archivists and technical staff from both the Archives and the Library — will embark on several pilot research projects using machine learning, and especially natural language processing research methods, on the data. The pilot projects, which will be scoped in collaboration with various stakeholders, will attempt to explore what kinds of structured data can be pulled out of the unstructured text, and define some simple critical inquiries which can be understood using this data, these methods, and the results of these experimental endeavors. Additionally the project team hopes to get a better sense of the functional requirements for systems supplying this type of data when tailored towards these kinds of medical humanities research questions. Through these efforts the project team will be able to better define the extent to which, as stated by Dr. Medeiros, “making 200,000 pages of primary-source archival documentation converted to unstructured textual data will… further meaningful research and our understanding of this epidemic.”

Finally, the project team will promote the existence of this dataset, and will lead workshops to help instruct potentially interested students, researchers, scholars, and members of the general public in its use. Again in the words of Dr. Medeiros, “the plans to provide workshops to help curious scholars learn how to best interface with this data is exciting as it will allow for those who are experts in the field but not necessarily in the digital medical humanities to conduct important research.”

This project will support innovation, creativity, and collaboration in and across the humanities, social sciences, and STEM fields (Science, Technology, Engineering, and Math) by opening up a new body of historical materials for research and discovery. The project will foster new creative research methods in the areas of the humanities, which are just beginning to experiment with computationally-driven research, and it will encourage collaboration through the use of the newly-created data resource, engaging the expertise of both humanists and scientists in making discoveries in the data. Not only does this collaborative work allow for innovation “at the edges” of each of these fields, it allows for computational access to a previously-inaccessible research object — the data of the lived experience and cultural history of the AIDS crisis in the Bay Area and beyond.

The following institutions and groups are serving as informal partners on this project:

About UCSF Archives & Special Collections (UCSF Library)
The mission of the UCSF Archives & Special Collections is to identify, collect, organize, interpret, and maintain rare and unique material to support research and teaching of the health sciences and medical humanities and to preserve institutional memory. The UCSF AIDS History Project (AHP) began in 1987 as a joint effort of historians, archivists, AIDS activists, health care providers, scientists, and others to secure historically significant resources documenting the response to the AIDS crisis, its holdings currently include 46 collections and they continue to grow.

UCSF Library logo

About the Library Services and Technology Act
Library Services and Technology Act (LSTA) grants are federal funds from the Institute of Museum and Library Services that are awarded by the State Library to eligible California libraries. This project was supported in whole or in part by the U.S. Institute of Museum and Library Services under the provisions of the Library Services and Technology Act, administered in California by the State Librarian. www.library.ca.gov/grants/library-services-technology-act/

California State Library logo
Institute of Museum and Library Services logo


Intern Report: Crafting a Digital Forensics Lab

This is a guest post by Elizabeth Popiel, our Digital Archives intern for summer 2018. Elizabeth worked on implementing, testing, and piloting equipment for the Digital Forensics Lab to capture content off of decaying computer media which are present in our collections. 

This summer I worked with UCSF’s Digital Archivist Charles Macquarie on building up the UCSF Archives & Special Collections Digital Forensics Lab. It was such an honor to come and join the UCSF team because of the great people, and the unique and important collections that the Archives preserves and provides access to. I am grateful for the experience and the shared wisdom of the staff, and to be able to contribute to this growing piece of the work of the Archives.

Elizabeth Popiel, digital archives intern, using a dremel tool to grind down part of a plastic piece taped to a table.

Digital archives intern Elizebeth Popiel works on a drive-housing which was 3d-printed in collaboration with the UCSF Makers Lab.

What exactly does it mean to build a Digital Forensics Lab? In the case of UCSF, many of the collections contain obsolete and legacy media – things like floppy disks and ZIP disks, and even personal digital assistants (PDAs, remember those pre-smart phone devices?) and SD cards. As more time passes, it isn’t so easy to read or utilize these formats effectively without access to the machines and software that are used to read and create them.

Without access, we risk losing important parts of collections when outdated digital document formats and research materials become no-longer readable. By creating a lab environment where we can rescue these items, we can give them life again. This is why I love this kind of work.

Building a lab like this is no small challenge! I’ve worked on configuring new software to power old hardware, testing lab equipment with “dummy disks” and files, troubleshooting the problems that arise with new implementation, and even building a new housing for a few select drives using the 3d-printing equipment in the UCSF Makers Lab. It has been a busy summer.

Throughout my time here I was able to contribute documentation, workflows, and hardware (my favorite part) to the lab implementation process, and to create software troubleshooting steps that should make it easier for archivists and researchers to use the lab equipment to retrieve difficult-to-read media and file formats from our archival collections.

I also appreciate being able to learn more about archival processing for digital collections in the context of digital forensics and born-digital archival materials. I gained practical field experience with both digital forensic work and digital curation that I can take back with me in the final year of my Master’s degree at University of Michigan Information School. Though I’m sad to be leaving the UCSF Library, I am grateful for the experience I’ve gained working on this challenging project, and I hope to return to visit next year.

New Archives Intern: Elizabeth Popiel

Today’s post is an introduction from Elizabeth Popiel, our newest intern here in the Archives who will be working on piloting and testing some of the key pieces of our digital forensics lab and workstations.

Portrait of Elizabeth Popiel.

Elizabeth Popiel

Hello out there readers! My name is Elizabeth Popiel and I’ll be interning at the UCSF Archives & Special Collections working with some of the early born-digital collections here in the Library this summer. I’m a second year graduate student in the School of Information at the University of Michigan in Ann Arbor, with a concentration in Digital Curation, Archives and Human Computer Interaction.  I’ve always the loved exploration and discovery part of any research project and I hope to do a little of that here this summer as well.

I’m enjoying being back in the Bay Area before heading back to the Midwest for my last year of school. I love road tripping along the coast and seeing everything out there from the Redwoods to the Historic Forts, museums and interesting locations. I was born in Canada and have traveled extensively from places such as Bern to Tasmania, Singapore to Beijing and back again. It’s great to get to see and learn perspectives that differ from your own and to learn to appreciate them when you approach your work, especially when trying to figure out a puzzle or sort through a collection

In my past I taught English overseas, worked in broadcasting, and I have experience working in both hardware and software in Silicon Valley. I’m an old-school gamer and I still love text-adventures, joystick-based and SCUMM Engine games. Figuring out how to make them work on newer machines is always a challenge!

I like the challenge of working in research and preservation for born-digital archival collections, and at UCSF I’m hoping to be able to gain practical experience in this area. I’ll assist in getting their Digital Forensics lab up and running for collections capture, processing, and use as well as test processing some of the collections. It’s my hope that I can better understand how to work with active collections and how digital archival models can be adapted to different and unique libraries and archives such as UCSF.

In Archives, my passion in work and learning lies in the archival challenges that lay ahead in digital curation, forensic work, and audiovisual materials. One of the reasons working with UCSF Special Collections interests me is because there are so many collection pieces that need attention in order for them to remain usable for future generations. Everything from floppy disks with key scientific notes, to spreadsheets containing experiment setup in ontological medicine, or information or email communications that represent negotiations and crucial strategies during the height of the San Francisco AIDS epidemic – these all represent important parts of the history of UCSF and its legacy and I’m excited to contribute to preserving that legacy.