A (very) brief Report back from Society of American Archivists

It’s been a whirlwind last couple of weeks for me as I bounced from conference to conference, but as I settle back in it’s been exciting to collect my thoughts on what I’ve learned. And while it’s still fresh in my memory, this is a brief report back from the largest conference I attended — the annual meeting of the Society of American Archivists (SAA) which was held last week in impossibly-quaint Portland, OR.

Being the digital archivist, I mostly spent my time in sessions focused on processing, preserving, and providing access to digital materials, in all the different forms that can take. One of the most fruitful of these was hosted by colleagues from UCLA, UCB, Stanford’s Hoover Institute, Cornell, and Emory, and was entitled “What we talk about when we talk about processing born-digital.” This session reported on an effort to establish shared definitions for what it means to process born-digital archival collections. Because this field is so new, what is considered “processing” a collection at one institution might be a totally different set of tasks from that performed at another. To address this, the group is attempting to identify which steps are essential or recommended, and assign different processing levels based on these frameworks.

To attempt to break all these steps out in a clear way is an immense amount of work, so I’m incredibly excited that my colleagues have begun to take on this huge task. It will help us all out in a massive way.

UCSF was not without good representation, as our own Polina Ilieva moderated several events — one that was a meeting of the section on Science, Technology, and Health Care archives, and one that was a panel discussion on Collecting and Preserving contemporary science in institutional archives.

Two people in front of a power-point presentation at a meeting of the Science, Technology, and Health Care Section of the Society of American Archivists.

A very poor photo of Polina Ilieva taking over as Senior Co-Chair of the Science, Technology, and Health Care Section of the Society of American Archivists

Finally, some of my most interesting food for thought came from a panel on archival responses to climate change. The panel covered everything from Native Hawaiian community preservation of historic material endangered by sea level-rise, to projects acquiring better data to map which archival repositories are likely to be most affected by a changing climate. Especially pertinent for my work was a presentation urging us as digital archivists to think more explicitly about what kinds of energy use we are engaging through our different preservation practices. Simply put: current digital preservation practices rely on cheap data storage, and cheap data storage relies upon energy from fossil fuels. So where can we start to change that?

More updates soon as we start to engage with all these thoughts more directly at UCSF.

Health Sciences Data Laboratory and digitized medical records

Today’s post is a brief update on the implementation of the Health Sciences Data Laboratory, a collaboration between the UCSF Archives & Special Collections and the Department of Anthropology, History, and Social Medicine (DAHSM). Last year DAHSM and the Archives were awarded a Resource Allocation Program (RAP) grant to purchase a high-throughput document scanner and begin the huge task of digitizing some of the more than 7 million historic patient files that track the development of care at Mt. Zion and UCSF Hospitals in the 20th century. These files contain a wealth of data – demographic, clinical, and public health – which has been mostly inaccessible on paper media for the life of the record. Electronic health records – data which was collected for clinical rather than research purposes – have already proven unexpectedly useful for epidemiological and public health research (Diez Roux, 2015). Similarly, this lab aims to make the valuable data contained in these records available for new computational access, and to bring a large body of historical records into the realm of big-data health science research.

But for right now, we’re figuring out how it all works! The scanner we were able to purchase is a powerful machine, and at max speed can scan almost 280 pages per minute. Because most of our documents are relatively-fragile paper from the 20s, 30s, and 40s, we scan at a slower speed than this. This helps us to minimize potential for damage of the records and optimize image quality and file size. Even at a slow speed however, this process is vastly improved by the new scanner, which can scan an entire stack of paper (700 pages when full) in one go. Formerly each page had to be scanned one by one, on a flatbed scanner which created only one image at a time.

The new sheet-fed scanner in the Health Sciences Data Laboratory.

The new sheet-fed scanner in the Health Sciences Data Laboratory.

Now that we’ve got the scanner working smoothly and a workflow in place, we’re hoping to begin ramping up production soon. Currently, our intern Maopeli is working on digitizing patient records in order to draw some small-scale research conclusions on the income-levels of patients at that time and how these related to specific health conditions that they experienced, research being done as part of an internship with the CHORI program.

We hope not only to increase the rate of scanning (7 million records is a lot to get through!) but also to start exploring new ways to facilitate researcher access to this wealth of data. As evidenced by the image of a blank sample record, the data contained in these materials is both detailed and comprehensive, but it also requires a lot of labor, both human and computer, to make it computationally actionable. Much of it is handwritten and must either be transcribed or put through heavy-duty image processing algorithms which are more than most researchers have access to. For now though, we’re happy to be finally taking the first important steps as the first images and data from this vast trove make the transition from physical to digital.

Blank eye examination form from patient record.

An example of some of the types of data collected in patient records.

New Archives Intern: Maopeli Ali

We’re happy to welcome new intern Maopeli Ali to Archives & Special Collections. Born and raised in San Francisco, Maopeli is currently a sophomore at Kenyon College in Ohio where he is pursuing a major in biology with a minor in Latin. At Kenyon, he also participates in club rugby and is a member of the Delta Tau Delta fraternity. Maopeli is a seasoned intern; he has previously worked at various institutions in the Bay Area, including an architecture firm, the Geology Department of the California Academy of Science, and the Children’s Hospital Oakland Research Institute (CHORI). Maopeli is very ambitious, and is proud to be a First Generation to College student. He plans to attend graduate school after completing his undergraduate studies to pursue a Criminal Justice Master Degree in Forensic Science. His career goal is to become a forensics investigator for the Federal Bureau of Investigation (FBI).

Portrait of Maopeli Ali with San Francisco in the background.

New Archives intern Maopeli Ali

Maopeli comes to us as part of the Children’s Hospital Oakland Research Institute Summer Research Program. “This program is designed to provide an opportunity for High School and Undergraduate students to immerse themselves in the world of basic and/or clinical research for three months during the summer. The program pairs students with one or two CHORI principal investigators who serve as mentors, guiding the students through the design and testing of their own hypotheses and methodology development. At the end of the summer, students present their research to their peers just as any professional researcher would do.” As a CHORI intern, Maopeli is mentored by Dr. Aimee Medeiros from the UCSF Department of Anthropology, History, and Social Medicine and Polina Ilieva, Head of Archives & Special Collections.

Maopeli will be working on digitizing medical records using our newly-implemented scanning lab purchased with funds from UCSF’s RAP Shared Instrument program. He will then have the opportunity to work with some of this data to formulate a research question which can be addressed by the records.

The Archives are a new experience for Maopeli, whose previous work has mostly focused on biology. He is excited to work in this context, and explore ways in which this study can both help the archives and increase awareness within the health sciences fields about the wealth of historical medical data which is available in the archives and records of large health science universities like UCSF.