Health Sciences Data Laboratory and digitized medical records

Today’s post is a brief update on the implementation of the Health Sciences Data Laboratory, a collaboration between the UCSF Archives & Special Collections and the Department of Anthropology, History, and Social Medicine (DAHSM). Last year DAHSM and the Archives were awarded a Resource Allocation Program (RAP) grant to purchase a high-throughput document scanner and begin the huge task of digitizing some of the more than 7 million historic patient files that track the development of care at Mt. Zion and UCSF Hospitals in the 20th century. These files contain a wealth of data – demographic, clinical, and public health – which has been mostly inaccessible on paper media for the life of the record. Electronic health records – data which was collected for clinical rather than research purposes – have already proven unexpectedly useful for epidemiological and public health research (Diez Roux, 2015). Similarly, this lab aims to make the valuable data contained in these records available for new computational access, and to bring a large body of historical records into the realm of big-data health science research.

But for right now, we’re figuring out how it all works! The scanner we were able to purchase is a powerful machine, and at max speed can scan almost 280 pages per minute. Because most of our documents are relatively-fragile paper from the 20s, 30s, and 40s, we scan at a slower speed than this. This helps us to minimize potential for damage of the records and optimize image quality and file size. Even at a slow speed however, this process is vastly improved by the new scanner, which can scan an entire stack of paper (700 pages when full) in one go. Formerly each page had to be scanned one by one, on a flatbed scanner which created only one image at a time.

The new sheet-fed scanner in the Health Sciences Data Laboratory.

The new sheet-fed scanner in the Health Sciences Data Laboratory.

Now that we’ve got the scanner working smoothly and a workflow in place, we’re hoping to begin ramping up production soon. Currently, our intern Maopeli is working on digitizing patient records in order to draw some small-scale research conclusions on the income-levels of patients at that time and how these related to specific health conditions that they experienced, research being done as part of an internship with the CHORI program.

We hope not only to increase the rate of scanning (7 million records is a lot to get through!) but also to start exploring new ways to facilitate researcher access to this wealth of data. As evidenced by the image of a blank sample record, the data contained in these materials is both detailed and comprehensive, but it also requires a lot of labor, both human and computer, to make it computationally actionable. Much of it is handwritten and must either be transcribed or put through heavy-duty image processing algorithms which are more than most researchers have access to. For now though, we’re happy to be finally taking the first important steps as the first images and data from this vast trove make the transition from physical to digital.

Blank eye examination form from patient record.

An example of some of the types of data collected in patient records.

New Archives Intern: Maopeli Ali

We’re happy to welcome new intern Maopeli Ali to Archives & Special Collections. Born and raised in San Francisco, Maopeli is currently a sophomore at Kenyon College in Ohio where he is pursuing a major in biology with a minor in Latin. At Kenyon, he also participates in club rugby and is a member of the Delta Tau Delta fraternity. Maopeli is a seasoned intern; he has previously worked at various institutions in the Bay Area, including an architecture firm, the Geology Department of the California Academy of Science, and the Children’s Hospital Oakland Research Institute (CHORI). Maopeli is very ambitious, and is proud to be a First Generation to College student. He plans to attend graduate school after completing his undergraduate studies to pursue a Criminal Justice Master Degree in Forensic Science. His career goal is to become a forensics investigator for the Federal Bureau of Investigation (FBI).

Portrait of Maopeli Ali with San Francisco in the background.

New Archives intern Maopeli Ali

Maopeli comes to us as part of the Children’s Hospital Oakland Research Institute Summer Research Program. “This program is designed to provide an opportunity for High School and Undergraduate students to immerse themselves in the world of basic and/or clinical research for three months during the summer. The program pairs students with one or two CHORI principal investigators who serve as mentors, guiding the students through the design and testing of their own hypotheses and methodology development. At the end of the summer, students present their research to their peers just as any professional researcher would do.” As a CHORI intern, Maopeli is mentored by Dr. Aimee Medeiros from the UCSF Department of Anthropology, History, and Social Medicine and Polina Ilieva, Head of Archives & Special Collections.

Maopeli will be working on digitizing medical records using our newly-implemented scanning lab purchased with funds from UCSF’s RAP Shared Instrument program. He will then have the opportunity to work with some of this data to formulate a research question which can be addressed by the records.

The Archives are a new experience for Maopeli, whose previous work has mostly focused on biology. He is excited to work in this context, and explore ways in which this study can both help the archives and increase awareness within the health sciences fields about the wealth of historical medical data which is available in the archives and records of large health science universities like UCSF.

Web-Archives at UCSF — Synapse student newspaper

This is part 1 of a series of blog posts we will put together talking about some of our web-archiving activities at UCSF, and examining some of the changes in University web presences over the years we’ve been collecting them. How many parts will there be? We don’t know yet! The possibilities are endless, so we’ll just have to see where it takes us.

By way of some introduction, we here at Archives & Special Collections have been collecting website captures since about 2009. Initially we used a service maintained by California Digital Library called WAS (Web-Archiving Service), but we now use the Internet Archives Archive-It service to capture sites. We plan to use a later post to go into more detail about how Archive-It works, but for this blog post it suffices to know that Archive-It contains the technology that crawls the web-sites and saves them on the Internet Archives servers just down the road at their Clement St. headquarters.

So why do we do this? Currently, much of the history of the University and its various buildings, people, and internal organizations are all published on the web, so examining that history in the future will include looking at these web-sites to assess the way things have changed.

As an example, let’s look at the web page for Synapse, the UCSF Student Newspaper. Even since 2009, we can see significant changes in the look and feel of the web page, and can begin to tease out historical questions from the design and content of the pages.

Synapse home page in 2009. It looks pretty simple, and all the information is pretty static.

Synapse home page in 2011. It’s gone through a redesign, and perhaps looks a bit more like a newspaper now. It also contains some interesting pre-formatted search bars at the top right.

Already questions begin to emerge. Why did the staff institute search starting in 2011? And did the interesting “How Do I?” pre-formatted search bar get added to the page as a result of identified need for such searches? (I had never seen it before this)

Synapse home page in 2013. It now contains a slideshow on the cover page, and has gone to a darker look.

Synapse home page in 2015. It retains the same look, appears to have moved to a new slideshow technology, and just so happens to feature Dr. Atul Butte at the very beginning of his current position at the University.

Between 2013 and 2015 the paper got a new look for its home page, and introduced slide shows for the first time. Additionally the paper gained a “login” option in 2013 but that had been removed again by 2015 — perhaps a brief memory of the everything-must-be-social media phase of design. It’s also the first time that advertisements have appeared directly on the front page of the paper.

In these documents we can also begin to see the rise of precision medicine and computational health sciences at UCSF, and it’s clear that by 2015 the University was ramping up investment, and that this translated directly to the content of the newspaper as well.

Synapse home page in 2017. It now reflects the design we are used to with most sites we visit.

And finally the page today is in line with much of the design we are used to seeing at our most commonly-visited sites. Synapse also happens to be going back to the archives themselves in this latest update, and pulling content which is just distant enough to have a historical feel.

The aesthetic changes in the page also mirror the aesthetic trajectories of the way we think health and the health sciences should “look”. It makes sense that they’ve gone back to a mostly white color scheme — you’d have a hard time finding a contemporary web page for a hospital or health provider with a dark color scheme now.

Sometimes it can be hard to consider web pages as historical artifacts because we are so close to them, but we are now reaching a point where the first web pages we have collected look foreign enough to us that they are beginning to seem more worthy of study. And we’ve barely even mentioned the wealth of data about the way we communicate which is contained in our web-archives and which can be accessed and assessed with new computational historical methods.

You can find all the UCSF web-archives here: https://archive-it.org/organizations/986

What research will you do with them?