Topics in Data Science: Python and JSON workshop

Thanks to everyone who attending the “Topics in Data Science: JSON and Python” workshop on November 14th, 2017!

The format for this workshop was a little different from previous Software Carpentry workshops. Unlike core SWC workshops, which assume no prior programming knowledge, the “Topics” courses are designed for people who have some programming background or who have taken a workshop or course in the past. Although we continue to emphasize hands on programming, workshops in a “Topics” series will cover more material than we can manually type in the time allotted. Some code will be presented through copy and paste, some through review.

This means that to get the most out of these workshops, participants will want to review and work with the code after the class.

In the JSON and Python workshop, we first covered dictionaries, an essential data structure for parsing JSON that isn’t generally covered in the Python section of a standard SWC workshop at UCSF. We covered keys, look up values, and the practice of nesting sub dictionaries and lists within a dictionary or list.

We then used the UCSF Profiles application developed at the CTSI to demonstrate how to request a JSON document, send query parameters, and convert the result to a dictionary. As an example, we reviewed how to parse a JSON document to generate a list of publications for a researcher at UCSF.

After this, we briefly reviewed some of the JSON based web apis available through the National Library of Medicine, applying the same techniques to generate a list of interactions for a particular medication. I highly encourage everyone who took this class to take a more extensive look at this website, and think about what kind of services you’d like to see here at UCSF.

Lastly, we reviewed and ran code using the bottle module to create a JSON based web service, running on localhost, to demonstrate how to write web services that do more than simply provide access to data.

As promised, all code, with comments and explanations, is available on github.

On last note – I mentioned, briefly, that you will sometimes need to directly parse HTML from a website, since many useful data feeds still don’t have a nice JSON api available (or, if they do, it’s hard to find or poorly documented). Here’s a link to a short application that shows how to use beautifulsoup to quickly parse XML.

Because XML is tag based, this approach will work for HTML as well.

Thanks again for attending!