Topics in Data Science: Python and JSON workshop

Thanks to everyone who attending the “Topics in Data Science: JSON and Python” workshop on November 14th, 2017!

The format for this workshop was a little different from previous Software Carpentry workshops. Unlike core SWC workshops, which assume no prior programming knowledge, the “Topics” courses are designed for people who have some programming background or who have taken a workshop or course in the past. Although we continue to emphasize hands on programming, workshops in a “Topics” series will cover more material than we can manually type in the time allotted. Some code will be presented through copy and paste, some through review.

This means that to get the most out of these workshops, participants will want to review and work with the code after the class.

In the JSON and Python workshop, we first covered dictionaries, an essential data structure for parsing JSON that isn’t generally covered in the Python section of a standard SWC workshop at UCSF. We covered keys, look up values, and the practice of nesting sub dictionaries and lists within a dictionary or list.

We then used the UCSF Profiles application developed at the CTSI to demonstrate how to request a JSON document, send query parameters, and convert the result to a dictionary. As an example, we reviewed how to parse a JSON document to generate a list of publications for a researcher at UCSF.

After this, we briefly reviewed some of the JSON based web apis available through the National Library of Medicine, applying the same techniques to generate a list of interactions for a particular medication. I highly encourage everyone who took this class to take a more extensive look at this website, and think about what kind of services you’d like to see here at UCSF.

Lastly, we reviewed and ran code using the bottle module to create a JSON based web service, running on localhost, to demonstrate how to write web services that do more than simply provide access to data.

As promised, all code, with comments and explanations, is available on github.

On last note – I mentioned, briefly, that you will sometimes need to directly parse HTML from a website, since many useful data feeds still don’t have a nice JSON api available (or, if they do, it’s hard to find or poorly documented). Here’s a link to a short application that shows how to use beautifulsoup to quickly parse XML.

Because XML is tag based, this approach will work for HTML as well.

Thanks again for attending!

Job Shadowing: hosting high school students from Balboa High School

On Thursday October 26, 2017, the Library and the CKM hosted job shadowing for high school students for the first time!

Two students, Kelly and Jane, both Juniors from Balboa High School, visited us and spent the day learning about what it is like to work as a programmer.  Kelly and Jane are part of the Game Design Academy at Balboa High School.  The Game Design Academy is the path way for students who are interested in engineering and programming.  Kelly and Jane have not had any programming experience yet.  They will start programming classes next semester.

They started the day off with a tour of the library with Jim.  Then they attended the weekly meeting with the Industry Documents Library team.  During the meeting, they learned about about the IDL project, databases, and search index.

Then they attended Illios code jam with the Illios team where they got a front end programming primer from Jason and listened in as the Illios team discussed ways to improve their UI.

During lunch, Kelly and Jane asked us 10 questions from their teacher.  We had fun discovering that most programmers did not have a college degree in programming.  We also talked about the qualities that help us succeed at our jobs,  the things we expect from a good leader, and Sascha gave them the best advise on exploring everything that is not related to their chosen career because every extracurricular activity they participate in will eventually help them succeed in their career.


We finished off the day at the Maker’s Lab where Dylan gave the students a great overview of the various maker’s projects.  Kelly and Jane were introduced to two different types of 3D printers and saw the clavicle bones that a UCSF doctor printed.  They also talked to second year medical student Parth who mentioned his own experience job shadowing at hospitals while he was in high school.  Since Jane is also interested in a career in medicine, Parth gave her the information on how to job shadow at ZSFG.

We look forward to hosting another job shadowing day in the spring and working with one to two interns during the summer!



Deploying with Ember.js: a story

What is this all about?

This is the story of how we changed our Ember.js application deployment one night while no one was looking. This is not a strictly technical story, and the details may not matter; what is important is the journey we took and the steps you can follow to get your app deploying however you want.

Table of Contents

  1. What is this all about?
  2. Some Background (feel free to skip)
  3. Our Goals (the whole point of this)
  4. Our Journey
    1. Starts with a plan on a whiteboard
    2. Leads to Ember-cli-deploy
  5. The End
  6. Afterwords

Some Background (feel free to skip)

Our application – Ilios – is an installed solution and the code is in two parts. The API and management code is written in PHP and is installed by medical schools all over the world. Each installation is a little bit different, but they all require and consume our Ember.js frontend application. We don’t ship the frontend with the API codebase, it is provide separately and updated far more frequently. In the past we have shipped an index.html file with links to our AWS Cloudfront distribution containing the assets. This way the assets are stored in a CDN where they can be loaded quickly by users and easily kept up to date by our team without needing to understand each individual installation.

Our Goals (the whole point of this)

In order to take advantage of http/2 push and service workers we need to change the way the frontend is deployed. Our goal is to ship a single archive file containing the frontend assets including the app code, service workers, images, styles etc…

This archive must be tagged in such a way that we can have many versions in the wild at the same time with a default active version. Because the code for our API does change it must be possible to activate versions in such a way that v23 is the default frontend for API v1.1 and v34 is the default frontend for API v1.3.

Once the archive is downloaded and extracted by the API server it must be simple to parse and customize the files so that scripts and styles can be sent to the browser using http/2 push LINK headers and service workers can be placed at the root of our domain (where they must be do be effective).

Because we are constantly adding new features and fixing bugs we need to be able to release versions continuously.


We want to ship a magical box to our customers and we want to ship it all the time.

Our Journey

Starts with a plan on a whiteboard


  1. Build the app
  2. Archive the build artifacts
  3. Tag the archive with a version which combines both a unique ID for the code as well as the API version it works with
  4. Upload the archive to an S3 bucket
  5. Mark the most recent version so it is easy to find


  1. Download and extract the archive at
    • a) the most recent version
    • b) a specific version from the past
    • c) dev build we want to test
  2. Parse the the index file to extract configuration and assets
  3. Serve a modified index.html file to users

Leads to Ember-cli-deploy

This isn’t a surprise ember-cli-deploy has both the builtin tools and a plugin ecosystem to make deploying Ember.js apps very manageable. All we need to do is assemble a list of plugins that can meet our needs. As is the case with many ember addons, our journey really starts with Ember Observer where we find there is a whole category dedicated to plugins for ember-cli-deploy.

Some investigation and perusal of the deploy docs and a few minutes in the #ec-deploy channel in the Ember Slack Team will lead us to some standard choices and well supported solutions.

  • ember-cli-deploy is the foundation we can build on
  • ember-cli-deploy-build builds our app
  • ember-cli-deploy-revision-data can tag and activate versions
  • ember-cli-deploy-s3 uploads our assets and makes them public in an S3 bucket

A search for ‘archive’ in Ember Observer leads us to:

  • ember-cli-deploy-archive which takes our build assets and outputs a single tar archive

These combined knock out several of our needs.

Further investigation of ember-cli-deploy-s3 leads us to it’s companion ember-cli-deploy-s3-index which not only uploads to S3, it also manages versioning and activation.

And while looking around we also stumble upon ember-cli-deploy-json-config which conveniently parses our messy index.html file and outputs nice human and machine readable JSON that will be way easier for our API server to consume.

Looks like we don’t have to write any code at all! Just install and configure some plugins just the way we want and in the end just need to run a few instal commands:

ember install ember-cli-deploy
ember install ember-cli-deploy-archive
ember install ember-cli-deploy-build
ember install ember-cli-deploy-display-revisions
ember install ember-cli-deploy-json-config
ember install ember-cli-deploy-revision-data
ember install ember-cli-deploy-s3-index

Which helpfully creates a config/deploy.js file for us with some useful defaults:

module.exports = function(deployTarget) {
  var ENV = {
    build: {}
    // include other plugin configuration that applies to all deploy targets here

  if (deployTarget === 'development') { = 'development';
    // configure other plugins for development deploy target here

  if (deployTarget === 'staging') { = 'production';
    // configure other plugins for staging deploy target here

  if (deployTarget === 'production') { = 'production';
    // configure other plugins for production deploy target here

  // Note: if you need to build some configuration asynchronously, you can return
  // a promise that resolves with the ENV object instead of returning the
  // ENV object synchronously.
  return ENV;

We just need to configure our AWS info and we will something working. Note: I removed targets here as well as comments for brevity.

module.exports = function(deployTarget) {
  var ENV = {
    build: {}
    's3-index': {
      accessKeyId: '<our-access-key>',
      secretAccessKey: '<our-secret>',
      bucket: '<our-bucket-name>',
      region: '<our-bucket-region>'

  return ENV;

Next step is to try it out:

ember deploy development  --verbose
Pipeline complete

And when we got look in our S3 bucket low and behold there is an build.tar:HASH file in there! We’re really getting somewhere.

That version is nice, but remember we really need to know what API versions this build is compatible with. Oh, I see there is a prefix option we can use.

's3-index': {
  prefix: 'v1.22'

Deploy again and we now have a bucket with

build.tar:HASH (from our first deploy)
  build.tar:NEWHASH (from our second deploy)

That is exactly what the doctor ordered. Check that off our list and lets download that archive and see what it has. Hmmm… it has an index.html file, but no index.json file. Looking back at the --verbose output we can see why

+- didBuild
|  |
|  +- archive
- saving tarball of tmp/deploy-dist to tmp/deploy-archive/build.tar
- tarball ok
|  |
|  +- json-config
- generating `tmp/deploy-dist/index.json` from `tmp/deploy-dist/index.html`
- generated: `tmp/deploy-dist/index.json`
- added `index.json` to `context.distFiles`

The build.tar file is getting generated before the index.json file has been created. Since both of these plugins are doing their work inside the didBuild hook we just need to swap the order they run in. We can find out how to do that in the docs at

Looks like we just add a pipeline object to our deploy config like:

pipeline: {
  runOrder: {
    'archive': { after: 'json-config' },

and… done! running ember deploy development again gives us exactly the build.tar we’re looking for.

The End

Seriously. That’s how the story ends. No brave journey into the darkness of code comments to figure out just how this is supposed to work. It just works. TM. It’s so simple we can just add it as a command at the end of our CI process and never think about it again.


I’m 100% head over heals in love with ember-cli-deploy. I find it incredibly freeing to be able to tinker with our deployment process to find just the right setup without needing to design and script each pice every time. The plugin ecosystem gives me the building blocks I can use to assemble our pipeline in whatever way makes sense so we can test new ideas and strategies fast and with very little friction.

I hope this was a useful overview of one deployment journey. Want to share your own? Tell me about a mistake I made or shower me with praise? Leave a comment here or find me at @iam_jrjohnson.

Improving Summon/EZproxy/Shibboleth Login Experience With Custom JavaScript

If you use Summon’s custom auth banner/VPN banner with Shibboleth and EZproxy, you may have noticed an unfortunate side effect. If you do a search, and then click the banner to log in, you will be redirected to the start page rather than back to your search results.

Fixing this means tapping into their app with custom JavaScript. The front end of Summon is built on AngularJS 1.2.27 and ships with jQuery.  Here’s how we’re currently doing it:

(function () {
  var waitingForInitialLoad = true;

  var fixBannerLink = function () {
    var links = $('.vpnBanner.customAuthBanner div a');
    if (waitingForInitialLoad && links.length === 0) {
      setTimeout(fixBannerLink, 100);
    } else {
      waitingForInitialLoad = false;
      window.addEventListener('hashchange', fixBannerLink);
    links.each(function() {
      if (/^Off the UCSF network/.test(this.text)) {
        $(this).attr('ng-href', '' + encodeURIComponent(location.href));
        $(this).attr('href', '' + encodeURIComponent(location.href));


Most embarrassing hack: The /^Off the UCSF network/ regular expression is of course matching the text of the link. You’ll need to update that to match the text in your own link. (Or find a less embarrassing hack and submit it as a pull request in the GitHub repository.)

Also, our EZproxy URL prefix is hard-coded in there a few times. Change to your own, and make sure you use the qurl URL parameter rather than url.

If you have any questions, feel free to open an issue in the GitHub repository.

Beyond Basic SQL

Thanks to everyone who attended the Beyond Basic SQL class at Mission Bay on April 25th!

This class was inspired by the questions and problems I’ve encountered while trying to work with various datasets at the Programming and Pizza sessions.  SQL (the language) and sqlite (the database) can be excellent tools for data analysis, but getting the data into a form where you can use it can be a real challenge.

In the first part of this class, we reviewed the basics of querying a single database table through Selection, Filtering, and Aggregation.  We used the “gapminder” dataset from Software Carpentry to  write queries and do analysis on worldwide per capita gdp, life expectancy, and population by country and continent.

The second part of this class focused on dealing with data that isn’t in a convenient, normalized format.  To do this, we analyzed the kind of spreadsheet researchers often download from the internet.  As a simplified example, we looked at a sample enrollment spreadsheet with information about students, courses, and reading lists.  We reviewed the first two normal forms, discussed how these forms can reduce ambiguity and redundancy, and transformed the non-normalized spreadsheet into a database in second normal form.

It was a lot of material to cover in two hours, thanks for sticking with it.  Here are the promised slides for everyone who wanted to try it out.

If you’re interested in trying this out on a real world example, take a look at the blog post below on data munging and excel.  The examples in that post  (you’ll need to use Python as well as SQL) will present strategies for getting an unstructured spreadsheet into a form where you can use sql, pandas, data frames, and other tools to analyze a dataset.

Class files are available on the ucsf-ckm dsi GitHub repo here.



Data Munging Addendum: The Long Way To Handle Comma Delimited Lists

In an earlier post, we discussed the issue of comma delimited lists within an excel spreadsheet. This way of one-to-many relationships in data can make it more difficult to build look up tables, run queries, and do other types of analysis. Although there are some concise coding approaches, both in SQL and pandas, sometimes you just want to [give up on trying to be clever and] reconstruct your data frame line by line. Here’s a quick overview on how to do this.

You can follow along, cut and paste into your own notebook, or view/checkout the code from github.

Let’s go ahead and build a pandas dataframe with comma delimited information in some of the cells.

[code lang=”python”]
import pandas as pd
import numpy as np

First, we’ll create three lists

[code lang=”python”]
ar1 = [1,2,3,4,5]
ar2 = [2,4,6,8,10]
ar3 = [‘one’,’two,three,four’,’three,four,five’,’four,five’,’five’]

Next, add the lists as columns to a dataframe

[code lang=”python”]
df = pd.DataFrame({‘A’ : ar1, ‘B’ : ar2, ‘C’ : ar3})


And, of course, if you query it through pandasql, you’ll get a single row with a comma delimited list

[code lang=”python”]
from pandasql import sqldf
pysqldf = lambda q: sqldf(q, globals())
pysqldf("SELECT * FROM df WHERE A = 3")


Like last time, we want to have each value for C on a separate row, with corresponding values for A and B.

Here’s how to do this the computationally long and expensive way (which is, in fact, sometimes the way you have to do things – sometimes because performance doesn’t matter and you’re tired of trying to be clever, sometimes because the logic is so intricate that you have to knit it all together line by line anyway).

We’ll create three new arrays (again, to hold the column values).

[code lang=”python”]
a0 = []
a1 = []
a2 = []

Next, we’ll loop through each row of our existing dataframe. We’ll split on the the third row (we start counting from 0 in a list, so that will be at index 2). Splitting on the comma will create a new array with three new strings. We’ll add a new value to each column for each string (word) in that row.

[code lang=”python”]
for index, row in df.iterrows():
for s in row[2].split(‘,’):

Now, let’s create a new data frame out of the three columns we just created

[code lang=”python”]
ndf = pd.DataFrame({‘A’ : a0, ‘B’ : a1, ‘C’ : a2})

Take a look

[code lang=”python”]


We can now query both dataframes using pandasql

[code lang=”python”]
from pandasql import sqldf
pysqldf = lambda q: sqldf(q, globals())

Querying our original dataframe, you’ll see everything on one comma delimited line

[code lang=”python”]
pysqldf("SELECT * FROM df WHERE A = 3")


Quering our new dataframe, you’ll set a separate line for each entry

[code lang=”python”]
pysqldf("SELECT * FROM ndf WHERE A = 3")


It’s the long way to do this, but sometimes the long way can save you a lot of time.

Data Munging with Python, SQL, and Excel

So much of data science is, in fact, data munging and wrangling, moving and transforming it from one format to another. Data, when you’re fortunate enough to know where to find it, is almost never in the nicely organized format you need for your analysis. This is one of the reasons python is so popular among data scientists – it is a good language and environment for collecting, formatting, parsing, combining and splitting data from different sources.

Data frequently arrives in comma delimited files or excel spreadsheets. You can do certainly some analysis with spreadsheet operations, but odds are you’ll eventually want to load it into a data frame and use python (or R) for more meaningful analysis. This post is a write up of a few tips I learned from my recent efforts to wrangle some genomic data available on the web as Excel downloads. These spreadsheets presented a few common issues that arise when dealing with non-normalized data sets in single table format.

The exact nature of the data isn’t really the focus here, but for some context: researchers at UCSF often need information about a gene, variants of that gene, and the effect this Gene has on responsiveness of different tumors or conditions to treatment. There are a number of different places to find this data, a number of different datasets, and (of course) varied ways to search, filter, or query those datasets. Searching and finding can be a long, error prone, irritating, manual process.

We’ll use Python, Pandas, and Jupyter Notebook to try to get a better handle on some of this data. I’m assuming you already know how to open a 1) Jupyter Notebook and issue basic Python commands, and 2) Use pandasql to run SQL queries against a dataframe. (If you don’t, consider signing up for a Software Carpentry class or attend one of our Python/R workshops at Mission Bay).

Otherwise, you can follow the jupyter and python installation documents (I used anaconda for both).

A full jupyter notebook for the code in this post is available on the ucsf-ckm github repository.

Create a DataFrame from an Excel Spreadsheet

We’ll use a spreadsheet from The Precision Medicine Knowledge Base. To follow along, click on the “Download All Interpretations (Excel)” link.

Before analyzing, let’s load the excel spreadsheet into a pandas DataFrame. Open up a jupyter notebook (or just the python interactive shell)  to start.

First off, remember to include the pandas module…

[code language=”python”]
import pandas as pd

Pandas has an easy method to load a spreadsheet (I’m assuming the spreadsheet is in your working directory, otherwise you’ll need to edit the path)

[code language=”python”]
ipm = pd.read_excel("IPM_Knowledgebase_Interpretations_Complete_20160913-2207.xlsx")

Now take at your newly populated dataframe

[code language=”python”]

You’ll see that your spreadsheet headers and data have become the column names and rows of a pandas dataframe. Let’s try using pandasql to run a query on it. First, load the module and globals (more information on this)

[code language=”python”]
from pandasql import sqldf
pysqldf = lambda q: sqldf(q, globals())

And try running a query. Let’s get the variants that match a particular Gene.

[code language=”python”]
pysqldf("SELECT Gene, [Tumor Type(s)], [Variant(s)] FROM ipm WHERE Gene = ‘PIK3CA’")

Note: you must use the brackets around Tumor Type(s) so the white space and parentheses around (s) won’t be interpreted as SQL.

Even without any additional data munging, you’re in a pretty good spot for analyzing your data. You have it in a dataframe, where you can run queries, python expressions, and pandas operations on it. However, there are a few issues, common to spreadsheets, that may make this data harder to work with and analyse.

Relational Databases frequently have “one to many” relationships. In this case, a Gene has a one to many relationship with Tumor Types, Variants, and Citations. Each Gene has an effect on multiple Tumor Types, each Gene can have multiple Variants, and the effect a Gene and Variant has on a Tumor Type can have multiple Citations.

This spreadsheet stores the data for one to many relationships in two different ways. For Genes to Tumor Types and Variants, the spreadsheet provides a comma delimited list in a single cell. For Citations, the spreadsheet adds tacks on a varying number of columns to the right side of the spreadsheet. Although this does provide the data, it can make the data harder to work with.

The next two sections will review techniques for converting comma delimited lists and multiple columns into a one-to-many lookup table.

Common Issue #1 – Comma Delimited Lists

You may notice that although the Genes are provided as single (atomic) values, other fields, such as Variants or Tumor Types are provided as a comma delimited list. This won’t be much of a problem if you want to find all Tumor Types associated with a Gene – the query is straightforward:

[code language=”python”]
pysqldf("SELECT Gene, [Tumor Type(s)] FROM ipm WHERE Gene = ‘CSF3R’")

You’ll get back a single row with a comma delimited list of Tumor Types, rather than a separate row for each Tumor Type, but you can parse that relatively easily.

Now, suppose you wanted to reverse this query, to find all genes that that match a particular Tumor Type. In this case, a query like the one above won’t work, as it will miss fields that have multiple Tumor Types separated by commas. Because SQL will look for an exact match, you’ll won’t get all the results for a query like this.

[code language=”python”] pysqldf("SELECT Gene, [Tumor Type(s)] FROM ipm WHERE [Tumor Type(s)] = ‘Diffuse Large B Cell Lymphoma’") [/code]

Note that you only received a single row from this query, even though there are multiple records that match this tumor type.  SQL does provide a way to find sub-patterns in a text field. You can get all records (sort of/kind of/hand waving) with a LIKE query

[code language=”python”]
pysqldf("SELECT Gene, [Tumor Type(s)] FROM ipm WHERE [Tumor Type(s)] LIKE ‘%Diffuse Large B Cell Lymphoma%’")

NOTE: you may not want the text truncated in your results. To handle this, set a property on your dataframe:

[code language=”python”]
pd.set_option(‘display.max_colwidth’, -1)

Although this works, you might want to split the comma delimited values into separate rows to create a Tumor_Type to Gene lookup table (perhaps to put it into first or higher normal forms As always, there are a number of different ways to do this. You can certainly do this through SQL and temporary tables, but since we’re in python and can access this table as a dataframe, let’s try a python solution.

First, let’s get the Gene and Tumor Type as a dataframe

[code language=”python”]
tumor_types = pysqldf("SELECT Gene, [Tumor Type(s)] as Tumor_Type FROM ipm")

Next, we’ll split the comma delimited tumor_types into separate rows.

[code language=”python”]
gene_tumor_types = pd.DataFrame(tumor_types["Tumor_Type"].str.split(‘,’).tolist(), index=tumor_types["Gene"]).stack()
gene_tumor_types = gene_tumor_types.reset_index()[[0, ‘Gene’]]
gene_tumor_types.columns = [‘Tumor_Type’, ‘Gene’]

See stack overflow for a nice discussion of this solution and other pandas dataframe based solutions.

Take a look at the resulting gene_tumor_types data frame.

[code language=”python”]

You now have an association from Gene to Tumor Type, with each tumor type as an individual row rather than as a comma delimited list. To get the Genes associated with a particular Tumor Type, we no longer need a LIKE query.

[code language=”python”]
pysqldf("SELECT Gene, Tumor_Type FROM gene_tumor_types WHERE Tumor_Type = ‘Diffuse Large B Cell Lymphoma’")

Wonderful! Except that… not so wonderful, it didn’t work – we’re missing data! There’s a big gotcha here. Compare the earlier LIKE query and this one. As an exercise, you might want to stop and try to figure out why (answer is in the next paragraph).

Common Issue # 2: Leading or Trailing White Space

This takes us to another common pitfall – white space! Don’t forget, an equals operator in SQL (and programming languages in general) is an exact match. “ Hello” and “Hello” do not match!

Take a look at the dataframe for gene_tumor_types  – you’ll notice many of the Tumor Types have leading whitespace. This prevents the exact match from occurring, though you will find them through like queries, which find it as a partial match. You can still them them through a LIKE query

[code language=”python”]
pysqldf("SELECT Gene, Tumor_Type FROM gene_tumor_types WHERE Tumor_Type LIKE ‘%Diffuse Large B Cell Lymphoma’")

But that’s a hack and kind of defeats the purpose of creating a new lookup table. We should be able to get this through an equality operator. Let’s trim the whitespace from this column.

[code language=”python”]
gene_tumor_types["Tumor_Type"] = gene_tumor_types["Tumor_Type"].str.strip()

And take a look at the list to see the whitespace has been removed

[code language=”python”]

Now retry the LIKE query and the exact match query – you’ll see that you are now retrieving all the rows.

[code language=”python”]
pysqldf("SELECT Gene, Tumor_Type FROM gene_tumor_types WHERE Tumor_Type = ‘Diffuse Large B Cell Lymphoma’")

Common Issue # 3: Repeated Columns

Another common spreadsheet practice is to tack on a variable number of columns to store one-to-many data relationships. Take a look toward the end (right hand side) of the spreadsheet (or ipm dataframe)

[code language=”python”]

Each row has one or more citations. This spreadsheet stores the one to many relationship by creating a new column for each set.

Unfortunately, this does make it more difficult to query, since we need to know in advance how many Citations to query. Furthermore, the column headers that hold Citations beyond the first one don’t have names, making the query less informative.

For instance, not all Citations have a multiple citations. To get the citations for Gene JAK1, we’d need to write:

[code language=”python”]
pysqldf("SELECT Gene, Citations, [Unnamed: 7], [Unnamed: 8], [Unnamed: 9], [Unnamed: 10], [Unnamed: 11], [Unnamed: 12], [Unnamed: 13], [Unnamed: 14] FROM ipm WHERE Gene = ‘JAK1’")

This query will return all the citations for Gene “JAK1”. However, if you run this query against Gene “MPL”, you’ll receive a value of “None” for several columns. By contrast, if you run this query against Gene “MYD88”, you’ll miss a number of citations that extend out to “Unnamed: 26”.

It would be more convenient to be able to write a query like this:

[code language=”python”]
pysqldf("SELECT Gene, Citation from ipm”)

And receive a separate row for each citation.

Let’s create a lookup table for Gene and Citations. There are, as always, a number of different ways to accomplish this, through SQL or pandas. In this case, we’ll use SQL with a python loop to create a “UNION ALL” query.

[code language=”python”]
query = "SELECT Gene, [Tumor Type(s)], [Variant(s)], Tier, Interpretations, Citations as Citation FROM ipm WHERE Citations != ‘None’"
for i in range(7, 27):
query += (" UNION ALL SELECT Gene, [Tumor Type(s)], [Variant(s)], Tier, Interpretations, [Unnamed: {val}] as Citation FROM ipm WHERE [Unnamed: {val}] != ‘None’ ".format(val=i))
query += ("ORDER BY Gene")

This approach uses a python loop to build a SQL statement. The UNION ALL statement combines the results of more than one query into a single output table. In this case, we are taking querying each Gene and Citation combination and outputting the results into separate rows. You can take a look at the full query (it’s long, so I’m not posting it here, just view the output of the “query” string in jupyter or interactive python).

Let’s look at the results of this table

[code language=”python”]
gene_citations = pysqldf(query)

Let’s re-run the query for JAK1 and MPL.

[code language=”python”]
pysqldf("SELECT Gene, Citation FROM gene_citations WHERE Gene = ‘JAK1’")

[code language=”python”]
pysqldf("SELECT Gene, Citation FROM gene_citations WHERE Gene = ‘MPL’")

You are now able to get clean, multiple row results from the Python dataframe through a one-to-many table relation.

We’ll tackle a few more data sources for responsiveness of various tumors and conditions to genes and variants in future posts. Stay tuned.

And, as always, if you’d like to learn more about Python or R, please consider signing up for a Software Carpentry Class or attend one of our workshops at Mission Bay!

Rails and Ember without rails-api

This is a follow up to a post I wrote a couple months ago, “Ember and Rails Scaffolding for Simple Crud”. In that post, I gave an overview for how to generate simple CRUD scaffolding for an Ember app using Rails purely as an api through the rails-api gem.

Here’s the thing… if you take the api-only approach, you by design give up the ability to write standard Rails views. This can be a good thing, of course, which is why the gem was integrated into Rails 5. If all you want is an api, you don’t want the extra weight of everything else that comes with Rails, and you always can add various gems back in as needed. But for now, you may want to preserve the ability to write a full MVC rails app while still providing an API for Ember or other single page javascript frameworks.

Fortunately, this isn’t especially difficult. The Rails side will get a little more verbose, at least the way I’m writing it, but all you need to do is ensure that your app responds to both html and json – and of course be particularly careful to make sure that you don’t mix view logic with backend logic.

So, here we go…

First, create a full rails app with basic CRUD for the User model in the previous tutorial. I’m not going to repeat the steps here since they won’t change much. The only difference here is that instead of doing this with the rails-api gem and command, you’ll now do this with traditional rails. You will still need to create serializers, add rack/cors, allow access to various HTTP actions in the Rails app, and so forth. This is all available through the previous tutorial, with one change – you don’t need to install the rails-api gem, and wherever it says “rails-api generate…”, instead just use “rails generate…”.

You should now have a fully functional rails app for CRUD operations on a User that also provides json formatting as an api. The main difference between the api for a traditional Rails app and the rails-api generated app is that the traditional rails app responds by default as html, whereas rails-api responds as json. To get a json response from the traditional rails app, you will need to append “.json” to the url – in other words, to get the list of users rendered as json rather than displayed as html, you’d need to request:


whereas the rails-api version doesn’t require this extension, as a rails-api app by default returns json (and wouldn’t normally respond as html at all).

On the Ember side, we need to instruct the adapter to specifically request json from the Rails app, as this is no longer the default Rails response.

To accomplish this, we will modify the Ember adapter in app/user/adapter.js

[code language=”ruby”]
import DS from ’ember-data’;

export default DS.RESTAdapter.extend({
host: ‘http://localhost:3000’,
buildURL: function(record, suffix) {
var s = this._super(record, suffix);
return s + ".json";

As you can see, this will append “.json” to all the requests send from Ember to Rails – even post, put, and delete requests , so you’ll need to explicitly handle the json format in any Rails controllers you intend to make available to Ember. As a result, we’ll need to modify the update and create methods in the Rails controller to specifically respond with json for Ember.

There is, inevitably, one more wrinkle – although Rails does respond by default to the “.json” extension, Ember expects a slightly different formatting, so you’ll need to make a few tweaks to get it working with Ember. Here’s the full controller code:

[code language=”ruby”]
class UsersController &lt; ApplicationController
before_action :set_user, only: [:show, :edit, :update, :destroy]

# GET /users
# GET /users.json
def index
@users = User.all

#render json: @users

respond_to do |format|
format.json { render json: @users }

# GET /users/1
# GET /users/1.json
def show
respond_to do |format|
format.json { render json: @user }

# GET /users/new
def new
@user =

# GET /users/1/edit
def edit

# POST /users
# POST /users.json
def create
@user =

respond_to do |format|
format.html { redirect_to @user, notice: 'User was successfully created.' }
format.json { render :show, status: :created, location: @user }
format.html { render :new }
format.json { render json: @user.errors, status: :unprocessable_entity }

# PATCH/PUT /users/1
# PATCH/PUT /users/1.json
def update
respond_to do |format|
if @user.update(user_params)
format.html { redirect_to @user, notice: 'User was successfully updated.' }
format.json { render :show, status: :ok, location: @user }
format.html { render :edit }
format.json { render json: @user.errors, status: :unprocessable_entity }

# DELETE /users/1
# DELETE /users/1.json
def destroy
respond_to do |format|
format.html { redirect_to users_url, notice: 'User was successfully destroyed.' }
format.json { head :no_content }

# Use callbacks to share common setup or constraints between actions.
def set_user
@user = User.find(params[:id])

# Never trust parameters from the scary internet, only allow the white list through.
def user_params

You may notice some additional code in create and update. This is because we need to respond as json for Ember, which we configured to submit all requests with the .json extension (even POST and PUT requests).

At this point, you can bring up both a Rails app on port 3000 and an Ember app on port 4200 and use both a standard Rails view and the Ember client for CRUD operations on your User model.

This does require some extra overhead, but it does keep open the possibility of writing a traditional Rails app while providing an API for not just Ember but any other app that might want to consume a Rails API.

Ember and Rails Scaffolding for Simple CRUD

Like many developers who use Rails, I haven’t thought about scaffolding for a while, but I found myself wanting something like it when I started looking into Ember as a front end for Rails. As with Rails, my guess is that I won’t use Ember scaffolding for long, but I wanted a way to get that initial Ember application up and running, with Rails as a backend.

It turns out that there are easy to use scaffolding generators for an Ember front end and a Rails API backend, with just a few minor gotchas you need to be aware of when integrating the two.

For this tutorial, we’re going to do the simplest thing possible for a crud app. We will create a single model (User), with one field (name), and we will create a web app with Create, Retrieve, Update, and Delete operations. Ember will supply the front end, Rails will handle the back end.

One quick note – while there are ways to integrate Rails with Ember in the same application, this tutorial will build the Ember frontend out as a completely separate, stand-alone app from the Rails backend. This way, the front end can be integrated with any backend that provides the API that Rails provides.

Here we go…

Part 1 – Create an Ember front end

We’ll create a client side, front end application that provides an interface for CRUD operations.

Step 0 – Install ember and ember-cli.

This will also require installing node, npm. I’m pretty sure you’ll need git as well. I’m not going to cover all this (or the process for installing Ruby or Rails). There are plenty of guides on the web to getting all this installed. Just make sure you can execute the following command

ember -v

I’m using ember 1.13.8, node 0.12.6, and npm 2.13.4 (on Mac OS X).

Step 1 – Create an ember app

ember new ember-crud

Step 2 – Install the ember-cli-scaffold package

cd into ember-crud and run

ember install ember-cli-scaffold

(for more information, see

Step 3 – Generate scaffolding for a Model (User) with one field (Name)

ember generate scaffold user name:string --pod

The “pod” option creates a different organizational structure for your files than the standard ember defaults. I prefer it and will be using it here, but the differences are very minimal.

Step 4 – Verify that your app is working

ember serve

and go to http://localhost:4200/users

You should see a very Rails-ish interface with full CRUD for a user with a single input field, name. Go ahead and create, edit, delete a few to verify that it is all working.

The ember CRUD app is using a local dev storage provided by mirage. In the next step, we’ll swap this out for a Rails backend.

Part 2 – Provide a Rails API for CRUD operations

Ember is already providing the view, so rather than creating a full blown rails app, we will limit the Rails app to providing an API for persistence.

Step 0: Install Ruby and Rails

As with Ember, there are lots of resources on the web for getting Ruby and Rails installed. Make sure that you can run

ruby -v

(I’m using 2.1.0)


rails -v

(I’m using Rails 4.2.0. You will need this version or later for Rails to use the rails-api gem, which I believe will be included in Rails 5).

Step 1 – Create a rails API only application

Install the rails-api gem

gem install rails-api

And generate a new Rails application

rails-api new rails-crud

Step 2 – Create an API scaffold for CRUD operations for a User

cd into rails-crud and type

rails-api g scaffold user name:string

While I promised not to go into a lot of detail here, you may notice that no view tier is created, and if you look at the user controller, you’ll see that it is set up for rendering json, not html.

Step 3 – Seed with a bit of data

Technically, you don’t need to do this step, since you’ll populate from your Ember app, but it can help to verify everything is working properly on the Rails side before integrating with your Ember app.

in db/seeds.rb, create something like this

[code language=”ruby”]
user1 = User.create(name: ‘user1’)
user2 = User.create(name: ‘user2’)

and run

rake db:migrate
rake db:seeds

Step 4 – Check out the Rails API


rails server

go to localhost:3000/users

and you should see a json representation of the two users you created in the seeds.rb file.

Part 3 – Use the Rails API service as a backend for the Ember application

This is relatively straightforward, though there are a few wrinkles.

Step 1 – Modify the way Rails is rendering JSON for a list of objects

Take another look at the json returned from http://localhost:300/users

[code language=”json”][{"id":1,"name":"user1","created_at":"2015-10-21T22:17:32.778Z","updated_at":"2015-10-21T22:17:32.778Z"},{"id":2,"name":"user2","created_at":"2015-10-21T22:17:32.783Z","updated_at":"2015-10-21T22:17:32.783Z"}]

You may notice that Rails has flattened this into a single array. Ember, by default, expects a slightly different formatting where the json array of User objects is stored in a hash with the model type as a key.

One approach to this problem is to use a serializer to establish the format for JSON output from the rails api.

Add the serializer to your Gemfile

gem 'active_model_serializers', '~>; 0.8.3'

and run

bundle update

and create a new serializer for the user model

rails g serializer user

This will create a user_serializer.rb file in app/serializers.

[code language=”ruby”]
class UserSerializer < ActiveModel::Serializer
embed :ids, embed_in_root: true
attributes :id, :name

This code will format users the way Ember expects it at the defaults, and will include only the id and name that are expected by the Ember model we created earlier (the various Rails additions like created_at or updated_at will not be serialized and sent to Ember as JSON).


I recently tried this with ember-cli 2.11.0, and it looks like the formatting for JSON may have changed since I wrote this. To get this working, I had to create a new file named json_api.rb in the initializers folder containing the following code (per this tutorial from EmberIgniter).

[code language=”ruby”]
ActiveSupport.on_load(:action_controller) do
require ‘active_model_serializers/register_jsonapi_renderer’

ActiveModelSerializers.config.adapter = :json_api

Once you’ve made these changes, reload http://localhost:3000/users Or go to one of the individual users http://localhost:3000/users/1 You should see the following change to the json representation


Step 2 – Tell Rails to allow Ember to use the API

For security reasons, Rails by default won’t allow an application running on a different port to access the API. To solve this, add the following to your Gemfile

gem 'rack-cors', :require => 'rack/cors'

And add the following configuration to your Rails config/application.rb file

[code language=”ruby”]
class Application < Rails::Application

config.middleware.use Rack::Cors do
allow do
origins ‘*’
resource ‘*’, headers: :any, methods: [:get, :post, :put, :delete, :options]

and run

bundle update

and restart the rails server

This is the minimal configuration to get this example working – for more information on how to allow cross-origin JSON properly check out the rack-cors documentation at

Step 3 – Point Ember at the Rails API

In your Ember application, open the application.js file in app/user/adapter.js (if you didn’t use the –pod structure, this will be in app/adapters instead). You should see a line

namespace: 'api'

change this to

host: ‘http://localhost:3000’

You will also need to disable mirage so that it won’t intercept Ember communications with the rails app. In the config directory of the ember-crud application, open the environments.js file and add

[code language=”ruby”]
if (environment === ‘development’) {
ENV[’ember-cli-mirage’] = {
enabled: false


Step 4 – Turn off Ember’s cli security policy

Like Rails, ember comes with configuration defaults to protect against cross domain security problems. To get this example running quickly, you can remove this line from package.json in your Ember app.

“ember-cli-content-security-policy”: “0.4.0”,

As with other config options in this tutorial, this is something you’ll want to read about and understand rather than just disabling.

Step 5 – Verify that the changes to the Ember UI are directed to and properly handled by the Rails application

Restart ember by typing

ember serve

(you may need to stop and restart the server if it is still running), and navigate to


You should see a list of the users you created in the Rails database seed.

Try adding, editing, or deleting a few users. You can verify the changes at the Rails back end by rendering the list of Users in the database as JSON by going to


Step 6 – Fix the duplicates problem

You may have noticed that when you create a new record, two new items are added to your list – but if you look at the rails service, only one record was persisted and one of the records in the ember list has a null id. If you refresh the page, the record with the null id will disappear.

I’ve been looking around for a real solution to this. If you just want to make the null id record to disappear, you can hack it in the index route:

[code language=”ruby”]
model: function() {
return‘user’, {id: true});


I wrote this as part of my notes on getting up and running on Ember and Rails. I’ve found that if I don’t do these write ups when I’m learning something (figuring I’ll do it later when I understand it all better), there’s a good chance I’ll never do it at all. However, I figured it would be a good idea to run it by a colleague here at the UCSF CKM, Jon Johnson, who has some Ember experience. He said no problem posting his reply:

There are a couple of things you might do a little bit differently, but they aren’t wrong. I’m not sure if its a drop in but looks like a better serializer to use in rails. Active record will work and continue to be supported, but if I were going at this from scratch I would start with that.

For Ember you might want to setup the adapter globally to talk to rails instead of just for the user model. You can do that with `ember g adapter application` It looks like that paragraph might be stuck between these two things as you reference application.js being in the user pod.

Instead of killing mirage in development you could also restart the server in production mode. Thats what we do to talk to the real API. Something like `ember s –env=production` will not go through mirage at all. I’m not sure if that is easier or harder than your way.

Always Be Collecting Feedback

As part of our ongoing efforts to get to know our users better, the UCSF Library web team decided we wanted some of that sweet, sweet microfeedback action, and so we deployed a very short satisfaction survey at back in July.

How Are We Doing tabAnytime a patron clicked the How Are We Doing button at the bottom of every page, they were asked a simple question: How would you rate your experience today? Patrons could let us know if they were satisfied or unsatisfied with their web experience.

Regardless of their answer, they’d be prompted to go into detail and provide some demographic information, but the only required question was whether they were satisfied/unsatisfied. Our hope was that keeping the survey short and to the point, and constantly available on every page, would encourage participation.

The not-so-secret goal of this survey structure was for the web team to learn directly from our patrons where they’re having problems, so that we make improvements to the website based on their needs and not our own assumptions. Our thinking was that a user frustrated enough to leave a comment was a user we’d want to hear from.

Enough Background Already, What Did We Learn?

help-btnThe stats below are from July 7 through August 23, 2015, the day before we introduced our new Help feature. We’re excluding overlapping dates from this analysis because the Help button began competing with the How Are We Doing button for user attention (more on this below), and we wanted to base our analysis on clean data.

Of the 201 responses received during that period, 65% had a satisfactory experience at our site. Hooray!

If we drill down to the 76% of respondents who shared their demographic information, the overwhelming number of responses came from UCSF affiliates (94%), with Staff leading the way, closely followed by Faculty, then Students. It’s likely the data was skewed by the summer months, and it’ll be interesting to see if the affiliation breakdown changes now that the fall semester is in full swing.


Patron satisfaction is awesome, but remember our not-so-secret goal was to learn why our users are unsatisfied. While only 20% of all respondents bothered to comment at all, our hypothesis about frustrated users being more likely to go into detail was correct: 87% of comments came from unsatisfied users. Hooray (really)!

Unsatisfied Users are More Likely to Comment

What’s Making our Users Unhappy?

Most of the frustration came from patrons who felt that navigating the site and finding what they needed was just too hard. 2nd prize for user frustration goes to remote access issues, with patrons expressing difficulties getting to articles using EZproxy, UC-eLinks, and VPN.

Connection errors and Library service issues (comments like you don’t have a book I want and my barcode number doesn’t work anymore) tied for 3rd place, and I was personally amused and humbled to know that 9% of the feedback was about how annoying they found the feedback survey popup window (removed after a few weeks).

Unsatisfied Users - Comments Breakdown
* If a patron gave feedback in more than one category, I picked the dominant issue.

So What’s Next?

We were fortunate to meet with some respondents, and we used their comments and other key takeaways from our satisfaction survey, the interviews we conducted with our customer-facing staff, and the LibQUAL 2013 survey to finalize the Library User Personas we’ll rely on as we make changes going forward.

We’ll keep our satisfaction survey going, but with the successful rollout of the Help feature, the time has come to modify how we ask for feedback. The How Are We Doing and Help buttons serve different purposes; unfortunately, the current design doesn’t make that distinction clear. Getting requests for help in the satisfaction survey gave us useful information before we rolled out the Help feature, but now it’s more appropriate that help me please comments get routed to the right person right away for faster customer service.

We’ll launch our more low-key request for feedback this week.

New Feedback Button
The new Feedback button will live at the bottom of every web page.