Beyond Basic SQL

Thanks to everyone who attended the Beyond Basic SQL class at Mission Bay on April 25th!

This class was inspired by the questions and problems I’ve encountered while trying to work with various datasets at the Programming and Pizza sessions.  SQL (the language) and sqlite (the database) can be excellent tools for data analysis, but getting the data into a form where you can use it can be a real challenge.

In the first part of this class, we reviewed the basics of querying a single database table through Selection, Filtering, and Aggregation.  We used the “gapminder” dataset from Software Carpentry to  write queries and do analysis on worldwide per capita gdp, life expectancy, and population by country and continent.

The second part of this class focused on dealing with data that isn’t in a convenient, normalized format.  To do this, we analyzed the kind of spreadsheet researchers often download from the internet.  As a simplified example, we looked at a sample enrollment spreadsheet with information about students, courses, and reading lists.  We reviewed the first two normal forms, discussed how these forms can reduce ambiguity and redundancy, and transformed the non-normalized spreadsheet into a database in second normal form.

It was a lot of material to cover in two hours, thanks for sticking with it.  Here are the promised slides for everyone who wanted to try it out.

If you’re interested in trying this out on a real world example, take a look at the blog post below on data munging and excel.  The examples in that post  (you’ll need to use Python as well as SQL) will present strategies for getting an unstructured spreadsheet into a form where you can use sql, pandas, data frames, and other tools to analyze a dataset.

Class files are available on the ucsf-ckm dsi GitHub repo here.

 

 

Data Munging Addendum: The Long Way To Handle Comma Delimited Lists

In an earlier post, we discussed the issue of comma delimited lists within an excel spreadsheet. This way of one-to-many relationships in data can make it more difficult to build look up tables, run queries, and do other types of analysis. Although there are some concise coding approaches, both in SQL and pandas, sometimes you just want to [give up on trying to be clever and] reconstruct your data frame line by line. Here’s a quick overview on how to do this.

You can follow along, cut and paste into your own notebook, or view/checkout the code from github.

Let’s go ahead and build a pandas dataframe with comma delimited information in some of the cells.

import pandas as pd
import numpy as np

First, we’ll create three lists

ar1 = [1,2,3,4,5]
ar2 = [2,4,6,8,10]
ar3 = ['one','two,three,four','three,four,five','four,five','five']

Next, add the lists as columns to a dataframe

df = pd.DataFrame({'A' : ar1, 'B' : ar2, 'C' : ar3})

screen-shot-2016-10-10-at-9-00-20-am

And, of course, if you query it through pandasql, you’ll get a single row with a comma delimited list

from pandasql import sqldf
pysqldf = lambda q: sqldf(q, globals())
pysqldf("SELECT * FROM df WHERE A = 3")

screen-shot-2016-10-10-at-9-01-49-am

Like last time, we want to have each value for C on a separate row, with corresponding values for A and B.

Here’s how to do this the computationally long and expensive way (which is, in fact, sometimes the way you have to do things – sometimes because performance doesn’t matter and you’re tired of trying to be clever, sometimes because the logic is so intricate that you have to knit it all together line by line anyway).

We’ll create three new arrays (again, to hold the column values).

a0 = []
a1 = []
a2 = []

Next, we’ll loop through each row of our existing dataframe. We’ll split on the the third row (we start counting from 0 in a list, so that will be at index 2). Splitting on the comma will create a new array with three new strings. We’ll add a new value to each column for each string (word) in that row.

for index, row in df.iterrows():
for s in row[2].split(','):
a0.append(row[0])
a1.append(row[1])
a2.append(s)

Now, let’s create a new data frame out of the three columns we just created

ndf = pd.DataFrame({'A' : a0, 'B' : a1, 'C' : a2})

Take a look

ndf

screen-shot-2016-10-10-at-9-04-45-am

We can now query both dataframes using pandasql

from pandasql import sqldf
pysqldf = lambda q: sqldf(q, globals())

Querying our original dataframe, you’ll see everything on one comma delimited line

pysqldf("SELECT * FROM df WHERE A = 3")

screen-shot-2016-10-10-at-9-05-37-am

Quering our new dataframe, you’ll set a separate line for each entry

pysqldf("SELECT * FROM ndf WHERE A = 3")

screen-shot-2016-10-10-at-9-06-04-am

It’s the long way to do this, but sometimes the long way can save you a lot of time.

Data Munging with Python, SQL, and Excel

So much of data science is, in fact, data munging and wrangling, moving and transforming it from one format to another. Data, when you’re fortunate enough to know where to find it, is almost never in the nicely organized format you need for your analysis. This is one of the reasons python is so popular among data scientists – it is a good language and environment for collecting, formatting, parsing, combining and splitting data from different sources.

Data frequently arrives in comma delimited files or excel spreadsheets. You can do certainly some analysis with spreadsheet operations, but odds are you’ll eventually want to load it into a data frame and use python (or R) for more meaningful analysis. This post is a write up of a few tips I learned from my recent efforts to wrangle some genomic data available on the web as Excel downloads. These spreadsheets presented a few common issues that arise when dealing with non-normalized data sets in single table format.

The exact nature of the data isn’t really the focus here, but for some context: researchers at UCSF often need information about a gene, variants of that gene, and the effect this Gene has on responsiveness of different tumors or conditions to treatment. There are a number of different places to find this data, a number of different datasets, and (of course) varied ways to search, filter, or query those datasets. Searching and finding can be a long, error prone, irritating, manual process.

We’ll use Python, Pandas, and Jupyter Notebook to try to get a better handle on some of this data. I’m assuming you already know how to open a 1) Jupyter Notebook and issue basic Python commands, and 2) Use pandasql to run SQL queries against a dataframe. (If you don’t, consider signing up for a Software Carpentry class or attend one of our Python/R workshops at Mission Bay).

Otherwise, you can follow the jupyter and python installation documents (I used anaconda for both).

A full jupyter notebook for the code in this post is available on the ucsf-ckm github repository.

Create a DataFrame from an Excel Spreadsheet

We’ll use a spreadsheet from The Precision Medicine Knowledge Base. To follow along, click on the “Download All Interpretations (Excel)” link.

Before analyzing, let’s load the excel spreadsheet into a pandas DataFrame. Open up a jupyter notebook (or just the python interactive shell)  to start.

First off, remember to include the pandas module…

import pandas as pd

Pandas has an easy method to load a spreadsheet (I’m assuming the spreadsheet is in your working directory, otherwise you’ll need to edit the path)

ipm = pd.read_excel("IPM_Knowledgebase_Interpretations_Complete_20160913-2207.xlsx")

Now take at your newly populated dataframe

ipm

You’ll see that your spreadsheet headers and data have become the column names and rows of a pandas dataframe. Let’s try using pandasql to run a query on it. First, load the module and globals (more information on this)

from pandasql import sqldf
pysqldf = lambda q: sqldf(q, globals())

And try running a query. Let’s get the variants that match a particular Gene.

pysqldf("SELECT Gene, [Tumor Type(s)], [Variant(s)] FROM ipm WHERE Gene = 'PIK3CA'")


Note: you must use the brackets around Tumor Type(s) so the white space and parentheses around (s) won’t be interpreted as SQL.

Even without any additional data munging, you’re in a pretty good spot for analyzing your data. You have it in a dataframe, where you can run queries, python expressions, and pandas operations on it. However, there are a few issues, common to spreadsheets, that may make this data harder to work with and analyse.

Relational Databases frequently have “one to many” relationships. In this case, a Gene has a one to many relationship with Tumor Types, Variants, and Citations. Each Gene has an effect on multiple Tumor Types, each Gene can have multiple Variants, and the effect a Gene and Variant has on a Tumor Type can have multiple Citations.

This spreadsheet stores the data for one to many relationships in two different ways. For Genes to Tumor Types and Variants, the spreadsheet provides a comma delimited list in a single cell. For Citations, the spreadsheet adds tacks on a varying number of columns to the right side of the spreadsheet. Although this does provide the data, it can make the data harder to work with.

The next two sections will review techniques for converting comma delimited lists and multiple columns into a one-to-many lookup table.

Common Issue #1 – Comma Delimited Lists

You may notice that although the Genes are provided as single (atomic) values, other fields, such as Variants or Tumor Types are provided as a comma delimited list. This won’t be much of a problem if you want to find all Tumor Types associated with a Gene – the query is straightforward:

pysqldf("SELECT Gene, [Tumor Type(s)] FROM ipm WHERE Gene = 'CSF3R'")

You’ll get back a single row with a comma delimited list of Tumor Types, rather than a separate row for each Tumor Type, but you can parse that relatively easily.

Now, suppose you wanted to reverse this query, to find all genes that that match a particular Tumor Type. In this case, a query like the one above won’t work, as it will miss fields that have multiple Tumor Types separated by commas. Because SQL will look for an exact match, you’ll won’t get all the results for a query like this.

 pysqldf("SELECT Gene, [Tumor Type(s)] FROM ipm WHERE [Tumor Type(s)] = 'Diffuse Large B Cell Lymphoma'") 

Note that you only received a single row from this query, even though there are multiple records that match this tumor type.  SQL does provide a way to find sub-patterns in a text field. You can get all records (sort of/kind of/hand waving) with a LIKE query

pysqldf("SELECT Gene, [Tumor Type(s)] FROM ipm WHERE [Tumor Type(s)] LIKE '%Diffuse Large B Cell Lymphoma%'")

NOTE: you may not want the text truncated in your results. To handle this, set a property on your dataframe:

pd.set_option('display.max_colwidth', -1)

Although this works, you might want to split the comma delimited values into separate rows to create a Tumor_Type to Gene lookup table (perhaps to put it into first or higher normal forms https://en.wikipedia.org/wiki/First_normal_form). As always, there are a number of different ways to do this. You can certainly do this through SQL and temporary tables, but since we’re in python and can access this table as a dataframe, let’s try a python solution.

First, let’s get the Gene and Tumor Type as a dataframe

tumor_types = pysqldf("SELECT Gene, [Tumor Type(s)] as Tumor_Type FROM ipm")

Next, we’ll split the comma delimited tumor_types into separate rows.

gene_tumor_types = pd.DataFrame(tumor_types["Tumor_Type"].str.split(',').tolist(), index=tumor_types["Gene"]).stack()
gene_tumor_types = gene_tumor_types.reset_index()[[0, 'Gene']]
gene_tumor_types.columns = ['Tumor_Type', 'Gene']

See stack overflow for a nice discussion of this solution and other pandas dataframe based solutions.

Take a look at the resulting gene_tumor_types data frame.

gene_tumor_types

You now have an association from Gene to Tumor Type, with each tumor type as an individual row rather than as a comma delimited list. To get the Genes associated with a particular Tumor Type, we no longer need a LIKE query.

pysqldf("SELECT Gene, Tumor_Type FROM gene_tumor_types WHERE Tumor_Type = 'Diffuse Large B Cell Lymphoma'")

Wonderful! Except that… not so wonderful, it didn’t work – we’re missing data! There’s a big gotcha here. Compare the earlier LIKE query and this one. As an exercise, you might want to stop and try to figure out why (answer is in the next paragraph).

Common Issue # 2: Leading or Trailing White Space

This takes us to another common pitfall – white space! Don’t forget, an equals operator in SQL (and programming languages in general) is an exact match. “ Hello” and “Hello” do not match!

Take a look at the dataframe for gene_tumor_types  – you’ll notice many of the Tumor Types have leading whitespace. This prevents the exact match from occurring, though you will find them through like queries, which find it as a partial match. You can still them them through a LIKE query

pysqldf("SELECT Gene, Tumor_Type FROM gene_tumor_types WHERE Tumor_Type LIKE '%Diffuse Large B Cell Lymphoma'")

But that’s a hack and kind of defeats the purpose of creating a new lookup table. We should be able to get this through an equality operator. Let’s trim the whitespace from this column.

gene_tumor_types["Tumor_Type"] = gene_tumor_types["Tumor_Type"].str.strip()

And take a look at the list to see the whitespace has been removed

gene_tumor_types["Tumor_Type"]

Now retry the LIKE query and the exact match query – you’ll see that you are now retrieving all the rows.

pysqldf("SELECT Gene, Tumor_Type FROM gene_tumor_types WHERE Tumor_Type = 'Diffuse Large B Cell Lymphoma'")

Common Issue # 3: Repeated Columns

Another common spreadsheet practice is to tack on a variable number of columns to store one-to-many data relationships. Take a look toward the end (right hand side) of the spreadsheet (or ipm dataframe)

ipm

Each row has one or more citations. This spreadsheet stores the one to many relationship by creating a new column for each set.

Unfortunately, this does make it more difficult to query, since we need to know in advance how many Citations to query. Furthermore, the column headers that hold Citations beyond the first one don’t have names, making the query less informative.

For instance, not all Citations have a multiple citations. To get the citations for Gene JAK1, we’d need to write:

pysqldf("SELECT Gene, Citations, [Unnamed: 7], [Unnamed: 8], [Unnamed: 9], [Unnamed: 10], [Unnamed: 11], [Unnamed: 12], [Unnamed: 13], [Unnamed: 14] FROM ipm WHERE Gene = 'JAK1'")

This query will return all the citations for Gene “JAK1”. However, if you run this query against Gene “MPL”, you’ll receive a value of “None” for several columns. By contrast, if you run this query against Gene “MYD88”, you’ll miss a number of citations that extend out to “Unnamed: 26”.

It would be more convenient to be able to write a query like this:

pysqldf("SELECT Gene, Citation from ipm”)

And receive a separate row for each citation.

Let’s create a lookup table for Gene and Citations. There are, as always, a number of different ways to accomplish this, through SQL or pandas. In this case, we’ll use SQL with a python loop to create a “UNION ALL” query.

query = "SELECT Gene, [Tumor Type(s)], [Variant(s)], Tier, Interpretations, Citations as Citation FROM ipm WHERE Citations != 'None'"
for i in range(7, 27):
    query += (" UNION ALL SELECT Gene, [Tumor Type(s)], [Variant(s)], Tier, Interpretations, [Unnamed: {val}] as Citation FROM ipm WHERE [Unnamed: {val}] != 'None' ".format(val=i))
query += ("ORDER BY Gene")

This approach uses a python loop to build a SQL statement. The UNION ALL statement combines the results of more than one query into a single output table. In this case, we are taking querying each Gene and Citation combination and outputting the results into separate rows. You can take a look at the full query (it’s long, so I’m not posting it here, just view the output of the “query” string in jupyter or interactive python).

Let’s look at the results of this table

gene_citations = pysqldf(query)

Let’s re-run the query for JAK1 and MPL.

pysqldf("SELECT Gene, Citation FROM gene_citations WHERE Gene = 'JAK1'")
pysqldf("SELECT Gene, Citation FROM gene_citations WHERE Gene = 'MPL'")

You are now able to get clean, multiple row results from the Python dataframe through a one-to-many table relation.

We’ll tackle a few more data sources for responsiveness of various tumors and conditions to genes and variants in future posts. Stay tuned.

And, as always, if you’d like to learn more about Python or R, please consider signing up for a Software Carpentry Class or attend one of our workshops at Mission Bay!

Rails and Ember without rails-api

This is a follow up to a post I wrote a couple months ago, “Ember and Rails Scaffolding for Simple Crud”. In that post, I gave an overview for how to generate simple CRUD scaffolding for an Ember app using Rails purely as an api through the rails-api gem.

Here’s the thing… if you take the api-only approach, you by design give up the ability to write standard Rails views. This can be a good thing, of course, which is why the gem was integrated into Rails 5. If all you want is an api, you don’t want the extra weight of everything else that comes with Rails, and you always can add various gems back in as needed. But for now, you may want to preserve the ability to write a full MVC rails app while still providing an API for Ember or other single page javascript frameworks.

Fortunately, this isn’t especially difficult. The Rails side will get a little more verbose, at least the way I’m writing it, but all you need to do is ensure that your app responds to both html and json – and of course be particularly careful to make sure that you don’t mix view logic with backend logic.

So, here we go…

First, create a full rails app with basic CRUD for the User model in the previous tutorial. I’m not going to repeat the steps here since they won’t change much. The only difference here is that instead of doing this with the rails-api gem and command, you’ll now do this with traditional rails. You will still need to create serializers, add rack/cors, allow access to various HTTP actions in the Rails app, and so forth. This is all available through the previous tutorial, with one change – you don’t need to install the rails-api gem, and wherever it says “rails-api generate…”, instead just use “rails generate…”.

You should now have a fully functional rails app for CRUD operations on a User that also provides json formatting as an api. The main difference between the api for a traditional Rails app and the rails-api generated app is that the traditional rails app responds by default as html, whereas rails-api responds as json. To get a json response from the traditional rails app, you will need to append “.json” to the url – in other words, to get the list of users rendered as json rather than displayed as html, you’d need to request:

http://localhost:3000/users.json

whereas the rails-api version doesn’t require this extension, as a rails-api app by default returns json (and wouldn’t normally respond as html at all).

On the Ember side, we need to instruct the adapter to specifically request json from the Rails app, as this is no longer the default Rails response.

To accomplish this, we will modify the Ember adapter in app/user/adapter.js

import DS from 'ember-data';

export default DS.RESTAdapter.extend({
  host: 'http://localhost:3000',
  buildURL: function(record, suffix) {
        var s = this._super(record, suffix);
        return s + ".json";
    }
});

As you can see, this will append “.json” to all the requests send from Ember to Rails – even post, put, and delete requests , so you’ll need to explicitly handle the json format in any Rails controllers you intend to make available to Ember. As a result, we’ll need to modify the update and create methods in the Rails controller to specifically respond with json for Ember.

There is, inevitably, one more wrinkle – although Rails does respond by default to the “.json” extension, Ember expects a slightly different formatting, so you’ll need to make a few tweaks to get it working with Ember. Here’s the full controller code:

class UsersController < ApplicationController
  before_action :set_user, only: [:show, :edit, :update, :destroy]

  # GET /users
  # GET /users.json
  def index
    @users = User.all
    
    #render json: @users
    
    respond_to do |format|
      format.html
      format.json { render json: @users }
    end
  end

  # GET /users/1
  # GET /users/1.json
  def show
    respond_to do |format|
      format.html
      format.json { render json: @user }
    end
  end

  # GET /users/new
  def new
    @user = User.new
  end

  # GET /users/1/edit
  def edit
  end

  # POST /users
  # POST /users.json
  def create
    @user = User.new(user_params)

    respond_to do |format|
      if @user.save
        format.html { redirect_to @user, notice: 'User was successfully created.' }
        format.json { render :show, status: :created, location: @user }
      else
        format.html { render :new }
        format.json { render json: @user.errors, status: :unprocessable_entity }
      end
    end
  end

  # PATCH/PUT /users/1
  # PATCH/PUT /users/1.json
  def update
    respond_to do |format|
      if @user.update(user_params)
        format.html { redirect_to @user, notice: 'User was successfully updated.' }
        format.json { render :show, status: :ok, location: @user }
      else
        format.html { render :edit }
        format.json { render json: @user.errors, status: :unprocessable_entity }
      end
    end
  end

  # DELETE /users/1
  # DELETE /users/1.json
  def destroy
    @user.destroy
    respond_to do |format|
      format.html { redirect_to users_url, notice: 'User was successfully destroyed.' }
      format.json { head :no_content }
    end
  end

  private
    # Use callbacks to share common setup or constraints between actions.
    def set_user
      @user = User.find(params[:id])
    end

    # Never trust parameters from the scary internet, only allow the white list through.
    def user_params
      params.require(:user).permit(:name)
    end
end

You may notice some additional code in create and update. This is because we need to respond as json for Ember, which we configured to submit all requests with the .json extension (even POST and PUT requests).

At this point, you can bring up both a Rails app on port 3000 and an Ember app on port 4200 and use both a standard Rails view and the Ember client for CRUD operations on your User model.

This does require some extra overhead, but it does keep open the possibility of writing a traditional Rails app while providing an API for not just Ember but any other app that might want to consume a Rails API.

Ember and Rails Scaffolding for Simple CRUD

Like many developers who use Rails, I haven’t thought about scaffolding for a while, but I found myself wanting something like it when I started looking into Ember as a front end for Rails. As with Rails, my guess is that I won’t use Ember scaffolding for long, but I wanted a way to get that initial Ember application up and running, with Rails as a backend.

It turns out that there are easy to use scaffolding generators for an Ember front end and a Rails API backend, with just a few minor gotchas you need to be aware of when integrating the two.

For this tutorial, we’re going to do the simplest thing possible for a crud app. We will create a single model (User), with one field (name), and we will create a web app with Create, Retrieve, Update, and Delete operations. Ember will supply the front end, Rails will handle the back end.

One quick note – while there are ways to integrate Rails with Ember in the same application, this tutorial will build the Ember frontend out as a completely separate, stand-alone app from the Rails backend. This way, the front end can be integrated with any backend that provides the API that Rails provides.

Here we go…

Part 1 – Create an Ember front end

We’ll create a client side, front end application that provides an interface for CRUD operations.

Step 0 – Install ember and ember-cli.

This will also require installing node, npm. I’m pretty sure you’ll need git as well. I’m not going to cover all this (or the process for installing Ruby or Rails). There are plenty of guides on the web to getting all this installed. Just make sure you can execute the following command

ember -v

I’m using ember 1.13.8, node 0.12.6, and npm 2.13.4 (on Mac OS X).

Step 1 – Create an ember app

ember new ember-crud

Step 2 – Install the ember-cli-scaffold package

cd into ember-crud and run

ember install ember-cli-scaffold

(for more information, see https://www.npmjs.com/package/ember-cli-scaffold)

Step 3 – Generate scaffolding for a Model (User) with one field (Name)

ember generate scaffold user name:string --pod

The “pod” option creates a different organizational structure for your files than the standard ember defaults. I prefer it and will be using it here, but the differences are very minimal.

Step 4 – Verify that your app is working

ember serve

and go to http://localhost:4200/users

You should see a very Rails-ish interface with full CRUD for a user with a single input field, name. Go ahead and create, edit, delete a few to verify that it is all working.

The ember CRUD app is using a local dev storage provided by mirage. In the next step, we’ll swap this out for a Rails backend.

Part 2 – Provide a Rails API for CRUD operations

Ember is already providing the view, so rather than creating a full blown rails app, we will limit the Rails app to providing an API for persistence.

Step 0: Install Ruby and Rails

As with Ember, there are lots of resources on the web for getting Ruby and Rails installed. Make sure that you can run

ruby -v

(I’m using 2.1.0)

and

rails -v

(I’m using Rails 4.2.0. You will need this version or later for Rails to use the rails-api gem, which I believe will be included in Rails 5).

Step 1 – Create a rails API only application

Install the rails-api gem

gem install rails-api

And generate a new Rails application

rails-api new rails-crud

Step 2 – Create an API scaffold for CRUD operations for a User

cd into rails-crud and type

rails-api g scaffold user name:string

While I promised not to go into a lot of detail here, you may notice that no view tier is created, and if you look at the user controller, you’ll see that it is set up for rendering json, not html.

Step 3 – Seed with a bit of data

Technically, you don’t need to do this step, since you’ll populate from your Ember app, but it can help to verify everything is working properly on the Rails side before integrating with your Ember app.

in db/seeds.rb, create something like this

user1 = User.create(name: 'user1')
user2 = User.create(name: 'user2')

and run

rake db:migrate
rake db:seeds

Step 4 – Check out the Rails API

run:

rails server

go to localhost:3000/users

and you should see a json representation of the two users you created in the seeds.rb file.

Part 3 – Use the Rails API service as a backend for the Ember application

This is relatively straightforward, though there are a few wrinkles.

Step 1 – Modify the way Rails is rendering JSON for a list of objects

Take another look at the json returned from http://localhost:300/users

[{"id":1,"name":"user1","created_at":"2015-10-21T22:17:32.778Z","updated_at":"2015-10-21T22:17:32.778Z"},{"id":2,"name":"user2","created_at":"2015-10-21T22:17:32.783Z","updated_at":"2015-10-21T22:17:32.783Z"}]

You may notice that Rails has flattened this into a single array. Ember, by default, expects a slightly different formatting where the json array of User objects is stored in a hash with the model type as a key.

One approach to this problem is to use a serializer to establish the format for JSON output from the rails api.

Add the serializer to your Gemfile

gem 'active_model_serializers', '~>; 0.8.3'

and run

bundle update

and create a new serializer for the user model

rails g serializer user

This will create a user_serializer.rb file in app/serializers.

class UserSerializer < ActiveModel::Serializer
  embed :ids, embed_in_root: true
  attributes :id, :name
end

This code will format users the way Ember expects it at the defaults, and will include only the id and name that are expected by the Ember model we created earlier (the various Rails additions like created_at or updated_at will not be serialized and sent to Ember as JSON).

NOTE:

I recently tried this with ember-cli 2.11.0, and it looks like the formatting for JSON may have changed since I wrote this. To get this working, I had to create a new file named json_api.rb in the initializers folder containing the following code (per this tutorial from EmberIgniter).

ActiveSupport.on_load(:action_controller) do
  require 'active_model_serializers/register_jsonapi_renderer'
end

ActiveModelSerializers.config.adapter = :json_api

Once you’ve made these changes, reload http://localhost:3000/users Or go to one of the individual users http://localhost:3000/users/1 You should see the following change to the json representation

{"users":[{"id":1,"name":"user1"},{"id":2,"name":"user2"},{"id":3,"name":"user1"},{"id":4,"name":"user2"}]}

Step 2 – Tell Rails to allow Ember to use the API

For security reasons, Rails by default won’t allow an application running on a different port to access the API. To solve this, add the following to your Gemfile

gem 'rack-cors', :require => 'rack/cors'

And add the following configuration to your Rails config/application.rb file

class Application < Rails::Application

config.middleware.use Rack::Cors do
   allow do
     origins '*'
     resource '*', headers: :any, methods: [:get, :post, :put, :delete, :options]
   end
end

and run

bundle update

and restart the rails server

This is the minimal configuration to get this example working – for more information on how to allow cross-origin JSON properly check out the rack-cors documentation at https://github.com/cyu/rack-cors

Step 3 – Point Ember at the Rails API

In your Ember application, open the application.js file in app/user/adapter.js (if you didn’t use the –pod structure, this will be in app/adapters instead). You should see a line

namespace: 'api'

change this to

host: ‘http://localhost:3000’

You will also need to disable mirage so that it won’t intercept Ember communications with the rails app. In the config directory of the ember-crud application, open the environments.js file and add

if (environment === 'development') {
   ENV['ember-cli-mirage'] = {
   enabled: false
};
…

Step 4 – Turn off Ember’s cli security policy

Like Rails, ember comes with configuration defaults to protect against cross domain security problems. To get this example running quickly, you can remove this line from package.json in your Ember app.

“ember-cli-content-security-policy”: “0.4.0”,

As with other config options in this tutorial, this is something you’ll want to read about and understand rather than just disabling.

Step 5 – Verify that the changes to the Ember UI are directed to and properly handled by the Rails application

Restart ember by typing

ember serve

(you may need to stop and restart the server if it is still running), and navigate to

http://localhost:4200/users

You should see a list of the users you created in the Rails database seed.

Try adding, editing, or deleting a few users. You can verify the changes at the Rails back end by rendering the list of Users in the database as JSON by going to

localhost:3000/users

Step 6 – Fix the duplicates problem

You may have noticed that when you create a new record, two new items are added to your list – but if you look at the rails service, only one record was persisted and one of the records in the ember list has a null id. If you refresh the page, the record with the null id will disappear.

I’ve been looking around for a real solution to this. If you just want to make the null id record to disappear, you can hack it in the index route:

model: function() {
   return this.store.find('user', {id: true});
}

Postscript

I wrote this as part of my notes on getting up and running on Ember and Rails. I’ve found that if I don’t do these write ups when I’m learning something (figuring I’ll do it later when I understand it all better), there’s a good chance I’ll never do it at all. However, I figured it would be a good idea to run it by a colleague here at the UCSF CKM, Jon Johnson, who has some Ember experience. He said no problem posting his reply:


There are a couple of things you might do a little bit differently, but they aren’t wrong. I’m not sure if its a drop in but https://github.com/fotinakis/jsonapi-serializers looks like a better serializer to use in rails. Active record will work and continue to be supported, but if I were going at this from scratch I would start with that.

For Ember you might want to setup the adapter globally to talk to rails instead of just for the user model. You can do that with `ember g adapter application` It looks like that paragraph might be stuck between these two things as you reference application.js being in the user pod.

Instead of killing mirage in development you could also restart the server in production mode. Thats what we do to talk to the real API. Something like `ember s –env=production` will not go through mirage at all. I’m not sure if that is easier or harder than your way.

Always Be Collecting Feedback

As part of our ongoing efforts to get to know our users better, the UCSF Library web team decided we wanted some of that sweet, sweet microfeedback action, and so we deployed a very short satisfaction survey at library.ucsf.edu back in July.

How Are We Doing tabAnytime a patron clicked the How Are We Doing button at the bottom of every page, they were asked a simple question: How would you rate your experience today? Patrons could let us know if they were satisfied or unsatisfied with their web experience.

Regardless of their answer, they’d be prompted to go into detail and provide some demographic information, but the only required question was whether they were satisfied/unsatisfied. Our hope was that keeping the survey short and to the point, and constantly available on every page, would encourage participation.

The not-so-secret goal of this survey structure was for the web team to learn directly from our patrons where they’re having problems, so that we make improvements to the website based on their needs and not our own assumptions. Our thinking was that a user frustrated enough to leave a comment was a user we’d want to hear from.

Enough Background Already, What Did We Learn?

help-btnThe stats below are from July 7 through August 23, 2015, the day before we introduced our new Help feature. We’re excluding overlapping dates from this analysis because the Help button began competing with the How Are We Doing button for user attention (more on this below), and we wanted to base our analysis on clean data.

Of the 201 responses received during that period, 65% had a satisfactory experience at our site. Hooray!

If we drill down to the 76% of respondents who shared their demographic information, the overwhelming number of responses came from UCSF affiliates (94%), with Staff leading the way, closely followed by Faculty, then Students. It’s likely the data was skewed by the summer months, and it’ll be interesting to see if the affiliation breakdown changes now that the fall semester is in full swing.

satisfaction-by-affiliation

Patron satisfaction is awesome, but remember our not-so-secret goal was to learn why our users are unsatisfied. While only 20% of all respondents bothered to comment at all, our hypothesis about frustrated users being more likely to go into detail was correct: 87% of comments came from unsatisfied users. Hooray (really)!

Unsatisfied Users are More Likely to Comment

What’s Making our Users Unhappy?

Most of the frustration came from patrons who felt that navigating the site and finding what they needed was just too hard. 2nd prize for user frustration goes to remote access issues, with patrons expressing difficulties getting to articles using EZproxy, UC-eLinks, and VPN.

Connection errors and Library service issues (comments like you don’t have a book I want and my barcode number doesn’t work anymore) tied for 3rd place, and I was personally amused and humbled to know that 9% of the feedback was about how annoying they found the feedback survey popup window (removed after a few weeks).

Unsatisfied Users - Comments Breakdown
* If a patron gave feedback in more than one category, I picked the dominant issue.

So What’s Next?

We were fortunate to meet with some respondents, and we used their comments and other key takeaways from our satisfaction survey, the interviews we conducted with our customer-facing staff, and the LibQUAL 2013 survey to finalize the Library User Personas we’ll rely on as we make changes going forward.

We’ll keep our satisfaction survey going, but with the successful rollout of the Help feature, the time has come to modify how we ask for feedback. The How Are We Doing and Help buttons serve different purposes; unfortunately, the current design doesn’t make that distinction clear. Getting requests for help in the satisfaction survey gave us useful information before we rolled out the Help feature, but now it’s more appropriate that help me please comments get routed to the right person right away for faster customer service.

We’ll launch our more low-key request for feedback this week.

New Feedback Button
The new Feedback button will live at the bottom of every web page.

Testing in Ember.js, Part 3: mock data with ember-cli-mirage

When Last We Left Our Heroes…

The goal of automated testing is to find problems before your users do. Good tests do this by preventing bad code from being merged. A great continuous integration (CI) setup can catch problems in beta browsers and libraries in time to report them to their authors or fix your code before a release happens. By the end of this three part series you will have a great CI setup. Tests will automatically run against any browser you support and any future version of your dependencies.

In Part One, we covered using Sauce Labs and Travis CI to create your test matrix.

In Part Two, we covered testing our application against many versions of Ember.

In Part Three, we will write some acceptance tests and use the awesome ember-cli-mirage addon to provide controlled mock data to our tests and development environment.

Getting Set Up

Part Three picks up right where we left off with a working ember-cli project and build configuration. This is not a tutorial on Test Driven Development so we’re going to start out with a working example and then test it.

First setup our ember-data models:

$ cd testing-sandbox
$ ember g model fruit title:string color:belongsTo
$ cat app/models/fruit.js
import DS from 'ember-data';
export default DS.Model.extend({
  title: DS.attr('string'),
  color: DS.belongsTo('color', {async: true})
});

$ ember g model color title:string fruits:hasMany
$ cat app/models/color.js 
import DS from 'ember-data';

export default DS.Model.extend({
  title: DS.attr('string'),
  fruits: DS.hasMany('fruit', {async: true})
});

Then a simple route:

$ ember g route colors --path='/colors'

Modify app/routes/colors.js to get all the colors:

import Ember from 'ember';

export default Ember.Route.extend({
    model(){
      return this.store.find('color');
    }
});

Setup a template to list the colors and their fruits app/templates/colors.hbs

<ul>
  {{#each model as |color|}}
    <li>{{color.title}}
      <ul>
        {{#each color.fruits as |fruit|}}
          <li>{{fruit.title}}</li>
        {{/each}}
      </ul>
    </li>
  {{/each}}
</ul>

Setup ember-cli-mirage

Let’s install the ember-cli-mirage addon.

$ ember install ember-cli-mirage

Now we need to configure our basic API routes. Mirage creates a basic configuration file for you at app/mirage/config.js. We just need to add a few lines for our new models:

export default function() {
  this.get('/colors');
  this.get('/colors/:id');
  this.get('/fruits');
  this.get('/fruits/:id');
};

For each of our models we need a factory so our tests can create new data. As ember-cli-mirage matures, generators for your factories will be added. For now, you have to make them on your own. We need one for fruit and one for color.

Create the file app/mirage/factories/fruit.js:

import Mirage from 'ember-cli-mirage';

export default Mirage.Factory.extend({
  title: (i) => `fruit ${i}`,
  color: null
});

…and the file app/mirage/factories/color.js:

import Mirage from 'ember-cli-mirage';

export default Mirage.Factory.extend({
  title: (i) => `color ${i}`,
  fruits: []
});

Wow. Thats was some serious setup; take heart that we’re done now and we can finally write a test.

Finally a test!

$ember g acceptance-test colors

Add some fixture data and a test to your new file at tests/acceptance/colors-test.js:

test('visiting /colors', function(assert) {
  //turn on logging so we can see what mirage is doing
  server.logging = true;
  
  //make our first color, its gets an id of one
  server.create('color', {
    //fill this color with some fruits (they don't exist yet, thats next)
    fruits: [1,2,3,4,5]
  });
  //now lets create a bunch of fruits and link them to our color
  server.createList('fruit', 5, {
    color: 1
  });
  //want another color? - just add it.
  server.create('color', {
    fruits: [6,7]
  });
  server.createList('fruit', 2, {
    color: 2
  });
  visit('/colors');

  andThen(function() {
    assert.equal(currentURL(), '/colors');
    //this is a stupid test, but hey its a tutorial, what did you expect?
    assert.equal(find('li').length, 9);
  });
});

Yup, I wrote that test for you. This isn’t a lesson on Test Driven Development. If you want that watch “Test Driven Development By Example”. The important part here is that we create fresh testing data with every test using server from ember-cli-mirage. You can be in complete control of what is passed to your application so you can check for any condition.

Final Thoughts

We’re just about out of time and we covered a lot. You still have some test writing to do and I wish there was an addon to do that for you. Until then you can take solace in the knowledge that your testing infrastructure is a foundation you can build your reputation on.

Until next time, Internet friends: If you liked it or hated it let me know @iam_jrjohnson.

Testing in Ember.js, Part 2: ember-try and the Travis CI build matrix

Our Story So Far

The goal of automated testing is to find problems before your users do. Good tests do this by preventing bad code from being merged. A great continuous integration (CI) setup can catch problems in beta browsers and libraries in time to report them to their authors or fix your code before a release happens. By the end of this three part series you will have a great CI setup. Tests will automatically run against any browser you support and any future version of your dependencies.

In Part One, we covered using Sauce Labs and Travis CI to create your test matrix.

In Part Two, we will start testing our application against multiple versions of Ember using the excellent ember-try addon and create a Travis CI build matrix which will allow some options to fail without the entire test failing. The goal of these two improvements to our original setup is to see issues coming long before they become problems.

Getting Set Up

Part Two picks up right where we left off with a working ember-cli project and build configuration.

Let’s install the ember-try addon.

$ cd testing-sandbox
$ ember install ember-try

ember-try is a completely configurable way to test your code against upcoming versions of Ember. You may want to customize this later, but for now let’s add a file to testing-sandbox/config/ember-try.js with these contents:

/* jshint node: true */

module.exports = {
  scenarios: [
    {
      name: 'our-current',
      dependencies: {}
    },
    {
      name: 'ember-release',
      dependencies: {
        "ember": "ember#release"
      },
      resolutions: {
        "ember": "release"
      }
    },
    {
      name: 'ember-beta',
      dependencies: {
        "ember": "ember#beta"
      },
      resolutions: {
        "ember": "beta"
      }
    },
    {
      name: 'ember-canary',
      dependencies: {
        "ember": "ember#canary"
      },
      resolutions: {
        "ember": "canary"
      }
    }
  ]
};

That creates a few different build targets. The first one, our-current, is whatever version of Ember your app currently depends on, by leaving the dependencies blank it will use your current setup. The others are dynamically linked to the Ember release process for latest release, beta, and canary.

We can now run all of our tests against the beta version of Ember.js with a single command.

$ember try ember-beta test

Go ahead, give that a spin, it’s pretty great right?

You can also test everything in your config file with the command:

$ember try:testall

Automating with Travis CI

In Part One, we learned how to test against any browser and now we know how to test against any version of Ember.js. The only thing missing is a way to automate the entire process. We’re going to take advantage of Travis CI’s build matrix to organize our tests into discrete units.

We need to modify our Travis configuration to:

  1. Make a variable for the browser we are testing with
  2. Make a variable for the Ember.js version we are testing against
  3. Not start Sauce Connect unless we need it
  4. Combine all of this into a build matrix
  5. Allow some of these test combinations to fail

The new .travis.yml file looks like this:

---
language: node_js
node_js:
  - "0.12"

sudo: false

env:
  global:
    # Setup SauceLabs Credentials
    - SAUCE_USERNAME="YOUR_USER_NAME"
    - SAUCE_ACCESS_KEY="YOUR_ACCESS_KEY"
    # Some default values for our build matrix
    - START_SAUCE_CONNECT=false
    - EMBER_VERSION='our-current'
    - TESTEM_LAUNCHER='PhantomJS'

matrix:
  fast_finish: true
  allow_failures:
    - env: EMBER_VERSION='ember-beta'
    - env: EMBER_VERSION='ember-canary'
    - env: "TESTEM_LAUNCHER='SL_internet_explorer_11_Windows_8_1' START_SAUCE_CONNECT=true"
  include:
    - env: "TESTEM_LAUNCHER='SL_firefox_Windows_7' START_SAUCE_CONNECT=true"
    - env: "TESTEM_LAUNCHER='SL_internet_explorer_11_Windows_8_1' START_SAUCE_CONNECT=true"
    - env: "EMBER_VERSION='ember-beta'"
    - env: "EMBER_VERSION='ember-canary'"

cache:
  directories:
    - node_modules

before_install:
  - "npm config set spin false"
  - "npm install -g npm@^2"
 
install:
  - npm install -g bower
  - npm install
  - bower install

before_script:
  # Create a sauce tunnel only if we need it
  - if [ "$START_SAUCE_CONNECT" = true ]; then ember start-sauce-connect; fi

script:
  # run our tests against the Ember version and browser of our choice
  - ember try ${EMBER_VERSION} test --port=8080 --launch=${TESTEM_LAUNCHER} --skip-cleanup

after_script:
  # Destroy the sauce tunnel if we needed it
  - if [ "$START_SAUCE_CONNECT" = true ]; then ember stop-sauce-connect; fi

That’s it for Part Two! Tune in next week for a look at writing acceptance tests to take advantage of this setup.

If you have questions or see a mistake, tweet @iam_jrjohnson.

Updated:
5/6/15 – Changed the ember-try config to use a blank set of dependancies. Thanks @katiegengler for the suggestion.

Testing in Ember.js, Part 1

[Update] 10/28/17 – Thanks to @DanstonEvans corrected the browser string for firefox on SauceLabs.

The Big Picture

The goal of automated testing is to find problems before your users do. Good tests do this by preventing bad code from being merged. A great continuous integration (CI) setup can catch problems in beta browsers and libraries in time to report them to their authors or fix your code before a release happens. By the end of this three part series you will have a great CI setup. Tests will automatically run against any browser you support and any future version of your dependencies.

Requirements for this guide are Ember.js > 1.10 and Ember CLI > 0.2.3. It may be entirely possible to do this without Ember CLI, but I wouldn’t know how.

In Part One, we will cover using Sauce Labs and Travis CI to create your test matrix.

Getting Set Up

If you’ve never used Ember CLI before, you should follow their instructions to install all dependencies.

Now let’s create a new sandbox to play in:

$ ember new testing-sandbox
$ cd testing-sandbox
$ ember test --server

Congrats! You now have a brand new Ember.js app and running tests in both PhantomJS and Chrome. Go ahead and leave that console window open and create a new one. Tests will keep running in the original window and track all the changes we make.

$ cd testing-sandbox
$ ember g acceptance-test welcome-page

Your test console should now record a failure indicating:

✘ UnrecognizedURLError: /welcome-page

Open testing-sandbox/tests/acceptance/welcome-page-test.js in your favorite editor and make it look like this:

import Ember from 'ember';
import {
  module,
  test
} from 'qunit';
import startApp from 'testing-sandbox/tests/helpers/start-app';

var application;

module('Acceptance: WelcomePage', {
  beforeEach: function() {
    application = startApp();
  },

  afterEach: function() {
    Ember.run(application, 'destroy');
  }
});

test('we should be welcoming', function(assert) {
  visit('/');

  andThen(function() {
    assert.equal(currentURL(), '/');
    var title = find('#title');
    assert.equal(title.text(), 'Welcome to Ember.js');
  });
});

Save that and all of your tests should pass. We are ready to get started with multi-browser testing.

Test Multiple Browsers in the Cloud

Sauce Labs is a service for running your tests against a huge variety of browsers. We’re going to abstract a lot of the complexity of using Sauce Labs by taking advantage of the excellent ember-cli-sauce addon. First, you will need Sauce Labs credentials. You can start a free trial or, if your project is open source, you can sign up for Open Sauce. When you are done, take note of your user name and access key. You will need them later.

Let’s install the addon:

$ember install ember-cli-sauce

Now we can add additional browsers to our testem.json file. Testem calls these launchers:

  $ ember sauce --browser='firefox' --platform='Windows 7'
  $ ember sauce --browser='internet explorer' --version=11 --platform='Windows 8.1'

Lets run some tests!

First we have to export our sauce credentials as environment variables.

$export SAUCE_USERNAME="YOUR_USERNAME"
$export SAUCE_ACCESS_KEY="YOUR_ACCESS_KEY"

Then we fire up a proxy tunnel so Sauce Labs browsers can get to our local Ember.js server.

$ember start-sauce-connect

Then we launch the actual tests.

$ember test --launch='SL_firefox_public_Windows_7,SL_internet_explorer_11_Windows_8_1'

You should see something like:

ok 1 Firefox 37.0 - Acceptance: WelcomePage: we should be welcoming
ok 2 Firefox 37.0 - JSHint - acceptance: acceptance/welcome-page-test.js should pass jshint
ok 3 Firefox 37.0 - JSHint - .: app.js should pass jshint
ok 4 Firefox 37.0 - JSHint - helpers: helpers/resolver.js should pass jshint
ok 5 Firefox 37.0 - JSHint - helpers: helpers/start-app.js should pass jshint
ok 6 Firefox 37.0 - JSHint - .: router.js should pass jshint
ok 7 Firefox 37.0 - JSHint - .: test-helper.js should pass jshint
ok 8 IE 11.0 - Acceptance: WelcomePage: we should be welcoming
ok 9 IE 11.0 - JSHint - acceptance: acceptance/welcome-page-test.js should pass jshint
ok 10 IE 11.0 - JSHint - .: app.js should pass jshint
ok 11 IE 11.0 - JSHint - helpers: helpers/resolver.js should pass jshint
ok 12 IE 11.0 - JSHint - helpers: helpers/start-app.js should pass jshint
ok 13 IE 11.0 - JSHint - .: router.js should pass jshint
ok 14 IE 11.0 - JSHint - .: test-helper.js should pass jshint

1..14
# tests 14
# pass  14
# fail  0

# ok

Wasn’t that awesome? You just tested your code in two browsers. You can add anything you want to testem.json. Go nuts!

When you are done testing remember to kill the tunnel we opened.

$ember stop-sauce-connect

Making It Automatic with Travis CI

The last piece of this puzzle is to use Travis CI to run these tests for you every time you commit code. Update your .travis.yml file to run Sauce Labs tests. You will need to tell Travis CI what your Sauce Labs credentials are in the env section:

---
language: node_js
node_js:
  - "0.12"

sudo: false

env:
  global:
    #set these here becuase they get pulled out by testem saucie
    - SAUCE_USERNAME="YOUR_USER_NAME"
    - SAUCE_ACCESS_KEY="YOUR_ACCESS_KEY"

cache:
  directories:
    - node_modules

before_install:
  - "npm config set spin false"
  - "npm install -g npm@^2"

install:
  - npm install -g bower
  - npm install
  - bower install

before_script:
  # Create a sauce tunnel
  - ember start-sauce-connect

script:
  - ember test --launch='SL_firefox_Windows_7,SL_internet_explorer_11_Windows_8_1' --port=8080

after_script:
  # Destroy the sauce tunnel
  - ember stop-sauce-connect

You are well on your way to being a cross-browser testing hero! In my next post I will take you through using the ember-try addon to test your code against upcoming Ember.js versions.

If you have questions or see a mistake, you can use the comments here or tweet @iam_jrjohnson.

Ember CLI, Heroku, and You

Warning, this is old information and way more work than you need to do. The solution we are using now is the excellent heroku-buildpack-ember-cli.

A disclaimer: This is not for use in production. Doing this for a production app would be a bad decision.

The problem: Developers on the Ilios Project need to be able to share their changes with other team members. While it is possible to deploy a static Ember CLI app nearly anywhere, we want to include our mock API so everyone is looking at the same data.

The solution: Use Heroku to host an Ember CLI application running its built in Express server.

Continue reading Ember CLI, Heroku, and You