Production Packer Passwords: Securing the Root User

Packer is a tool for creating identical machine images for multiple platforms from a single source configuration. It is mostly used to create base images for developers using Vagrant. However it’s just as useful for creating virtual machine (VM) images to deploy in production. Using Packer in this way, you can create a consistent starting point for VMs which are then provisioned further with, for example, Puppet or Chef, creating a ready-to-deploy image with your application already installed.

One minor headache for using Packer in this way is how to safely create a root account with a known password without exposing that password in configuration files.

The key to this process is hooking into the scripted install process. For Debian this is known as Preseed. Redhat calls it Kickstart. Most Packer VMs are built with some kind of Preseed/Kickstart file.

Continue reading Production Packer Passwords: Securing the Root User

See your failures: Taking screenshots with PhantomJS running Behat tests

PhantomJS is a headless browser which, when combined with Behat, can run the same tests you can run in a browser like Chrome or Firefox. Headless testing is faster because it doesn’t actually render anything for a user. This makes it ideal for rapid test driven development.  

A downside to headless testing is that when it fails it can be hard to see exactly why.  Figuring out whether it is your code or your test that is causing the issue can be extremely frustrating and you might find yourself asking; Why can’t I just see it?  The frustration mounts when the same test runs just fine via Selenium in a GUI browser.

OK, was that enough buildup?  Are you ready for the good stuff?  Right then, here we go.

Continue reading See your failures: Taking screenshots with PhantomJS running Behat tests

Online Learning and Dispersed Teams

I recently started taking my first online course. In the course, one must work on projects with other students. Members of our work group are not always online at the same time. The level of communication and coordination needed is vastly greater than would be if a group gathered to work together in person.

The similarities between working on a software project with a geographically dispersed team are obvious.

Ken Haycock’s talk on working in teams is a tour de force. It includes strategies for clarifying goals, dealing with the stages of team growth, and navigating dysfunctions.

It takes a great deal of tact and care to deal with conflict in person.  It takes much more tact and care to navigate troubled waters in a project online.

For me, Haycock pulled together everything I had learned about teams and more into one 50 minute talk.  The most salient point for me was when he described the student in the team who did all the work, and, as it was being turned in, said “… and of course, I did all the work”.

Haycock describes how one might almost want to say in reply “Well, more fool you!”  The importance of negotiating standards at the start was the point he was making.  Unstated performance expectations must be brought to the fore.  If someone wants to get an A+, someone else is ok with a B-, and they make it clear from the start, then a source of tension is resolved before it gets too late to do anything about it.  To work in a team is to have everyone contribute fairly, and to get the benefit of the combined wisdom and experience of the group.

Haycock also talks about conflict in teams.  Conflict, unfortunately, seems to be an integral part of becoming a team that performs well. It’s the storming of the forming, storming, norming and performing stages of team development.  Yet conflict is all the more difficult to deal with when you can’t see the non-verbal cues of the participants.

In her talk, The Monster Inside Library School: Student Teams, Enid Irwin talks about her survey on the worries that online students have.  These worries include having nothing to offer, getting things wrong, others taking control, and others not contributing fairly.

Such worries can stem from a perceived lack of control of time and grades, and a lack of enthusiasm or trust for teamwork.  Such feelings on the part of everyone can have a negative feedback effect.  The antidote?  Enthusiasm!

The point is to gain experience working in teams in a world where cross-functional geographically-dispersed teamwork is more important than ever. Irwin talks about how a good attitude can make a team a success.  Staying silent or stubborn, on the other hand, can be disastrous.

Irwin’s talk highlights the importance of engaged participation. She discusses the different types of teamwork and characterizes what success would look like for each:

  • teams that solve problems present a consensus if successful
  • teams that set policies will be a success if the policies stand on their own after the team has dispersed
  • teams that build a product will be successful if the product that is built is tested and meets requirements

To perform well and achieve success for their team, team members must first of all bring a positive attitude and participate!

It won’t be easy to put in practice all the good advice distilled in the resources above, but a dedication to participation and an eagerness to meet the challenge will go a long way toward doing so.  I look forward to learning more in practice, for that is what it takes: practice.

It’s All About Audience

Back in February, the new Web Projects Team made known our purpose and guiding principles. All of that still holds true, but we realized that “Support education and meet the research needs of our users regardless of location or device” might need some clarification. UCSF is a somewhat unique academic institution having more staff than students and no undergraduates, among other things. So who is the primary audience that the library supports?

Primary Audiences served by the Library

  1. Teaching faculty
    • usually also involved in clinical research or practice or basic science research
  2. Students in degree programs
    • professional students in medicine, pharmacy, nursing, and dentistry
    • graduate students in basic science
    • graduate students in social sciences, nursing, and history
  3. Researchers in basic science or clinical medicine
    • faculty
    • postdocs
    • PhD students
    • lab managers/research staff

Notice that there is a fair amount of overlap between audiences with some people wearing multiple hats.

Of course there are others who use the Library too, for example, alumni, the public, visitors, Library staff, outside librarians, etc. They can all still benefit from parts of our site, but their needs will not drive decisions about how to structure our web pages and services. Ultimately, everything about the UCSF Library web should make it easier and more intuitive for the three audiences listed above to meet their research and education needs. All else is secondary, though not necessarily unimportant.

UCSF by the numbers

To define these audiences, we began by simply consulting the counts already provided by UCSF. However, those completely ignore Lab Managers and Research Assistants who have many of the same library needs as postdocs. There are also other staff members who do a lot of legwork for faculty, and therefore, reflect the library needs of faculty even though they are not counted as such. And if you talk about “students,” you must realize that the library needs of a medical student are completely different from those of a social sciences PhD. This means that the numbers are a rough estimate for our purposes.

These less obvious realities were gleaned from talking to people. The Library already tends to focus a lot on the Service Desk and subject liaisons when thinking about user interactions. To balance that, we decided to interview a variety of other library employees who act as liaisons to various user segments with library needs. A big thank you goes out to these individuals who took the time to share their super-valuable insights about user work patterns, language, and challenges!

  • Megan Laurance on basic science researchers
  • Art Townsend on Mission Bay users
  • Ben Stever and Kirk Hudson on Tech Commons users
  • Polina Ilieva and Maggie Hughes on researchers of special collections and archives
  • Dylan Romero on those who use multimedia stations and equipment and the CLE

A few other sources of insight came from meetings of the Student Advisory Board to the Library, LibQual feedback, and the Resource Access Improvement group.

We also came to the conclusion that it is helpful to think about users in terms of what they DO rather than by title alone. It’s the nature of their work that really defines their needs regarding library support. Once again the numbers are a rough estimate, but the segmentation they reveal is still helpful.

Library Users

Next Steps

The Web Projects Team will continue to make iterative improvements to the Library web presence, some small and some larger, driven by our now established Purpose and Guiding Principles and through the lens of our primary audiences.

We will also be regularly checking feedback from end users via usage statistics and quick user tests, and that will in turn, drive further improvements. In addition, we’ll continue to share about the evolution of the Library web and improvements to the user experience. If you have questions or comments on any of this, we’re all ears!

photo credit: Reuver via photopin cc

Solr/Blacklight highlighting and upgrading Blacklight from 5.1.0 to 5.3.0

Last week, I ran into a highlighting issue with Blacklight where clicking on a facet results in the blanking out of the values of the fields with highlighting turned on.  I debugged into the Blacklight 5.3.0 gem and found that in document_presenter.rb, it displays the highlight snippet from Solr response highlighting.  If nothing is returned from Solr highlighting, then it returns null to the view.

when (field_config and field_config.highlight)
   # retrieve the document value from the highlighting response         @document.highlight_field(field_config.field).map { |x| x.html_safe } if @document.has_highlight_field? field_config.field

This seemed strange to me because I couldn’t always guarantee that Solr returned something for the highlighting field.  So I posted to the Blacklight user’s group with my question.  I got a response right away (thank you!) and it turns out Blacklight inherits Solr’s highlighting behavior.  In order to always return a value for the highlighting field, an hl.alternateField is needed in the Solr configuration.

Here’s my code in the catalog_controller.rb that enables highlighting:

configure_blacklight do |config|
    ## Default parameters to send to solr for all search-like requests. See also SolrHelper#solr_search_params
    config.default_solr_params = { 
      :qt => 'search',
      :rows => 10,
      :fl => 'dt pg bn source dd ti id score',
      :"hl.fl" => 'dt pg bn source',
      :"f.dt.hl.alternateField" => 'dt',
      :"f.pg.hl.alternateField" => 'pg',
      :"f.bn.hl.alternateField" => 'bn',
      :"f.source.hl.alternateField" => 'source',
      :"hl.simple.pre" => '',
      :"hl.simple.post" => '',
      :hl => true
    }

...

    config.add_index_field 'dt', :label => 'Document type', :highlight => true
    config.add_index_field 'bn', :label => 'Bates number', :highlight => true
    config.add_index_field 'source', :label => 'Source', :highlight => true
    config.add_index_field 'pg', :label => 'Pages', :highlight => true

 

Another issue I ran into was upgrading from Blacklight 5.1.0 to 5.3.0. It does have an impact on the solrconfig.xml file.  It took me a bit of time to figure out the change that’s needed.

In the solrconfig.xml that ships with Blacklight 5.1.0, the standard requestHandler is set as the default.

<requestHandler name="standard" default="true" />

This means if the qt parameter is not passed in, Solr will use this request handler.  In fact, with version 5.1.0, which request handler is set as default is not important at all. In my solrconfig.xml, my own complex request handler is set as default and it did not cause any issues.

But in 5.3.0 the search request handler must be set as the default:

<requestHandler name="search" default="true">

This is because Blacklight now issues a Solr request like this:[Solr_server]:8983/solr/[core_name]/select?wt=ruby. Notice the absence of the qt parameter. The request is routed to the default search request handler to retrieve and facet records.

On Metrics

Collecting metrics is important. But we all know that many metrics are chosen for collection because they are inexpensive and obvious, not because they are actually useful.

(Quick pre-emptive strike #1: I’m using metrics very broadly here. Yes, sometimes I really mean measurements, etc. For better or for worse, this is the way metrics is used in the real world. Oh well.)

(Quick pre-emptive strike #2: Sure, if you’re Google or Amazon, you probably collect crazy amounts of data that allow highly informative and statistically valid metrics through sophisticated tools. I’m not talking about you.)

I try to avoid going the route of just supplying whatever numbers I can dig up and hope that it meets the person’s need. Instead, I ask the requester to tell me what it is they are trying to figure out and how they think they will interpret the data they receive. If pageviews have gone up 10% from last year, what does that tell us? How will we act differently if pageviews have only gone up 3%?

This has helped me avoid iterative metric fishing expeditions. People often ask for statistics hoping that, when the data comes back, it will tell an obvious story that they like. Usually it doesn’t tell any obvious story or tells a story they don’t like, so they start fishing. “Now can you also give me the same numbers for our competitors?” “Now can you divide these visitors into various demographics?”

When I first started doing this, I was afraid that people would get frustrated with my push-back on their requests. For the most part, that didn’t happen.

Instead, people started asking better questions as they thought through and explained how the data would be interpreted. And I felt better about spending resources getting people the information they need because I understood its value.

Just like IT leaders need to “consistently articulate the business value of IT”, it is healthy for data requesters to articulate the value of their data requests.

Headless JavaScript Testing, Continuous Integration, and Jasmine 2.0

Earlier this month, my attention was caught by a short article entitled “Headless Javascript testing with Jasmine 2.0” by Lorenzo Planas. Integrating our Jasmine tests on Ilios with our Travis continuous integration had been on my list of things to procrastinate on. The time had come to address it.

After integrating Lorenzo’s very helpful sample into our code base, we ran into a couple of issues. First, the script was exiting before the specs were finished running. The sample code had a small number of specs to run so it never ran into that problem. Ilios has hundreds of specs and seemed to exit after running around 13 or so.

I patched the code to have it wait until it saw an indication in the DOM that the specs had finished running. Now we ran into the second issue: The return code from the script indicated success even when one of the specs failed. For Travis, it needed to supply a return code indicating failure. That was an easy enough patch, although I received props for cleverness from a teammate.

I sent a pull request to the original project so others could benefit from the changes. Lorenzo not only merged the pull request but put a nice, prominent note on the article letting people know, even linking to my GitHub page (which I then hurriedly updated).

So, if you’re using Jasmine and Travis but don’t have the two yet integrated, check out Lorenzo’s repo on GitHub and stop procrastinating!

Working with Blacklight Part 3 – Linking to Your Solr Index

We are using Blacklight to provide a search interface for a Solr index.  I expected it to be super straightforward to plug in our Solr index to the Blacklight configuration.  It wasn’t quite the case! Most of the basic features do plugin nicely, but if you use more advanced Solr features (like facet pivot) or if your solrconfig.xml differs from the Blacklight example solrconfig.xml file, then you are out of luck.  There is not currently much documentation to help you out.

SolrConfig.xml – requestDispatcher

After 3.6, Solr ships with <requestDispatcher handleSelect=”false”> in the solrconfig.xml file.  But Blacklight works with <requestDispatcher handleSelect=”true”>, and passes in the parameter qt (request handler) explicitly .  An example of a SOLR request sent by Blacklight looks like this: http://example.com:8983/solr/ltdl3test/select?q=dt:email&qt=document&wt=xml.

/select request handler should not be defined in solrconfig.xml. This allows the request dispatcher to dispatch to the request handler specified in the qt parameter. Blacklight, by default, expects a search and a document request handler (note the absence of /).

We could override the controller code for Blacklight to call our request handlers.  But a simpler solution is to update the solrconfig.xml to follow the Blacklight convention.

The ‘document’ Request Handler and id Passing

Blacklight expects there to be a document request handler defined in the solrconfig.xml file like this:

<!-- for requests to get a single document; use id=666 instead of q=id:666-->   
<requestHandler name="document" class="solr.SearchHandler">
    <lst name="defaults">
        <str name="echoParams">all</str>       
        <str name="fl">*</str>       
        <str name="rows">1</str>       
        <str name="q">{!raw f=id v=$id}</str> <!-- use id=666 instead of q=id:666 -->
    </lst>
</requestHandler>

As the comment says, Blacklight will pass in the request to SOLR in the format of id=666 instead of q=id:666.  It achieves this by using the SOLR raw query parser.  However, this only works if your unique id is a String.  In our case, the unique id is a long and passing in id=666 does not return anything in the SOLR response.

There are two ways to solve this issue.  The first is to rebuild the index and change the id type from long to String.  The other is to override solr_helper.rb to pass in q=id:xxx instead of id=xxx.  And the code snippet is below.

  require "#{Blacklight.root}/lib/blacklight/solr_helper.rb"
  module Blacklight::SolrHelper
  extend ActiveSupport::Concern

  # returns a params hash for finding a single solr document (CatalogController #show action)
  # If the id arg is nil, then the value is fetched from params[:id]
  # This method is primary called by the get_solr_response_for_doc_id method.
  def solr_doc_params(id=nil)
    id ||= params[:id]
    p = blacklight_config.default_document_solr_params.merge({
      #:id => id # this assumes the document request handler will map the 'id' param to the unique key field
      :q => "id:" + id.to_s
    })
    p[:qt] ||= 'document'

    p
  end
end

Getting Facet Pivot to work

In our index, we have a top-level facet called industry and a child facet called source that should be displayed in a hierarchical tree.    It should look something like:
Screen Shot 2014-03-04 at 2.48.23 PM

The correct configuration is in the code snippet below.

#Industry  
    config.add_facet_field 'industry', :label => 'Industry', :show => false
# Source
    config.add_facet_field 'source_facet', :label => 'Source', :show => false
#Industry -> Source
    config.add_facet_field 'industry_source_pivot_field', :label => 'Industry/Source',   :pivot => ['industry', 'source_facet']

You must add the two base fields  (Industry and Source) to the catalog_controller.rb file and set :show => false if they should not be displayed.  And it usually is the case since the data is already displayed in the pivot tree.  The current documentation on Blacklight facet pivot support makes it seem like only the last line is needed.  But if only the last line is defined, then the facet pivot will render correctly in the refine panel and it makes you think that facet pivot is working OK. But when you click on the facet, you will get an error, “undefined method ‘label’ for nil:NilClass”:

Screen Shot 2014-03-04 at 2.38.24 PM

Improving Code Quality: Together. Like We Used To. Like a Family.

We had a day-long session of hacking trying to improve the code quality and test coverage of Ilios.

This post is clearly not a step-by-step instruction manual on transforming an intimidatingly large pile of spaghetti code into a software engineering masterpiece. I hope video is posted soon of Dan Gribbin’s jQuery Conference presentation from last month to see what wisdom and techniques I can steal acquire.

In the meantime, here are a few small things we learned doing the first session.

  1. Give everyone concrete instructions ahead of time regarding how to get the app set up and how to get all the existing tests running. Have new folks arrive early for a get-setup session. This allows new folks to hit the ground running and let’s experienced folks begin coding or help people with interesting problems, rather than help people just get started.
  2. Decide on a focus ahead of time. Try to fix one class of bugs, write one type of test, complete coverage for one large component, or whatever. This allows for more collaboration as people are working on similar things.
  3. Do it often or you lose momentum! I suspect that weekly is too often. We’re trying once every two-to-three weeks.

P. S. If you recognized the reference in this post’s title, then clearly you have all the skills required to work here. We’re hiring a Front-End Engineer and the job is so good that HR pasted the description twice into the ad. Submit your résumé and knock our socks off.

Running Behat Tests via Sauce Labs on Travis-CI

We use Behat for testing the Ilios code base. (We also use PHPUnit and Jasmine.) We started out using Cucumber but switched to Behat on the grounds that we should use PHP tools with our PHP project. Our thinking was that someone needing to dig in deep to write step code shouldn’t have to learn Ruby when the rest of the code was PHP.

We use Travis for continuous integration. Naturally, we needed to get our Behat tests running on Travis. Fortunately, there is already a great tutorial from about a year ago explaining how to do this.

Now let’s say you want to take things a step further. Let’s say you want your Behat tests to run on a variety of browsers and operating systems, not just whatever you can pull together on the Linux host running your Travis tests. One possibility is Sauce Labs, which is free for open source projects like Ilios.

Secure Environment Variables

Use the travis Ruby gem to generate secure environment variable values for your .travis.yml file containing your SAUCE_USERNAME and your SAUCE_ACCESS_KEY. See the heplful Travis documentation for more information.

Sauce Connect

You may be tempted to use the Travis addon for Sauce Connect. I don’t because, using the addon, Travis builds hang (and thus fail) when running the CI tests in a fork. This is because forks cannot read the secure environment variables generated in the previous step.

Instead, I check to see if SAUCE_USERNAME is available and, if so, then I run Sauce Connect using the same online bash script (located in a GitHub gist) used by the addon provided by Travis. (By the way, you can check for TRAVIS_SECURE_ENV_VARS if that feels better than checking for SAUCE_USERNAME.)

The specific line in .travis.yml that does this is:

  - if [ "$SAUCE_USERNAME" ] ; then (curl -L https://gist.github.com/santiycr/5139565/raw/sauce_connect_setup.sh | bash); fi

Use the Source, Luke

Now it’s time to get Behat/Mink to play nicely with Sauce Labs.

The good news is that there is a saucelabs configuration option. The bad news is that, as far as I can tell, it is not documented at the current time. So you may need to read the source code if you want to find out about configuration options or troubleshoot. Perhaps it’s intended to be released and documented in the next major release. Regardless, we’re using it and it’s working for us. Enable it in your behat.yml file:

default:
    extensions:
        Behat\MinkExtension\Extension:
            saucelabs: ~

Special Sauce

We keep our Behat profile for Sauce in it’s own file, because it’s special. Here’s our sauce.yml file:

# Use this profile to run tests on Sauce against the Travis-CI instance
default:
    context:
        class: "FeatureContext"
    extensions:
        Behat\MinkExtension\Extension:
            base_url: https://localhost
            default_session: saucelabs
            javascript_session: saucelabs
            saucelabs:
                browser: "firefox"
                capabilities:
                    platform: "Windows 7"
                    version: 26

Note that we configured our app within Travis-CI to run over HTTPS. In a typical setting, you will want the protocol of your base_url to specify HTTP instead.

Here’s the line in our .travis.yml to run our Behat tests using the Sauce profile:

  - if [ "$SAUCE_USERNAME" ] ; then (cd tests/behat && bin/behat -c sauce.yml); fi

Of course, if you’re using a different directory structure, you will need to adjust the command to reflect it.

That’s All, Folks!

I hope this has been helpful. It will no doubt be out of date within a few months, as things move quickly with Behat/Mink, Sauce Labs, and Travis-CI. I will try to keep it up to date and put a change log here at the bottom. Or if a better resource for this information pops up, I’ll just put a link at the top. Thank you for reading!