Academic libraries offer many resources, but users cannot be expected to search in, say, a half dozen different interfaces to find what they’re looking for. So academic libraries typically offer federated search.
Sometimes, a solution is purchased. Many libraries, for example, use 360 Search.
Here at UCSF, we are among the libraries that have built our own federated search. Twice.
There are (at least) three ways to pull data out of other resources in real time.
Cool, they have an API for that!
This almost never happens.
I will screen-scrape the #*%!?@ out of your website!
This is by far the most common scenario.
An edge case, but one that is becoming more important all the time. Make friends with PhantomJS to scrape these sites.
When trying to implement these solutions, a common scenario is to build your screen-scraping federated search tool with traditional server-side languages like Java or PHP.
These strategies and technologies bring with them pitfalls to be avoided. Recompiling your WAR every time one of your target systems modifies their HTML layout, anyone?
Here’s another pitfall: Our group built a solution years ago (our first one) that is implemented in Drupal with no external facing API. So, if I want to experiment with a different results interface, I need to write it in Drupal. This tight coupling prevents experimentation with other technologies or things that don’t fit neatly into the Drupal paradigm.
A lot of the pitfalls can be avoided by following sound software architecture principles. But one thing should be uncontroversial:
That is our approach this second time around.
First, we wrote a pluggable, extensible federated search tool called Amalgamatic.
Second, we wrote the plugins that we needed to search the resources we were interested in:
- Millennium OPAC
- Drupal 6 (to include our own website contents in the search results)
- our curated list of databases
Lastly, because we could, we used Browserify to create a demo showing how to use Amalgamatic so that all the retrieval and processing happens in the browser—no need for an intermediary API or search server! (source code)
(Or tell me about your project that already does this better and I need to fold up shop or at least steal all your ideas.)