ALA TechSource Logo
 
curve Home spacer Publications spacer Subscribe spacer Blog spacer About  
    

How OPACs Suck, Part 2: The Checklist of Shame

Submitted by Karen G. Schneider on April 3, 2006 - 3:02pm

Karen G. Schneider head shotIn my first article in this series, I wrassled with the biggest bear in the forest: how most online catalogs lack relevance ranking. That's one big hairy bear, but as some readers pointed out, it's a little forced to pick on relevance ranking, out of the context of all the other important features most online catalogs don't offer—or are features implemented so badly that librarians disable these features rather than further confuse the poor user, who just wants to find a book or DVD, for crying out loud.

So rather than plunge into another specific feature, I'm back tracking just enough to give you the Checklist of Shame—key features common to most search engines (even the least expensive), features often missing in online catalogs. Even this is an abbreviated list; the search-engine test instrument I've developed for My Place Of Work (MPOW) is seven pages long.

I agree with Eric Lease Morgan's comment on my last piece that librarians tend to ask for esoteric features at the expense of core functionality. I continue to be surprised at the people who tell me how a catalog "should" work but haven't done a lick of user analysis, forensic, heuristic, academic, or otherwise, to back their theories.

But here's a rule of thumb: in general, if the 800-pound gorillas, such as Google and Ask.com, offer a feature (like default setting), you should mimic the gorillas and offer the same feature—and give that feature priority in your considerations. Furthermore, it's common-sense usability practice that you should offer that big-gorilla search-engine feature the way the gorillas offer it—because users will come to your catalog with user behavior learned from such search engines as Google and Ask.com. (Don't ever rely on help files to "teach" people. In last year's usability testing at MPOW, the only person who read our help files, out of a group of techies, librarians, and academics, was the 25-year-old soccer mom.)

I also list features used primarily by aficionados. This group—ranging from in-house librarians to information super-users—can be influential, and they are often engaged with your catalog at a level that can prove hugely informative. So many search engines support aficionado features that it's easy enough to support their preferences. Furthermore, in my experience, aficionados will also tell you when an esoteric feature is completely pointless, even for them. Just don't let the aficionado input drown out common-sense decisions.

Features Your OPAC Wishes It Had

  • Relevance ranking—As I explained earlier, on TF/IDF (term frequency/inverse document frequency), relevance rank is the essential building block to ensure the most likely search results rise to the top. Every search engine on the planet relies on relevance ranking. Many online catalogs don't offer it ("system sorted," anyone?) or implement it bizarrely. (I agree with comments that relevance ranking and online catalogs can be hard to do well, but I disagree that adding relevance ranking cannot be done at all; the NCSU catalog makes that clear.)


  • Stemming—To steal from a couple of good Web definitions, stemming is "a method by which Search Engines associate words with prefixes and suffixes to [a] word stem to make the search broader," such as returning the same results for "applies, applying, and applied."

    After relevance ranking, stemming is arguably one of the most important search features for an online catalog, where search success hinges precipitously on searching the relatively scanty metadata of MARC records. Yet even huge search engines (such as Google) with the luxury of massive amounts of full text to improve matching, use stemming. (I've watched Google turn stemming off and on and tinker with it—clearly they think about stemming a lot.)


  • Field weighting—First runner-up for second most important feature in a search engine. You can tweak field weighting to give more or less prominence to fields. For example, titles are often given more importance, allowing the first few hits for the search term million to retrieve books with million in the title.


  • Spell-checking—Essential, not because people are dumb, but because people make mistakes. If anyone gets snobby with you when you bring up spell-check, just tell them Jane Austen was a notoriously bad speller; she misspelled one of her teenage works as “Love and Freindship.” (Thank goodness for edditors editors.)


  • Refining original queries—If you type in a term such as butterfly, after viewing the results, you may want to tweak that search to add a term such as conservation. A good search engine will present the search terms in the search box or otherwise make it very easy to view and modify the original search.


  • Support for popular query operators—For example, supports + and – for "required" and "not." It's also okay to offer older query operators, such as and, for backward compatibility to people who have been searching your catalog since Melvil Dewey was a circ clerk, but those older query operators are not substitutes for what people are using today. For that matter, things change over time, so the ability to add a new query operator synonym is valuable.


  • The Boolean bag o' goods—Can the search engine support quoted searching ("declaration of independence"), wildcard searching (appl*), proximity searching (cheese near cheddar), or give preference to case (AIDS versus aids)? Most people don't use these features, but your aficionado users will look for them, and nearly all search engines, even the entry-level products, offer these features. Any vendor who moans these are difficult and expensive to offer is blowing smoke in your ear.


  • Flexible default query processing—Basically, can you decide that search results will be "anded" (meaning that all terms must be matched) or "orred" (meaning that any term must be matched)? Google changes its features over time, but Google's settings might not be the best choice for your catalog (something to keep in mind if you evaluate the Google Appliance). You'll only know through usability testing, and the search engine shouldn't make that decision for you.


  • In-line query limiters—The ability to search in-line by a field, the way in Google you can limit your searches, for example, with site: edu. This is a capability that will be used by a tiny fraction of your users. I wouldn't trade it for relevance ranking and field weighting, but then, every search engine I've evaluated this spring offers this feature. Extra credit for being able to select and label the limiters any way you want.


  • Duplicate detection—This is an interesting search-engine feature to discuss for online catalogs. It raises the issue of FRBR (pronounced FER-ber)—Functional Requirements for Bibliographic Records—which is, to be grossly reductive, duplicate management for online catalogs, so that a user isn't stumped by five records for what is essentially the same item. But in a search engine, duplicate detection simply flags multiple records for the same item and ideally gives you control over how to handle search results when duplicates are detected.


  • Sort flexibility—You don't want to overwhelm users with options for sorting search results, but can you at least offer them the capability to switch between relevance and date? Also, can you offer other sorting that might be a nice local option (the way some store Web sites offer sorting by price or user rating)? Even more crucially, can you control where the search engine pulls its date information—ensuring that the indexed "date" comes from a locally controlled field, rather than simply the HTTP header?


  • Character sets—Although most search engines offer flexible support for other languages, many online catalogs can barely handle one character set. I recently observed ALA Council debating a resolution on non-Roman characters in online catalogs that was ultimately shot down because it didn't come from ALCTS—a classic example of NIH (Not Invented Here). Forget ALA subcommittees: the pressure needs to come from you, gentle reader.


  • Faceting—This is a "21st-century search engine" feature that some search engines grew up around and that older search engines are scrambling to add. Faceting manipulates search results to make it easy to browse by category. Search the NCSU catalog for the phrase civil war, and browse by LCSH or publisher; search Landsend.com with the term pants, and see choices arranged by size, cost, and other metadata.

    Avi Rappoport, search guru extraordinaire, explains faceting thoroughly in www.searchtools.com/info/faceted-metadata.html. Online catalogs offer such rich metadata that it's a shame not to offer faceting.

  • Advanced search—My favorite chimera! In most search engines, most notably Google (www.google.com/advanced_search?hl=en ), the "advanced search" page is largely a "junior" search page that walks the user through fielded and Boolean searches. At MPOW, we shamelessly stole their page for our own (http://lii.org/pub/htdocs/adv_search_home.htm). There's nothing wrong with that, and the "advanced search" page can be convenient place to offer popular date-searching options or other nice tweaks. But users should be able to perform most truly advanced features through inline operators in the search engine's basic search box, so that the handful of hopeless nerds like me who think it's bang-up fun to do a search such as wine-cheese site: edu won't have to plod through a fielded page to do so.


  • Easily customized search-result pages—The word easily should be understood to refer to people with respectable HTML skills, not to people who pay people to do that kind of work (for about the same reason I don't give myself root canals). Still, good search engines provide strong templating systems for developing search-results pages that integrate well with your overall design. Extra credit for default templates that validate to published HTML standards and meet Priority 2 accessibility requirements.


  • Human suggestions (also called "best bets," etc.)—Can you force an item to the top of search results? (Can you then charge publishers for premium results? Just kidding, just kidding…) This smart discussion of best bets www.steptwo.com.au/papers/cmb_bestbets/ has a great screen capture of this feature in action. Best bets are particularly nice when you have good search analysis to indicate what people are searching for most frequently, which brings up…


  • Search logging and reports—You need to know what's working for your users and what isn't. Your basic transaction logs (how many hits to the server and where the hits come from) aren't adequate for this. A good search engine will, at minimum, log top queries by frequency and top queries with no hits. Also look for trend reports you can use to tweak the search engine, for example, by adding terms to records to make them more findable (the way I saw librarians add Brokeback Mountain to the notes field for records for Annie Proulx's short story collection, Close Range www.worldcatlibraries.org/wcpa/ow/e4d1df37de10d114a19afeb4da09e526.html


  • Well-rounded administrative interface—Does every tweak to the search engine require begging some techie to tweak a feature, observing the results, and then begging some more until it's right? Are the search engine's features hidden in largely undocumented mystery meat? Is it impossible to determine the settings at a glance, or at least through intelligent perusal of the administrative section? (Yes, this is a roman à clef…one of several drivers in our search for a better search engine at MPOW.)

These are just the high notes of search functionality, and it doesn't cover how well, or badly, vendors provide these features (or how well or badly customers implement them)—topics I'll tackle in future sections in this series. After all that, this checklist doesn't address the much more difficult problem to solve: the sparse, hard-to-search nature of citation indexes. People are now accustomed to full-text searching. Can we make them like an OPAC, no matter how much we fix its search functions?

But think about your own catalog: are these features available? It may well be, as some users wrote me privately, that the OPAC (as separate software purchased by local libraries) is near death's door. I think that's very likely. But if so, anything else we use for a catalog—who's betting on Open WorldCat?—will need good search functionality as well, or it too will suck, only more consistently and on a much larger scale. In the end, as uber-librarian and user champion Marvin Scilken told me many times, the bottom line is public service.Technorati tags: library, library catalog, library catalogs, Online catalogs, OPAC

Posted in

Comments (5)

If the OPAC was easy enough

If the OPAC was easy enough to use, simple, and logical, then help files would not be needed. A ten-year old can use Google without any help files because it is easy to use, simple, and logical. Why do libraries insist on building complexity into OPACs? Is it to give others the impression that what we do is important, and we are scholarly and educated? Rather, it has the opposite effect. Google and Yahoo are used by millions not because it is complex!

I agree with a lot of what

I agree with a lot of what Karen says regarding the functionality an OPAC should provide. But I think the title '...how OPACs suck...' is a little misleading.

I believe that part of the issue with OPACs is that libraries don't spend enough time thinking about, learning, or managing their OPAC after they buy into a product. Current generation OPACs from the major vendors (such as Sirsi and Triple I) offer many ( not all) of the features that Karen suggests: sorting options, boolean searching, searching logging, (primitive) relevance and 'most popular' rankings. And other cool things like dynamic list generation patron bibliographies, and enriched content. I do feel that Karen's exercise of identifying the functional needs of an OPAC is important and constructive. And I'm not suggesting that any vendor's OPAC is perfect. But I do feel that there is a lot of room for libraries to better maximize the benefit that they're getting from their existing OPACs. (And, as importantly, doing more in terms of competitive analysis *before* buying into a product. Libraries, or anyone, really, should complain about a product before they buy it, not after.)

My feeling is, it's ok to say 'it sucks' (about your OPAC, or OPACs in general), if you're willing to then do some work to make it better (while, of course, still challenging your vendor to create the product you 'really' want).

Oops, I forgot to close an

Oops, I forgot to close an somewhere.

I couldn't agree more with

I couldn't agree more with most of your comments and am going to bring them to my Director's attention, however, I disagree with your comment about help pages:

(Don't ever rely on help files to 'teach' people. In last year's usability testing at MPOW, the only person who read our help files, out of a group of techies, librarians, and academics, was the 25-year-old soccer mom.)

First of all, 'techies, librarians, and academics' are notorious know-it-alls (I'm a librarian so I know that to be true). Obviously, they don't need to use a help page. The fact a 'soccer mom,' i.e. an average citizen, used the help pages is not a reason to dismiss them, but rather is a reason to embrace them.

I work at a small public library and here are some unscientific statistics from google analytics. Last week our main database page was visited 163 times with 286 page views. This includes librarians using the databases for reference etc. We have also made a general help page and some help pages for the various database vendors. Both our faq page and our proquest page were visited 3 times with 3 page views and a 0% exit rate. Our ebsco help page was visited twice but had seven page views. In other words the patron came back to the page for help more than once. It had an exit rate of 14%.

I'm not saying our help pages are particularly good. I am saying they are being used. Keep in mind, too, that repeat users of the databases won't necessarily return to the help pages. I think that if we have two patrons a week that start using our databases and are encouraged to do so by our help pages, that is pretty good.

Yay! A list I can love. It

Yay! A list I can love. It can become painfully clear when weighting isn't being done properly. In the catch all search title words should still be given a heavy weight, even if you do have a separate title search. I'd say this provides a vital role in relevancy ranking as well due to the small amount of words in the surrogate docs. I'd probably include it in relevancy ranking.

I'd just elaborate a little more on 'Refining original queries' to suggest that could also include thesaurus-driven suggestions. I can't remember now but I've seen some good research indicating people like the suggestions after a search but not necessarily the idea of an 'expanded search' ( a search using all thesaurus terms).