Turning a Document Search Engine into a Product Search Engine (Part 1)

Quite often, people install an advanced, sophisticated search engine on their e-commerce site, and are heartbroken to discover that it returns awful results.  Chances are that’s because it’s a document search engine, not a product search engine.  In this article, I’ll explain what the difference is, and how to remedy the problem.

Document Search Engines

Most search engines are designed to search documents like web pages.  The methods they use to determine which results to show are often more simple than you might think — usually just counting the number of words in a document that match the search query, and returning the documents with the most number of matches.

Web pages are a lot different than the structured data most e-commerce sites use.  For instance, many document search engines have put a lot of work into improving relevancy through strategies like TF/IDF, which stands for term frequency/inverse document frequency.  This is a clever way of finding relevant search results from a big set of documents.

Term frequency is a measure of how many times a term (that is, a word in a document) is used.  If you’re searching for a document about platypuses, chances are that a document that uses the word “platypus” 10 times is going to be more relevant than a document that only uses it once.

The problem with using term frequency alone as a relevance measure is that certain common words (like “is”) occur very frequently.  Inverse document frequency gives more weight to those words that occur less frequently, which further improves relevance.

E-Commerce and Document Search Engines

While all this is useful for document search engines, it can be counter-intuitive for product search engines on e-commerce sites.  Consider, for example, indexing these two product descriptions:

“This comfortable jacket fits well and is a perfect match for our line of shoes.”

“These shoes are easy to wear and feel great.”

When someone searches for “comfortable shoes”, the search engine notices a) the word “comfortable” only appears in the first description and b) the word “shoes” appears once in both descriptions, so it assumes they’re both equally relevant.  Because it finds both search words in the first description, it returns that result first, which appears to us as woefully wrong.

Fortunately, knowing how a document search engine works can help you modify how your search engine returns your data.  Here are a few ways to use your search engine’s logic to your advantage:

1) Clean your data

Before all else, make sure the data going into your search engine is as clean as possible.  Consider hiring people (through sites like oDesk) to go through your product names and descriptions to ensure all information is relevant and up-to-date.  Otherwise, you’ll find that descriptions like “Band-Aids — OUT OF STOCK, BACK IN MARCH” will mysteriously show up for searches for “Marching Band Uniform” (because “March” and “Band” match).

With search engines, the old adage is truer than ever: Garbage In, Garbage Out.

2) Separate titles, descriptions, and keywords in your search engine indexes

If you’re currently combining titles, descriptions, and keywords into a single searchable field, your results are suffering.  Descriptions, in particular, should be weighted much lower than titles and keywords.  Consider the description, “This printer is the best on the market.  It will print beautiful color images in vivid reds, blues, and greens on all weights of paper.”  Your search engine will think this is a great match for a query for “red paper,” because the words “red” and “paper” appear in the description.

Instead, index titles and keywords separately, and boost matches in those fields much more than matches in the description.  That will cause results whose titles and keywords match the search terms to appear first in the results.

3) Add product names to keyword lists multiple times

Remember TF/IDF?  If your keywords for two products are “adult red shorts” and “short tennis socks”, your search engine considers those equally valid results for a search for “shorts”, because that word appears once in each set of keywords.  But you can trick your search engine into understanding your products better by repeating words in keyword fields that are particularly important.  So you might index those adult red shorts using “adult red shorts shorts shorts shorts”.  This will tell the search engine to return that product first in a search for “shorts”.

4) Add synonyms to keyword lists

Indexing a “keywords” field that’s independent of the title and description of a product gives you a lot of good ways to improve your search results.  Suppose you’re selling “WalkMaster Shoes” that are “perfect for tennis, walking, and strolling around.”  You examine your list of zero-result searches and notice that people are searching for “sneakers” but not getting any results.  You can add this keyword to your WalkMaster Shoes without changing the title or description of the product, and now searches for “sneakers” will return a relevant result.

These are a few ways you can transform your document search engine into a great product search engine.  Stay tuned for more ideas to help you improve your search results…