Sunday, May 3, 2020

How to enable search synonyms in AEM with Lucene?

How to enable search synonyms in AEM with Lucene?

This tutorial explains how to enable search synonyms in AEM with Lucene.

Search Synonyms

Synonyms are used to inform the search engine that searching for one word should also search for others e.g searching for gigabyte should also consider gigabytes, gib and gb etc

The analyzer should be configured for custom oak index to support the search synonyms.

Refer the following tutorial to configure custom oak index and analyzers

I have a data node with id property value as "gigabyte", the node will be returned while searching with the value "gigabyte" but not returned while searching with "gigabytes", "gib" or "gb".




Configure Analyzer

Let us now configure the Analyzer to support the search synonyms.

Create a node "Synonym" with the the primary type of  "nt:unstructured" under analyzers\default\filters (refer the sample configuration package from the git link posted in the bottom of the tutorial)


Add the following properties to synonyms node

format - solr or wordnet
synonyms - synonyms.txt, file with synonyms definitions

There are two possible formats for the dictionary, I am using solr format for the demo
  • wordnet: based on the popular Wordnet community. This required the synonyms configuration in specific format
  • solr: it’s more plain text. Comma separated values 
The synonym.txt file is a simple comma-separated list of synonyms. All matching terms should exist in a single row. Any word that is searched in the row will match all other words in that same row. Common uses for Synonyms are matching on variations of a word.


Television, Televisions, TV, TVs

The configurations are ready now, let us re-index the data. Change the value of reindex property under the custom index to true - this will initiate the re-indexing, the property value will be changed to false once the re-indexing is initiated


Wait for few minutes to the index to complete

Let us now search with "gib" - there is no node with value "gib" for id property under /content/sampledata but "gib" is configured as a synonym for "gigabyte" so the node with id property value "gigabyte" will be returned as a result.


Synonyms Configuration -

Define a custom index with synonyms and configure all the possible synonyms to the synonyms.txt file. The configured data can be used to search the data with different synonyms.

No comments:

Post a Comment