Sunday, May 3, 2020

How to enable search synonyms in AEM with Lucene?

How to enable search synonyms in AEM with Lucene?


This tutorial explains how to enable search synonyms in AEM with Lucene.


Search Synonyms


Synonyms are used to inform the search engine that searching for one word should also search for others e.g searching for gigabyte should also consider gigabytes, gib and gb etc

The analyzer should be configured for custom oak index to support the search synonyms.

Refer the following tutorial to configure custom oak index and analyzers

https://www.albinsblog.com/2020/04/oak-lucene-index-improve-query-in-aem-configure-lucene-index.html#.Xu7oD2hKjb1

https://www.albinsblog.com/2020/05/how-to-enable-case-insensitive-search-in-aem-lucene.html#.Xu7nAWhKjb1

I have a data node with id property value as "gigabyte", the node will be returned while searching with the value "gigabyte" but not returned while searching with "gigabytes", "gib" or "gb".

aem-search-synonyms-with-lucene1

aem-search-synonyms-with-lucene2.png

aem-search-synonyms-with-lucene

Configure Analyzer


Let us now configure the Analyzer to support the search synonyms.

Create a node "Synonym" with the the primary type of  "nt:unstructured" under analyzers\default\filters (refer the sample configuration package from the git link posted in the bottom of the tutorial)

aem-search-synonyms-with-lucene


Add the following properties to synonyms node

format - solr or wordnet
synonyms - synonyms.txt, file with synonyms definitions

There are two possible formats for the dictionary, I am using solr format for the demo
  • wordnet: based on the popular Wordnet community. This required the synonyms configuration in specific format
  • solr: it’s more plain text. Comma separated values 
The synonym.txt file is a simple comma-separated list of synonyms. All matching terms should exist in a single row. Any word that is searched in the row will match all other words in that same row. Common uses for Synonyms are matching on variations of a word.

synonyms.txt

GB,gib,gigabyte,gigabytes
MB,mib,megabyte,megabytes
Television, Televisions, TV, TVs

The configurations are ready now, let us re-index the data. Change the value of reindex property under the custom index to true - this will initiate the re-indexing, the property value will be changed to false once the re-indexing is initiated

aem-search-synonyms-with-lucene

Wait for few minutes to the index to complete

Let us now search with "gib" - there is no node with value "gib" for id property under /content/sampledata but "gib" is configured as a synonym for "gigabyte" so the node with id property value "gigabyte" will be returned as a result.

aem-search-synonyms-with-lucene


Synonyms Configuration - https://github.com/techforum-repo/youttubedata/tree/master/lucene


Define a custom index with synonyms and configure all the possible synonyms to the synonyms.txt file. The configured data can be used to search the data with different synonyms.


No comments:

Post a Comment