Friday, April 10, 2020

Oak Lucene Index - Improve the query performance in AEM(Adobe Experience Manager) | Configure Oak Lucene Index in AEM

Oak Lucene Index - Improve the query performance in AEM(Adobe Experience Manager) | Configure Oak Lucene Index in AEM


This tutorial explains the details on enabling Oak Lucene Index to improve the query performance in AEM(Adobe Experience Manager)

OAK Lucene Index


For queries to perform well, Oak supports indexing of content that is stored in the repository. When a JCR query gets executed, usually it searches the index first. If there is no index, the query executes for the entire content. This is time consuming and an overhead for the AEM. A query can be executed without an index, but for large datasets, it will execute very slowly, or even abort.

There are three types of indexing mode available that defines how comparing is performed, and when the index content gets updated

Synchronous Indexing - Under synchronous indexing, the index content gets updates as part of the commit itself. Changes to both the main content, as well as the index content, are done atomically in a single commit. The new content is added into the index as soon as available.

Asynchronous Indexing - Asynchronous indexing (also called async indexing) is performed using periodic scheduled jobs. As part of the setup, Oak schedules certain periodic jobs which perform diff of the repository content, and update the index content based on that. This will provide better performance but the new content will not be available immediately to the index.

Near Real Time (NRT) Indexing - This method indicates that index is a near real time index.

Indexing uses Commit Editors. Some of the editors are of type IndexEditor, which are responsible for updating index content based on changes in main content. Currently, Oak has following in built editors:

  • PropertyIndexEditor
  • ReferenceEditor
  • LuceneIndexEditor
  • SolrIndexEditor

There are 3 main types of indexes available in AEM :

  • Lucene – asynchronous (full text and property) - Recommended
  • Property – synchronous [ Prefer only when you need synchronous results ]
  • Solr – asynchronous


Configure Lucene Index in AEM



Oak supports Lucene based indexes to support both property constraint and full text constraints. Depending on the configuration a Lucene index can be used to evaluate property constraints, full text constraints, path restrictions and sorting.

If multiple indexers are available for a query, each available indexer estimates the cost of executing the query. Oak then chooses the indexer with the lowest estimated cost.

I have a large data sets(12k) under "/content/sampledata" with id property, the id property valy of all nodes starts with '1111'

aem-oak-lucene-index

Let me now execute a query to fetch all the nodes under "/content/sampledata" those id property value start with '1111'

select * from [nt:unstructured] where [jcr:path] like '/content/sampledata/%' and id LIKE '%1111%'

aem-oak-lucene-index

The query execution failed with the following exception "The query read or traversed more than 100000 nodes. To avoid affecting other tasks, processing was stopped"

The 100000 is the queryLimitReads value, queryLimitReads value can be changed but the query fails again after reaching the limit and also this will impact the overall system performance.

The queryLimitReads value can be changed through the following OSGI configuration -http://localhost:4502/system/console/configMgr/org.apache.jackrabbit.oak.query.QueryEngineSettingsService

aem-oak-lucene-index

The query execution behavior can be reviewed through Query Performance Tool

aem-oak-lucene-index
  
aem-oak-lucene-index

aem-oak-lucene-index

This will display the slow queries and popular queries, also Explain query explains the query execution details.

aem-oak-lucene-index

aem-oak-lucene-index


There is no index defined for the query so executed as a full traversal, the query traversed more than 100000 and aborted to avoid the impact on other activities.

Let us see now how to define a Lucene index to improve the query performance

Create a node with name "testindex" under oak:index with the following properties

jcr:primaryType - oak:QueryIndexDefinition
type - lucene
includedPaths - /content/sampledata
fullTextEnabled - false
evaluatePathRestrictions - true
compatVersion - 2
async - async, nrt


aem-oak-lucene-index

On Save this will create a node with name "indexRules"

aem-oak-lucene-index

By default, a node with name nt:base created under indexRules. Rename the node to the primaryType of nodes need to be indexed, our case "nt:unstructured"

aem-oak-lucene-index

aem-lucene-index


There a default node with name prop0 created under properties, rename prop0 to the property need to be indexed, our case "id" and enable the below properties

id:
propertyIndex - true
ordered - true
name - id
isRegexp - false

https://oakutils.appspot.com/generate/index utility can be used to generate the index definitions.

aem-oak-lucene-index


aem-oak-lucene-index

Let us now reindex the data, Change the reindex property to true to initiate the asynchronous indexing

aem-oak-lucene-index

The reindex property value will be changed to false after initiating the index, wait for sometime to the index to complete.

Re-execute the query. The query is now working with out any issues and with better performance, the query is executed with the defined Lucene index.

aem-oak-lucene-index

aem-oak-lucene-index



This is always best practice to review the slow running custom queries and configure the required indexing to improve the performance. If there are multiple indexes defined, oak considers the best index to execute the query.

1 comment: