Tuesday, September 4, 2018

How to use Luke(Lucene Index Toolbox) to analyze Lucene Index in AEM(Adobe Experience manager)

How to use Luke(Lucene Index Toolbox) to analyze Lucene Index in AEM(Adobe Experience manager)


This post will explain the details on analyzing the created Lucene index in AEM(Adobe Experience Manager)

Retrieve the Lucene Index:


By default in Oak the Lucene Index files are stored in NodeStore and will not be accessible directly but if the following configurations("Enable CopyOnRead" or "Enable CopyOnWrite" ) are enabled in "Apache Jackrabbit Oak LuceneIndexProvider" the Lucene Index will be copied to Local files system path, If the "Local index storage path" not specified then indexes would be stored under 'index' dir under Repository Home (localIndexDir)

Lucene_index_provider

Index_on_local_file_system_aem

Index_on_local_file_system_aem

If the index is copied to loacl file system this can be directly accessed,the index is stored in the file name starts with segments

Index_on_local_file_system_aem
The mapping between local path and the index can be found here - localhost:4502/system/console/jmx/org.apache.jackrabbit.oak%3Aname%3DIndexCopier+support+statistics%2Ctype%3DIndexCopierStats(this URL will available only if the above mentioned properties are enabled)

Index_status_oak_aem


The below steps can be be followed to retrieve the indexing files if the index files are not stored in local.

Download oak-run-x.x.x.jar that corresponds to AEM Oak version, the AEM Oak version can be identified from CRXDE(the oak-run version 1.4.1 was not working and i downloaded 1.8.0 version - https://repository.apache.org/service/local/artifact/maven/redirect?r=releases&g=org.apache.jackrabbit&a=oak-run&v=1.8.0)

Execute java -jar oak-run-1.8.0.jar index <Node Store Path> e.g. java -jar oak-run-1.8.0.jar index C:\Albin\Development\AEM\6.2\crx-quickstart\repository\segmentstore to identify the available indexes, the index stats and index definitions under the folder from where the command is executed

The index is stored in the file name starts with segments


oak_run_available_index

The indexing status and the definitions can be accessed from the following URL also - http://localhost:4502/system/console/jmx/org.apache.jackrabbit.oak%3Aname%3DLucene+Index+statistics%2Ctype%3DLuceneIndex


oak_lucene_index_status

Execute java -jar oak-run-1.8.0.jar console <Node Store Path> e.g. java -jar oak-run-1.8.0.jar console C:\Albin\Development\AEM\6.2\crx-quickstart\repository\segmentstore

lc dump <Target path to dump the index file> <Index Path> e.g. lc dump C:\Albin\Development\Oak /oak:index/users

oak_run_dump_index


Analyze the Index File:


Download https://github.com/DmitryKey/luke/releases/download/4.7.0/luke-with-deps.jar and oak-lucene-1.4.1.jar(based on the AEM oak version)
Place both the jar files in folder

Execute java -cp luke-with-deps.jar;oak-lucene-1.4.1.jar org.getopt.luke.Luke

Select the parent folder of the index file

For index generated through oak-run-x.x.x.jar

Lucene_index_toolbox

For index created in local file system

Lucene_index_toolbox

This will display the overview of available documents, terms and fields

Lucene_index_toolbox_overview

Documents tab shows indexed documents, clicking on Reconstruct&Edit displays the fields level details
Lucene_index_toolbox_documents

The documents in the index can be searched in the Search tab - Eneter the search expression and select the search field

Lucene_index_toolbox_resource_search

Click on "Explain Structure" this will displays the query structure

Lucene_index_toolbox_explain_structure

Select the document in the result and click on "Explain Query", this will display the query execution details
Lucene_index_toolbox_resource_explain



No comments:

Post a Comment