Tuesday, September 13, 2022

Coveo Search Implementation with AEM(Adobe Experience Manager)

In one of the earlier posts, we discussed Selecting a Search Engine for on-site Search — Open Source vs. Search as a Service | by Albin Issac | Medium.

In this post, I will explain my experience integrating the Coveo Search platform with the AEM platform based on my experience migrating Adobe Search & Promote to the Coveo platform.

Coveo is a software-as-a-service search engine powered by artificial intelligence. Refer to the following document for more details on the Coveo Platform — Coveo Overview.

Setting Up the Sources:

A source is a connector configuration and an index container that holds all of the items related to a repository, e.g., Web, Sitemap, Youtube, etc.

Coveo has an AEM connector that helps push the AEM Content to Coveo at regular intervals for Indexing; Refer to Index With the Adobe Experience Manager Connector (coveo.com) for more details on Coveo AEM connector and configurations.

AEM connector is more flexible to configure and easy to manage; in my perspective, if your AEM content is organized correctly, the AEM Connector makes sense to enable the indexing. In case you are migrating the existing website, you don't want to make a significant change to the current indexing process, the other Coveo connectors like Sitemap(use the website sitemap to index the content) and Web(more to crawl and index the pages and assets — mainly used for indexing assets, e.g., PDF, the PDF's can even be indexed through Asset Sitemap enabled in AEM). Please note that AEM Connector is still in development, and AEM as a Cloud Service Support needs to be reviewed.

You can use the other available connectors to index the content from any other sources; a Generic REST API connector can be used to pull the content through a standard REST API endpoint supported sources. You can enable optical character recognition configurations to allow the index of text within the image and pdfs.

Web and Sitemap sources can include a web scraping configuration to control the content getting indexed; refer to Web Scraping Configuration (coveo.com) for more details on web scraping. For example, exclude the header and footer content from indexing, specify the configuration as a JSON

You should rebuild the source to reflect those changes whenever the source configuration is modified, e.g., Sitemap URL, Fields, etc.

Crawl settings can be enabled if required

Web Source

Sitemap Source

Some of the hidden configurations can be directly enabled by editing the JSON.

e.g., IndexHtmlMetadata for Sitemap Source; by default, the HTML metadata is not indexed; modify the value to “true” to index the HTML metadata for the required sitemap resources.

Refer to https://docs.coveo.com/en/l31h4512/index-content/web-source-json-modification and
https://onlinehelp.coveo.com/en/ces/7.0/administrator/modifying_hidden_sitemap_source_parameters.htm for more details on Web and Sitemap source hidden configurations.

Extensions:

An indexing pipeline extension (IPE) is a Python script to customize how one or more sources index content (see Indexing Pipeline Extension Overview).

For example, if you want to identify the language/country based on the URL and add the required metadata to the item or change date values or reject items that don't meet criteria or change an item body, or enable the facets with minimal available data, etc.

Refer to Indexing Pipeline Extension Script Samples (coveo.com) for Extension samples.

The Extensions can be associated with the required Sources.

The extension can be associated with a Pre or Post conversion; refer to Indexing Pipeline Extension Overview (coveo.com) for more details.

Scheduler:

The source can be scheduled to index the content at specific intervals.

Refresh — During a refresh operation, Coveo crawls the items and permission models identified by your content system as modified since the last source update. Then, Coveo retrieves the changes and updates your index—refresh factor in only the modified content.

Rescan — During a rescan operation, Coveo crawls all items in your content system. Rescan factor in Add/Delete/Modified Content.

Fields:

Define the list of custom metadata fields applicable for all the sources.

The Fields can be marked as a Facet/Multi-Value Facet or Sortable based on the business requirement. The supported Fields for the individual sources can be mapped at the Source level.

Map out the Fields to the actual metadata available in the source.

You can enable a custom field, e.g., commonsource, with source-specific value to support multiple search interfaces(sites) that display the data from specific sources; the Filtering logic can be enabled in the Query Pipeline.

Conditions:

A query pipeline condition is a set of predefined requirements that a query must meet to go through a query pipeline or trigger the execution of a query pipeline statement. A query pipeline condition is a set of predefined requirements that a query must meet to go through a query pipeline or trigger the execution of a query pipeline statement. Each query pipeline and query pipeline rule/ML model association can only be associated with a single condition.

You can add the condition based on different parameters, such as a minimal Search Hub to associate a specific pipeline with the website — the search hub will be passed as a parameter while fetching the search result from the website(Search Hub is a Search API query parameter whose value is a descriptive name of the search interface from which the query originates. This lets you create an optimized query pipeline for a specific search interface.)

Configure Query Pipeline:

You can define query pipelines in your Coveo organization when you have more than one search interface(Search Hub) with distinct users and purposes, and you want to apply different rules or models for each search interface.

You can associate a condition created in the earlier step to associate the query pipeline with specific scenarios, e.g., Search Hub(Search Interface)

You can also define

  • A/B tests:
  • Campaigns:
  • Search Terms: (Thesaurus rules — define synonyms and Stop words and Stop words — words that are filtered out from a query entered by an end user before it's sent to the index)
  • Result Ranking — You can change the ranking score of the content or display specific content as featured content.
  • Associate the Machine Learning Models
  • Advanced — you can define some advanced configurations like

Filters — to define the scope of the search, e.g., enable country code and language code as a filter to support multi-country and multi-language sites(no need to define multi-sources to support this use case — instead, use Filter along with context data from search interface to scope the search result). Also, the scope can be restricted based on the Field defined in the source, e.g., commonsource.

Query Parameters Rules — You can configure query parameter rules to override the default query parameters when a certain condition is met. e.g. enable or disable enableMLDidYouMean/enableDidYouMean based on specific condition.

Ranking Weights — The Coveo Platform index uses default ranking factors to evaluate the relevance score of each search result for each query. When you see that the ranking of specific search results isn't ideal in a particular context of search or case, you can adjust the value of one or more ranking factors.

Triggers — You can configure Notify, Query, Execute, and Redirect triggers to execute actions in your search interfaces when a specific query is performed.

Enable Search Result Page on AEM:

Multiple approaches enable the Search Interface; refer to Choose the Right Approach (coveo.com) to get more details on the interface options.

Coveo JavaScript Search Framework — This open-source, component-based framework allows developers to easily customize and deploy feature-rich, client-side search interfaces on any web page, site, or application.

We used Coveo Search UI(Coveo JavaScript Search Framework) to build our search interfaces considering the project timeline and effort; you can create multiple search interfaces to support your business needs, e.g., support different themes, fonts, structures, etc. The JavaScript Framework is enabled with various OOTB components, but custom/extended components can be enabled based on your business need.

Enable the Search interface with required components, e.g., Pagination/Facets/Sorting/Filters/Search Suggestions, AutoComplete, etc., and the look and feel. You can add the search interface-specific context data to the search UI; the context data can be used in Query Pipeline Filter to restrict the data specific to the scenario, e.g., filter the data based on country and language for multi-country/language websites.

//identify the language and country based on the URL pattern and set it as a context variable.Coveo.$$(document.getElementById('coveo-search')).on('buildingQuery', function(e, args) {args.queryBuilder.addContextValue('country', country);args.queryBuilder.addContextValue('locale', locale);});

The page metadata is another critical factor in enabling the facets; if possible, tag the AEM pages with different metadata that can be used as a facet.

You can host the i18n key updates into AEM, maybe as a client library path, e.g./etc.clientlibs/project/clientlibs/assets/resources/i18n/search.coveo.js.

window.coveo_dict = {
"en": {
"Relevance": "Relevance",
"Date": "Date",
....
},
"es": {
"Relevance": "Relevancia",
"Date": "Fecha",
...
},
...
}

So easy to manage in the future; also, the CSS file can be enabled through AEM as a client library path(inline CSS is not recommended) e.g.,/etc.clientlibs/project/clientlibs/assets/resources/css/sitea/search.coveo.css

Ensure the accessibility is factored in during the development and testing of the search interfaces.

The search interface can be integrated into AEM through Coveo Hosted Search Page Component.

Download the Coveo for Adobe package (coveo-aem-components.all-1.6.0-beta.zip) from https://docs.coveo.com/en/l89g0153/adobe/releases-and-downloads#coveo-hosted-search-page-package

Install the package in author and publisher instances.

Enable the hosted search page component to the required templates.

Add the hosted search page component to the search-result pages.

Configure Coveo organization ID(In most implementations, will have two organizations — Non-Prod and Prod), Search page ID, API keys for access and search, Search Hub, and platform endpoint URL

You can copy the Id of the search page.

Enable Access and Search API Keys with required access


Now the search results are displayed on the result page; activate the page to the publisher.

The search interfaces will send the Search Page name as the Search hub (coveo.com) as part of the API request to fetch the search results; the Search Hub value can be used in Query Pipelines/Conditions to restrict the section of data that is specific to an interface. The SearchHub value can be used as part of the APIs to fetch the particular data section.

The search box on the header:

You can follow the Create a Standalone Search Box (coveo.com) approach to enable the Search box on the page headers; modify the search hub value in the HTML based on your configuration

<div class="CoveoAnalytics" data-search-hub="my_main_search_page_search_hub"></div>

Suppose, if required custom search box can be enabled on the header and Query Suggest API to display the suggestion but enabling the Coveo standalone search box makes more sense in the long run. In that case, the standalone search box will have the inbuilt query suggest also extended by adding additional components, e.g., recommendations. You should configure the Coveo token(make this configurable in AEM and inject it into the pages through a hidden field) and the search result URL. Also, you should include the required Coveo CSS and JS as part of the site client Library. Refer to JavaScript Search Framework Events (coveo.com) to attach to the specific events to enable any custom functionalities, e.g., disable the search box until the user types in 3 characters.

Create an access token with limited access to support the header search box functionalities, e.g., query suggest

Users will be directed to the configured search result page while clicking on a search term or a search button.

Machine Learning:

Coveo Machine Learning (Coveo ML) models are algorithms that leverage usage analytics data to provide contextually relevant recommendations to the users. Multiple OOTB ML models are available; refer to Machine Learning Overview (coveo.com) for more details; Custom models can be enabled if your team can support that.

We mainly used Query Suggestions (QS) and the Automatic Relevance Tuning (ART) model to improve the relevancy of the query suggestions and the results based on user behavior.

You can create the model specific to your scenarios(e.g., Site Specific) and associate it with the Query Pipeline.

Configure the Learning interval

Also, you can enable the required filters to restrict the data specific to a scenario, e.g., limiting data to a particular country and language on multi-language websites.

You can test the models through the model testing page.

Associate the Model to Query Pipeline

Sharing Data through API:

Sometimes, you may need to share the search data through an API in addition to the Search interfaces.

You can use GET or POST search API; sample GET URL

https://platform.cloud.coveo.com/rest/search/v2?organizationId=<<Organization ID>>&searchHub=<<xxx>>&access_token=<<Access Token>>&context={"country":"us","locale":"en"}&q= test

Specify the organization id, Search Hub also the necessary context data, e.g., country/language values.

Specify the query string to search the index, e.g., q=test

Enable a dedicated Access API key with minimal access to retrieve the search data

The response will have all the records matching the input query.

Access Management:

Multiple OOTB groups are available, e.g., Administrators, Analytics managers, Relevance Managers, etc.; the custom groups can be defined based on the use cases. Refer to Manage Groups (coveo.com) for more details on managing groups.

You can integrate your SSO(SAML) system to enable single sign-on behavior to the Coveo interfaces. Refer to Configure Coveo SAML SSO for more details on SSO configurations.

Select the Provider(different providers are available; Single sign-on if enabled; else other applicable providers. Enter the username; send the notification if required; select the Group.

Analytics:

Coveo analytics is designed to help you measure and optimize your Coveo solution. The Coveo-powered search interface captures and stores data in your Coveo organization. The data is then processed and made accessible through dashboards and explorers.

In addition to helping you measure your search solution adoption, usage analytics data also feeds Coveo Machine Learning (Coveo ML) algorithms to provide a more intuitive and personalized experience. Refer to Coveo Usage Analytics Overview for more details on Coveo analytics.

You can define the required dimensions and dashboards to review the usage data; the usage data can be exported as a CSV also scheduled at a specific interval.

The data can also be shared with the organization’s snowflake account.

You can review the visitor details through the Visitor Browser.

Content Browser/Log browser:

The content browser page is a search interface that allows you to filter and inspect the content indexed in your Coveo organization. This page helps validate changes made to your organization and troubleshoot issues.

The Log Browser page of the Coveo Administration Console allows members with the required privileges to inspect the status of an item sent through the Coveo indexing pipeline.

Since each log corresponds to a stage of the indexing pipeline for a single item, you can retrieve precise information from this page after narrowing down your search. The Log Browser page is handy when troubleshooting indexing issues that apply to specific items. The custom messages can be logged from Python extensions — Logging Messages From an Indexing Pipeline Extension (coveo.com)

The following diagram shows the process of a query being sent to a given Coveo organization and the execution order of query pipeline features.

Whenever a query is made in your search hub, i.e., in one of your search interfaces, it passes through a query pipeline for optimization. The optimized query is then sent to the Coveo index, which returns matching items to display in the search interface.

The changes are initially enabled on the non-prod organization; the required configurations from the non-prod can be moved to the prod organization after successful post validation. You can use the Resource Snapshots to push the changes from the non-prod Organization to Prod Organization, or even the required changes can be manually applied to the prod organizations (mostly the source changes should be moved manually considering the source URL are different for no-prod and prod). Refer to Manage Resource Snapshots (coveo.com) for more details; you should be able to adjust Prod-specific values while moving the changes and only the selected resources.