Thursday, May 30, 2024

Optimizing SEO Headers for Digital Assets — Adobe Experience Manager (AEM)

Search Engine Optimization (SEO) is crucial for ensuring that your digital content is easily discoverable by search engines and, consequently, your target audience. While much attention is often given to optimizing content pages, it’s equally important to focus on the SEO of digital assets such as images, videos, and documents. This is especially true for platforms like Adobe Experience Manager (AEM), where managing and delivering a vast range of digital assets is a core function. In this blog, we’ll explore the best practices for applying SEO headers to assets in AEM, focusing on the use of noindex and hreflang.

When it comes to digital assets, adding SEO headers such as noindex or hreflang can be more complex than for standard content pages. Unlike content pages, assets often serve a supporting role and may not need to be indexed by search engines. However, when they do, precise handling is required to ensure proper indexing and language targeting.

SEO Configurations for Content Pages:

For content pages, there are two primary methods to enable SEO configurations:

1. HTML Meta Tags:

SEO settings like noindex and hreflang can be added within the HTML metadata of content pages. This involves including these meta tags directly into the page’s HTML structure, either manually or using content authoring tools that automate their insertion.

2. HTTP Headers:

SEO directives such as noindex and hreflang can also be set at the server level using HTTP headers. This method provides an alternative to HTML meta tags for controlling SEO settings, allowing for centralized management of SEO configurations.

SEO Configurations for Assets:

For assets, the only viable method to implement SEO configurations is through HTTP headers, as assets do not have HTML structures where meta tags can be embedded:

1. HTTP Headers for Assets:

HTTP headers such as X-Robots-Tag for noindex and Link for hreflang can be used to manage SEO settings for digital assets like images, PDFs, and videos. These headers must be configured at the server or dispatcher level, ensuring that the appropriate SEO directives are applied to assets.

Example Syntax for Assets:

noindex:

The noindex directive tells search engines not to index a particular asset. This can be crucial for preventing duplicate content issues or for keeping non-essential assets out of search engine results.

X-Robots-Tag: noindex

Selective Application: Apply noindex to assets that do not provide direct value in search results, such as decorative images, icons, or internal documentation files.

hreflang:

The hreflang attribute is essential for assets that have multiple language versions. It helps search engines understand which version of an asset to serve based on the user's language preference.

Link: <https://example.com/assets/test-en.pdf>; rel="alternate"; hreflang="en", <https://example.com/assets/test-es.pdf>; rel="alternate"; hreflang="es"

Consistent Implementation: Ensure that hreflang tags are consistently applied across all versions of an asset.

Implementing SEO Headers for Assets in AEM:

Implementing these headers in AEM involves a combination of configuration and customization:

Dispatcher/CDN Configuration:

The Dispatcher/CDN configuration can be used to enable SEO headers based on specific patterns or directories. This configuration is usually done at the web server level (e.g., Apache) and is effective for broad, pattern-based rules.

Apache Configuration Examples

Folder Level Noindex Header:

<Directory /path/to/your/noindex-directory>
Header set X-Robots-Tag "noindex"
</Directory>

File Level Noindex and Hreflang Headers:

<Files "/path/to/your/specific-file.html">
Header set X-Robots-Tag "noindex"
Header set Link "<http://example.com/specific-file-en.html>; rel=\"alternate\"; hreflang=\"en\",<http://example.com/specific-file-es.html>; rel=\"alternate\"; hreflang=\"es\""
</Files>

2. Custom Approach through Authoring

For more granular control, such as enabling SEO headers at the individual asset level, a custom approach is required. This involves using AEM’s metadata schema to allow authors to control SEO headers and implementing a custom Java filter to apply these headers to asset responses.

Step-by-Step Implementation

Step 1: Metadata Schema Customization

  1. Create or Edit Metadata Schema:
  • Navigate to Tools > Assets > Metadata Schemas in AEM.
  • Create a new schema or edit an existing one.
  • Add fields for noindex and hreflang. For example:
  • noindex: Checkbox
  • hreflang: Text field or multi-value field for multiple languages

2. Apply Metadata Schema:

  • Apply the schema to the relevant asset folders or types.

Step 2: Custom Java Filter

Create a custom Sling filter to read the metadata and set the appropriate headers.

1. Create a Sling Filter:

package com.example.aem.filters;

import org.apache.sling.api.resource.Resource;
import org.apache.sling.api.resource.ResourceResolver;
import org.apache.sling.engine.EngineConstants;
import javax.servlet.Filter;
import javax.servlet.FilterChain;
import javax.servlet.FilterConfig;
import javax.servlet.ServletException;
import javax.servlet.ServletRequest;
import javax.servlet.ServletResponse;
import org.apache.sling.api.SlingHttpServletRequest;
import org.apache.sling.api.SlingHttpServletResponse;
import org.osgi.service.component.annotations.Component;
import org.osgi.service.component.annotations.Reference;

import javax.servlet.ServletException;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

@Component(service = Filter.class, property = {
EngineConstants.SLING_FILTER_SCOPE + "=" + EngineConstants.FILTER_SCOPE_REQUEST,
EngineConstants.SLING_FILTER_PATTERN + "=/content/dam/.*"
})

public class SEOHeadersFilter implements Filter {

@Override
public void doFilter(
final ServletRequest request, final ServletResponse response, final FilterChain filterChain)

throws IOException, ServletException {

final SlingHttpServletRequest slingRequest = (SlingHttpServletRequest) request;
final SlingHttpServletResponse slingResponse = (SlingHttpServletResponse) response;

String metadataPath = slingRequest.getResource().getPath() + "/jcr:content/metadata";

Resource resource = slingRequest.getResourceResolver().getResource(metadataPath);
if (resource != null) {
String noindex = resource.getValueMap().get("noindex", String.class);
String[] hreflangArray = resource.getValueMap().get("hreflang", String[].class);

if ("true".equals(noindex)) {
slingResponse.setHeader("X-Robots-Tag", "noindex");
}
if (hreflangArray != null && hreflangArray.length > 0) {
List<String> hreflangList = new ArrayList<>();
for (String hreflangEntry : hreflangArray) {
String[] parts = hreflangEntry.split(":", 2);
if (parts.length == 2) {
String lang = parts[0].trim();
String url = parts[1].trim();
String hreflangUrl = String.format("<%s>; rel=\"alternate\"; hreflang=\"%s\"", url, lang);
hreflangList.add(hreflangUrl);
}
}
if (!hreflangList.isEmpty()) {

String hreflangHeader = String.join(", ", hreflangList);
System.out.println("inside: " + hreflangHeader);
slingResponse.setHeader("Link", hreflangHeader);
}
}
}

filterChain.doFilter(request, response);

}

@Override
public void init(FilterConfig filterConfig) {
}

@Override
public void destroy() {
}

}

2. Deploy the Filter:

  • Deploy the custom filter bundle to your AEM instance.

3. Caching SEO headers on dispatcher:

To make SEO headers available to subsequent requests after caching, the headers need to be cached at the dispatcher. This involves adding X-Robots-Tag and Link headers to the cache headers in the Dispatcher farm file configurations.

/headers {
"Cache-Control"
"Content-Type"
"Expires"
"Last-Modified"
"X-Content-Type-Options"
"X-Robots-Tag"
"Link"
}

Now the author can enable the required assets’ no-index or hreflang configurations.

[The AEM as a Cloud OOTB Fastly CDN blocks all custom headers set from AEM and allows only some standard headers. While the X-Robot-Tag header is currently supported, the Link header is not. Please contact Adobe through a support ticket to enable the Link header for your environments.]

Conclusion:

Optimizing SEO headers for assets in AEM is a nuanced process that differs significantly from optimizing content pages. By utilizing headers like noindex and hreflang, you can ensure that your digital assets are correctly indexed, served in appropriate languages, and managed efficiently.

Wednesday, May 22, 2024

GeoLocation Redirection in AEM as a Cloud

 In this post, we will explore how to enable GeoLocation redirection in AEM as a Cloud Service.

Sometimes, we may need to enable visitor country-based redirects to direct users to the appropriate page when they access a domain. Multiple approaches are possible to handle geolocation redirection. For example, client-side redirects using the Google Geocoder API or other Geocoder APIs can identify the visitor’s country. Additionally, server-side redirects can be enabled, such as using Apache in conjunction with geocoder services. Most CDN services also provide geo headers that capture the visitor’s country, which can be used to enable redirects.

Let’s now explore different approaches in AEM as a Cloud:

Option 1: Redirect through Apache Using CDN Geo Country Header

AEM as a Cloud uses Fastly CDN out of the box (OOTB). The CDN provides geo headers, such as x-aem-client-country, with every request, supplying an Alpha-2 country code (e.g., US, AR, etc.). This header can be utilized in Apache (Dispatcher) to redirect users to the appropriate country-specific URL. For instance, if a user’s request originates from the US for the domain www.test.com, they can be redirected to the corresponding country-specific homepage URL. To prevent the caching of redirects in the CDN and browser, caching should be disabled for the root path with max-age=0, no-cache, and no-store.

RewriteCond %{REQUEST_URI} ^/$
RewriteCond %{HTTP:x-aem-client-country} ^US$
RewriteRule ^.*$ https:/
/www.test.com/us/en/home.html.html [R=301,L]

Option 2: Redirect through CDN Using clientCountry

Another option is to handle the redirect directly through the CDN using the clientCountry header. The AEM as a Cloud service OOTB CDN now enables multiple capabilities that can be managed by the customer, such as origin selectors, request and response transformation, etc. For more details, refer to A Deep Dive into CDN Capabilities Within AEM as a Cloud | by Albin Issac | Tech Learnings | May, 2024 | Medium..

The latest addition to these capabilities is client redirects. The CDN now allows customers to configure different URL redirects like 301 and 302. This client redirect feature can be combined with the existing geo-location capability to redirect users to country-specific URLs. Note that this client redirect feature is not yet generally available. To join the early-adopter program, email [email protected].

You can add the following rule configuration to the cdn.yml file and add additional rules to meet your criteria:

experimental_redirects:
rules:
- name: country-redirect-us
when:
allOf:
- { reqProperty: clientCountry, equals: "US" }
- { reqProperty: domain, equals: "www.test.com" }
- { reqProperty: path, equals: "/" }
action:
type: redirect
location: https://www.test.com/us/en/home.html

- name: country-redirect-ca
when:
allOf:
- { reqProperty: clientCountry, equals: "CA" }
- { reqProperty: domain, equals: "www.test.com" }
- { reqProperty: path, equals: "/" }
action:
type: redirect
location: https://www.test.com/ca/en/home.html

- name: country-redirect-default
when:
allOf:
- { reqProperty: domain, equals: "www.test.com" }
- { reqProperty: path, equals: "/" }
action:
type: redirect
location: https://www.test.com/default/en/home.html

This configuration sets up rules to redirect users based on their country. For example, users from the US accessing www.test.com will be redirected to the country-specific URL https://www.test.com/us/en/home.html, and users from Canada accessing www.test.com will be redirected to the country-specific URL https://www.test.com/ca/en/home.html. The default rule sends visitors from the rest of the countries to the default website URL https://www.test.com/default/en/home.html. You can add more rules to the cdn.yml file to meet other redirection criteria.

Please note that since the redirect is an experimental feature, experimental_redirects: is used in the configuration. The experimental_ prefix should be removed once this capability becomes generally available (GA).

Tuesday, May 14, 2024

A Deep Dive into CDN Capabilities Within AEM as a Cloud

 In this post, we’ll explore some of the recent enhancements to the out-of-the-box CDN capabilities in AEM as a Cloud.

AEM as a Cloud’s CDN(Fastly) has recently enabled several key features that help us quickly address various issues. Some of these features are not yet in general availability but are accessible through the Early Adopter Program.

Image from Adobe Document

The above image outlines the process flow for request and response handling in AEM as a Cloud CDN. Let’s examine some key features that are either currently enabled or available at the early adopter stage. These CDN configurations can be set through the cdn.yaml file and deployed separately via the Cloud Manager config pipeline.

For example below config folders in your code repository and configure the required features into the cdn.yaml file.

config-dev/cdn.yaml

config-prod/cdn.yaml

kind: "CDN"
version: "1"
metadata:
envTypes: ["prod", "stage"]
data:
trafficFilters:
rules:

originSelectors:
rules:

Define config pipeline to deploy the CDN changes

Origin Selector:

The Origin Selector feature enables the CDN to route traffic to non-AEM backends according to the configuration, functioning similarly to a reverse proxy.

You can define rules to determine when requests should be directed to a specific origin. You can specify the backend domain or Ip address to connect, this allows for the configuration of various options such as useCache(default is true), forwardHost(default is false), forwardCookie(default is false), forwardAuthorization(default is false), and timeout (default is 60) settings.

Request Transformations:

Request transformation rules enable you to modify incoming requests by setting, unsetting, and altering paths, query parameters, and headers (including cookies) based on a variety of matching conditions, such as regular expressions. The supported actions — set, unset, and transform — can be chained together. Additionally, these rules allow you to set variables and reference these variables later in the pipeline, such as in Response Transformers.

Response Transformations:

Response transformation rules allow you to set and unset headers of the CDN’s outgoing responses. The supported actions — set (Sets a specified header to a given value in the response) and unset (Removes a specified header from the response)

Client-side Redirects(Early Adopter Program):

This feature is not yet generally available. To join the early-adopter program, email [email protected].

the client-side redirect rules for 301, 302 and similar client-side redirects. If a rule matches, the CDN responds with a status line that includes the status code and message (for example, HTTP/1.1 301 Moved Permanently), as well as the location header set. Both absolute and relative locations with fixed values are allowed.

CDN Error Pages:

When the CDN is unable to reach the AEM origin, a generic, unbranded error page is displayed. However, this default error page can be overridden by hosting static files in self-hosted storage solutions such as Amazon S3, Azure Blob Storage, or any external servers. These can then be referenced through CDN configuration.


kind: "CDN"
version: "1"
metadata:
envTypes: ["dev"]
data:
errorPages:
spa:
title: the error page
icoUrl: https://www.example.com/error.ico
cssUrl: https://www.example.com/error.css
jsUrl: https://www.example.com/error.js

Edge Side Includes (ESI) for Loading Dynamic Content (Early Adopter Program):

This feature is not yet generally available. To join the early-adopter program, email [email protected].

The Adobe Managed CDN now supports Edge Side Includes (ESI), enabling dynamic content replacement directly at the CDN level. Sling Dynamic Include facilitates various include types such as SSI (Server Side Include), ESI (Edge Side Include), and JavaScript includes. To leverage ESI, specific CDN configurations are necessary, allowing dynamic content placeholders to be replaced at the CDN level, which permits full-page caching. While SSI allow content to be fully cached at the dispatcher (not at the CDN), as the dispatcher retains the full page with dynamic include placeholders and substitutes them with actual content upon request. ESI, on the other hand, enables the complete content along with dynamic placeholders to be stored at the CDN, with placeholders replaced with real content as needed. For a more detailed understanding, refer to the article ‘Sling Dynamic Include — Deep Dive | Dynamically Include Page Components in AEM | by Albin Issac | Tech Learnings | Medium.

Traffic Filter Rules Alerts (Early Adopter Program):

Traffic filter rules at the CDN layer can be employed to either block or allow requests, providing crucial control in various scenarios. These include:

  • Restricting access to specific domains exclusively to internal company traffic before a new site goes live.
  • Establishing rate limits to reduce susceptibility to volumetric DoS attacks.
  • Blocking access from IP addresses known to be associated with malicious activities.

Rate Limit Rules — to block traffic if it exceeds a certain rate of incoming requests, based on a specific condition. Setting a value for the rateLimit property limits the rate of those requests that match the rule condition.

Additionally, you can implement WAF (Web Application Firewall) traffic filter rules, which utilize WAF flags. These rules require either an Enhanced Security license or a WAF-DDoS Protection license.

Please note, that there is a size limit to the configuration file (the cumulative size of the configuration file cannot exceed 100KB) so organizations with larger requirements should define rules in the apache/dispatcher layer.

References:

https://experienceleague.adobe.com/en/docs/experience-manager-cloud-service/content/implementing/content-delivery/cdn-configuring-traffic

Configuring CDN Error Pages | Adobe Experience Manager

Traffic Filter Rules including WAF Rules | Adobe Experience Manager