Showing posts with label Adobe Experience Manager. Show all posts
Showing posts with label Adobe Experience Manager. Show all posts

Thursday, October 31, 2024

How to Resolve org.apache.poi.util.RecordFormatException: "Tried to Allocate an Array of Length, but the Maximum Length is 100,000,000

When uploading and processing a large .xls file through Java in an AEM backend, I encountered the following exception:

Exception in thread "main" org.apache.poi.util.RecordFormatException:
Tried to allocate an
array of length 159,135,035, but the maximum length for this record type is 100,000,000.
If the file is not corrupt and not large, please open an issue on bugzilla to
request increasing the maximum allowable size for this record type. You can set a higher override value with IOUtils.setByteArrayMaxOverride()

Stack Trace:

at org.apache.poi.util.IOUtils.throwRFE(IOUtils.java:596)
at org.apache.poi.util.IOUtils.checkLength(IOUtils.java:281) ... at ExcelReader.readSpreadsheet(ExcelReader.java:17) at ExcelReader.main(ExcelReader.java:52)



Explanation of the Issue

This exception occurs because Apache POI tries to allocate a large byte array to process the Excel file, exceeding the default size limit of 100,000,000 bytes. The issue is often due to the size or structure of the .xls file, which requires a larger array than Apache POI allows by default.

Solutions

There are two primary ways to resolve this issue:


Option 1: Increase the Byte Array Limit

You can override the maximum allowable byte array size using IOUtils.setByteArrayMaxOverride(). Setting this to a higher value, such as 200,000,000, may help process larger Excel files. However, this approach can impact memory usage and may lead to performance issues with very large files. Before increasing the size, make sure the issue isn’t due to a corrupted or incorrect file format.

Updated Code Snippet:


import org.apache.poi.ss.usermodel.*; import org.apache.poi.ss.usermodel.WorkbookFactory; import org.apache.poi.util.IOUtils; import java.io.File; import java.io.FileInputStream; import java.io.IOException; import java.util.Iterator; public class ExcelReader { public void readSpreadsheet(String filePath) { Workbook workBook = null; try (FileInputStream fis = new FileInputStream(new File(filePath))) { // Increase the byte array limit for larger files IOUtils.setByteArrayMaxOverride(200_000_000); // Create Workbook instance workBook = WorkbookFactory.create(fis); Sheet sheet = workBook.getSheetAt(0); Iterator<Row> rowIter = sheet.rowIterator(); boolean firstRow = true; while (rowIter.hasNext()) { Row row = rowIter.next(); if (!firstRow) { Iterator<Cell> cellIter = row.cellIterator(); while (cellIter.hasNext()) { Cell cell = cellIter.next(); System.out.print(cell + "\t"); // Print cell content } System.out.println(); // Newline for each row } else { firstRow = false; } } } catch (IOException e) { System.err.println("IO Exception: " + e.getMessage()); } finally { if (workBook != null) { try { workBook.close(); } catch (IOException e) { System.err.println("Unable to close Workbook object: "
                     + e.getMessage()); } } } } public static void main(String[] args) { String filePath = "Test.xlsx"; ExcelReader reader = new ExcelReader(); reader.readSpreadsheet(filePath); } }

Command to Compile and Run:

javac -cp ".;poi-5.3.0.jar;poi-ooxml-5.3.0.jar;poi-ooxml-full-5.3.0.jar;poi-ooxml-schemas-5.3.0.jar;xmlbeans-5.1.1.jar;commons-collections4-4.5.0-M2.jar;commons-io-2.17.0.jar;log4j-api-2.24.1.jar;commons-compress-1.27.1.jar;log4j-core-2.24.1.jar" ExcelReader.java

java -cp ".;poi-5.3.0.jar;poi-ooxml-5.3.0.jar;poi-ooxml-full-5.3.0.jar;poi-ooxml-schemas-5.3.0.jar;xmlbeans-5.1.1.jar;commons-collections4-4.5.0-M2.jar;commons-io-2.17.0.jar;log4j-api-2.24.1.jar;commons-compress-1.27.1.jar;log4j-core-2.24.1.jar" ExcelReader

Option 2: Use the Streaming API (SXSSFWorkbook)

Using SXSSFWorkbook, a streaming API in Apache POI, allows you to process large .xlsx files in a memory-efficient way. This approach loads only a small part of the file into memory at any time, making it suitable for processing large Excel files.

Code Using SXSSFWorkbook:


import org.apache.poi.ss.usermodel.*; import org.apache.poi.xssf.usermodel.XSSFWorkbook; import org.apache.poi.xssf.streaming.SXSSFWorkbook; import java.io.File; import java.io.FileInputStream; import java.io.IOException; import java.util.Iterator; public class ExcelReaderStream { public void readSpreadsheet(String filePath) { SXSSFWorkbook streamingWorkbook = null; try (FileInputStream fis = new FileInputStream(new File(filePath)); XSSFWorkbook workbook = new XSSFWorkbook(fis)) { // Wrap XSSFWorkbook in SXSSFWorkbook to enable streaming streamingWorkbook = new SXSSFWorkbook(workbook); Sheet sheet = streamingWorkbook.getSheetAt(0); // Iterate through rows in a memory-efficient way for (Row row : sheet) { Iterator<Cell> cellIter = row.cellIterator(); while (cellIter.hasNext()) { Cell cell = cellIter.next(); System.out.print(getCellValue(cell) + "\t"); } System.out.println(); // Newline for each row } } catch (IOException e) { System.err.println("IO Exception: " + e.getMessage()); } finally { if (streamingWorkbook != null) { try { streamingWorkbook.dispose(); // Dispose of temporary files streamingWorkbook.close(); } catch (IOException e) { System.err.println("Unable to close SXSSFWorkbook object: "
                    + e.getMessage()); } } } } // Utility method to get the value of a cell as a String private String getCellValue(Cell cell) { switch (cell.getCellType()) { case STRING: return cell.getStringCellValue(); case NUMERIC: if (DateUtil.isCellDateFormatted(cell)) { return cell.getDateCellValue().toString(); } else { return String.valueOf(cell.getNumericCellValue()); } case BOOLEAN: return String.valueOf(cell.getBooleanCellValue()); case FORMULA: return cell.getCellFormula(); default: return ""; } } public static void main(String[] args) { String filePath = "Test.xlsx"; ExcelReaderStream reader = new ExcelReaderStream(); reader.readSpreadsheet(filePath); } }

Command to Compile and Run:

javac -cp ".;poi-5.3.0.jar;poi-ooxml-5.3.0.jar;poi-ooxml-full-5.3.0.jar;poi-ooxml-schemas-5.3.0.jar;xmlbeans-5.1.1.jar;commons-collections4-4.5.0-M2.jar;commons-io-2.17.0.jar;log4j-api-2.24.1.jar;commons-compress-1.27.1.jar;log4j-core-2.24.1.jar" ExcelReaderStream.java

java -cp ".;poi-5.3.0.jar;poi-ooxml-5.3.0.jar;poi-ooxml-full-5.3.0.jar;poi-ooxml-schemas-5.3.0.jar;xmlbeans-5.1.1.jar;commons-collections4-4.5.0-M2.jar;commons-io-2.17.0.jar;log4j-api-2.24.1.jar;commons-compress-1.27.1.jar;log4j-core-2.24.1.jar" ExcelReaderStream

Conclusion

  • Option 1: Increasing the byte array limit using IOUtils.setByteArrayMaxOverride() can solve the immediate issue but may impact memory usage.
  • Option 2: Using SXSSFWorkbook with the streaming API is a more scalable solution for large .xlsx files, allowing memory-efficient processing.

Choose the approach that best suits your file size and memory requirements. Let me know if you have any questions!

Thursday, July 11, 2024

Enhancing Global Reach: Implementing RTL Support in Adobe Experience Manager (AEM)

 

In today’s globalized world, providing an inclusive digital experience is crucial for reaching diverse audiences. One essential aspect of this inclusivity is supporting Right-to-Left (RTL) languages, such as Arabic, Hebrew, and Persian. Adobe Experience Manager (AEM) offers some OOTB capabilities to implement RTL support, but customization is required to ensure your content is accessible and user-friendly for all language speakers.

RTL support involves more than just text alignment; it encompasses the entire user interface, ensuring that elements like navigation menus, buttons, and icons are appropriately mirrored. This holistic approach guarantees a seamless and intuitive user experience for RTL language speakers.

This post explains the approach to support RTL websites on AEM.

When we refer to RTL support in Adobe Experience Manager (AEM), it involves addressing two different perspectives: supporting RTL for end users and supporting RTL for content authors. Content authors need to be able to create and manage content in an RTL-supported way, ensuring the text and interface align correctly during the authoring process. On the other hand, end users should experience the site in a fully RTL-supported manner, with the entire interface, including navigation menus, buttons, and icons, appropriately mirrored and aligned for a seamless and intuitive user experience.

This post explains the approach to support RTL websites on AEM from both perspectives.

End Users/Authoring View — Pages

We will be setting the direction at the page level, assuming the same templates will be used for both RTL and LTR-based pages. Our approach involves setting the direction at the HTML tag level based on the language configured in the page properties. This ensures that the page’s layout and text direction automatically adjust according to the specified language, providing a seamless experience for both RTL and LTR users.

Create a Sling Model that fetches the language configuration from the page properties and returns the direction value.

package com.adobe.aem.guides.wknd.core.models;

import org.apache.sling.api.resource.Resource;
import org.apache.sling.models.annotations.Model;
import org.apache.sling.models.annotations.injectorspecific.SlingObject;

import javax.inject.Inject;
import java.util.Arrays;
import java.util.List;

@Model(adaptables = Resource.class)
public class LanguageHelper {

private static final List<String> RTL_LANGUAGES = Arrays.asList("ar", "he", "fa", "ur");

@SlingObject
private Resource currentResource;

public String getDirection() {
String language = currentResource.getValueMap().get("jcr:language", String.class);
if (language != null) {
String primaryLanguage = language.split("_")[0];
return RTL_LANGUAGES.contains(primaryLanguage) ? "rtl" : "ltr";
}
return "ltr";
}
}

Override page.html from your page rendering component, e.g., /apps/wknd/components/page/page.html, and invoke the LanguageHelper model created in the previous step to set the direction tag in the HTML.

<html data-sly-use.page="com.adobe.cq.wcm.core.components.models.Page" lang="${page.language}"
data-sly-use.pwa="com.adobe.cq.wcm.core.components.models.PWA"
data-sly-use.head="head.html"
data-sly-use.footer="footer.html"
data-sly-use.redirect="redirect.html"
data-sly-use.langHelper="com.adobe.aem.guides.wknd.core.models.LanguageHelper" dir="${langHelper.direction}">

Now, when users visit the pages from the authoring or end-user view, the direction tag is added at the root HTML level based on the site/page language, and the page starts displaying in the RTL direction.

CSS/ClientLib to Support RTL

The default CSS (ClientLib) created supports LTR, but to support RTL pages, the clientlibs should be adapted for RTL. Considering that the templates are shared between RTL and LTR pages, we need to associate the RTL client library with those pages based on the language selection.

By setting the HTML direction tag to RTL at the root level, the website begins adapting to RTL. However, some styles and alignments managed through the CSS may break. Therefore, we need to generate an RTL version of the CSS (ClientLib).

You can create two separate versions of the CSS (ClientLib) to support LTR and RTL. Alternatively, build tools like Webpack have plugins, such as rtlcss-webpack-plugin, that can help convert the existing LTR CSS to support RTL. This way, two versions of CSS (ClientLibs) are generated: one to support LTR and another to support RTL.

Install rtlcss-webpack-plugin to the ui.frontend module:

npm install rtlcss-webpack-plugin --save-dev

Modify webpack.common.js to add the RtlCssPlugin:

const RtlCssPlugin = require('rtlcss-webpack-plugin');

module.exports = {
// existing configuration
plugins: [
// other plugins
new RtlCssPlugin({
filename: 'clientlib-[name]/[name]-rtl.css'
})
]
};

Modify clientlib.config.js to generate the RTL version of the clientlib:

{
...libsBaseConfig,
name: 'clientlib-site-rtl',
categories: ['wknd.site.rtl'],
assets: {
css: {
cwd: 'clientlib-site',
files: ['**/*rtl.css'],
flatten: false
},
resources: {
cwd: 'clientlib-site',
files: ['**/*.*'],
flatten: false,
ignore: ['**/*.js', '**/*.css']
}
},
}

Override headlibs.html from your page rendering component, e.g., /apps/wknd/components/page/headlibs.html.

Change the existing logic to include the clientlibs from:

<sly data-sly-test="${clientlibCategories}"
data-sly-call="${clientlib.css @ categories=clientlibCategories}"></sly>

to include the corresponding LTR or RTL clientlibrary based on the direction:

<sly data-sly-use.langHelper="com.adobe.aem.guides.wknd.core.models.LanguageHelper"></sly>
<sly data-sly-test="${clientLibCategories}"
data-sly-use.rtlClientLibCSSProvider="${'com.adobe.aem.guides.wknd.core.models.RTLClientLibCSSProvider' @ cssClientLibs=clientLibCategories, dir=langHelper.direction}"
data-sly-unwrap></sly>
<sly data-sly-test="${rtlClientLibCSSProvider.rtlClientLibs}" data-sly-call="${clientlib.css @ categories=rtlClientLibCSSProvider.rtlClientLibs}"></sly>

Create a model to configure the correct clientlib based on the page direction:

package com.adobe.aem.guides.wknd.core.models;

import org.apache.sling.api.SlingHttpServletRequest;
import org.apache.sling.models.annotations.Model;
import javax.inject.Inject;
import javax.annotation.PostConstruct;

@Model(adaptables = SlingHttpServletRequest.class)
public class RTLClientLibCSSProvider {

@Inject
private Object[] cssClientLibs;

@Inject
private String dir;

private String[] rtlClientLibs;

@PostConstruct
protected void init() {
if (cssClientLibs != null) {
rtlClientLibs = new String[cssClientLibs.length];
for (int i = 0; i < cssClientLibs.length; i++) {
if (cssClientLibs[i] instanceof String) {
rtlClientLibs[i] = dir.equals("rtl") ? ((String) cssClientLibs[i]) + ".rtl" : (String) cssClientLibs[i];
}
}
}
}

public String[] getRtlClientLibs() {
return rtlClientLibs;
}
}

If some styles are not adapted for RTL, you may need to enable manual overrides to apply specific styles only to RTL. For example:

/* Default LTR styles */
.example {
margin-left: 20px;
}

/* RTL specific overrides */
[dir="rtl"] .example {
margin-left: 0;
margin-right: 20px;
}

Using Logical Properties — CSS logical properties and values provide a way to define styles that are agnostic to text direction. This can help reduce the need for manual overrides.

/* Instead of using left/right */
.container {
padding-inline-start: 20px;
}

Control Directives — Use /*rtl:ignore*/ or /*rtl:begin:ignore*/ ... /*rtl:end:ignore*/ comments to prevent the plugin from modifying certain parts of your CSS. The RTL plugin uses these directives while outputting the CSS for RTL.

Example:

.example {
/*rtl:ignore*/
margin-left: 20px;
}

By combining manual overrides, logical properties, and control directives, you can ensure that your styles are correctly applied for both LTR and RTL layouts.

Enabling RTL Support for Core Text Component (RTE) for Authoring

The core text component by default does not support RTL and always uses LTR during authoring. The direction we set at the page level is not applied to authoring dialogs. To enable RTL support for the Core Text Component during authoring, follow these steps:

Create a Custom ClientLibrary: Create a custom client library under your project, for example, /apps/wknd/clientlibs/set-direction-clientlib, and add the category cq.authoring.dialog.

Add JavaScript: Add the following JavaScript to the JS file. This script fetches the language page property value and sets the direction tag on the rich text editor element. Now, when the user authors the text component, the direction is set automatically based on the language set at the site/page level.

js/set-direction-clientlib.js


// File: js/set-direction-clientlib.js
(function ($, document, ns) {
$(document).on("dialog-ready", function () {
var language =
Granite.author.ContentFrame.contentWindow.document.documentElement.lang;
if (language) {
var primaryLanguageCode = language.split("-")[0]; // Get the language part only
var rtlLanguages = ["ar", "he", "fa", "ur"]; // List of RTL language codes
var isRtl = rtlLanguages.indexOf(primaryLanguageCode) !== -1;

if (isRtl) {
// The language is right-to-left
var richtextElement = document.querySelector(".coral-RichText");

// Check if the element exists
if (richtextElement) {
// Add additional styles for RTL
richtextElement.style.textAlign = "right";
richtextElement.style.direction = "rtl";
}
}
}
});
})(Granite.$, document, Granite.author);

js.txt

#base=js

set-direction-clientlib.js

Experience Fragment for Authoring

Considering that Experience Fragments (XF) are authored independently from pages, the direction configurations enabled at the page level will not apply to XFs. Additionally, XFs use the foundation page component, not the core page component.

To adapt the logic based on different criteria, such as identifying the language using the experience fragment path, I am going to use the language property at the XF level considering the same experience fragment template will be used to create different language XF fragments. The language needs to be set at the XF level at least for RTL languages; for other cases, the direction will default to LTR.

Add the following logic to xfpage.html (e.g., /apps/wknd/components/xfpage/xfpage.html) to add the direction tag based on the language selected in the XF properties:

<!DOCTYPE HTML>
<html data-sly-use.langHelper="com.adobe.aem.guides.wknd.core.models.LanguageHelper" dir="${langHelper.direction}">
<head data-sly-include="head.html"></head>
<body data-sly-use.body="body.js" class="${body.cssClasses}"
data-sly-include="body.html">
</body>
</html>

Additionally, override /libs/wcm/foundation/components/page/author.html to the xfpage component (e.g., /apps/wknd/components/xfpage/author.html). Replace:

<sly data-sly-test="${wcmInit.templateCategories.length > 0}" data-sly-call="${clientLib.css @ categories=wcmInit.templateCategories}" />

with

<sly data-sly-use.langHelper="com.adobe.aem.guides.wknd.core.models.LanguageHelper"></sly>

<sly data-sly-test="${wcmInit.templateCategories.length > 0}"
data-sly-use.rtlClientLibCSSProvider="${'com.adobe.aem.guides.wknd.core.models.RTLClientLibCSSProvider' @ cssClientLibs=wcmInit.templateCategories, dir=langHelper.direction}"
data-sly-unwrap></sly>

<sly data-sly-test="${rtlClientLibCSSProvider.rtlClientLibs}" data-sly-call="${clientlib.css @ categories=rtlClientLibCSSProvider.rtlClientLibs}"></sly>

Now, when authoring the XF, the direction is set based on the language configured in the XF properties.

Content Fragment for authoring

Content Fragments are used to author headless content independently. The new Content Fragment Editor (Universal Editor) based OOTB enables the option to configure the direction for the MultiLine Rich Text Editor, but the direction is not supported for other types, including SingleLine text.

Unfortunately, I could not find an approach to extend the Universal Editor by attaching custom JavaScript.

If you still use the old CF editor, you can create a custom ClientLibrary that sets the direction based on some logic, for example, by passing the direction parameter while accessing the CF, e.g., http://localhost:4502/editor.html/content/dam/cf/test?direction=rtl.

Create a ClientLibrary with the category dam.cfm.authoring.contenteditor.v2.

Add the following JavaScript to the file js/cf-direction.js:

//File js/cf-direction.js
(function ($, $document) {

$document.on("foundation-contentloaded", onContentLoad);

function onContentLoad(event) {
var urlParams = new URLSearchParams(window.location.search);
var direction = urlParams.get('direction'); // Assume 'direction' can be 'rtl' or 'ltr'

// Apply direction to specific elements
var elements = document.querySelectorAll('.coral-Form-fieldwrapper, .cfm-multieditor-fullscreen-richtext-editor-wrapper');

elements.forEach(function(element) {
element.setAttribute('dir', direction);
});


}

}(jQuery, jQuery(document)));

For Core Text Component or CF RTEs, you can also develop a custom plugin that helps the author set the direction.

Internationalization

Internationalization (i18n) can be used to define labels and text in different languages to support multilingual websites. You should define the i18n mappings for custom labels displayed on dialogs, as well as any end-user tokens that require translation. While the actual authored content can follow the translation process, labels, tokens, and some content can be translated through i18n based on the selected language.

The corresponding language-specific value is displayed through i18n based on the website’s language.

From the authoring perspective, AEM allows users to set their language preference to specific languages. Currently, only a limited number of languages are supported (English, German, Spanish, French, Italian, Portuguese (Brazil), Chinese, Japanese, and Korean) and RTL (Right-to-Left) languages are not supported at this time. Once the user sets their language preference, the entire authoring interface, including dialog labels, is displayed in the selected language. Content that does not use i18n and is translated separately will be displayed in the language in which the content is available.

The core components’ dialogs already support i18n and have the required tokens defined. However, if you introduce new labels for your custom components, you should enable i18n for all languages so that the fields display in the corresponding language set by the user as a preference.

Unfortunately, /libs/granite/ui/content/userproperties is marked as granite:InternalArea, so we will not be able to overlay /libs/granite/ui/content/userproperties/preferences/form/items/language/items to add additional language preference values. We hope Adobe will support additional languages for the authoring view in the future.

Templates:

RTL-Specific Templates: If you need unique structures for Right-to-Left (RTL) pages, consider defining RTL-specific templates. Although templates are often created for Left-to-Right (LTR) structured websites by default, there are instances where creating RTL-specific templates is necessary to support the desired structure effectively.

Content Translation

The content needs to be translated into the required languages, including RTL languages. Use the AEM Content Translation Framework along with a translation provider to translate the content into the required languages. You can also use the Multi-Site Manager (MSM) structure to simplify the management of multilingual websites. Refer to the following posts for more details on this:

Summary

Supporting multilingual, including RTL (Right-to-Left) languages, enhances global reach, increases customer engagement, and boosts ROI. In AEM, it’s important that not only the end-user experience but also the authoring view supports multiple languages, including RTL.

AEM provides some out-of-the-box features to support RTL for both authoring and end-user views. However, customization is often required to fully support these features, and there are known limitations.

The approach I followed is based on certain requirements, such as sharing templates for both LTR and RTL websites and ensuring that pages and components adapt to RTL based on the language settings. Depending on your specific needs, this approach can be adapted to solve your challenges. Feel free to use this method or review and adapt it to suit your own requirements.