Thursday, October 31, 2024

How to Resolve org.apache.poi.util.RecordFormatException: "Tried to Allocate an Array of Length, but the Maximum Length is 100,000,000

When uploading and processing a large .xls file through Java in an AEM backend, I encountered the following exception:

Exception in thread "main" org.apache.poi.util.RecordFormatException:
Tried to allocate an
array of length 159,135,035, but the maximum length for this record type is 100,000,000.
If the file is not corrupt and not large, please open an issue on bugzilla to
request increasing the maximum allowable size for this record type. You can set a higher override value with IOUtils.setByteArrayMaxOverride()

Stack Trace:

at org.apache.poi.util.IOUtils.throwRFE(IOUtils.java:596)
at org.apache.poi.util.IOUtils.checkLength(IOUtils.java:281) ... at ExcelReader.readSpreadsheet(ExcelReader.java:17) at ExcelReader.main(ExcelReader.java:52)



Explanation of the Issue

This exception occurs because Apache POI tries to allocate a large byte array to process the Excel file, exceeding the default size limit of 100,000,000 bytes. The issue is often due to the size or structure of the .xls file, which requires a larger array than Apache POI allows by default.

Solutions

There are two primary ways to resolve this issue:


Option 1: Increase the Byte Array Limit

You can override the maximum allowable byte array size using IOUtils.setByteArrayMaxOverride(). Setting this to a higher value, such as 200,000,000, may help process larger Excel files. However, this approach can impact memory usage and may lead to performance issues with very large files. Before increasing the size, make sure the issue isn’t due to a corrupted or incorrect file format.

Updated Code Snippet:


import org.apache.poi.ss.usermodel.*; import org.apache.poi.ss.usermodel.WorkbookFactory; import org.apache.poi.util.IOUtils; import java.io.File; import java.io.FileInputStream; import java.io.IOException; import java.util.Iterator; public class ExcelReader { public void readSpreadsheet(String filePath) { Workbook workBook = null; try (FileInputStream fis = new FileInputStream(new File(filePath))) { // Increase the byte array limit for larger files IOUtils.setByteArrayMaxOverride(200_000_000); // Create Workbook instance workBook = WorkbookFactory.create(fis); Sheet sheet = workBook.getSheetAt(0); Iterator<Row> rowIter = sheet.rowIterator(); boolean firstRow = true; while (rowIter.hasNext()) { Row row = rowIter.next(); if (!firstRow) { Iterator<Cell> cellIter = row.cellIterator(); while (cellIter.hasNext()) { Cell cell = cellIter.next(); System.out.print(cell + "\t"); // Print cell content } System.out.println(); // Newline for each row } else { firstRow = false; } } } catch (IOException e) { System.err.println("IO Exception: " + e.getMessage()); } finally { if (workBook != null) { try { workBook.close(); } catch (IOException e) { System.err.println("Unable to close Workbook object: "
                     + e.getMessage()); } } } } public static void main(String[] args) { String filePath = "Test.xlsx"; ExcelReader reader = new ExcelReader(); reader.readSpreadsheet(filePath); } }

Command to Compile and Run:

javac -cp ".;poi-5.3.0.jar;poi-ooxml-5.3.0.jar;poi-ooxml-full-5.3.0.jar;poi-ooxml-schemas-5.3.0.jar;xmlbeans-5.1.1.jar;commons-collections4-4.5.0-M2.jar;commons-io-2.17.0.jar;log4j-api-2.24.1.jar;commons-compress-1.27.1.jar;log4j-core-2.24.1.jar" ExcelReader.java

java -cp ".;poi-5.3.0.jar;poi-ooxml-5.3.0.jar;poi-ooxml-full-5.3.0.jar;poi-ooxml-schemas-5.3.0.jar;xmlbeans-5.1.1.jar;commons-collections4-4.5.0-M2.jar;commons-io-2.17.0.jar;log4j-api-2.24.1.jar;commons-compress-1.27.1.jar;log4j-core-2.24.1.jar" ExcelReader

Option 2: Use the Streaming API (SXSSFWorkbook)

Using SXSSFWorkbook, a streaming API in Apache POI, allows you to process large .xlsx files in a memory-efficient way. This approach loads only a small part of the file into memory at any time, making it suitable for processing large Excel files.

Code Using SXSSFWorkbook:


import org.apache.poi.ss.usermodel.*; import org.apache.poi.xssf.usermodel.XSSFWorkbook; import org.apache.poi.xssf.streaming.SXSSFWorkbook; import java.io.File; import java.io.FileInputStream; import java.io.IOException; import java.util.Iterator; public class ExcelReaderStream { public void readSpreadsheet(String filePath) { SXSSFWorkbook streamingWorkbook = null; try (FileInputStream fis = new FileInputStream(new File(filePath)); XSSFWorkbook workbook = new XSSFWorkbook(fis)) { // Wrap XSSFWorkbook in SXSSFWorkbook to enable streaming streamingWorkbook = new SXSSFWorkbook(workbook); Sheet sheet = streamingWorkbook.getSheetAt(0); // Iterate through rows in a memory-efficient way for (Row row : sheet) { Iterator<Cell> cellIter = row.cellIterator(); while (cellIter.hasNext()) { Cell cell = cellIter.next(); System.out.print(getCellValue(cell) + "\t"); } System.out.println(); // Newline for each row } } catch (IOException e) { System.err.println("IO Exception: " + e.getMessage()); } finally { if (streamingWorkbook != null) { try { streamingWorkbook.dispose(); // Dispose of temporary files streamingWorkbook.close(); } catch (IOException e) { System.err.println("Unable to close SXSSFWorkbook object: "
                    + e.getMessage()); } } } } // Utility method to get the value of a cell as a String private String getCellValue(Cell cell) { switch (cell.getCellType()) { case STRING: return cell.getStringCellValue(); case NUMERIC: if (DateUtil.isCellDateFormatted(cell)) { return cell.getDateCellValue().toString(); } else { return String.valueOf(cell.getNumericCellValue()); } case BOOLEAN: return String.valueOf(cell.getBooleanCellValue()); case FORMULA: return cell.getCellFormula(); default: return ""; } } public static void main(String[] args) { String filePath = "Test.xlsx"; ExcelReaderStream reader = new ExcelReaderStream(); reader.readSpreadsheet(filePath); } }

Command to Compile and Run:

javac -cp ".;poi-5.3.0.jar;poi-ooxml-5.3.0.jar;poi-ooxml-full-5.3.0.jar;poi-ooxml-schemas-5.3.0.jar;xmlbeans-5.1.1.jar;commons-collections4-4.5.0-M2.jar;commons-io-2.17.0.jar;log4j-api-2.24.1.jar;commons-compress-1.27.1.jar;log4j-core-2.24.1.jar" ExcelReaderStream.java

java -cp ".;poi-5.3.0.jar;poi-ooxml-5.3.0.jar;poi-ooxml-full-5.3.0.jar;poi-ooxml-schemas-5.3.0.jar;xmlbeans-5.1.1.jar;commons-collections4-4.5.0-M2.jar;commons-io-2.17.0.jar;log4j-api-2.24.1.jar;commons-compress-1.27.1.jar;log4j-core-2.24.1.jar" ExcelReaderStream

Conclusion

  • Option 1: Increasing the byte array limit using IOUtils.setByteArrayMaxOverride() can solve the immediate issue but may impact memory usage.
  • Option 2: Using SXSSFWorkbook with the streaming API is a more scalable solution for large .xlsx files, allowing memory-efficient processing.

Choose the approach that best suits your file size and memory requirements. Let me know if you have any questions!

Friday, October 11, 2024

Syncing Segments from Adobe Target to Adobe Experience Manager (AEM)

While integrating Adobe Target with AEM, you can synchronize segments from Adobe Target to AEM. Basic configurations include:

  • Enabling the project with the Adobe Target API in the Adobe Developer Console.
  • Enabling the required workspaces and ensuring proper access for the API credentials via the Admin Console Product Profile (Workspace).
  • Setting up the IMS configuration in AEM.
  • Enabling the Adobe Target Cloud Configuration with segment syncing.


(I will be posting a detailed guide on the integration process soon.)

Issue: Segments Not Syncing

Unfortunately, even after completing all the steps, the segments were not syncing, and I encountered the following exception:

11.10.2024 12:08:37.246 *WARN* [sling-cq-polling-importer-2-/conf/wknd/settings/cloudconfigs/target] com.day.cq.polling.importer.impl.PollingImporterImpl importData: Failed to import PollConfig(/conf/wknd/settings/cloudconfigs/target/jcr:content/segmentsPollConfig): scheme=adobe-target-segments,source=/conf/wknd/settings/cloudconfigs/target,target=/etc/segmentation/adobe-target/tenant,interval=300s,enabled=true

java.lang.NullPointerException: null

at com.day.cq.analytics.testandtarget.impl.SegmentImporter.importData(SegmentImporter.java:159) [com.adobe.cq.cq-target-integration:1.4.60]

at com.day.cq.polling.importer.impl.PollingImporterImpl.importData(PollingImporterImpl.java:561) [com.day.cq.cq-polling-importer:5.13.2]

at com.day.cq.polling.importer.impl.PollingImporterImpl.access$000(PollingImporterImpl.java:72) [com.day.cq.cq-polling-importer:5.13.2]

at com.day.cq.polling.importer.impl.PollingImporterImpl$1.run(PollingImporterImpl.java:463) [com.day.cq.cq-polling-importer:5.13.2]

at org.apache.sling.commons.scheduler.impl.QuartzJobExecutor.execute(QuartzJobExecutor.java:349) [org.apache.sling.commons.scheduler:2.7.12]

at org.quartz.core.JobRunShell.run(JobRunShell.java:202) [org.apache.sling.commons.scheduler:2.7.12]

at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)

at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)

at java.base/java.lang.Thread.run(Thread.java:834)


Resolution

To resolve this issue, ensure that the Day CQ Analytics Segment Importer configuration is enabled.


Additionally, segments will only sync from the Default Workspace at present. Ensure that segments are created under the Default Workspace. In my case, there were no segments under the Default Workspace, which caused the sync to fail.

You can also adjust the polling interval in the Adobe Target Cloud Configuration.

You can adjust the polling interval on Adobe Target Cloud Configuration



Successful Sync

Once I created segments in the Default Workspace, they were successfully synced to the AEM Audience Library. These segments can now be used for A/B testing or Experience Targeting for AEM pages using Adobe Target as the targeting engine.


11.10.2024 14:23:02.976 *INFO* [sling-cq-polling-importer-3-/conf/wknd/settings/cloudconfigs/target] com.day.cq.analytics.testandtarget.impl.SegmentImporter Importing in /etc/segmentation/adobe-target/tenant with dataSource /conf/wknd/settings/cloudconfigs/target

11.10.2024 14:23:02.978 *DEBUG* [sling-cq-polling-importer-3-/conf/wknd/settings/cloudconfigs/target] com.day.cq.analytics.testandtarget.impl.service.WebServiceImpl Target API Request: GET https://mc.adobe.io/tenant/target/audiences?offset=0&modifiedAt=2024-10-10%2F2024-10-12&limit=2000&type=reusable 

11.10.2024 14:23:03.517 *DEBUG* [sling-cq-polling-importer-3-/conf/wknd/settings/cloudconfigs/target] com.day.cq.analytics.testandtarget.impl.service.WebServiceImpl Target API Request: GET https://mc.adobe.io/tenant/target/audiences?deletedAt=2024-10-10%2F2024-10-12&offset=0&limit=2000&type=reusable&status=deleted 

11.10.2024 14:23:03.985 *INFO* [sling-cq-polling-importer-3-/conf/wknd/settings/cloudconfigs/target] com.day.cq.analytics.testandtarget.impl.SegmentImporter Wrote 4 segments under /etc/segmentation/adobe-target/tenant 



Creating Adobe Target Audiences

You can create Adobe Target Audiences in AEM, and they will sync to Adobe Target's Default Workspace.







11.10.2024 14:55:59.150 *DEBUG* [[0:0:0:0:0:0:0:1] [1728676559144] POST /conf/wknd/settings/cloudconfigs/target/jcr:content.audienceupdate.json HTTP/1.1] com.day.cq.analytics.testandtarget.impl.service.WebServiceImpl Target API Request: POST https://mc.adobe.io/tenant/target/audiences?includeMarketingCloudMetadata=true {"name":"Windows","description":"--","targetRule":{"script":"return (((user.browser!=null) && (user.browser.indexOf(\"Windows\")>-1)));"},"type":"reusable","marketingCloudMetadata":{"sourceProductName":"Adobe Experience Manager","editURL":"http://localhost:4502/libs/cq/personalization/touch-ui/content/audiences.html"}}