Thursday, October 31, 2024

How to Resolve org.apache.poi.util.RecordFormatException: "Tried to Allocate an Array of Length, but the Maximum Length is 100,000,000

When uploading and processing a large .xls file through Java in an AEM backend, I encountered the following exception:

Exception in thread "main" org.apache.poi.util.RecordFormatException:
Tried to allocate an
array of length 159,135,035, but the maximum length for this record type is 100,000,000.
If the file is not corrupt and not large, please open an issue on bugzilla to
request increasing the maximum allowable size for this record type. You can set a higher override value with IOUtils.setByteArrayMaxOverride()

Stack Trace:

at org.apache.poi.util.IOUtils.throwRFE(IOUtils.java:596)
at org.apache.poi.util.IOUtils.checkLength(IOUtils.java:281) ... at ExcelReader.readSpreadsheet(ExcelReader.java:17) at ExcelReader.main(ExcelReader.java:52)



Explanation of the Issue

This exception occurs because Apache POI tries to allocate a large byte array to process the Excel file, exceeding the default size limit of 100,000,000 bytes. The issue is often due to the size or structure of the .xls file, which requires a larger array than Apache POI allows by default.

Solutions

There are two primary ways to resolve this issue:


Option 1: Increase the Byte Array Limit

You can override the maximum allowable byte array size using IOUtils.setByteArrayMaxOverride(). Setting this to a higher value, such as 200,000,000, may help process larger Excel files. However, this approach can impact memory usage and may lead to performance issues with very large files. Before increasing the size, make sure the issue isn’t due to a corrupted or incorrect file format.

Updated Code Snippet:


import org.apache.poi.ss.usermodel.*; import org.apache.poi.ss.usermodel.WorkbookFactory; import org.apache.poi.util.IOUtils; import java.io.File; import java.io.FileInputStream; import java.io.IOException; import java.util.Iterator; public class ExcelReader { public void readSpreadsheet(String filePath) { Workbook workBook = null; try (FileInputStream fis = new FileInputStream(new File(filePath))) { // Increase the byte array limit for larger files IOUtils.setByteArrayMaxOverride(200_000_000); // Create Workbook instance workBook = WorkbookFactory.create(fis); Sheet sheet = workBook.getSheetAt(0); Iterator<Row> rowIter = sheet.rowIterator(); boolean firstRow = true; while (rowIter.hasNext()) { Row row = rowIter.next(); if (!firstRow) { Iterator<Cell> cellIter = row.cellIterator(); while (cellIter.hasNext()) { Cell cell = cellIter.next(); System.out.print(cell + "\t"); // Print cell content } System.out.println(); // Newline for each row } else { firstRow = false; } } } catch (IOException e) { System.err.println("IO Exception: " + e.getMessage()); } finally { if (workBook != null) { try { workBook.close(); } catch (IOException e) { System.err.println("Unable to close Workbook object: "
                     + e.getMessage()); } } } } public static void main(String[] args) { String filePath = "Test.xlsx"; ExcelReader reader = new ExcelReader(); reader.readSpreadsheet(filePath); } }

Command to Compile and Run:

javac -cp ".;poi-5.3.0.jar;poi-ooxml-5.3.0.jar;poi-ooxml-full-5.3.0.jar;poi-ooxml-schemas-5.3.0.jar;xmlbeans-5.1.1.jar;commons-collections4-4.5.0-M2.jar;commons-io-2.17.0.jar;log4j-api-2.24.1.jar;commons-compress-1.27.1.jar;log4j-core-2.24.1.jar" ExcelReader.java

java -cp ".;poi-5.3.0.jar;poi-ooxml-5.3.0.jar;poi-ooxml-full-5.3.0.jar;poi-ooxml-schemas-5.3.0.jar;xmlbeans-5.1.1.jar;commons-collections4-4.5.0-M2.jar;commons-io-2.17.0.jar;log4j-api-2.24.1.jar;commons-compress-1.27.1.jar;log4j-core-2.24.1.jar" ExcelReader

Option 2: Use the Streaming API (SXSSFWorkbook)

Using SXSSFWorkbook, a streaming API in Apache POI, allows you to process large .xlsx files in a memory-efficient way. This approach loads only a small part of the file into memory at any time, making it suitable for processing large Excel files.

Code Using SXSSFWorkbook:


import org.apache.poi.ss.usermodel.*; import org.apache.poi.xssf.usermodel.XSSFWorkbook; import org.apache.poi.xssf.streaming.SXSSFWorkbook; import java.io.File; import java.io.FileInputStream; import java.io.IOException; import java.util.Iterator; public class ExcelReaderStream { public void readSpreadsheet(String filePath) { SXSSFWorkbook streamingWorkbook = null; try (FileInputStream fis = new FileInputStream(new File(filePath)); XSSFWorkbook workbook = new XSSFWorkbook(fis)) { // Wrap XSSFWorkbook in SXSSFWorkbook to enable streaming streamingWorkbook = new SXSSFWorkbook(workbook); Sheet sheet = streamingWorkbook.getSheetAt(0); // Iterate through rows in a memory-efficient way for (Row row : sheet) { Iterator<Cell> cellIter = row.cellIterator(); while (cellIter.hasNext()) { Cell cell = cellIter.next(); System.out.print(getCellValue(cell) + "\t"); } System.out.println(); // Newline for each row } } catch (IOException e) { System.err.println("IO Exception: " + e.getMessage()); } finally { if (streamingWorkbook != null) { try { streamingWorkbook.dispose(); // Dispose of temporary files streamingWorkbook.close(); } catch (IOException e) { System.err.println("Unable to close SXSSFWorkbook object: "
                    + e.getMessage()); } } } } // Utility method to get the value of a cell as a String private String getCellValue(Cell cell) { switch (cell.getCellType()) { case STRING: return cell.getStringCellValue(); case NUMERIC: if (DateUtil.isCellDateFormatted(cell)) { return cell.getDateCellValue().toString(); } else { return String.valueOf(cell.getNumericCellValue()); } case BOOLEAN: return String.valueOf(cell.getBooleanCellValue()); case FORMULA: return cell.getCellFormula(); default: return ""; } } public static void main(String[] args) { String filePath = "Test.xlsx"; ExcelReaderStream reader = new ExcelReaderStream(); reader.readSpreadsheet(filePath); } }

Command to Compile and Run:

javac -cp ".;poi-5.3.0.jar;poi-ooxml-5.3.0.jar;poi-ooxml-full-5.3.0.jar;poi-ooxml-schemas-5.3.0.jar;xmlbeans-5.1.1.jar;commons-collections4-4.5.0-M2.jar;commons-io-2.17.0.jar;log4j-api-2.24.1.jar;commons-compress-1.27.1.jar;log4j-core-2.24.1.jar" ExcelReaderStream.java

java -cp ".;poi-5.3.0.jar;poi-ooxml-5.3.0.jar;poi-ooxml-full-5.3.0.jar;poi-ooxml-schemas-5.3.0.jar;xmlbeans-5.1.1.jar;commons-collections4-4.5.0-M2.jar;commons-io-2.17.0.jar;log4j-api-2.24.1.jar;commons-compress-1.27.1.jar;log4j-core-2.24.1.jar" ExcelReaderStream

Conclusion

  • Option 1: Increasing the byte array limit using IOUtils.setByteArrayMaxOverride() can solve the immediate issue but may impact memory usage.
  • Option 2: Using SXSSFWorkbook with the streaming API is a more scalable solution for large .xlsx files, allowing memory-efficient processing.

Choose the approach that best suits your file size and memory requirements. Let me know if you have any questions!

No comments:

Post a Comment