Skip to content

Pagination-based Report Document Assembly

Pagination-based Report Document Assembly is a mechanism for generating large-scale reports by processing data in discrete chunks or pages, merging them incrementally, and managing the final document through a Message Queue.^[001-TODO__code.md]

System Workflow

The assembly process follows a distributed, asynchronous workflow designed to handle potentially massive data volumes without blocking system resources or causing timeouts^[001-TODO__code.md].

  1. Report Initialization: A user requests a report, creating a record in a RUNNING state^[001-TODO__code.md].
  2. Paginated Data Processing: An external data service queries data page-by-page^[001-TODO__code.md]. For each page, it sends a request (via PUT /v1/report) containing the current page number and total page count^[001-TODO__code.md].
  3. Document Chunk Creation: Upon receiving a page, the system generates a document segment (e.g., CSV or PDF) for that specific page and stores it in a temporary directory (e.g., {id}-temp/)^[001-TODO__code.md].
  4. Completion Check: The system verifies if the number of generated files in the temporary directory matches the expected total page count^[001-TODO__code.md].
  5. Merge Trigger: Once all pages are received, a merge task (ReportDocumentCombineVo) is dispatched to a Message Queue (COMBINE_FILE_Q)^[001-TODO__code.md].
  6. Final Assembly: A consumer listens to the queue, retrieves all temporary file segments, combines them into a single document, saves the final file, updates the record status to SUCCESS, and cleans up temporary files^[001-TODO__code.md].

Data Model

The state of a report is managed via the ReportDownloadRecordEntity, typically stored in a database^[001-TODO__code.md].

Key fields include: * Status: Tracks the lifecycle (e.g., RUNNING, SUCCESS, FAIL)^[001-TODO__code.md]. * Search Param Hash: An MD5 hash of the query parameters to prevent duplicate requests within a specific time window^[001-TODO__code.md]. * File Path: The storage location of the final, assembled document^[001-TODO__code.md]. * Error Message: Stores failure details if the generation or assembly process crashes^[001-TODO__code.md].

The data transfer object ReportUpdateDto is used to transmit page data, including metadata like currentPage and totalPage, along with the row data^[001-TODO__code.md].

Implementation Details

The system uses an interface ReportDocumentService with implementations for different file types (e.g., CSVDocumentServiceImpl, PDFDocumentServiceImpl)^[001-TODO__code.md].

  • CSV Strategy: When merging CSVs, the system typically preserves the header from the first chunk and removes headers from subsequent chunks before concatenating the byte arrays^[001-TODO__code.md].
  • File Management: A file management service (e.g., GcsFileManageServiceImpl for Google Cloud Storage) handles the creation, listing, and deletion of both temporary and final files^[001-TODO__code.md].
  • Failure Handling: If the merge process fails, the record status is updated to FAIL, and the error is logged^[001-TODO__code.md].

Sources

^[001-TODO__code.md]