Pagination-based Report Document Assembly¶
Pagination-based Report Document Assembly is a mechanism for generating large-scale reports by processing data in discrete chunks or pages, merging them incrementally, and managing the final document through a Message Queue.^[001-TODO__code.md]
System Workflow¶
The assembly process follows a distributed, asynchronous workflow designed to handle potentially massive data volumes without blocking system resources or causing timeouts^[001-TODO__code.md].
- Report Initialization: A user requests a report, creating a record in a
RUNNINGstate^[001-TODO__code.md]. - Paginated Data Processing: An external data service queries data page-by-page^[001-TODO__code.md]. For each page, it sends a request (via
PUT /v1/report) containing the current page number and total page count^[001-TODO__code.md]. - Document Chunk Creation: Upon receiving a page, the system generates a document segment (e.g., CSV or PDF) for that specific page and stores it in a temporary directory (e.g.,
{id}-temp/)^[001-TODO__code.md]. - Completion Check: The system verifies if the number of generated files in the temporary directory matches the expected total page count^[001-TODO__code.md].
- Merge Trigger: Once all pages are received, a merge task (
ReportDocumentCombineVo) is dispatched to a Message Queue (COMBINE_FILE_Q)^[001-TODO__code.md]. - Final Assembly: A consumer listens to the queue, retrieves all temporary file segments, combines them into a single document, saves the final file, updates the record status to
SUCCESS, and cleans up temporary files^[001-TODO__code.md].
Data Model¶
The state of a report is managed via the ReportDownloadRecordEntity, typically stored in a database^[001-TODO__code.md].
Key fields include:
* Status: Tracks the lifecycle (e.g., RUNNING, SUCCESS, FAIL)^[001-TODO__code.md].
* Search Param Hash: An MD5 hash of the query parameters to prevent duplicate requests within a specific time window^[001-TODO__code.md].
* File Path: The storage location of the final, assembled document^[001-TODO__code.md].
* Error Message: Stores failure details if the generation or assembly process crashes^[001-TODO__code.md].
The data transfer object ReportUpdateDto is used to transmit page data, including metadata like currentPage and totalPage, along with the row data^[001-TODO__code.md].
Implementation Details¶
The system uses an interface ReportDocumentService with implementations for different file types (e.g., CSVDocumentServiceImpl, PDFDocumentServiceImpl)^[001-TODO__code.md].
- CSV Strategy: When merging CSVs, the system typically preserves the header from the first chunk and removes headers from subsequent chunks before concatenating the byte arrays^[001-TODO__code.md].
- File Management: A file management service (e.g.,
GcsFileManageServiceImplfor Google Cloud Storage) handles the creation, listing, and deletion of both temporary and final files^[001-TODO__code.md]. - Failure Handling: If the merge process fails, the record status is updated to
FAIL, and the error is logged^[001-TODO__code.md].
Sources¶
^[001-TODO__code.md]