Skip to content

Asynchronous Report Generation Architecture

The Asynchronous Report Generation Architecture is a system designed to handle long-running data export tasks (such as large CSV or PDF reports) without blocking the primary application thread.^[001-TODO__code_copy.md, 001-todo-code.md] It achieves this by decoupling the request initiation from the actual file generation and data retrieval processes, utilizing a Message Queue (RabbitMQ) for state management and retry logic, and storing generated files in Google Cloud Storage (GCS).^[001-TODO__code_copy.md, 001-todo-code.md]

Core Workflow

The report generation lifecycle follows a specific state transition model managed by a ReportDownloadRecord entity.^[001-TODO__code_copy.md, 001-todo-code.md]

  1. Initiation: A user requests a report via the ReportManageController.^[001-TODO__code_copy.md, 001-todo-code.md] The system creates a database record with a RUNNING status and sends a message (ReportQryVo) to a RabbitMQ topic exchange.^[001-TODO__code_copy.md, 001-todo-code.md]
  2. Processing: An external service (or listener) consumes the query message.^[001-TODO__code_copy.md, 001-todo-code.md] This service fetches the data and sends the results back in chunks (pages) via a PUT request to makeReportDocument.^[001-TODO__code_copy.md, 001-todo-code.md] Each chunk is converted to a byte array (e.g., CSV) and temporarily stored in GCS.^[001-TODO__code_copy.md, 001-todo-code.md]
  3. Completion: Once all pages are received, a ReportDocumentCombineVo message is sent to a combineFileQ queue.^[001-TODO__code_copy.md, 001-todo-code.md] A listener merges the temporary files into a single document and updates the database status to SUCCESS.^[001-TODO__code_copy.md, 001-todo-code.md]
  4. Download: The user polls or triggers a download via the /download endpoint.^[001-TODO__code_copy.md, 001-todo-code.md] If the status is SUCCESS, the file is served from GCS.^[001-TODO__code_copy.md, 001-todo-code.md]

Key Components

Message Queue Topology

The architecture relies on RabbitMQ to manage the asynchronous flow and implement a "Retry with Delay" pattern to prevent blocking on failures.^[001-TODO__code_copy.md, 001-todo-code.md]

  • Topic Exchange (plt.basic.report.topic.ex): Receives the initial query request.^[001-TODO__code_copy.md, 001-todo-code.md]
  • Delay Queue (plt.basic.report.delay.q): A queue with a TTL (Time To Live) that forwards messages to a Dead Letter Exchange after a set delay (5 minutes).^[001-TODO__code_copy.md, 001-todo-code.md] This allows the system to retry processing if the initial attempt times out or fails.^[001-TODO__code_copy.md, 001-todo-code.md]
  • Dead Letter Queue (plt.basic.report.dead.q): If the message expires from the Delay Queue without successful processing, it lands here.^[001-TODO__code_copy.md, 001-todo-code.md] A listener on this queue marks the report status as FAIL in the database.^[001-TODO__code_copy.md, 001-todo-code.md]
  • Combine Queue (plt.basic.report.combine.file.q): Dedicated queue for triggering the final merging of temporary file chunks.^[001-TODO__code_copy.md, 001-todo-code.md]

File Management Strategy

The system uses Google Cloud Storage (GCS) to handle raw file data, structured around specific prefixes and temporary directories.^[001-TODO__code_copy.md, 001-todo-code.md]

  • Temporary Storage: During generation, partial files are stored in a temporary directory (e.g., /doc/report/csv/{id}-temp/).^[001-TODO__code_copy.md, 001-todo-code.md]
  • Final Storage: Once combined, the final document is moved to a permanent path (e.g., /doc/report/csv/{id}/{id}.csv).^[001-TODO__code_copy.md, 001-todo-code.md]
  • Cleanup: After successful merging, the system automatically deletes the temporary directory to save storage costs.^[001-TODO__code_copy.md, 001-todo-code.md]

Database State Machine

The report_download_record table tracks the progress of every report.^[001-TODO__code_copy.md, 001-todo-code.md]

  • RUNNING: The initial state. The system also checks for "duplicate queries" within a 5-minute cache window using searchParamHash to prevent redundant report generation.^[001-TODO__code_copy.md, 001-todo-code.md]
  • SUCCESS: Indicates the file path is populated and ready for download.^[001-TODO__code_copy.md, 001-todo-code.md]
  • FAIL: Indicates an error occurred (timeout or generation failure). The error_message column stores the reason.^[001-TODO__code_copy.md, 001-todo-code.md]

Implementation Details

  • Document Strategy: The system uses a Strategy Pattern for Document Generation via the ReportDocumentService interface.^[001-TODO__code_copy.md, 001-todo-code.md] Concrete implementations like CSVDocumentServiceImpl handle specific format logic (e.g., joining rows with commas, removing headers during chunk merging).^[001-TODO__code_copy.md, 001-todo-code.md]
  • Status Verification: The makeReportDocument endpoint checks isRunningStatus to ensure it only accepts data for records currently in the RUNNING state.^[001-TODO__code_copy.md, 001-todo-code.md]

Sources

  • 001-TODO__code_copy.md
  • 001-todo-code.md