Asynchronous Report Download Pattern¶

The Asynchronous Report Download Pattern is a backend architectural design used to handle long-running data export and report generation tasks. Instead of a blocking HTTP request, the system returns a record identifier immediately, processes the report in the background, and updates a record status when the file is ready for download.^[001-todo.md]

Workflow¶

The standard implementation flow typically involves the following stages:

Request Initiation: The user queries a report via the API Gateway.^[001-todo.md]
Record Creation: The system returns a record ID (e.g., FileDownloadRecordEntity.id) immediately, while the actual processing begins asynchronously.^[001-todo.md]
Background Processing: Data is queried, possibly joined from multiple projects, and transformed.^[001-todo.md]
Storage: The generated file (e.g., CSV) is uploaded to cloud storage (e.g., GCP Cloud Storage Bucket).^[001-todo.md]
Notification: The status of the download record is updated, and a download URL is notified to the user or made available in the report center.^[001-todo.md]

Data Model¶

The core of this pattern relies on a persistence entity to track the lifecycle of the download. A typical database schema, such as FileDownloadRecordEntity, includes the following fields^[001-todo.md]:

id: Unique identifier for the download record.
Report Source: The origin or type of report.
Report Enum: The specific report enumeration or type.
Data Retrieval Key: A key (often an MD5 hash of the query conditions) representing the search criteria.^[001-todo.md]
File Address: The URL or path to the file in cloud storage.
Department ID / Administrator ID: Ownership and scope metadata.
Status: The current state of the job (e.g., Pending, Processing, Completed, Failed).
Timestamps: Created Time and Completed Time.

Processing Strategy¶

For complex reports involving data from multiple sources or heavy computations, a distributed task queue (such as RabbitMQ) and a cache (Redis) are often utilized to manage the workflow^[001-todo.md].

Multi-Stage Queues¶

The workload is often broken down into sequential message queues (MQ) to handle dependencies and error recovery^[001-todo.md]:

MQ1 (Coordinator): Listens for the initial record ID. It checks if all prerequisite data is ready by checking a counter in Redis (e.g., queryDoneCount). If the count indicates completion (e.g., >= 2), it proceeds to merge data or convert to CSV.^[001-todo.md]
MQ2 (Fanout/Data Gathering): Receives the record ID and fans out tasks to specific listeners. Each listener queries a specific database table (e.g., plt_user, plt_fund) and writes the partial results to Redis.^[001-todo.md]

State Management via Redis¶

Redis is used as a temporary store for intermediate data and state tracking with a Time-To-Live (TTL) to expire stale data^[001-todo.md]:

Key Structure: Typically formatted as FileDownloadRecordEntity-{id}.
Payload: Contains JSON data chunks for specific tables and a queryDoneCount.
Atomic Increment: Listeners increment queryDoneCount using HINCRBY to signal completion of their sub-task.^[001-todo.md]

Error Handling¶

The pattern includes a retry mechanism (e.g., processMessageOrRequeue) with a configured limit (e.g., 6 retries) and delay (e.g., 1 minute).^[001-todo.md] If retries are exhausted, the FileDownloadRecordEntity.status is updated to reflect failure, ensuring the system does not hang indefinitely.

Sources¶

^[001-todo.md]

[[AsyncAPI]]
Message Queue
[[Background Jobs]]