Asynchronous Report Download Pattern¶
The Asynchronous Report Download Pattern is a backend architectural design used to handle long-running data export and report generation tasks. Instead of a blocking HTTP request, the system returns a record identifier immediately, processes the report in the background, and updates a record status when the file is ready for download.^[001-todo.md]
Workflow¶
The standard implementation flow typically involves the following stages:
- Request Initiation: The user queries a report via the API Gateway.^[001-todo.md]
- Record Creation: The system returns a record ID (e.g.,
FileDownloadRecordEntity.id) immediately, while the actual processing begins asynchronously.^[001-todo.md] - Background Processing: Data is queried, possibly joined from multiple projects, and transformed.^[001-todo.md]
- Storage: The generated file (e.g., CSV) is uploaded to cloud storage (e.g., GCP Cloud Storage Bucket).^[001-todo.md]
- Notification: The status of the download record is updated, and a download URL is notified to the user or made available in the report center.^[001-todo.md]
Data Model¶
The core of this pattern relies on a persistence entity to track the lifecycle of the download. A typical database schema, such as FileDownloadRecordEntity, includes the following fields^[001-todo.md]:
id: Unique identifier for the download record.- Report Source: The origin or type of report.
- Report Enum: The specific report enumeration or type.
- Data Retrieval Key: A key (often an MD5 hash of the query conditions) representing the search criteria.^[001-todo.md]
- File Address: The URL or path to the file in cloud storage.
- Department ID / Administrator ID: Ownership and scope metadata.
- Status: The current state of the job (e.g., Pending, Processing, Completed, Failed).
- Timestamps:
Created TimeandCompleted Time.
Processing Strategy¶
For complex reports involving data from multiple sources or heavy computations, a distributed task queue (such as RabbitMQ) and a cache (Redis) are often utilized to manage the workflow^[001-todo.md].
Multi-Stage Queues¶
The workload is often broken down into sequential message queues (MQ) to handle dependencies and error recovery^[001-todo.md]:
- MQ1 (Coordinator): Listens for the initial record ID. It checks if all prerequisite data is ready by checking a counter in Redis (e.g.,
queryDoneCount). If the count indicates completion (e.g.,>= 2), it proceeds to merge data or convert to CSV.^[001-todo.md] - MQ2 (Fanout/Data Gathering): Receives the record ID and fans out tasks to specific listeners. Each listener queries a specific database table (e.g.,
plt_user,plt_fund) and writes the partial results to Redis.^[001-todo.md]
State Management via Redis¶
Redis is used as a temporary store for intermediate data and state tracking with a Time-To-Live (TTL) to expire stale data^[001-todo.md]:
- Key Structure: Typically formatted as
FileDownloadRecordEntity-{id}. - Payload: Contains JSON data chunks for specific tables and a
queryDoneCount. - Atomic Increment: Listeners increment
queryDoneCountusingHINCRBYto signal completion of their sub-task.^[001-todo.md]
Error Handling¶
The pattern includes a retry mechanism (e.g., processMessageOrRequeue) with a configured limit (e.g., 6 retries) and delay (e.g., 1 minute).^[001-todo.md] If retries are exhausted, the FileDownloadRecordEntity.status is updated to reflect failure, ensuring the system does not hang indefinitely.
Sources¶
^[001-todo.md]
Related Concepts¶
- [[AsyncAPI]]
- Message Queue
- [[Background Jobs]]