Skip to content

Asynchronous Report Download Pattern

The Asynchronous Report Download Pattern is a backend architectural design used to handle long-running data export and report generation tasks. Instead of a blocking HTTP request, the system returns a record identifier immediately, processes the report in the background, and updates a record status when the file is ready for download.^[001-todo.md]

Workflow

The standard implementation flow typically involves the following stages:

  1. Request Initiation: The user queries a report via the API Gateway.^[001-todo.md]
  2. Record Creation: The system returns a record ID (e.g., FileDownloadRecordEntity.id) immediately, while the actual processing begins asynchronously.^[001-todo.md]
  3. Background Processing: Data is queried, possibly joined from multiple projects, and transformed.^[001-todo.md]
  4. Storage: The generated file (e.g., CSV) is uploaded to cloud storage (e.g., GCP Cloud Storage Bucket).^[001-todo.md]
  5. Notification: The status of the download record is updated, and a download URL is notified to the user or made available in the report center.^[001-todo.md]

Data Model

The core of this pattern relies on a persistence entity to track the lifecycle of the download. A typical database schema, such as FileDownloadRecordEntity, includes the following fields^[001-todo.md]:

  • id: Unique identifier for the download record.
  • Report Source: The origin or type of report.
  • Report Enum: The specific report enumeration or type.
  • Data Retrieval Key: A key (often an MD5 hash of the query conditions) representing the search criteria.^[001-todo.md]
  • File Address: The URL or path to the file in cloud storage.
  • Department ID / Administrator ID: Ownership and scope metadata.
  • Status: The current state of the job (e.g., Pending, Processing, Completed, Failed).
  • Timestamps: Created Time and Completed Time.

Processing Strategy

For complex reports involving data from multiple sources or heavy computations, a distributed task queue (such as RabbitMQ) and a cache (Redis) are often utilized to manage the workflow^[001-todo.md].

Multi-Stage Queues

The workload is often broken down into sequential message queues (MQ) to handle dependencies and error recovery^[001-todo.md]:

  • MQ1 (Coordinator): Listens for the initial record ID. It checks if all prerequisite data is ready by checking a counter in Redis (e.g., queryDoneCount). If the count indicates completion (e.g., >= 2), it proceeds to merge data or convert to CSV.^[001-todo.md]
  • MQ2 (Fanout/Data Gathering): Receives the record ID and fans out tasks to specific listeners. Each listener queries a specific database table (e.g., plt_user, plt_fund) and writes the partial results to Redis.^[001-todo.md]

State Management via Redis

Redis is used as a temporary store for intermediate data and state tracking with a Time-To-Live (TTL) to expire stale data^[001-todo.md]:

  • Key Structure: Typically formatted as FileDownloadRecordEntity-{id}.
  • Payload: Contains JSON data chunks for specific tables and a queryDoneCount.
  • Atomic Increment: Listeners increment queryDoneCount using HINCRBY to signal completion of their sub-task.^[001-todo.md]

Error Handling

The pattern includes a retry mechanism (e.g., processMessageOrRequeue) with a configured limit (e.g., 6 retries) and delay (e.g., 1 minute).^[001-todo.md] If retries are exhausted, the FileDownloadRecordEntity.status is updated to reflect failure, ensuring the system does not hang indefinitely.

Sources

^[001-todo.md]