Skip to content

Asynchronous Report Download System

The Asynchronous Report Download System is a backend architecture designed to handle large-scale data reporting and file generation without blocking user operations. It relies on a decoupled workflow where report generation is processed in the background, utilizing message queues and temporary storage to manage multi-project data joins and eventual retrieval^[001-TODO__報表下載.md].

System Workflow

The request lifecycle begins when a user queries for a report^[001-TODO__報表下載.md]. Instead of returning the file immediately, the Gateway returns a record identifier, initiating an asynchronous process^[001-TODO__報表下載.md]. The system records the specific query conditions—hashed as an MD5 key—to retrieve the data later^[001-TODO__報表下載.md]. Once the data is processed, the resulting file is uploaded to a GCP Cloud Storage bucket, at which point the status in the Report Download Center is updated to notify the user^[001-TODO__報表下載.md].

Data Persistence

The state of each report job is managed using a database entity, specifically FileDownloadRecordEntity^[001-TODO__報表下載.md]. This entity tracks critical information throughout the job lifecycle, including:

  • Report Source & Enumeration: Identifying the origin and type of report^[001-TODO__報表下載.md].
  • Query Key: The MD5 hash representing the data retrieval conditions^[001-TODO__報表下載.md].
  • Status & Timestamps: Tracking the current state (e.g., processing, completed) alongside creation and completion times^[001-TODO__報表下載.md].
  • Location: The file address in storage^[001-TODO__報表下載.md].
  • Context: Department and Administrator IDs for access control^[001-TODO__報表下載.md].

Processing Pipeline

To handle complex reporting requirements, such as joining data from multiple projects, the system employs a multi-stage Message Queue (MQ) and Redis architecture^[001-TODO__報表下載.md].

1. Distributed Data Collection

The system uses a FanoutExchange pattern (labeled mq2 in the source) to distribute the workload^[001-TODO__報表下載.md]. Listeners subscribe to specific data queries (e.g., plt_user.vs_user_tag_relation or plt_fund.vs_withdraw) and write the resulting data into a shared temporary store^[001-TODO__報表下載.md].

2. Intermediate Storage

A Redis Hash structure acts as a temporary data cache with a Time-To-Live (TTL) of 10 minutes^[001-TODO__報表下載.md]. The key is typically derived from the FileDownloadRecordEntity ID^[001-TODO__報表下載.md]. This storage holds the intermediate JSON data and a queryDoneCount counter, which is incremented (hincrby) whenever a listener finishes its sub-query task^[001-TODO__報表下載.md].

3. Aggregation and Requeue Logic

A primary processor (labeled mq1 or mq3) monitors the Redis queryDoneCount^[001-TODO__報表下載.md]. * Success Condition: If the count reaches the required threshold (indicating all listeners have finished), the system aggregates the data, converts it to CSV, and uploads it to GCP^[001-TODO__報表下載.md]. * Failure Handling: If the condition is not met, a processMessageOrRequeue mechanism attempts to retry the task^[001-TODO__報表下載.md]. The system typically allows up to 6 retries with a 1-minute delay between attempts^[001-TODO__報表下載.md]. * Final Failure: If all retries are exhausted, the FileDownloadRecordEntity status is updated to reflect the failure^[001-TODO__報表下載.md].

Sources

^[001-TODO__報表下載.md]