Search parameter deduplication via hashing¶
Search parameter deduplication via hashing is a mechanism used to prevent the redundant processing of identical report generation requests within a specific timeframe.^[001-todo-code.md]
This technique operates by calculating a hash value (specifically MD5) of the request parameters and storing it alongside the record.^[001-todo-code.md] By comparing this hash against existing records, the system can determine if a report with the exact same parameters is already being processed or has been recently completed, thereby avoiding duplicate work and optimizing resource usage.^[001-todo-code.md]
Implementation¶
The implementation is primarily found within the ReportDomainService and ReportDownloadRecordService.^[001-todo-code.md]
When a new report request is received, the system serializes the search parameters into a JSON string using an ObjectMapper.^[001-todo-code.md] This serialized string is then hashed using the MD5 algorithm via DigestUtils.md5Hex().^[001-todo-code.md] The resulting searchParamHash is stored in the report_download_record table as a VARCHAR(32) field.^[001-todo-code.md]
Duplicate Check Logic¶
To identify a duplicate request, the system executes a query that checks for the existence of a record matching three criteria:
1. Creator ID: The user initiating the request.
2. Search Param Hash: The calculated MD5 hash of the parameters.
3. Status and Time: The record status must be RUNNING or SUCCESS, and the create_time must be within a recent window (e.g., the last 5 minutes).^[001-todo-code.md]
If a matching record is found, the system prevents the creation of a new task, often throwing a BusinessException indicating that data is already being created.^[001-todo-code.md] This logic is encapsulated in the duplicateQry method.^[001-todo-code.md]
Database Optimization¶
To support efficient deduplication queries, the database table utilizes an index on the creator_id and search_param_hash columns (index_c_s).^[001-todo-code.md] This allows the database to quickly filter and check for duplicates without performing a full table scan.
Related Concepts¶
- [[Caching]]
- [[Hash functions]]
- [[Idempotency]]
Sources¶
001-todo-code.md