Skip to content

Search parameter deduplication via hashing

Search parameter deduplication via hashing is a mechanism used to prevent the redundant processing of identical report generation requests within a specific timeframe.^[001-todo-code.md]

This technique operates by calculating a hash value (specifically MD5) of the request parameters and storing it alongside the record.^[001-todo-code.md] By comparing this hash against existing records, the system can determine if a report with the exact same parameters is already being processed or has been recently completed, thereby avoiding duplicate work and optimizing resource usage.^[001-todo-code.md]

Implementation

The implementation is primarily found within the ReportDomainService and ReportDownloadRecordService.^[001-todo-code.md]

When a new report request is received, the system serializes the search parameters into a JSON string using an ObjectMapper.^[001-todo-code.md] This serialized string is then hashed using the MD5 algorithm via DigestUtils.md5Hex().^[001-todo-code.md] The resulting searchParamHash is stored in the report_download_record table as a VARCHAR(32) field.^[001-todo-code.md]

Duplicate Check Logic

To identify a duplicate request, the system executes a query that checks for the existence of a record matching three criteria: 1. Creator ID: The user initiating the request. 2. Search Param Hash: The calculated MD5 hash of the parameters. 3. Status and Time: The record status must be RUNNING or SUCCESS, and the create_time must be within a recent window (e.g., the last 5 minutes).^[001-todo-code.md]

If a matching record is found, the system prevents the creation of a new task, often throwing a BusinessException indicating that data is already being created.^[001-todo-code.md] This logic is encapsulated in the duplicateQry method.^[001-todo-code.md]

Database Optimization

To support efficient deduplication queries, the database table utilizes an index on the creator_id and search_param_hash columns (index_c_s).^[001-todo-code.md] This allows the database to quickly filter and check for duplicates without performing a full table scan.

  • [[Caching]]
  • [[Hash functions]]
  • [[Idempotency]]

Sources

  • 001-todo-code.md