Guava BloomFilter implementation¶
The Guava BloomFilter is a practical implementation of the Bloom filter probabilistic data structure, provided by the Google Guava library.^[600-developer-principle-bloomfilter.md]
Basic Usage¶
To utilize a BloomFilter in Guava, one must import the necessary classes and define a Funnel (which describes how to map an object to primitives) and a Charset.^[600-developer-principle-bloomfilter.md] The filter is instantiated using the create method, which typically accepts the expected number of insertions (n) and the desired false positive probability (p).^[600-developer-principle-bloomfilter.md]
Elements are added to the set using the put method.^[600-developer-principle-bloomfilter.md]
import java.nio.charset.Charset;
import com.google.common.hash.BloomFilter;
import com.google.common.hash.Funnels;
// Create a filter expecting 1000 insertions with a low [False Positive Rate](<./false-positive-rate.md>)
BloomFilter<String> filter = BloomFilter.create(
Funnels.stringFunnel(Charset.forName("utf-8")),
1000,
0.000001
);
filter.put("123");
Testing Membership¶
To check if an element might exist in the filter, the mightContain method is used.^[600-developer-principle-bloomfilter.md] This method returns true if the element is definitely in the set or is a false positive, and false only if the element is definitely not in the set.
boolean exists = filter.mightContain("123");
Combining Filters¶
The implementation supports merging two filters together using the putAll method.^[600-developer-principle-bloomfilter.md] This is useful for distributed scenarios or batch processing where separate filter instances need to be consolidated.
BloomFilter<String> filterA = ...;
BloomFilter<String> filterB = ...;
filterA.putAll(filterB); // Absorbs filterB into filterA
Use Case: Cache penetration¶
A primary application of this implementation is mitigating cache penetration.^[600-developer-principle-bloomfilter.md] By placing a Bloom filter in front of a storage layer (like a database), the system can quickly determine if a key exists. If the filter indicates the key is not present, the database access is skipped entirely, preventing the load from overwhelming the database when querying for non-existent data.
Related Concepts¶
- [[Guava]]
- Cache Penetration
- [[Hash Funnel]]