bloom_filter
A simple BloomFilter implementation, usable in-process or as an EventMachine daemon.
If you don't know what a bloom filter is, you should read up on it: http://en.wikipedia.org/wiki/Bloom_filter
Usage
You can use it as an in-process data structure:
bloom_filter = BloomFilter.new(100, 3) # 100 = bits, 3 = hash functions
bloom_filter.add("hello")
bloom_filter.include?("hello") #=> true
or you can use it as a service:
bloom_filter = BloomFilter::Client.new("localhost", 4111)
bloom_filter.add("hello") # weeeee bits flying over network IO
bloom_filter.include?("hello") #=> true
To run it as a service, run:
bloom_filter_server -i localhost:4111 -n 1000000 -p 0.05
# -i is interface
# -n is estimated number of elements
# -p is desired false positive probability
You can also make your in-process bloom filter by specifying your estimated number of elements and false positive probability
BloomFilter.new(*BloomFilter.optimal_values(1000000, 0.05))
Saving your bloom filter
You can dump/load your bloom filter:
In process:
dumped = bloom_filter.dump
new_bloom_filter = BloomFilter.load(dumped)
Remote service:
bloom_filter.dump(path_to_file)
bloom_filter.load(path_to_file)
TODO
- Better documentation
- CLI errors
- CLI help command
- Improve load/dump workflow
- periodic dumps
bloom filters are awesome btw.