bloom_filter

A simple BloomFilter implementation, usable in-process or as an EventMachine daemon.

If you don't know what a bloom filter is, you should read up on it: http://en.wikipedia.org/wiki/Bloom_filter

Usage

You can use it as an in-process data structure:

bloom_filter = BloomFilter.new(100, 3) # 100 = bits, 3 = hash functions
bloom_filter.add("hello")
bloom_filter.include?("hello") #=> true

or you can use it as a service:

bloom_filter = BloomFilter::Client.new("localhost", 4111)
bloom_filter.add("hello") # weeeee bits flying over network IO
bloom_filter.include?("hello") #=> true

To run it as a service, run:

bloom_filter_server -i localhost:4111 -n 1000000 -p 0.05
# -i is interface
# -n is estimated number of elements
# -p is desired false positive probability

You can also make your in-process bloom filter by specifying your estimated number of elements and false positive probability

BloomFilter.new(*BloomFilter.optimal_values(1000000, 0.05))

Saving your bloom filter

You can dump/load your bloom filter:

In process:

dumped = bloom_filter.dump
new_bloom_filter = BloomFilter.load(dumped)

Remote service:

bloom_filter.dump(path_to_file)
bloom_filter.load(path_to_file)

TODO

  • Better documentation
  • CLI errors
  • CLI help command
  • Improve load/dump workflow
    • periodic dumps

bloom filters are awesome btw.