WeightedSampler
Main repository is https://gitlab.com/oleksiy/weighted_sampler
Weighted Sampler helps you to pick a random samples from a collection with defined probabilities or weights
Installation
Add this line to your application's Gemfile:
gem 'weighted_sampler'
And then execute:
$ bundle
Or install it yourself as:
$ gem install weighted_sampler
Usage
Module or sampler instance modes available
Module
>> WeightedSampler.sample([P0, P1, ...])
=> INDEX
>> WeightedSampler.sample({K0 => P0, K1 => P1, ...})
=> Ki
Input as an Array
You can provide Array
of probabilities in a form of weights for each option.
Equal probabilities: [50, 50]
or [1, 1]
or [0.5, 0.5]
Different probabilities: [99, 1]
, or [0.001, 0.1]
(index 1 is 100x times more likely to be chosen than 0)
If your input probabilies are not normalized WeightedSampler
will do it for you
OUTPUT
will be an index of selected value, so that you can match it to your more complex data structure
Input as an Hash
To simplify dome workflows you can provide Hash
structure in a way
{ K0 => P0, ...}
{ a: 1, b: 1, c: 2} # c has 0.5, a and b - 0.25
{ 0 => 50, 150 => 1} # 105 key is 50 times less probable to be picked
where values
are probabilities with requirements similar to Array
approach
OUTPUT
in this case will be picked key
Class (::Base
)
Class is the recommended way to use of sampler becuase it's performance is ~10x better than Module
You need to initialize sampler:
sampler = WeightedSampler::Base.new([P0, P1, ...])
# OR
sampler = WeightedSampler::Base.new({K0 => P0, K1 => P1, ...})
after that you can get samples via
sampler.sample # => index (for Array) or key (for Hash)
Input parameter to initialization of an instance are similar to Module use case.
Plus, you can you seed
option for repeatable results
Options
skip_normalization
You do not have to normalize input probabilities
But for some reason you may want to normalize yourself, for this
you have an option skip_normalization
WeightedSampler.sample([...], skip_normalization: true)
WeightedSampler.sample({...}, skip_normalization: true)
if we will not be able to sum provided probabilities into 1
you'll get RuntimeError
exception with some information about this
seed
(¡ Class use case only !)
If you need to get repeatable sequence of samples you can initialize sampler with seed Integer (similar to ruby`s Random
WeightedSampler::Base.new([...], seed: SEED)
WeightedSampler::Base.new([...], seed: SEED)
Please, note that if seed
is not provided, sampler will use generic rand
functionality without any seed initialization
Performance
Once initialized, solution complexity is O(n)
. Array#index
is used to find a rand
match to intervals (see Math section below).
Math
Consider normalized probabilities P0, P1, .., Pn, 𝚺Pi = 1, 0 <= i <= n
To select random index i
with these probabilities we convert them into half-open intervals [0, P0)
, [P0, P1)
, [P0+P1, P2)
... [𝚺Pi, 1), 0 <= i < n
and place on a half-open interval [0, 1)
[<- P1 ->)[<- P2 ->) ... [<- Pn ->)
[--------------------------------------------------)
0 ^- rand (sample pick) 1
As you can see any random value from [0, 1)
will hit into one of the half-open intervals with probability equal to the "length" of the interval
rand
(seeded or not) will does exactly what is needed and returns a value in the same half-open interval [0, 1)
NOTE about Terminilogy and beauty in Ruby: in ruby Range class with ...
definition is equivalent to half-open interval in the end (i.e. [0...1]
in Ruby is [0, 1)
in math
NOTE about precision and computer algebra ugliness: in programming we cannot be sure that 𝚺Pi
will end up exactly equivalent to 1. And I do not like idea to use Ruby's Rational due to performance implications and not much benefitss here
That's why ERROR_ALLOWANCE = 10**-8
is accepted in normalization logic in WeightedSampler
Development
After checking out the repo, run bin/setup
to install dependencies. Then, run rspec
to run the tests. You can also run bin/console
for an interactive prompt that will allow you to experiment
To install this gem onto your local machine, run bundle exec rake install
. To release a new version, update the version number in version.rb
, and then run bundle exec rake release
, which will create a git tag for the version, push git commits and tags, and push the .gem
file to rubygems.org
Contributing
Bug reports and pull requests are welcome on GitHub at https://gitlab.com/[USERNAME]/weighted_sampler. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the Contributor Covenant code of conduct
License
The gem is available as open source under the terms of the MIT License
Code of Conduct
Everyone interacting in the WeightedSampler project’s codebases, issue trackers, chat rooms and mailing lists is expected to follow the code of conduct