Module: Pearson
- Defined in:
- lib/pearson.rb
Defined Under Namespace
Classes: EntityNotFound
Class Method Summary collapse
-
.closest_entities(scores, entity, opts = {}) ⇒ Array
Returns the closest entities from a given entity.
-
.coefficient(scores, entity1, entity2) ⇒ Float
Calculates the pearson correlation coefficient between two entities.
-
.recommendations(scores, entity, opts = {}) ⇒ Array
Returns the best recommended items for a given entity.
Class Method Details
.closest_entities(scores, entity, opts = {}) ⇒ Array
Returns the closest entities from a given entity. The distance between entities is based on the Pearson correlation coefficient
44 45 46 47 48 |
# File 'lib/pearson.rb', line 44 def closest_entities(scores, entity, opts={}) sort_desc(scores, opts) do |h, k, v| entity == k ? h : h.merge(k => coefficient(scores, entity, k)) end end |
.coefficient(scores, entity1, entity2) ⇒ Float
Calculates the pearson correlation coefficient between two entities
11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 |
# File 'lib/pearson.rb', line 11 def coefficient(scores, entity1, entity2) shared_items = shared_items(scores, entity1, entity2) n = shared_items.length return 0 if n == 0 sum1 = sum2 = sum1_sq = sum2_sq = psum = 0 shared_items.each_key do |item| sum1 += scores[entity1][item] sum2 += scores[entity2][item] sum1_sq += scores[entity1][item]**2 sum2_sq += scores[entity2][item]**2 psum += scores[entity1][item]*scores[entity2][item] end num = psum - (sum1*sum2/n) den = ((sum1_sq - (sum1**2)/n) * (sum2_sq - (sum2**2)/n)) ** 0.5 den == 0 ? 0 : num/den end |
.recommendations(scores, entity, opts = {}) ⇒ Array
Returns the best recommended items for a given entity
57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 |
# File 'lib/pearson.rb', line 57 def recommendations(scores, entity, opts={}) totals = {} similarity_sums = {} totals.default = 0 similarity_sums.default = 0 fail EntityNotFound unless scores[entity] scores.each do |other_entity| next if other_entity.first == entity similarity = coefficient(scores, entity, other_entity.first) next if similarity <= 0 scores[other_entity.first].each do |item, score| if !scores[entity].keys.include?(item) || scores[entity][item] == 0 totals[item] += score * similarity similarity_sums[item] += similarity end end end sort_desc(totals, opts) {|h, k, v| h.merge(k => v/similarity_sums[k]) } end |