Module: Pearson

Defined in:
lib/pearson.rb

Defined Under Namespace

Classes: EntityNotFound

Class Method Summary collapse

Class Method Details

.closest_entities(scores, entity, opts = {}) ⇒ Array

Returns the closest entities from a given entity. The distance between entities is based on the Pearson correlation coefficient

Parameters:

  • Hash (Hash)

    containing entity-item scores

  • Entity (String)
  • Options (Hash)

    (limit)

Returns:

  • (Array)

    Top matches



44
45
46
47
48
# File 'lib/pearson.rb', line 44

def closest_entities(scores, entity, opts={})
  sort_desc(scores, opts) do |h, k, v|
    entity == k ? h : h.merge(k => coefficient(scores, entity, k))
  end 
end

.coefficient(scores, entity1, entity2) ⇒ Float

Calculates the pearson correlation coefficient between two entities

Parameters:

  • Hash (Hash)

    containing entity-item scores

  • First (String)

    entity

  • Second (String)

    entity

Returns:

  • (Float)

    Coefficient



11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
# File 'lib/pearson.rb', line 11

def coefficient(scores, entity1, entity2)
  shared_items = shared_items(scores, entity1, entity2)

  n = shared_items.length

  return 0 if n == 0

  sum1 = sum2 = sum1_sq = sum2_sq = psum = 0

  shared_items.each_key do |item|
    sum1 += scores[entity1][item]
    sum2 += scores[entity2][item]

    sum1_sq += scores[entity1][item]**2
    sum2_sq += scores[entity2][item]**2

    psum += scores[entity1][item]*scores[entity2][item]
  end

  num = psum - (sum1*sum2/n)
  den = ((sum1_sq - (sum1**2)/n) * (sum2_sq - (sum2**2)/n)) ** 0.5

  den == 0 ? 0 : num/den
end

.recommendations(scores, entity, opts = {}) ⇒ Array

Returns the best recommended items for a given entity

Parameters:

  • Hash (Hash)

    containing entity-item scores

  • Entity (String)
  • Options (Hash)

    (limit)

Returns:

  • (Array)

    Top matches [item, score]



57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
# File 'lib/pearson.rb', line 57

def recommendations(scores, entity, opts={})
  totals = {}
  similarity_sums = {}

  totals.default = 0
  similarity_sums.default = 0

  fail EntityNotFound unless scores[entity]

  scores.each do |other_entity|
    next if other_entity.first == entity

    similarity = coefficient(scores, entity, other_entity.first)

    next if similarity <= 0

    scores[other_entity.first].each do |item, score|
      if !scores[entity].keys.include?(item) || scores[entity][item] == 0
        totals[item] += score * similarity
        similarity_sums[item] += similarity
      end
    end
  end

  sort_desc(totals, opts) {|h, k, v| h.merge(k => v/similarity_sums[k]) }
end