Class: Tetra::VersionMatcher

Inherits:
Object
  • Object
show all
Includes:
Logging
Defined in:
lib/tetra/version_matcher.rb

Overview

heuristically matches version strings

Instance Method Summary collapse

Methods included from Logging

#log

Instance Method Details

#best_match(my_version, their_versions) ⇒ Object

returns the “best match” between a version number and a set of available version numbers using a heuristic criterion. Idea:

- split the version number in chunks divided by ., - etc.
- every chunk with same index is "compared", differences make up a score
- "comparison" is a subtraction if the chunk is an integer, a string distance measure otherwise
- score weighs differently on chunk index (first chunks are most important)
- lowest score wins


28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
# File 'lib/tetra/version_matcher.rb', line 28

def best_match(my_version, their_versions)
  log.debug("version comparison: #{my_version} vs #{their_versions.join(', ')}")

  my_chunks = my_version.split(/[\.\-\_ ~,]/)
  their_chunks_hash = Hash[
                      their_versions.map do |their_version|
                        their_chunks_for_version = (
                          if !their_version.nil?
                            their_version.split(/[\.\-\_ ~,]/)
                          else
                            []
                          end
                        )
                        chunks_count = [my_chunks.length - their_chunks_for_version.length, 0].max
                        their_chunks_for_version += [nil] * chunks_count
                        [their_version, their_chunks_for_version]
                      end
  ]

  max_chunks_length = ([my_chunks.length] + their_chunks_hash.values.map(&:length)).max

  scoreboard = []
  their_versions.each do |their_version|
    their_chunks = their_chunks_hash[their_version]
    score = 0
    their_chunks.each_with_index do |their_chunk, i|
      score_multiplier = 100**(max_chunks_length - i - 1)
      my_chunk = my_chunks[i]
      score += chunk_distance(my_chunk, their_chunk) * score_multiplier
    end
    scoreboard << { version: their_version, score: score }
  end

  scoreboard = scoreboard.sort_by { |element| element[:score] }

  log.debug("scoreboard: ")
  scoreboard.each_with_index do |element, i|
    log.debug("  #{i + 1}. #{element[:version]} (score: #{element[:score]})")
  end

  return scoreboard.first[:version] unless scoreboard.first.nil?
end

#chunk_distance(my_chunk, their_chunk) ⇒ Object

returns a score representing the distance between two version chunks for integers, the score is the difference between their values for strings, the score is the Levenshtein distance in any case score is normalized between 0 (identical) and 99 (very different/uncomparable)



75
76
77
78
79
80
81
82
83
84
# File 'lib/tetra/version_matcher.rb', line 75

def chunk_distance(my_chunk, their_chunk)
  my_chunk = "0" if my_chunk.nil?
  their_chunk = "0" if their_chunk.nil?

  if i?(my_chunk) && i?(their_chunk)
    return [(my_chunk.to_i - their_chunk.to_i).abs, 99].min
  else
    return [Text::Levenshtein.distance(my_chunk.upcase, their_chunk.upcase), 99].min
  end
end

#i?(string) ⇒ Boolean

true for integer strings

Returns:

  • (Boolean)


87
88
89
# File 'lib/tetra/version_matcher.rb', line 87

def i?(string)
  string =~ /^[0-9]+$/
end

#split_version(full_name) ⇒ Object

heuristically splits a full name into an artifact name and version string assumes that version strings begin with a numeric character and are separated by a ., -, _, ~ or space returns a [name, version] pair



12
13
14
15
16
17
18
19
# File 'lib/tetra/version_matcher.rb', line 12

def split_version(full_name)
  matches = full_name.match(/(.*?)(?:[\.\-\_ ~,]?([0-9].*))?$/)
  if !matches.nil? && matches.length > 1
    [matches[1], matches[2]]
  else
    [full_string, nil]
  end
end