Top Level Namespace

Defined Under Namespace

Modules: Semtools Classes: Ontology

Instance Method Summary collapse

#binom(n, k) ⇒ Object
#complex_text_similitude(textA, textB, splitChar = ";", charsToRemove = "") ⇒ Object

Applies the WhiteSimilarity from ‘text’ package over two given complex texts.
#compute_hyper_prob(a, b, c, d, n) ⇒ Object
#ctext_AtoB(textsA, textsB) ⇒ Object
Applies the WhiteSimilarity from ‘text’ package over two given text sets and returns the similitudes of the each element of the first set over the second set Param: textsA

text set to be compared with textsB textsB

text set to be compared with textsA Returns the maximum similarity percentage between [0,1] for each element of textsA against all elements of textsB.
#cummin(array) ⇒ Object
#get_benjaminiHochberg_pvalues(arr_pvalues) ⇒ Object

to cmpute adjusted pvalues rosettacode.org/wiki/P-value_correction#Ruby.
#get_fisher_exact_test(listA, listB, all_elements_count, tail = 'two_sided', weigths = nil) ⇒ Object

TODO: Make a pull request to rubygems.org/gems/ruby-statistics, with all the statistic code implemented here.
#get_less_tail(listA_listB_count, listA_nolistB_count, nolistA_listB_count, nolistA_nolistB_count, all_elements_count) ⇒ Object
#get_two_tail(listA_listB_count, listA_nolistB_count, nolistA_listB_count, nolistA_nolistB_count, all_elements_count) ⇒ Object
#order(array, decreasing = false) ⇒ Object
#pmin(array) ⇒ Object
#similitude_network(items_array, splitChar = ";", charsToRemove = "", unique = false) ⇒ Object

Applies the WhiteSimilarity from ‘text’ package over all complex text stored into an array.
#text_similitude(textA, textB) ⇒ Object
Applies the WhiteSimilarity from ‘text’ package over two given texts Param: textA

text to be compared with textB textB

text to be compared with textA Returns the similarity percentage between [0,1].

Instance Method Details

#binom(n, k) ⇒ `Object`

# File 'lib/semtools/math_methods.rb', line 90

def binom(n,k)
  if k > 0 && k < n
    res = (1+n-k..n).inject(:*)/(1..k).inject(:*)
  else
    res = 1
  end
end

#complex_text_similitude(textA, textB, splitChar = ";", charsToRemove = "") ⇒ `Object`

Applies the WhiteSimilarity from ‘text’ package over two given complex texts. Complex texts will be splitted and compared one by one from A to B and B to A Param:

textA: text to be compared with textB
textB: text to be compared with textA
splitChar: char to split text* complex names
charsToRemove: char (or chars set) to be removed from text to be compared

Returns the similarity percentage between [0,1] obtained by bidirectional all Vs all similarity

# File 'lib/semtools/sim_handler.rb', line 61

def complex_text_similitude(textA, textB, splitChar = ";", charsToRemove = "")
  # Check special cases
  return -1.0 if (textA.nil?) | (textB.nil?)
  return -1.0 if (!textA.is_a? String) | (!textB.is_a? String)
  return -1.0 if (textA.length <= 0) | (textB.length <= 0)
  # Split&Clean both sets
  textA_splitted = textA.split(splitChar)
  textB_splitted = textB.split(splitChar)
  if !charsToRemove.empty?
    textA_splitted.map! {|str| str.gsub(/[#{charsToRemove}]/,'')}
    textA_splitted.select! {|str| str.length > 0}
    textB_splitted.map! {|str| str.gsub(/[#{charsToRemove}]/,'')}
    textB_splitted.select! {|str| str.length > 0}
  end
  # Per each X elemnt, compare against all Y elements
  similitudesA = ctext_AtoB(textA_splitted, textB_splitted)
  similitudesB = ctext_AtoB(textB_splitted, textA_splitted)
  # Obtain bidirectional similitude
  similitudesA = similitudesA.inject{ |sum, el| sum + el }.to_f / similitudesA.size
  similitudesB = similitudesB.inject{ |sum, el| sum + el }.to_f / similitudesB.size
  # Obtain bidirectional similitude
  bidirectional_sim = (similitudesA + similitudesB) / 2
  # Return info
  return bidirectional_sim
end

#compute_hyper_prob(a, b, c, d, n) ⇒ `Object`

# File 'lib/semtools/math_methods.rb', line 82

def compute_hyper_prob(a, b, c, d, n)
  # https://en.wikipedia.org/wiki/Fisher%27s_exact_test
  binomA = binom(a + b, a)
  binomC = binom(c + d, c)
  divisor = binom(n, a + c)
  return (binomA * binomC).fdiv(divisor)
end

#ctext_AtoB(textsA, textsB) ⇒ `Object`

Applies the WhiteSimilarity from ‘text’ package over two given text sets and returns the similitudes of the each element of the first set over the second set Param:

textsA: text set to be compared with textsB
textsB: text set to be compared with textsA

Returns the maximum similarity percentage between [0,1] for each element of textsA against all elements of textsB

# File 'lib/semtools/sim_handler.rb', line 28

def ctext_AtoB(textsA, textsB)
  # Check special cases
  return [-1.0] if (textsA.nil?) | (textsB.nil?)
  return [-1.0] if (!textsA.is_a? Array) | (!textsB.is_a? Array)
  return [-1.0] if (textsA.length <= 0) | (textsB.length <= 0)
  # Calculate similitude
  similitudesA = []
  textsA.each do |fragA|
    frag_A_similitudes = []
    textsB.each do |fragB|
      frag_A_similitudes << text_similitude(fragA, fragB)
    end
    begin 
      similitudesA << frag_A_similitudes.max
    rescue => e
      STDERR.puts frag_A_similitudes.inspect
      STDERR.puts textsA.inspect , textsB.inspect
      STDERR.puts e.message
      STDERR.puts e.backtrace
      Process.exit
    end 
  end
  return similitudesA
end

#cummin(array) ⇒ `Object`

# File 'lib/semtools/math_methods.rb', line 121

def cummin(array)
  cumulative_min = array.first
  arr_cummin = []
  array.each do |p|
    cumulative_min = [p, cumulative_min].min
    arr_cummin << cumulative_min
  end
  return arr_cummin
end

#get_benjaminiHochberg_pvalues(arr_pvalues) ⇒ `Object`

to cmpute adjusted pvalues rosettacode.org/wiki/P-value_correction#Ruby

# File 'lib/semtools/math_methods.rb', line 100

def get_benjaminiHochberg_pvalues(arr_pvalues)
  n = arr_pvalues.length
  arr_o = order(arr_pvalues, true)
  arr_cummin_input = []
  (0..(n - 1)).each do |i|
    arr_cummin_input[i] = (n / (n - i).to_f) * arr_pvalues[arr_o[i]]
  end
  arr_ro = order(arr_o)
  arr_cummin = cummin(arr_cummin_input)
  arr_pmin = pmin(arr_cummin)
  return arr_pmin.values_at(*arr_ro)
end

#get_fisher_exact_test(listA, listB, all_elements_count, tail = 'two_sided', weigths = nil) ⇒ `Object`

TODO: Make a pull request to rubygems.org/gems/ruby-statistics, with all the statistic code implemented here. to cmpute fisher exact test Fisher => www.biostathandbook.com/fishers.html

# File 'lib/semtools/math_methods.rb', line 4

def get_fisher_exact_test(listA, listB, all_elements_count, tail ='two_sided', weigths=nil)
  listA_listB = listA & listB
  listA_nolistB = listA - listB
  nolistA_listB = listB - listA
  if weigths.nil?
    listA_listB_count = listA_listB.length
    listA_nolistB_count = listA_nolistB.length
    nolistA_listB_count = nolistA_listB.length
    nolistA_nolistB_count = all_elements_count - (listA | listB).length
  else
    # Fisher exact test weigthed as proposed in Improved scoring of functional groups from gene expression data by decorrelating GO graph structure
    # https://academic.oup.com/bioinformatics/article/22/13/1600/193669
    listA_listB_count = listA_listB.map{|i| weigths[i]}.inject(0){|sum, n| sum + n}.ceil
    listA_nolistB_count = listA_nolistB.map{|i| weigths[i]}.inject(0){|sum, n| sum + n}.ceil
    nolistA_listB_count = nolistA_listB.map{|i| weigths[i]}.inject(0){|sum, n| sum + n}.ceil
    nolistA_nolistB_count = (weigths.keys - (listA | listB)).map{|i| weigths[i]}.inject(0){|sum, n| sum + n}.ceil
    all_elements_count = weigths.values.inject(0){|sum, n| sum + n}.ceil
  end
  if tail == 'two_sided'
    accumulated_prob = get_two_tail(listA_listB_count, listA_nolistB_count, nolistA_listB_count, nolistA_nolistB_count, all_elements_count)
  elsif tail == 'less' 
    accumulated_prob = get_less_tail(listA_listB_count, listA_nolistB_count, nolistA_listB_count, nolistA_nolistB_count, all_elements_count)
  end
  return accumulated_prob
end

#get_less_tail(listA_listB_count, listA_nolistB_count, nolistA_listB_count, nolistA_nolistB_count, all_elements_count) ⇒ `Object`

# File 'lib/semtools/math_methods.rb', line 68

def get_less_tail(listA_listB_count, listA_nolistB_count, nolistA_listB_count, nolistA_nolistB_count, all_elements_count)
  accumulated_prob = 0
  [listA_listB_count, nolistA_nolistB_count].min.times do |n|
    accumulated_prob += compute_hyper_prob(
      listA_listB_count - n, 
      listA_nolistB_count + n, 
      nolistA_listB_count + n, 
      nolistA_nolistB_count - n, 
      all_elements_count
    )
  end
  return accumulated_prob
end

#get_two_tail(listA_listB_count, listA_nolistB_count, nolistA_listB_count, nolistA_nolistB_count, all_elements_count) ⇒ `Object`

# File 'lib/semtools/math_methods.rb', line 30

def get_two_tail(listA_listB_count, listA_nolistB_count, nolistA_listB_count, nolistA_nolistB_count, all_elements_count)
  #https://www.sheffield.ac.uk/polopoly_fs/1.43998!/file/tutorial-9-fishers.pdf
  accumulated_prob = 0
  ref_prob = compute_hyper_prob(
    listA_listB_count, 
    listA_nolistB_count, 
    nolistA_listB_count, 
    nolistA_nolistB_count, 
    all_elements_count
  )
  accumulated_prob += ref_prob
  [listA_listB_count, nolistA_nolistB_count].min.times do |n| #less
    n += 1
    prob = compute_hyper_prob(
      listA_listB_count - n, 
      listA_nolistB_count + n, 
      nolistA_listB_count + n, 
      nolistA_nolistB_count - n, 
      all_elements_count
    )
    prob <= ref_prob ? accumulated_prob += prob : break
  end

  [listA_nolistB_count, nolistA_listB_count].min.times do |n| #greater
    n += 1
    prob = compute_hyper_prob(
      listA_listB_count + n, 
      listA_nolistB_count - n, 
      nolistA_listB_count - n, 
      nolistA_nolistB_count + n, 
      all_elements_count
    )
    accumulated_prob += prob if prob <= ref_prob
  end

  return accumulated_prob
end

#order(array, decreasing = false) ⇒ `Object`

# File 'lib/semtools/math_methods.rb', line 113

def order(array, decreasing = false)
  if decreasing == false
    array.sort.map { |n| array.index(n) }
  else
    array.sort.map { |n| array.index(n) }.reverse
  end
end

#pmin(array) ⇒ `Object`

# File 'lib/semtools/math_methods.rb', line 131

def pmin(array)
  x = 1
  pmin_array = []
  array.each_index do |i|
    pmin_array[i] = [array[i], x].min
    abort if pmin_array[i] > 1
  end
  return pmin_array
end

#similitude_network(items_array, splitChar = ";", charsToRemove = "", unique = false) ⇒ `Object`

Applies the WhiteSimilarity from ‘text’ package over all complex text stored into an array. Complex texts will be splitted and compared one by one from A to B and B to A Param:

items_array: text elements to be compared all against others
splitChar: char to split text* complex names
charsToRemove: char (or chars set) to be removed from texts to be compared
unique: boolean flag which indicates if repeated elements must be removed

Returns the similarity percentage for all elements into array

# File 'lib/semtools/sim_handler.rb', line 95

def similitude_network(items_array, splitChar = ";", charsToRemove = "", unique = false)
  # Special cases
  return nil if items_array.nil?
  return nil if !items_array.is_a? Array
  return nil if items_array.length <= 0
  # Remove repeated elements
  items_array.uniq! if unique
  # Define hash to be filled
  sims = {}
  # Per each item into array => Calculate similitude
  while(items_array.length > 1)
    current = items_array.shift
    sims[current] = {}
    items_array.each do |item|
      sims[current][item] = complex_text_similitude(current,item,splitChar,charsToRemove)
    end
  end 
  return sims
end

#text_similitude(textA, textB) ⇒ `Object`

Applies the WhiteSimilarity from ‘text’ package over two given texts Param:

textA: text to be compared with textB
textB: text to be compared with textA

Returns the similarity percentage between [0,1]

# File 'lib/semtools/sim_handler.rb', line 11

def text_similitude(textA, textB)
  # Check special cases
  return -1.0 if (textA.nil?) | (textB.nil?)
  return -1.0 if (!textA.is_a? String) | (!textB.is_a? String)
  return -1.0 if (textA.length <= 0) | (textB.length <= 0)
  # Calculate similitude
  require 'text'
  white = Text::WhiteSimilarity.new
  return white.similarity(textA.lstrip, textB.lstrip)
end

Top Level Namespace

Defined Under Namespace

Instance Method Summary collapse

Instance Method Details

#binom(n, k) ⇒ Object

#complex_text_similitude(textA, textB, splitChar = ";", charsToRemove = "") ⇒ Object

#compute_hyper_prob(a, b, c, d, n) ⇒ Object

#ctext_AtoB(textsA, textsB) ⇒ Object

#cummin(array) ⇒ Object

#get_benjaminiHochberg_pvalues(arr_pvalues) ⇒ Object

#get_fisher_exact_test(listA, listB, all_elements_count, tail = 'two_sided', weigths = nil) ⇒ Object

#get_less_tail(listA_listB_count, listA_nolistB_count, nolistA_listB_count, nolistA_nolistB_count, all_elements_count) ⇒ Object

#get_two_tail(listA_listB_count, listA_nolistB_count, nolistA_listB_count, nolistA_nolistB_count, all_elements_count) ⇒ Object

#order(array, decreasing = false) ⇒ Object

#pmin(array) ⇒ Object

#similitude_network(items_array, splitChar = ";", charsToRemove = "", unique = false) ⇒ Object

#text_similitude(textA, textB) ⇒ Object