Class: Cass::Stats

Inherits:
Object
  • Object
show all
Defined in:
lib/cass/stats.rb

Overview

Collects miscellaneous descriptive statistic methods that may be useful. These are generally not hooked up to the primary processing stream, and need to be called on an ad-hoc basis.

Class Method Summary collapse

Class Method Details

.string_tokens(text, s) ⇒ Object

Count the number of times a given token s occurs in text.



34
35
36
# File 'lib/cass/stats.rb', line 34

def self.string_tokens(text, s)
  text.scan(/#{s}/).size
end

.word_count(text, stopwords = nil, save = nil) ⇒ Object

Takes a string as input and prints out a list of all words encountered, sorted by their frequency count (in descending order). Words are separated by whitespace; no additional processing will be performed, so if you don’t want special characters to define words, you need to preprocess the string before you call this method. Arguments:

  • text: the string to count token occurrences in.

  • stopwords: optional location of stopword file. Words in file will be excluded from count.

  • save: the filename to save the results to. If left nil, will print to screen.



18
19
20
21
22
23
24
25
26
27
28
29
30
31
# File 'lib/cass/stats.rb', line 18

def self.word_count(text, stopwords=nil, save=nil)
  sw = {}
  text = text.join(" ") if text.class == Array
  File.new(stopwords).readlines.each { |l|  sw[l.strip] = 1 } if !stopwords.nil?
  words = text.split(/\s+/)
  counts = Hash.new(0)
  words.each { |w| counts[w] += 1 if !sw.key?(w) }
  counts = counts.sort { |a,b| b[1] <=> a[1] }.each { |l| "#{l[0]}: #{l[1]}" }
  if save.nil?
    puts counts
  else
    File.new(save, 'w').puts counts
  end
end