Class: Cass::Stats
- Inherits:
-
Object
- Object
- Cass::Stats
- Defined in:
- lib/cass/stats.rb
Overview
Collects miscellaneous descriptive statistic methods that may be useful. These are generally not hooked up to the primary processing stream, and need to be called on an ad-hoc basis.
Class Method Summary collapse
-
.string_tokens(text, s) ⇒ Object
Count the number of times a given token s occurs in text.
-
.word_count(text, stopwords = nil, save = nil) ⇒ Object
Takes a string as input and prints out a list of all words encountered, sorted by their frequency count (in descending order).
Class Method Details
.string_tokens(text, s) ⇒ Object
Count the number of times a given token s occurs in text.
34 35 36 |
# File 'lib/cass/stats.rb', line 34 def self.string_tokens(text, s) text.scan(/#{s}/).size end |
.word_count(text, stopwords = nil, save = nil) ⇒ Object
Takes a string as input and prints out a list of all words encountered, sorted by their frequency count (in descending order). Words are separated by whitespace; no additional processing will be performed, so if you don’t want special characters to define words, you need to preprocess the string before you call this method. Arguments:
-
text: the string to count token occurrences in.
-
stopwords: optional location of stopword file. Words in file will be excluded from count.
-
save: the filename to save the results to. If left nil, will print to screen.
18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
# File 'lib/cass/stats.rb', line 18 def self.word_count(text, stopwords=nil, save=nil) sw = {} text = text.join(" ") if text.class == Array File.new(stopwords).readlines.each { |l| sw[l.strip] = 1 } if !stopwords.nil? words = text.split(/\s+/) counts = Hash.new(0) words.each { |w| counts[w] += 1 if !sw.key?(w) } counts = counts.sort { |a,b| b[1] <=> a[1] }.each { |l| "#{l[0]}: #{l[1]}" } if save.nil? puts counts else File.new(save, 'w').puts counts end end |