Class: Cass::Context
- Inherits:
-
Object
- Object
- Cass::Context
- Defined in:
- lib/cass/context.rb
Overview
Represents the context of a document, i.e., a list of words to analyze, along with an index.
Instance Attribute Summary collapse
-
#index ⇒ Object
Returns the value of attribute index.
-
#words ⇒ Object
Returns the value of attribute words.
Instance Method Summary collapse
-
#[](el) ⇒ Object
Convenience accessor method for getting either words in the context, or their index in the array.
-
#index_words ⇒ Object
Index the context.
-
#initialize(doc, opts) ⇒ Context
constructor
A new instance of Context.
-
#key?(k) ⇒ Boolean
Returns true if a word is in the context, false otherwise.
-
#size ⇒ Object
Number of words in the context.
Constructor Details
#initialize(doc, opts) ⇒ Context
Returns a new instance of Context.
8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
# File 'lib/cass/context.rb', line 8 def initialize(doc, opts) min_prop = opts['min_prop'] || 0 max_prop = opts['max_prop'] || 1 puts "Creating new context..." if VERBOSE words = doc.lines.join(' ').split(/\s+/) nwords = words.size puts "Found #{nwords} words." if min_prop > 0 or max_prop < 1 word_hash = Hash.new(0) words.each {|w| word_hash[w] += 1 } min_t, max_t = (min_prop * nwords).round, (max_prop * nwords).round words = word_hash.delete_if { |w,c| c < min_t or c > max_t }.keys else words.uniq! end # words = words - doc.targets words -= opts['stop_file'].read.split(/\s+/) if opts.key?('stop_file') @words = opts.key?('context_size') ? words.sort_by{rand}[0, opts['context_size']] : words index_words puts "Using #{@words.size} words as context." if VERBOSE end |
Instance Attribute Details
#index ⇒ Object
Returns the value of attribute index.
6 7 8 |
# File 'lib/cass/context.rb', line 6 def index @index end |
#words ⇒ Object
Returns the value of attribute words.
6 7 8 |
# File 'lib/cass/context.rb', line 6 def words @words end |
Instance Method Details
#[](el) ⇒ Object
Convenience accessor method for getting either words in the context, or their index in the array. If an integer is passed, returns a word; If a string is passed, return the index of the word in the array.
39 40 41 |
# File 'lib/cass/context.rb', line 39 def [](el) el.class == Integer ? @words[el] : @index[el] end |
#index_words ⇒ Object
Index the context. Necessary when words are updated manually.
31 32 33 34 |
# File 'lib/cass/context.rb', line 31 def index_words @index = {} @words.each_index { |i| @index[@words[i]] = i } end |
#key?(k) ⇒ Boolean
Returns true if a word is in the context, false otherwise.
44 45 46 |
# File 'lib/cass/context.rb', line 44 def key?(k) @index.key?(k) end |
#size ⇒ Object
Number of words in the context.
49 50 51 |
# File 'lib/cass/context.rb', line 49 def size @words.size end |