Class: Lda::Document
- Inherits:
-
Object
- Object
- Lda::Document
- Defined in:
- lib/lda-ruby/document/document.rb,
ext/lda-ruby/lda-inference.c
Direct Known Subclasses
Instance Attribute Summary collapse
-
#corpus ⇒ Object
readonly
Returns the value of attribute corpus.
-
#counts ⇒ Object
readonly
Returns the value of attribute counts.
-
#length ⇒ Object
readonly
Returns the value of attribute length.
-
#tokens ⇒ Object
readonly
Returns the value of attribute tokens.
-
#total ⇒ Object
readonly
Returns the value of attribute total.
-
#words ⇒ Object
readonly
Returns the value of attribute words.
Instance Method Summary collapse
- #handle(tokens) ⇒ Object
- #has_text? ⇒ Boolean
-
#initialize(corpus) ⇒ Document
constructor
A new instance of Document.
-
#recompute ⇒ Object
Recompute the total and length values.
- #tokenize(text) ⇒ Object
Constructor Details
#initialize(corpus) ⇒ Document
Returns a new instance of Document.
5 6 7 8 9 10 11 12 13 |
# File 'lib/lda-ruby/document/document.rb', line 5 def initialize(corpus) @corpus = corpus @words = Array.new @counts = Array.new @tokens = Array.new @length = 0 @total = 0 end |
Instance Attribute Details
#corpus ⇒ Object (readonly)
Returns the value of attribute corpus.
3 4 5 |
# File 'lib/lda-ruby/document/document.rb', line 3 def corpus @corpus end |
#counts ⇒ Object (readonly)
Returns the value of attribute counts.
3 4 5 |
# File 'lib/lda-ruby/document/document.rb', line 3 def counts @counts end |
#length ⇒ Object (readonly)
Returns the value of attribute length.
3 4 5 |
# File 'lib/lda-ruby/document/document.rb', line 3 def length @length end |
#tokens ⇒ Object (readonly)
Returns the value of attribute tokens.
3 4 5 |
# File 'lib/lda-ruby/document/document.rb', line 3 def tokens @tokens end |
#total ⇒ Object (readonly)
Returns the value of attribute total.
3 4 5 |
# File 'lib/lda-ruby/document/document.rb', line 3 def total @total end |
#words ⇒ Object (readonly)
Returns the value of attribute words.
3 4 5 |
# File 'lib/lda-ruby/document/document.rb', line 3 def words @words end |
Instance Method Details
#handle(tokens) ⇒ Object
27 28 29 |
# File 'lib/lda-ruby/document/document.rb', line 27 def handle(tokens) tokens end |
#has_text? ⇒ Boolean
23 24 25 |
# File 'lib/lda-ruby/document/document.rb', line 23 def has_text? false end |
#recompute ⇒ Object
Recompute the total and length values.
18 19 20 21 |
# File 'lib/lda-ruby/document/document.rb', line 18 def recompute @total = @counts.inject(0) { |sum, i| sum + i } @length = @words.size end |
#tokenize(text) ⇒ Object
31 32 33 34 |
# File 'lib/lda-ruby/document/document.rb', line 31 def tokenize(text) clean_text = text.gsub(/[^A-Za-z'\s]+/, ' ').gsub(/\s+/, ' ') # remove everything but letters and ' and leave only single spaces @tokens = handle(clean_text.split(' ')) end |