Class: Ariel::Token

Inherits:
Object
  • Object
show all
Defined in:
lib/ariel/token.rb

Overview

Tokens populate a TokenStream. They know their position in the original document, can list the wildcards that match them and determine whether a given string or wildcard is a valid match. During the process of parsing a labeled document, some tokens may be marked as being a label_tag. These are filtered from the TokenStream before the rule learning phase.

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(text, start_loc, end_loc, label_tag = false) ⇒ Token

Each new Token must have a string representing its content, its start position in the original document (start_loc) and the point at which it ends (end_loc). For instance, in str=“This is an example”, if “is” were to be made a Token it would be given a start_loc of 5 and and end_loc of 7, which is str



16
17
18
19
20
21
# File 'lib/ariel/token.rb', line 16

def initialize(text, start_loc, end_loc, label_tag=false)
  @text=text.to_s
  @start_loc=start_loc
  @end_loc=end_loc
  @label_tag=label_tag
end

Instance Attribute Details

#end_locObject (readonly)

Returns the value of attribute end_loc.



9
10
11
# File 'lib/ariel/token.rb', line 9

def end_loc
  @end_loc
end

#start_locObject (readonly)

Returns the value of attribute start_loc.



9
10
11
# File 'lib/ariel/token.rb', line 9

def start_loc
  @start_loc
end

#textObject (readonly)

Returns the value of attribute text.



9
10
11
# File 'lib/ariel/token.rb', line 9

def text
  @text
end

Instance Method Details

#<=>(t) ⇒ Object

Tokens are sorted based on their start_loc



35
36
37
# File 'lib/ariel/token.rb', line 35

def <=>(t)
  @start_loc <=> t.start_loc
end

#==(t) ⇒ Object

Tokens are only equal if they have an equal start_loc, end_loc and text.



30
31
32
# File 'lib/ariel/token.rb', line 30

def ==(t)
  return (@start_loc==t.start_loc && @end_loc==t.end_loc && @text==t.text)
end

#is_label_tag?Boolean

Returns true or false depending on whether the token was marked as a label tag when it was initialized.

Returns:

  • (Boolean)


25
26
27
# File 'lib/ariel/token.rb', line 25

def is_label_tag?
  @label_tag
end

#matches?(landmark) ⇒ Boolean

Accepts either a string a symbol representing a wildcard in Wildcards#list or an an arbitrary regex. Returns true if the whole Token is consumed by the wildcard or the string is equal to Token#text, and false if the match fails. Raises an error if the passed symbol is not a member of Wildcards#list.

Returns:

  • (Boolean)


44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
# File 'lib/ariel/token.rb', line 44

def matches?(landmark)
  if landmark.kind_of? Symbol or landmark.kind_of? Regexp
    if landmark.kind_of? Symbol
      raise ArgumentError, "#{landmark} is not a valid wildcard." unless Wildcards.list.has_key? landmark
      regex = Wildcards.list[landmark]
    else
      regex = landmark
    end
    if self.text[regex] == self.text
      return true
    else
      return false
    end
  else
    return true if landmark==self.text
  end
  return false
end

#matching_wildcardsObject

Returns an array of symbols corresponding to the Wildcards that match the Token.



65
66
67
# File 'lib/ariel/token.rb', line 65

def matching_wildcards
  return Wildcards.matching(self.text)
end