Class: Ariel::StructureNode

Inherits:
Object
  • Object
show all
Includes:
NodeLike
Defined in:
lib/ariel/structure_node.rb

Overview

Implements a Node object used to represent the structure of the document tree. Each node stores start and end rules to extract the desired content from its parent node. Could be viewed as a rule-storing object.

Instance Attribute Summary collapse

Attributes included from NodeLike

#children, #meta, #parent

Instance Method Summary collapse

Methods included from NodeLike

#add_child, #each_descendant

Constructor Details

#initialize(name = :root, type = :not_list) {|_self| ... } ⇒ StructureNode

Returns a new instance of StructureNode.

Yields:

  • (_self)

Yield Parameters:



10
11
12
13
14
# File 'lib/ariel/structure_node.rb', line 10

def initialize(name=:root, type=:not_list, &block)
  @children={}
  @meta = OpenStruct.new({:name=>name, :node_type=>type})
  yield self if block_given?
end

Dynamic Method Handling

This class handles dynamic methods through the method_missing method

#method_missing(method, *args, &block) ⇒ Object



66
67
68
69
70
71
72
# File 'lib/ariel/structure_node.rb', line 66

def method_missing(method, *args, &block)
  if @children.has_key? method
    @children[method]
  else
    super
  end
end

Instance Attribute Details

#rulesetObject

Returns the value of attribute ruleset.



9
10
11
# File 'lib/ariel/structure_node.rb', line 9

def ruleset
  @ruleset
end

Instance Method Details

#apply_extraction_tree_on(root_node, extract_labels = false) ⇒ Object

Applies the extraction rules stored in the current StructureNode and all its descendant children.



42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
# File 'lib/ariel/structure_node.rb', line 42

def apply_extraction_tree_on(root_node, extract_labels=false)
  extraction_queue = [root_node]
  until extraction_queue.empty? do
    new_parent = extraction_queue.shift
    new_parent.meta.structure.children.values.each do |child|
      if extract_labels
        extracted_node=LabelUtils.extract_labeled_region(child, new_parent)
      else
        extracted_node=child.extract_from(new_parent)
      end
      extraction_queue.push(extracted_node) if extracted_node
    end
  end
  return root_node
end

#extend_structure {|_self| ... } ⇒ Object

Used to extend an already created Node. e.g.

node.extend_structure do |r|
  r.new_field1
  r.new_field2
end

Yields:

  • (_self)

Yield Parameters:



21
22
23
# File 'lib/ariel/structure_node.rb', line 21

def extend_structure(&block)
  yield self if block_given?
end

#extract_from(node) ⇒ Object

Given a Node to apply it’s rules to, this function will create a new node and add it as a child of the given node. For StructureNodes of :list type, the list is extracted and so are each of the list items. In this case, only the list items are yielded.



29
30
31
32
33
34
35
36
37
38
# File 'lib/ariel/structure_node.rb', line 29

def extract_from(node)
  # Will be reimplemented to return an array of extracted items
  newstream = @ruleset.apply_to(node.tokenstream)
  extracted_node = ExtractedNode.new(meta.name, newstream, self)
  node.add_child extracted_node if newstream
  if self.meta.node_type == :list
    #Do stuff
  end
  return extracted_node
end

#item(name, &block) ⇒ Object



58
59
60
# File 'lib/ariel/structure_node.rb', line 58

def item(name, &block)
  self.add_child(StructureNode.new(name, &block))
end

#list_item(name, &block) ⇒ Object



62
63
64
# File 'lib/ariel/structure_node.rb', line 62

def list_item(name, &block)
  self.add_child(StructureNode.new(name, :list, &block))
end