Class: MarkovTwitter::MarkovBuilder
- Inherits:
-
Object
- Object
- MarkovTwitter::MarkovBuilder
- Defined in:
- lib/markov_twitter/markov_builder.rb,
lib/markov_twitter/markov_builder/node.rb
Overview
Builds a Markov chain from phrases passed as input. A “phrase” is defined here as a tweet.
Defined Under Namespace
Classes: Node
Constant Summary collapse
- SeparatorCharacterRegex =
Regex used to split the phrase into tokens. It splits on any number of whitespace/newline in sequence. Sequences of punctuation characters are treated like any other word.
/\s+/
Instance Attribute Summary collapse
-
#end_nodes ⇒ Set<Node>
readonly
The nodes that were found at the end of phrases.
-
#nodes ⇒ Hash<String, Node>
readonly
The base dictionary for nodes.
-
#start_nodes ⇒ Set<Node>
readonly
The nodes that were found at the start of phrases.
Class Method Summary collapse
-
.split_phrase(phrase) ⇒ Array<String>
Splits a phrase into tokens.
Instance Method Summary collapse
-
#_evaluate(length:, probability_bounds: [0,100], root_node: nil, direction:, node_finder:) ⇒ Array<Node>
An “evaluation” of the markov chain.
-
#add_linkages(node1, node2) ⇒ void
Adds bidirectional linkages beween two nodes.
-
#add_nodes(node1, node2 = nil) ⇒ void
Adds a sequence of two tokens to @nodes and creates linkages.
-
#check_probability_bounds(bounds) ⇒ Boolean
validates the given probability bounds.
-
#construct_node(value) ⇒ Node
Builds a single node which contains a reference to @nodes.
-
#evaluate(length:, probability_bounds: [0,100], root_node: nil) ⇒ String
The default evaluation method to produce a run case.
-
#evaluate_favoring_end(length:, probability_bounds: [0,100], root_node: nil) ⇒ String
See #_evaluate for paramspec.
-
#evaluate_favoring_start(length:, probability_bounds: [0,100], root_node: nil) ⇒ String
See #_evaluate for paramspec.
-
#get_new_start_point(node_finder) ⇒ Node
Gets a random node as a potential start point.
-
#initialize(phrases: []) ⇒ MarkovBuilder
constructor
processes the phrases to populate @nodes.
-
#node_finders ⇒ Lambda<Node>
lambdas which can be used during evaluation to find the first node, or the next node when “stuck” (meaning there is no :next/:prev node).
-
#pick_linkage(linkages, probability_bounds = [0,100]) ⇒ Node
Given “linkages” which includes all possibly node traversals in a predetermined direction, pick one based on their probabilities.
-
#process_phrase(phrase) ⇒ void
Splits a phrase into tokens, adds them to @nodes, and creates linkages.
Constructor Details
#initialize(phrases: []) ⇒ MarkovBuilder
processes the phrases to populate @nodes.
45 46 47 48 49 50 |
# File 'lib/markov_twitter/markov_builder.rb', line 45 def initialize(phrases: []) @nodes = {} @start_nodes = Set.new @end_nodes = Set.new phrases.each &method(:process_phrase) end |
Instance Attribute Details
#end_nodes ⇒ Set<Node> (readonly)
The nodes that were found at the end of phrases
22 23 24 |
# File 'lib/markov_twitter/markov_builder.rb', line 22 def end_nodes @end_nodes end |
#nodes ⇒ Hash<String, Node> (readonly)
The base dictionary for nodes. There is only a single copy of each node created, although they are referenced in Node#linkages as well.
14 15 16 |
# File 'lib/markov_twitter/markov_builder.rb', line 14 def nodes @nodes end |
#start_nodes ⇒ Set<Node> (readonly)
The nodes that were found at the start of phrases
18 19 20 |
# File 'lib/markov_twitter/markov_builder.rb', line 18 def start_nodes @start_nodes end |
Class Method Details
.split_phrase(phrase) ⇒ Array<String>
Splits a phrase into tokens.
39 40 41 |
# File 'lib/markov_twitter/markov_builder.rb', line 39 def self.split_phrase(phrase) phrase.split(SeparatorCharacterRegex) end |
Instance Method Details
#_evaluate(length:, probability_bounds: [0,100], root_node: nil, direction:, node_finder:) ⇒ Array<Node>
An “evaluation” of the markov chain. e.g. a run case. Passes random values through the probability sequences.
168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 |
# File 'lib/markov_twitter/markov_builder.rb', line 168 def _evaluate( length:, probability_bounds: [0,100], root_node: nil, direction:, node_finder: ) length.times.reduce([]) do |result_nodes| root_node ||= get_new_start_point(node_finder) result_nodes.push root_node root_node = pick_linkage( root_node.linkages[direction], probability_bounds, ) result_nodes end end |
#add_linkages(node1, node2) ⇒ void
This method returns an undefined value.
Adds bidirectional linkages beween two nodes. the Node class re-calculates the probabilities internally and mirrors the change on :prev.
99 100 101 |
# File 'lib/markov_twitter/markov_builder.rb', line 99 def add_linkages(node1, node2) node1.add_next_linkage(node2, mirror_change=true) end |
#add_nodes(node1, node2 = nil) ⇒ void
This method returns an undefined value.
Adds a sequence of two tokens to @nodes and creates linkages. if node_val2 is nil, it won’t be added and linkages won’t be created
74 75 76 77 78 79 80 81 82 83 |
# File 'lib/markov_twitter/markov_builder.rb', line 74 def add_nodes(node1, node2=nil) unless node1.is_a?(Node) raise ArgumentError, "first arg passed to add_nodes is not a Node" end @nodes[node1.value] ||= node1 if node2 @nodes[node2.value] ||= node2 add_linkages(*@nodes.values_at(*[node1,node2].map(&:value))) end end |
#check_probability_bounds(bounds) ⇒ Boolean
validates the given probability bounds
197 198 199 200 201 202 203 204 205 206 |
# File 'lib/markov_twitter/markov_builder.rb', line 197 def check_probability_bounds(bounds) bounds1, bounds2 = bounds bounds_diff = bounds2 - bounds1 if ( (bounds_diff < 0) || (bounds_diff > 100) || (bounds1 < 0) || (bounds2 > 100) ) raise ArgumentError, "wasn't given 0 <= bounds1 <= bounds2 <= 100" end end |
#construct_node(value) ⇒ Node
Builds a single node which contains a reference to @nodes. Note that this does do the inverse (it doesn’t add the node to @nodes)
89 90 91 |
# File 'lib/markov_twitter/markov_builder.rb', line 89 def construct_node(value) Node.new(value: value, nodes: @nodes) end |
#evaluate(length:, probability_bounds: [0,100], root_node: nil) ⇒ String
The default evaluation method to produce a run case. Goes in forward direction with with random nodes as start points. See also #evaluate_favoring_start and #evaluate_favoring_end. See #_evaluate for paramspecs The passed node_node_finder lambda picks a totally random new node.
109 110 111 112 113 114 115 116 117 |
# File 'lib/markov_twitter/markov_builder.rb', line 109 def evaluate(length:, probability_bounds: [0,100], root_node: nil) _evaluate( length: length, probability_bounds: probability_bounds, root_node: root_node, direction: :next, node_finder: node_finders[:random] ).map(&:value).join(" ") end |
#evaluate_favoring_end(length:, probability_bounds: [0,100], root_node: nil) ⇒ String
See #_evaluate for paramspec. The passed node_node_finder lambda picks a node contained in @end_nodes An error is raised if no nodes match this condition.
142 143 144 145 146 147 148 149 150 151 152 153 154 155 |
# File 'lib/markov_twitter/markov_builder.rb', line 142 def evaluate_favoring_end(length:, probability_bounds: [0,100], root_node: nil) node_finder = node_finders[:favor_end] has_possible_end_node = nodes.values.any? &node_finder unless has_possible_end_node raise ArgumentError, "@end_nodes is empty; can't evaluate favoring end" end _evaluate( length: length, probability_bounds: probability_bounds, root_node: root_node, direction: :prev, node_finder: node_finder ).map(&:value).reverse.join(" ") end |
#evaluate_favoring_start(length:, probability_bounds: [0,100], root_node: nil) ⇒ String
See #_evaluate for paramspec. The passed node_node_finder lambda picks a node contained in @start_nodes An error is raised if no nodes match this condition.
123 124 125 126 127 128 129 130 131 132 133 134 135 136 |
# File 'lib/markov_twitter/markov_builder.rb', line 123 def evaluate_favoring_start(length:, probability_bounds: [0,100], root_node: nil) node_finder = node_finders[:favor_start] has_possible_start_node = nodes.values.any? &node_finder unless has_possible_start_node raise ArgumentError, "@start_nodes is empty; can't evaluate favoring start" end _evaluate( length: length, probability_bounds: probability_bounds, root_node: root_node, direction: :next, node_finder: node_finder ).map(&:value).join(" ") end |
#get_new_start_point(node_finder) ⇒ Node
Gets a random node as a potential start point.
190 191 192 |
# File 'lib/markov_twitter/markov_builder.rb', line 190 def get_new_start_point(node_finder) nodes.values.shuffle.find(&node_finder) end |
#node_finders ⇒ Lambda<Node>
lambdas which can be used during evaluation to find the first node, or the next node when “stuck” (meaning there is no :next/:prev node).
28 29 30 31 32 33 34 |
# File 'lib/markov_twitter/markov_builder.rb', line 28 def node_finders @node_finders ||= { random: -> (node) { true }, favor_start: -> (node) { start_nodes.include? node.value }, favor_end: -> (node) { end_nodes.include? node.value }, } end |
#pick_linkage(linkages, probability_bounds = [0,100]) ⇒ Node
Given “linkages” which includes all possibly node traversals in a predetermined direction, pick one based on their probabilities.
217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 |
# File 'lib/markov_twitter/markov_builder.rb', line 217 def pick_linkage(linkages, probability_bounds=[0,100]) check_probability_bounds(probability_bounds) bounds1, bounds2 = probability_bounds # pick a random number between the bounds. random_num = (rand(bounds2 - bounds1) + bounds1) * 0.01 # offset is the accumulation of probabilities seen during iteration. offset = 0 # sort to lowest first sorted = linkages.sort_by { |name, prob| prob } # find the first linkage value that satisfies offset < N(rand) < val. new_key = sorted.find do |(key, probability)| # increment the offset each time. random_num.between?(offset, probability + offset).tap do offset += probability end end nodes[new_key&.first] end |
#process_phrase(phrase) ⇒ void
This method returns an undefined value.
Splits a phrase into tokens, adds them to @nodes, and creates linkages.
55 56 57 58 59 60 61 62 63 64 65 66 67 |
# File 'lib/markov_twitter/markov_builder.rb', line 55 def process_phrase(phrase) node_vals = self.class.split_phrase(phrase) last_node = nil node_vals.length.times do |i| nodes = node_vals[i..(i+1)].compact.map do |node_val| construct_node(node_val) end @start_nodes.add(nodes[0].value) if i == 0 last_node = nodes.last add_nodes(*nodes) end @end_nodes.add last_node.value end |