Class: MarkovTwitter::MarkovBuilder

Inherits:
Object
  • Object
show all
Defined in:
lib/markov_twitter/markov_builder.rb,
lib/markov_twitter/markov_builder/node.rb

Overview

Builds a Markov chain from phrases passed as input. A “phrase” is defined here as a tweet.

Defined Under Namespace

Classes: Node

Constant Summary collapse

SeparatorCharacterRegex =

Regex used to split the phrase into tokens. It splits on any number of whitespace/newline in sequence. Sequences of punctuation characters are treated like any other word.

/\s+/

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(phrases: []) ⇒ MarkovBuilder

processes the phrases to populate @nodes.

Parameters:

  • phrases (Array<String>) (defaults to: [])

    e.g. sentences or tweets.



45
46
47
48
49
50
# File 'lib/markov_twitter/markov_builder.rb', line 45

def initialize(phrases: [])
  @nodes = {}
  @start_nodes = Set.new
  @end_nodes = Set.new
  phrases.each &method(:process_phrase)
end

Instance Attribute Details

#end_nodesSet<Node> (readonly)

The nodes that were found at the end of phrases

Returns:



22
23
24
# File 'lib/markov_twitter/markov_builder.rb', line 22

def end_nodes
  @end_nodes
end

#nodesHash<String, Node> (readonly)

The base dictionary for nodes. There is only a single copy of each node created, although they are referenced in Node#linkages as well.

Returns:

  • (Hash<String, Node>)


14
15
16
# File 'lib/markov_twitter/markov_builder.rb', line 14

def nodes
  @nodes
end

#start_nodesSet<Node> (readonly)

The nodes that were found at the start of phrases

Returns:



18
19
20
# File 'lib/markov_twitter/markov_builder.rb', line 18

def start_nodes
  @start_nodes
end

Class Method Details

.split_phrase(phrase) ⇒ Array<String>

Splits a phrase into tokens.

Parameters:

  • phrase (String)

Returns:

  • (Array<String>)


39
40
41
# File 'lib/markov_twitter/markov_builder.rb', line 39

def self.split_phrase(phrase)
  phrase.split(SeparatorCharacterRegex)
end

Instance Method Details

#_evaluate(length:, probability_bounds: [0,100], root_node: nil, direction:, node_finder:) ⇒ Array<Node>

An “evaluation” of the markov chain. e.g. a run case. Passes random values through the probability sequences.

Parameters:

  • length (Integer)

    the number of tokens in the result.

  • probability_bounds (Array<Integer, Integer>) (defaults to: [0,100])

    optional, can limit the probability to a range where 0 <= min <= result <= max <= 100.

  • node_finder (Lambda<Node>)

    during iteration, if the current node has no linkages in <direction>, a new node is selected from the nodes dict. The first randomly-picked node which this lambda returns a truthy value for is selected.

Returns:

  • (Array<Node>)

    the result tokens in order.



168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
# File 'lib/markov_twitter/markov_builder.rb', line 168

def _evaluate(
  length:,
  probability_bounds: [0,100],
  root_node: nil,
  direction:,
  node_finder:
)
  length.times.reduce([]) do |result_nodes|
    root_node ||= get_new_start_point(node_finder)
    result_nodes.push root_node
    root_node = pick_linkage(
      root_node.linkages[direction],
      probability_bounds,
    )
    result_nodes
  end
end

#add_linkages(node1, node2) ⇒ void

This method returns an undefined value.

Adds bidirectional linkages beween two nodes. the Node class re-calculates the probabilities internally and mirrors the change on :prev.

Parameters:

  • node1 (Node)

    the parent.

  • node2 (Node)

    the child.



99
100
101
# File 'lib/markov_twitter/markov_builder.rb', line 99

def add_linkages(node1, node2)
  node1.add_next_linkage(node2, mirror_change=true)
end

#add_nodes(node1, node2 = nil) ⇒ void

This method returns an undefined value.

Adds a sequence of two tokens to @nodes and creates linkages. if node_val2 is nil, it won’t be added and linkages won’t be created

Parameters:

  • node1 (Node)
  • node2 (Node) (defaults to: nil)


74
75
76
77
78
79
80
81
82
83
# File 'lib/markov_twitter/markov_builder.rb', line 74

def add_nodes(node1, node2=nil)
  unless node1.is_a?(Node)
    raise ArgumentError, "first arg passed to add_nodes is not a Node"
  end
  @nodes[node1.value] ||= node1
  if node2
    @nodes[node2.value] ||= node2
    add_linkages(*@nodes.values_at(*[node1,node2].map(&:value)))
  end
end

#check_probability_bounds(bounds) ⇒ Boolean

validates the given probability bounds

Parameters:

  • bounds (Array<Integer, Integer>)

Returns:

  • (Boolean)

    indicating whether it is valid



197
198
199
200
201
202
203
204
205
206
# File 'lib/markov_twitter/markov_builder.rb', line 197

def check_probability_bounds(bounds)
  bounds1, bounds2 = bounds
  bounds_diff = bounds2 - bounds1 
  if (
    (bounds_diff < 0) || (bounds_diff > 100) ||
    (bounds1 < 0) || (bounds2 > 100)
  )
    raise ArgumentError, "wasn't given 0 <= bounds1 <= bounds2 <= 100"
  end
end

#construct_node(value) ⇒ Node

Builds a single node which contains a reference to @nodes. Note that this does do the inverse (it doesn’t add the node to @nodes)

Parameters:

  • value (String)

Returns:



89
90
91
# File 'lib/markov_twitter/markov_builder.rb', line 89

def construct_node(value)
  Node.new(value: value, nodes: @nodes)
end

#evaluate(length:, probability_bounds: [0,100], root_node: nil) ⇒ String

The default evaluation method to produce a run case. Goes in forward direction with with random nodes as start points. See also #evaluate_favoring_start and #evaluate_favoring_end. See #_evaluate for paramspecs The passed node_node_finder lambda picks a totally random new node.

Returns:

  • (String)

    the result of #_evaluate joined by whitespace.



109
110
111
112
113
114
115
116
117
# File 'lib/markov_twitter/markov_builder.rb', line 109

def evaluate(length:, probability_bounds: [0,100], root_node: nil)
  _evaluate(
    length: length,
    probability_bounds: probability_bounds,
    root_node: root_node,
    direction: :next,
    node_finder: node_finders[:random]
  ).map(&:value).join(" ")
end

#evaluate_favoring_end(length:, probability_bounds: [0,100], root_node: nil) ⇒ String

See #_evaluate for paramspec. The passed node_node_finder lambda picks a node contained in @end_nodes An error is raised if no nodes match this condition.

Returns:

  • (String)

    the result of #_evaluate reversed and joined by whitespace.



142
143
144
145
146
147
148
149
150
151
152
153
154
155
# File 'lib/markov_twitter/markov_builder.rb', line 142

def evaluate_favoring_end(length:, probability_bounds: [0,100], root_node: nil)
  node_finder = node_finders[:favor_end]
  has_possible_end_node = nodes.values.any? &node_finder
  unless has_possible_end_node
    raise ArgumentError, "@end_nodes is empty; can't evaluate favoring end"
  end
  _evaluate(
    length: length,
    probability_bounds: probability_bounds,
    root_node: root_node,
    direction: :prev,
    node_finder: node_finder
  ).map(&:value).reverse.join(" ")
end

#evaluate_favoring_start(length:, probability_bounds: [0,100], root_node: nil) ⇒ String

See #_evaluate for paramspec. The passed node_node_finder lambda picks a node contained in @start_nodes An error is raised if no nodes match this condition.

Returns:

  • (String)

    the result of #_evaluate joined by whitespace.



123
124
125
126
127
128
129
130
131
132
133
134
135
136
# File 'lib/markov_twitter/markov_builder.rb', line 123

def evaluate_favoring_start(length:, probability_bounds: [0,100], root_node: nil)
  node_finder = node_finders[:favor_start]
  has_possible_start_node = nodes.values.any? &node_finder
  unless has_possible_start_node
    raise ArgumentError, "@start_nodes is empty; can't evaluate favoring start"
  end
  _evaluate(
    length: length,
    probability_bounds: probability_bounds,
    root_node: root_node,
    direction: :next,
    node_finder: node_finder
  ).map(&:value).join(" ")
end

#get_new_start_point(node_finder) ⇒ Node

Gets a random node as a potential start point.

Parameters:

  • node_finder (lambda<Node>)

    any returned node will return a truthy value from this.

Returns:

  • (Node)

    or nil if one couldn’t be found.



190
191
192
# File 'lib/markov_twitter/markov_builder.rb', line 190

def get_new_start_point(node_finder)
  nodes.values.shuffle.find(&node_finder)
end

#node_findersLambda<Node>

lambdas which can be used during evaluation to find the first node, or the next node when “stuck” (meaning there is no :next/:prev node).

Returns:

  • (Lambda<Node>)

    the lambda should return true for a node that is suitable.



28
29
30
31
32
33
34
# File 'lib/markov_twitter/markov_builder.rb', line 28

def node_finders
  @node_finders ||= {
    random:      -> (node) { true },
    favor_start: -> (node) { start_nodes.include? node.value },
    favor_end:   -> (node) { end_nodes.include? node.value },
  }
end

#pick_linkage(linkages, probability_bounds = [0,100]) ⇒ Node

Given “linkages” which includes all possibly node traversals in a predetermined direction, pick one based on their probabilities.

Parameters:

  • linkages (Hash<String, Float>)

    key=token, val=probability

  • probability_bounds (Array<Integer,Integer>) (defaults to: [0,100])

    Optional, can limit the probability to a range where 0 <= min <= result <= max <= 100. This gets divided by 100 before being compared to the linkage values.

Returns:

  • (Node)

    or nil if one couldn’t be found.



217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
# File 'lib/markov_twitter/markov_builder.rb', line 217

def pick_linkage(linkages, probability_bounds=[0,100])
  check_probability_bounds(probability_bounds)
  bounds1, bounds2 = probability_bounds
  # pick a random number between the bounds.
  random_num = (rand(bounds2 - bounds1) + bounds1) * 0.01
  # offset is the accumulation of probabilities seen during iteration.
  offset = 0
  # sort to lowest first
  sorted = linkages.sort_by { |name, prob| prob }
  # find the first linkage value that satisfies offset < N(rand) < val.
  new_key = sorted.find do |(key, probability)|
    # increment the offset each time.
    random_num.between?(offset, probability + offset).tap do
      offset += probability
    end
  end
  nodes[new_key&.first]
end

#process_phrase(phrase) ⇒ void

This method returns an undefined value.

Splits a phrase into tokens, adds them to @nodes, and creates linkages.

Parameters:

  • phrase (String)

    e.g. a sentence or tweet.



55
56
57
58
59
60
61
62
63
64
65
66
67
# File 'lib/markov_twitter/markov_builder.rb', line 55

def process_phrase(phrase)
  node_vals = self.class.split_phrase(phrase)
  last_node = nil
  node_vals.length.times do |i|
    nodes = node_vals[i..(i+1)].compact.map do |node_val|
      construct_node(node_val)
    end
    @start_nodes.add(nodes[0].value) if i == 0
    last_node = nodes.last
    add_nodes(*nodes)
  end
  @end_nodes.add last_node.value
end