Class: Rsel::StudyHtml

Inherits:
Object
  • Object
show all
Includes:
Support
Defined in:
lib/rsel/study_html.rb

Overview

Class to study a web page: Parses it with Nokogiri, and allows searching and simplifying Selenium-like expressions.

Constant Summary collapse

NO_PAGE_LOADED =

A large sentinel for @dirties.

1000000

Instance Method Summary collapse

Methods included from Support

#apply_scope, #csspath, #escape_for_hash, #failed_within, #globify, #loc, #normalize_ids, #result_within, #selenium_compare, #string_is_true?, #strip_tags, #xpath, #xpath_expressions, #xpath_row_containing, #xpath_sanitize

Constructor Details

#initialize(first_page = nil) ⇒ StudyHtml



14
15
16
17
18
19
20
21
22
23
24
25
# File 'lib/rsel/study_html.rb', line 14

def initialize(first_page=nil)
  @sections_kept_clean = []
  if first_page
    study(first_page)
  else
    @studied_page = nil
    # Invariant: @dirties == 0 while @keep_clean is true.
    @keep_clean = false
    # No page is loaded.  Set a large sentinel value so it's never tested.
    @dirties = NO_PAGE_LOADED
  end
end

Instance Method Details

#begin_sectionObject

Store the current keep_clean status, and begin forcing study use until the next #end_section.
A semi-optional block argument returns the first argument to give to #study. It's not required if #clean?, but otherwise if it's not present an exception will be thrown.



105
106
107
108
109
110
111
112
113
114
# File 'lib/rsel/study_html.rb', line 105

def begin_section
  last_keep_clean = @keep_clean
  if clean?
    @keep_clean = true
  else
    # This will erase all prior sections.
    study(yield, true)
  end
  @sections_kept_clean.push(last_keep_clean)
end

#clean?Boolean

Return whether the studied page is clean and ready for analysis. Not a verb - does not #undo_all_dirties.



72
73
74
# File 'lib/rsel/study_html.rb', line 72

def clean?
  return @dirties == 0
end

#dirtyObject

"Dirty" the studied page, marking one (potential) change since the page was studied. This can be undone: see #undo_last_dirty Does nothing if #keep_clean has been called with true, which may occur from #study



50
51
52
# File 'lib/rsel/study_html.rb', line 50

def dirty
  @dirties += 1 unless @keep_clean
end

#end_sectionObject

Restore the keep_clean status from before the last #begin_section. Also marks the page dirty unless the last keep_clean was true. It's fine to call this more than you call begin_section. It will act just like keep_clean(false) if it runs out of stack parameters.



120
121
122
123
124
125
126
127
128
129
# File 'lib/rsel/study_html.rb', line 120

def end_section
  # Can't just assign - what if nil is popped?
  if @sections_kept_clean.pop
    @keep_clean = true
  else
    @keep_clean = false
    dirty
  end
  return true
end

#get_node(locator) ⇒ Object

Find a studied node by almost any type of Selenium locator. Returns a Nokogiri::Node, or nil if not found.



172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
# File 'lib/rsel/study_html.rb', line 172

def get_node(locator)
  return nil if @dirties > 0
  case locator
  when /^id=/, /^name=/
    locator = locator.gsub("'","\\\\'").gsub(/([a-z]+)=([^ ]*) */, "[@\\1='\\2']")
    locator = locator.sub(/\]([^ ]+) */, "][@value='\\1']")
    return @studied_page.at_xpath("//*#{locator}")
  when /^link=/
    # Parse the link through loc (which may simplify it to an id or something).
    # Then try get_studied_node again.  It should not return to this spot.
    return get_node(loc(locator[5,locator.length], 'link'))
  when /^css=/
    return @studied_page.at_css(locator[4,locator.length])
  when /^xpath=/, /^\/\//
    return @studied_page.at_xpath(locator.sub(/^xpath=/,''))
  when /^dom=/, /^document\./
    # Can't parse dom=
    return nil
  else
    locator = locator.sub(/^id(entifier)?=/,'')
    retval = @studied_page.at_xpath("//*[@id='#{locator}']")
    retval = @studied_page.at_xpath("//*[@name='#{locator}']") unless retval
    return retval
  end
end

#keep_clean(switch) ⇒ Object

Turn on or off maintenance of clean status. True prevents #dirty from having any effect. Also cleans all dirties (with #undo_all_dirties) if true, or dirties the page (with #dirty) if false.



79
80
81
82
83
84
85
86
87
88
89
90
91
92
# File 'lib/rsel/study_html.rb', line 79

def keep_clean(switch)
  if switch
    if undo_all_dirties
      @keep_clean = true
      return true
    else
      return false
    end
  else
    @keep_clean = false
    dirty
    return true
  end
end

#keeping_clean?Boolean

Return whether keep_clean is on or not. Useful if you want to start keeping clean and then return to your previous state. Invariant: clean? == true if keeping_clean? == true.



96
97
98
# File 'lib/rsel/study_html.rb', line 96

def keeping_clean?
  return @keep_clean
end

#simplify_locator(locator, tocss = true) ⇒ Object

Simplify a Selenium-like locator (xpath or a css path), based on the studied page



136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
# File 'lib/rsel/study_html.rb', line 136

def simplify_locator(locator, tocss=true)
  return locator if @dirties > 0

  # We need a locator using either a css= or locator= expression.
  if locator[0,4] == 'css='
    studied_node = @studied_page.at_css(locator[4,locator.length])
    # If we're already using a css path, don't bother simplifying it to another css path.
    tocss = false
  elsif locator[0,6] == 'xpath=' || locator[0,2] == '//'
    locator = 'xpath='+locator if locator[0,2] == '//'
    studied_node = @studied_page.at_xpath(locator[6,locator.length])
  else
    # Some other kind of locator.  Just return it.
    return locator
  end
  # If the path wasn't found, just return the locator; maybe the browser will
  # have better luck.  (Or return a better error message!)
  return locator if studied_node == nil

  # Now let's try simplified locators.  First, id.
  return "id=#{studied_node['id']}" if(studied_node['id'] &&
                                       @studied_page.at_xpath("//*[@id='#{studied_node['id']}']") == studied_node)
  # Next, name.  Same pattern.
  return "name=#{studied_node['name']}" if(studied_node['name'] &&
                                           @studied_page.at_xpath("//*[@name='#{studied_node['name']}']") == studied_node)

  # Link, perhaps?
  return "link=#{studied_node.inner_text}" if(studied_node.node_name.downcase == 'a' && 
                                           @studied_page.at_xpath("//a[text()='#{studied_node.inner_text}']") == studied_node)

  # Finally, try a CSS path.  Make that a simple xpath, since nth-of-type doesn't work.  But give up if we were told not to convert to CSS.
  return locator unless tocss
  return "xpath=#{studied_node.path}"
end

#study(page, keep = false) ⇒ Object

Load a page to study.



33
34
35
36
37
38
39
40
41
42
43
44
45
# File 'lib/rsel/study_html.rb', line 33

def study(page, keep=false)
  @sections_kept_clean = []
  begin
    @studied_page = Nokogiri::HTML(page)
    @dirties = 0
  rescue => e
    @keep_clean = false
    @dirties = NO_PAGE_LOADED
    @studied_page = nil
    raise e
  end
  @keep_clean = keep
end

#undo_all_dirtiesObject

Try to un-#dirty the studied page. Returns true on success or false if there was no page to clean.



61
62
63
64
65
66
67
68
69
# File 'lib/rsel/study_html.rb', line 61

def undo_all_dirties
  if @studied_page != nil
    @dirties = 0
  else
    @keep_clean = false
    @dirties = NO_PAGE_LOADED
  end
  return @dirties == 0
end

#undo_last_dirtyObject

Try to un-#dirty the studied page by marking one (potential) change since the page was studied not actually a change. This may or may not be enough to resume using the studied page. Cannot be used preemptively.



56
57
58
# File 'lib/rsel/study_html.rb', line 56

def undo_last_dirty
  @dirties -= 1 unless @dirties <= 0
end