Class: ScraperUtils::MechanizeActions

Inherits:
Object
  • Object
show all
Defined in:
lib/scraper_utils/mechanize_actions.rb

Overview

Class for executing a series of mechanize actions with flexible replacements

Examples:

Basic usage

agent = ScraperUtils::MechanizeUtils.mechanize_agent
page = agent.get("https://example.com")

actions = [
  [:click, "Next Page"],
  [:click, ["Option A", "xpath://div[@id='results']/a", "css:.some-button"]] # Will select one randomly
]

processor = ScraperUtils::MechanizeActions.new(agent)
result_page = processor.process(page, actions)

With replacements

replacements = { FROM_DATE: "2022-01-01", TO_DATE: "2022-03-01" }
processor = ScraperUtils::MechanizeActions.new(agent, replacements)

# Use replacements in actions
actions = [
  [:click, "Search between {FROM_DATE} and {TO_DATE}"]
]

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(agent, replacements = {}) ⇒ MechanizeActions

Initialize a new MechanizeActions processor

Parameters:

  • agent (Mechanize)

    The mechanize agent to use for actions

  • replacements (Hash) (defaults to: {})

    Optional text replacements to apply to action parameters



37
38
39
40
41
# File 'lib/scraper_utils/mechanize_actions.rb', line 37

def initialize(agent, replacements = {})
  @agent = agent
  @replacements = replacements || {}
  @results = []
end

Instance Attribute Details

#agentMechanize (readonly)

Returns The mechanize agent used for actions.

Returns:

  • (Mechanize)

    The mechanize agent used for actions



28
29
30
# File 'lib/scraper_utils/mechanize_actions.rb', line 28

def agent
  @agent
end

#resultsArray (readonly)

Returns The results of each action performed.

Returns:

  • (Array)

    The results of each action performed



31
32
33
# File 'lib/scraper_utils/mechanize_actions.rb', line 31

def results
  @results
end

Instance Method Details

#process(page, actions) ⇒ Mechanize::Page

Process a sequence of actions on a page

Examples:

Action format

actions = [
  [:click, "Link Text"],                     # Click on link with this text
  [:click, ["Option A", "text:Option B"]],   # Click on one of these options (randomly selected)
  [:click, "css:.some-button"],              # Use CSS selector
  [:click, "xpath://div[@id='results']/a"],  # Use XPath selector
  [:block, ->(page, args, agent, results) { [page, { custom_results: 'data' }] }] # Custom block
]

Parameters:

  • page (Mechanize::Page)

    The starting page

  • actions (Array<Array>)

    The sequence of actions to perform

Returns:

  • (Mechanize::Page)

    The resulting page after all actions

Raises:

  • (ArgumentError)

    If an unknown action type is provided



58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
# File 'lib/scraper_utils/mechanize_actions.rb', line 58

def process(page, actions)
  @results = []
  current_page = page

  actions.each do |action|
    args = action.dup
    action_type = args.shift
    current_page, result =
      case action_type
      when :click
        handle_click(current_page, args)
      when :block
        handle_block(current_page, args)
      else
        raise ArgumentError, "Unknown action type: #{action_type}"
      end

    @results << result
  end

  current_page
end