Class: ScraperUtils::MechanizeUtils::AdaptiveDelay
- Inherits:
-
Object
- Object
- ScraperUtils::MechanizeUtils::AdaptiveDelay
- Defined in:
- lib/scraper_utils/mechanize_utils/adaptive_delay.rb
Overview
Adapts delays between requests based on server response times. Target delay is proportional to response time based on max_load setting. Uses an exponential moving average to smooth variations in response times.
Constant Summary collapse
- DEFAULT_MIN_DELAY =
0.0
- DEFAULT_MAX_DELAY =
Presumed default timeout for Mechanize
30.0
Instance Attribute Summary collapse
-
#max_delay ⇒ Object
readonly
Returns the value of attribute max_delay.
-
#max_load ⇒ Object
readonly
Returns the value of attribute max_load.
-
#min_delay ⇒ Object
readonly
Returns the value of attribute min_delay.
Instance Method Summary collapse
-
#delay(uri) ⇒ Float
Current delay for the domain, or min_delay if no delay set.
-
#initialize(min_delay: DEFAULT_MIN_DELAY, max_delay: DEFAULT_MAX_DELAY, max_load: AgentConfig::DEFAULT_MAX_LOAD) ⇒ AdaptiveDelay
constructor
Creates a new adaptive delay calculator.
-
#next_delay(uri, response_time) ⇒ Float
Returns the next_delay calculated from a smoothed average of response_time to use less than max_load% of server.
Constructor Details
#initialize(min_delay: DEFAULT_MIN_DELAY, max_delay: DEFAULT_MAX_DELAY, max_load: AgentConfig::DEFAULT_MAX_LOAD) ⇒ AdaptiveDelay
Creates a new adaptive delay calculator
23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
# File 'lib/scraper_utils/mechanize_utils/adaptive_delay.rb', line 23 def initialize(min_delay: DEFAULT_MIN_DELAY, max_delay: DEFAULT_MAX_DELAY, max_load: AgentConfig::DEFAULT_MAX_LOAD) @delays = {} # domain -> last delay used @min_delay = min_delay.to_f @max_delay = max_delay.to_f @max_load = max_load.to_f.clamp(1.0, AgentConfig::MAX_LOAD_CAP) @response_multiplier = (100.0 - @max_load) / @max_load return unless DebugUtils.basic? ScraperUtils::LogUtils.log( "AdaptiveDelay initialized with delays between #{@min_delay} and #{@max_delay} seconds, " \ "Max_load #{@max_load}% thus response multiplier: #{@response_multiplier.round(2)}x" ) end |
Instance Attribute Details
#max_delay ⇒ Object (readonly)
Returns the value of attribute max_delay.
15 16 17 |
# File 'lib/scraper_utils/mechanize_utils/adaptive_delay.rb', line 15 def max_delay @max_delay end |
#max_load ⇒ Object (readonly)
Returns the value of attribute max_load.
15 16 17 |
# File 'lib/scraper_utils/mechanize_utils/adaptive_delay.rb', line 15 def max_load @max_load end |
#min_delay ⇒ Object (readonly)
Returns the value of attribute min_delay.
15 16 17 |
# File 'lib/scraper_utils/mechanize_utils/adaptive_delay.rb', line 15 def min_delay @min_delay end |
Instance Method Details
#delay(uri) ⇒ Float
Returns Current delay for the domain, or min_delay if no delay set.
40 41 42 |
# File 'lib/scraper_utils/mechanize_utils/adaptive_delay.rb', line 40 def delay(uri) @delays[domain(uri)] || @min_delay end |
#next_delay(uri, response_time) ⇒ Float
Returns the next_delay calculated from a smoothed average of response_time to use less than max_load% of server
49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 |
# File 'lib/scraper_utils/mechanize_utils/adaptive_delay.rb', line 49 def next_delay(uri, response_time) uris_domain = domain(uri) # calculate target_delay to achieve desired max_load% using pre-calculated multiplier target_delay = (response_time * @response_multiplier).clamp(0.0, @max_delay) # Initialise average from initial_response_time rather than zero to start with reasonable approximation current_delay = @delays[uris_domain] || target_delay # exponential smooth the delay to smooth out wild swings (Equivalent to an RC low pass filter) delay = ((3.0 * current_delay) + target_delay) / 4.0 delay = delay.clamp(@min_delay, @max_delay) if DebugUtils.basic? ScraperUtils::LogUtils.log( "Adaptive delay for #{uris_domain} updated to #{delay.round(2)}s (target: " \ "#{@response_multiplier.round(1)}x response_time of #{response_time.round(2)}s)" ) end @delays[uris_domain] = delay delay end |