Class: ScraperUtils::DataQualityMonitor
- Inherits:
-
Object
- Object
- ScraperUtils::DataQualityMonitor
- Defined in:
- lib/scraper_utils/data_quality_monitor.rb
Overview
Monitors data quality during scraping by tracking successful vs failed record processing Automatically triggers an exception if the error rate exceeds a threshold
Class Attribute Summary collapse
-
.stats ⇒ Object
readonly
Returns the value of attribute stats.
Class Method Summary collapse
-
.extract_authority(record) ⇒ Object
Extracts authority label and ensures stats are setup for record.
-
.log_saved_record(record) ⇒ void
Logs a successfully saved record.
-
.log_unprocessable_record(exception, record) ⇒ void
Logs an unprocessable record and raises an exception if error threshold is exceeded The threshold is 5 + 10% of saved records.
-
.start_authority(authority_label) ⇒ Object
Notes the start of processing an authority and clears any previous stats.
- .threshold(authority_label) ⇒ Object
Class Attribute Details
.stats ⇒ Object (readonly)
Returns the value of attribute stats.
10 11 12 |
# File 'lib/scraper_utils/data_quality_monitor.rb', line 10 def stats @stats end |
Class Method Details
.extract_authority(record) ⇒ Object
Extracts authority label and ensures stats are setup for record
22 23 24 25 26 27 |
# File 'lib/scraper_utils/data_quality_monitor.rb', line 22 def self.(record) = (record&.key?("authority_label") ? record["authority_label"] : "").to_sym @stats ||= {} @stats[] ||= { saved: 0, unprocessed: 0 } end |
.log_saved_record(record) ⇒ void
This method returns an undefined value.
Logs a successfully saved record
57 58 59 60 61 |
# File 'lib/scraper_utils/data_quality_monitor.rb', line 57 def self.log_saved_record(record) = (record) @stats[][:saved] += 1 ScraperUtils::LogUtils.log "Saving record #{authority_label} - #{record['address']}" end |
.log_unprocessable_record(exception, record) ⇒ void
This method returns an undefined value.
Logs an unprocessable record and raises an exception if error threshold is exceeded The threshold is 5 + 10% of saved records
40 41 42 43 44 45 46 47 48 49 50 51 |
# File 'lib/scraper_utils/data_quality_monitor.rb', line 40 def self.log_unprocessable_record(exception, record) = (record) @stats[][:unprocessed] += 1 ScraperUtils::LogUtils.log "Erroneous record #{authority_label} - #{record&.fetch( 'address', nil ) || record.inspect}: #{exception}" return unless @stats[][:unprocessed] > threshold() raise ScraperUtils::UnprocessableSite, "Too many unprocessable_records for #{authority_label}: " \ "#{@stats[authority_label].inspect} - aborting processing of site!" end |
.start_authority(authority_label) ⇒ Object
Notes the start of processing an authority and clears any previous stats
16 17 18 19 |
# File 'lib/scraper_utils/data_quality_monitor.rb', line 16 def self.() @stats ||= {} @stats[] = { saved: 0, unprocessed: 0 } end |
.threshold(authority_label) ⇒ Object
29 30 31 |
# File 'lib/scraper_utils/data_quality_monitor.rb', line 29 def self.threshold() 5.01 + (@stats[][:saved] * 0.1) if @stats&.fetch(, nil) end |