Module: ScraperUtils::SpecSupport
- Defined in:
- lib/scraper_utils/spec_support.rb
Overview
Methods to support specs
Constant Summary collapse
- AUSTRALIAN_STATES =
%w[ACT NSW NT QLD SA TAS VIC WA].freeze
- STREET_TYPE_PATTERNS =
[ /\bAv(e(nue)?)?\b/i, /\bB(oulevard|lvd)\b/i, /\b(Circuit|Cct)\b/i, /\bCl(ose)?\b/i, /\bC(our|r)?t\b/i, /\bCircle\b/i, /\bChase\b/i, /\bCr(es(cent)?)?\b/i, /\bDr((ive)?|v)\b/i, /\bEnt(rance)?\b/i, /\bGr(ove)?\b/i, /\bH(ighwa|w)y\b/i, /\bLane\b/i, /\bLoop\b/i, /\bParkway\b/i, /\bPl(ace)?\b/i, /\bPriv(ate)?\b/i, /\bParade\b/i, /\bR(oa)?d\b/i, /\bRise\b/i, /\bSt(reet)?\b/i, /\bSquare\b/i, /\bTerrace\b/i, /\bWay\b/i ].freeze
- AUSTRALIAN_POSTCODES =
/\b\d{4}\b/.freeze
- PLACEHOLDERS =
[ /no description/i, /not available/i, /to be confirmed/i, /\btbc\b/i, %r{\bn/a\b}i ].freeze
Class Method Summary collapse
-
.geocodable?(address, ignore_case: false) ⇒ Boolean
Check if an address is likely to be geocodable by analyzing its format.
- .placeholder?(text) ⇒ Boolean
-
.reasonable_description?(text) ⇒ Boolean
Check if this looks like a “reasonable” description This is a bit stricter than needed - typically assert >= 75% match.
Class Method Details
.geocodable?(address, ignore_case: false) ⇒ Boolean
Check if an address is likely to be geocodable by analyzing its format. This is a bit stricter than needed - typically assert >= 75% match
42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 |
# File 'lib/scraper_utils/spec_support.rb', line 42 def self.geocodable?(address, ignore_case: false) return false if address.nil? || address.empty? check_address = ignore_case ? address.upcase : address # Basic structure check - must have a street name, suburb, state and postcode has_state = AUSTRALIAN_STATES.any? { |state| check_address.end_with?(" #{state}") || check_address.include?(" #{state} ") } has_postcode = address.match?(AUSTRALIAN_POSTCODES) # Using the pre-compiled patterns has_street_type = STREET_TYPE_PATTERNS.any? { |pattern| check_address.match?(pattern) } has_unit_or_lot = address.match?(/\b(Unit|Lot:?)\s+\d+/i) has_suburb_stats = check_address.match?(/(\b[A-Z]{2,}(\s+[A-Z]+)*,?|,\s+[A-Z][A-Za-z ]+)(\s+\d{4})?\s+(#{AUSTRALIAN_STATES.join('|')})\b/) if ENV["DEBUG"] missing = [] unless has_street_type || has_unit_or_lot missing << "street type / unit / lot" end missing << "state" unless has_state missing << "postcode" unless has_postcode missing << "suburb state" unless has_suburb_stats puts " address: #{address} is not geocodable, missing #{missing.join(', ')}" if missing.any? end (has_street_type || has_unit_or_lot) && has_state && has_postcode && has_suburb_stats end |
.placeholder?(text) ⇒ Boolean
79 80 81 |
# File 'lib/scraper_utils/spec_support.rb', line 79 def self.placeholder?(text) PLACEHOLDERS.any? { |placeholder| text.to_s.match?(placeholder) } end |
.reasonable_description?(text) ⇒ Boolean
Check if this looks like a “reasonable” description This is a bit stricter than needed - typically assert >= 75% match
85 86 87 |
# File 'lib/scraper_utils/spec_support.rb', line 85 def self.reasonable_description?(text) !placeholder?(text) && text.to_s.split.size >= 3 end |