Class: Audumbla::Enrichments::CoarseGeocode
- Inherits:
-
Object
- Object
- Audumbla::Enrichments::CoarseGeocode
- Includes:
- FieldEnrichment
- Defined in:
- lib/audumbla/enrichments/coarse_geocode.rb
Overview
Enriches a ‘DPLA::MAP::Place` node by running its data through external geocoders, using heuristics to determine a matching feature from GeoNames, and repopulating the `Place` with related data.
If the existing ‘Place` contains data other than a `providedLabel`, that data will be used as context for evaluating interpretations. For example: a `Place` with an existing latitude and longitude will verify that the point is within the bounding box for a candidate match.
‘skos:exactMatch` are reserved for the GeoNames features returned by the geocoder. Other matching URIs (currently: LC authorities) are included as `skos:closeMatch`
Configuration is handled through a YAML file passed into the initializer (default: ‘geocode.yml’). The options are:
- 'twofishes_host': the hostname for the twofishes server (default:
'localhost')
- 'twofishes_port': the port of the twofishes geocode endpoint (default:
8080)
- 'twofishes_timeout': request timeout in seconds (default: 3)
- 'twofishes_retries': request retry maximum for twofishes (default: 2)
- 'distance_threshold': the maximum distance between a set of coordinates
in the input object and a candidate match before we judge it a
false positive, given in kilometers. (default: 5)
- 'max_intepretations': the number of geocoded "interpretations" to
request from the server; these are the places that will be considered
by the internal heuristics (defualt: 5).
Constant Summary collapse
- DEFAULT_DISTANCE_THRESHOLD_KMS =
100
- DEFAULT_MAX_INTERPRETATIONS =
5
- DEFAULT_TWOFISHES_HOST =
'localhost'
- DEFAULT_TWOFISHES_PORT =
8080
- DEFAULT_TWOFISHES_TIMEOUT =
10
- DEFAULT_TWOFISHES_RETRIES =
2
Instance Method Summary collapse
-
#enrich_value(value) ⇒ DPLA::MAP::Place
Enriches the given value against the TwoFishes coarse geocoder.
-
#initialize(config_file = 'geocode.yml') ⇒ CoarseGeocode
constructor
A new instance of CoarseGeocode.
-
#match?(interpretation, place) ⇒ Boolean
Checks that we are satisfied with the geocoder’s best matches prior to acceptance.
Methods included from FieldEnrichment
#enrich, #enrich_all, #enrich_field
Methods included from Audumbla::Enrichment
#enrich, #enrich!, #list_fields
Constructor Details
#initialize(config_file = 'geocode.yml') ⇒ CoarseGeocode
Returns a new instance of CoarseGeocode.
82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 |
# File 'lib/audumbla/enrichments/coarse_geocode.rb', line 82 def initialize(config_file = 'geocode.yml') config = YAML.load_file(config_file) @distance_threshold = config.fetch('distance_threshold', DEFAULT_DISTANCE_THRESHOLD_KMS) @max_interpretations = config.fetch('max_interpretations', DEFAULT_MAX_INTERPRETATIONS) Twofishes.configure do |twofish| twofish.host = config.fetch('twofishes_host', DEFAULT_TWOFISHES_HOST) twofish.port = config.fetch('twofishes_port', DEFAULT_TWOFISHES_PORT) twofish.timeout = config.fetch('twofishes_timeout', DEFAULT_TWOFISHES_TIMEOUT) twofish.retries = config.fetch('twofishes_retries', DEFAULT_TWOFISHES_RETRIES) end end |
Instance Method Details
#enrich_value(value) ⇒ DPLA::MAP::Place
Enriches the given value against the TwoFishes coarse geocoder. This process adds a ‘skos:exactMatch` for a matching GeoNames URI, if any, and populates the remaining place data to the degree possible from the matched feature.
Considers a number of matches specified by ‘@max_interpretations` and returned by Twofishes, via `#match?`.
112 113 114 115 116 117 118 119 |
# File 'lib/audumbla/enrichments/coarse_geocode.rb', line 112 def enrich_value(value) return value unless value.is_a? DPLA::MAP::Place interpretations = geocode(value.providedLabel.first, [], maxInterpretations: @max_interpretations) match = interpretations.find { |interp| match?(interp, value) } match.nil? ? value : enrich_place(value, match.feature) end |
#match?(interpretation, place) ⇒ Boolean
Checks that we are satisfied with the geocoder’s best matches prior to acceptance. Most tweaks to the geocoding process should be taken care of at the geocoder itself, but a simple accept/reject of the points offered is possible here. This allows existing data about the place to be used as context.
For example, this method returns false if ‘place` contains latitude and longitude, but the candidate match has a geometry far away from those given. “far away” is defined by `@distance_threshold` from the center of the candidate feature to the point given by `#lat` and `#long` in `place`.
137 138 139 140 141 142 143 144 145 146 147 148 149 150 |
# File 'lib/audumbla/enrichments/coarse_geocode.rb', line 137 def match?(interpretation, place) return true if place.lat.empty? || place.long.empty? point = Geokit::LatLng.new(place.lat.first, place.long.first) if interpretation.geometry.bounds.nil? # measure distance between point centers distance = twofishes_point_to_geokit(interpretation.geometry.center) .distance_to(point, unit: :kms) return distance < @distance_threshold end twofishes_bounds_to_geokit(interpretation.geometry.bounds) .contains?(point) end |