Class: Audumbla::Enrichments::CoarseGeocode

Inherits:
Object
  • Object
show all
Includes:
FieldEnrichment
Defined in:
lib/audumbla/enrichments/coarse_geocode.rb

Overview

Enriches a ‘DPLA::MAP::Place` node by running its data through external geocoders, using heuristics to determine a matching feature from GeoNames, and repopulating the `Place` with related data.

If the existing ‘Place` contains data other than a `providedLabel`, that data will be used as context for evaluating interpretations. For example: a `Place` with an existing latitude and longitude will verify that the point is within the bounding box for a candidate match.

‘skos:exactMatch` are reserved for the GeoNames features returned by the geocoder. Other matching URIs (currently: LC authorities) are included as `skos:closeMatch`

Configuration is handled through a YAML file passed into the initializer (default: ‘geocode.yml’). The options are:

- 'twofishes_host': the hostname for the twofishes server (default: 
    'localhost')
- 'twofishes_port': the port of the twofishes geocode endpoint (default: 
     8080)
- 'twofishes_timeout': request timeout in seconds (default: 3)
- 'twofishes_retries': request retry maximum for twofishes (default: 2)
- 'distance_threshold': the maximum distance between a set of coordinates
     in the  input object and a candidate match before we judge it a 
     false positive, given in kilometers. (default: 5)
- 'max_intepretations': the number of geocoded "interpretations" to 
    request from the server; these are the places that will be considered
    by the internal heuristics (defualt: 5).

Examples:

enriching from a ‘#providedLabel`


place = DPLA::MAP::Place.new.tap { |p| p.providedLabel = 'Georgia' }
CoarseGeocode.new.enrich_value.dump :ttl
# [
#    a <http://www.europeana.eu/schemas/edm/Place>;
#    <http://dp.la/about/map/providedLabel> "Georgia";
#    <http://www.geonames.org/ontology#countryCode> "US";
#    <http://www.w3.org/2003/01/geo/wgs84_pos#lat> 3.275042e1;
#    <http://www.w3.org/2003/01/geo/wgs84_pos#long> -8.350018e1;
#    <http://www.w3.org/2004/02/skos/core#closeMatch> <http://id.loc.gov/authorities/names/n79023113>;
#    <http://www.w3.org/2004/02/skos/core#exactMatch> <http://sws.geonames.org/4197000/>;
#    <http://www.w3.org/2004/02/skos/core#prefLabel> "Georgia, United States"
# ] .

enriching from a ‘#providedLabel` with lat/lng guidance


place = DPLA::MAP::Place.new.tap do |p| 
  p.providedLabel = 'Georgia'
  p.lat = 41.9997
  p.long = 43.4998
end

CoarseGeocode.new.enrich_value.dump :ttl
# [
#    a <http://www.europeana.eu/schemas/edm/Place>;
#    <http://dp.la/about/map/providedLabel> "Georgia";
#    <http://www.geonames.org/ontology#countryCode> "GE";
#    <http://www.w3.org/2003/01/geo/wgs84_pos#lat> 4.199998e1;
#    <http://www.w3.org/2003/01/geo/wgs84_pos#long> 4.34999e1;
#    <http://www.w3.org/2004/02/skos/core#exactMatch> <http://sws.geonames.org/614540/>;
#    <http://www.w3.org/2004/02/skos/core#prefLabel> "Georgia"
# ] .

Constant Summary collapse

DEFAULT_DISTANCE_THRESHOLD_KMS =
100
DEFAULT_MAX_INTERPRETATIONS =
5
DEFAULT_TWOFISHES_HOST =
'localhost'
DEFAULT_TWOFISHES_PORT =
8080
DEFAULT_TWOFISHES_TIMEOUT =
10
DEFAULT_TWOFISHES_RETRIES =
2

Instance Method Summary collapse

Methods included from FieldEnrichment

#enrich, #enrich_all, #enrich_field

Methods included from Audumbla::Enrichment

#enrich, #enrich!, #list_fields

Constructor Details

#initialize(config_file = 'geocode.yml') ⇒ CoarseGeocode

Returns a new instance of CoarseGeocode.

Parameters:

  • config_file (String) (defaults to: 'geocode.yml')

    a path to a config file for the geocoder; default: ‘geocode.yml’



82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
# File 'lib/audumbla/enrichments/coarse_geocode.rb', line 82

def initialize(config_file = 'geocode.yml')
  config = YAML.load_file(config_file)

  @distance_threshold = config.fetch('distance_threshold', 
                                     DEFAULT_DISTANCE_THRESHOLD_KMS)
  @max_interpretations = config.fetch('max_interpretations', 
                                      DEFAULT_MAX_INTERPRETATIONS)

  Twofishes.configure do |twofish|
    twofish.host = config.fetch('twofishes_host', DEFAULT_TWOFISHES_HOST)
    twofish.port = config.fetch('twofishes_port', DEFAULT_TWOFISHES_PORT)
    twofish.timeout = config.fetch('twofishes_timeout', 
                                   DEFAULT_TWOFISHES_TIMEOUT)
    twofish.retries = config.fetch('twofishes_retries', 
                                   DEFAULT_TWOFISHES_RETRIES)
  end
end

Instance Method Details

#enrich_value(value) ⇒ DPLA::MAP::Place

Enriches the given value against the TwoFishes coarse geocoder. This process adds a ‘skos:exactMatch` for a matching GeoNames URI, if any, and populates the remaining place data to the degree possible from the matched feature.

Considers a number of matches specified by ‘@max_interpretations` and returned by Twofishes, via `#match?`.

Parameters:

  • value (DPLA::MAP::Place)

    the place to geocode

Returns:

  • (DPLA::MAP::Place)

    the inital place, enriched via coarse geocoding



112
113
114
115
116
117
118
119
# File 'lib/audumbla/enrichments/coarse_geocode.rb', line 112

def enrich_value(value)
  return value unless value.is_a? DPLA::MAP::Place
  interpretations = geocode(value.providedLabel.first, 
                            [],
                            maxInterpretations: @max_interpretations)
  match = interpretations.find { |interp| match?(interp, value) }
  match.nil? ? value : enrich_place(value, match.feature)
end

#match?(interpretation, place) ⇒ Boolean

Checks that we are satisfied with the geocoder’s best matches prior to acceptance. Most tweaks to the geocoding process should be taken care of at the geocoder itself, but a simple accept/reject of the points offered is possible here. This allows existing data about the place to be used as context.

For example, this method returns false if ‘place` contains latitude and longitude, but the candidate match has a geometry far away from those given. “far away” is defined by `@distance_threshold` from the center of the candidate feature to the point given by `#lat` and `#long` in `place`.

Parameters:

  • interpretation (GeocodeInterpretation)

    a twofishes interpretation

  • place (#lat#long)

    a place to verify a match against

Returns:

  • (Boolean)


137
138
139
140
141
142
143
144
145
146
147
148
149
150
# File 'lib/audumbla/enrichments/coarse_geocode.rb', line 137

def match?(interpretation, place)
  return true if place.lat.empty? || place.long.empty?

  point = Geokit::LatLng.new(place.lat.first, place.long.first)
  if interpretation.geometry.bounds.nil?
    # measure distance between point centers
    distance = twofishes_point_to_geokit(interpretation.geometry.center)
               .distance_to(point, unit: :kms)
    return distance < @distance_threshold
  end
    
  twofishes_bounds_to_geokit(interpretation.geometry.bounds)
    .contains?(point)
end