Class: Arwen
- Inherits:
-
Object
- Object
- Arwen
- Defined in:
- lib/arwen.rb,
lib/arwen/url.rb,
lib/arwen/version.rb
Overview
Parses a sitemap url and provides all links provided by the sitemap or sitemap_index. It uses Typheous for network requests and making concurrent requests when parsing a sitemap_index. Ox is the XML parser used to parse the sitemap. Sitemaps are assumed to follow the sitemaps.org protocol.
Defined Under Namespace
Classes: Url
Constant Summary collapse
- VERSION =
"0.1.1"
Instance Method Summary collapse
-
#initialize(url, opts = {}) ⇒ Arwen
constructor
Create a new Arwen instance.
-
#sitemap ⇒ Ox::Document
parses the sitemap url to an Ox::Document instance.
-
#to_a ⇒ Array<String>
returns an array of url strings for all URls in the sitemap.
-
#urls ⇒ Array<SitemapParser::Url>
fetches and returns all urls for the sitemap with corresponding <url> sitemap schema metadata.
Constructor Details
#initialize(url, opts = {}) ⇒ Arwen
Create a new Arwen instance
22 23 24 25 26 27 |
# File 'lib/arwen.rb', line 22 def initialize(url, opts = {}) @url = url max_concurrency = opts.delete(:max_concurrency) { 200 } @opts = { followlocation: true }.merge(opts) @hydra = Typhoeus::Hydra.new(max_concurrency: max_concurrency) end |
Instance Method Details
#sitemap ⇒ Ox::Document
parses the sitemap url to an Ox::Document instance
47 48 49 |
# File 'lib/arwen.rb', line 47 def sitemap @sitemap ||= raw_sitemap end |
#to_a ⇒ Array<String>
returns an array of url strings for all URls in the sitemap
39 40 41 |
# File 'lib/arwen.rb', line 39 def to_a urls.map(&:url) end |
#urls ⇒ Array<SitemapParser::Url>
fetches and returns all urls for the sitemap with corresponding <url> sitemap schema metadata
32 33 34 |
# File 'lib/arwen.rb', line 32 def urls @urls ||= all_urls(sitemap) end |