Class: ScraperWiki::API
- Inherits:
-
Object
- Object
- ScraperWiki::API
- Includes:
- HTTParty
- Defined in:
- lib/scraperwiki-api.rb,
lib/scraperwiki-api/version.rb,
lib/scraperwiki-api/matchers.rb
Overview
A Ruby wrapper for the ScraperWiki API.
Defined Under Namespace
Modules: Matchers
Constant Summary collapse
- RUN_INTERVALS =
{ never: -1, monthly: 2678400, weekly: 604800, daily: 86400, hourly: 3600, }
- VERSION =
"0.0.6"
Class Method Summary collapse
-
.edit_scraper_url(shortname) ⇒ String
Returns the URL to edit the scraper.
-
.scraper_url(shortname) ⇒ String
Returns the URL to the scraper’s overview.
Instance Method Summary collapse
-
#datastore_sqlite(shortname, query, opts = {}) ⇒ Array, ...
Queries and extracts data via a general purpose SQL interface.
-
#initialize(apikey = nil) ⇒ API
constructor
Initializes a ScraperWiki API object.
-
#scraper_getinfo(shortname, opts = {}) ⇒ Array
Extracts data about a scraper’s code, owner, history, etc.
-
#scraper_getruninfo(shortname, opts = {}) ⇒ Array
See what the scraper did during each run.
-
#scraper_getuserinfo(username) ⇒ Array
Find out information about a user.
-
#scraper_search(opts = {}) ⇒ Array
Search the titles and descriptions of all the scrapers.
-
#scraper_usersearch(opts = {}) ⇒ Array
Search for a user by name.
Constructor Details
#initialize(apikey = nil) ⇒ API
Initializes a ScraperWiki API object.
37 38 39 |
# File 'lib/scraperwiki-api.rb', line 37 def initialize(apikey = nil) @apikey = apikey end |
Class Method Details
.edit_scraper_url(shortname) ⇒ String
Returns the URL to edit the scraper.
31 32 33 |
# File 'lib/scraperwiki-api.rb', line 31 def edit_scraper_url(shortname) "https://scraperwiki.com/scrapers/#{shortname}/edit/" end |
.scraper_url(shortname) ⇒ String
Returns the URL to the scraper’s overview.
23 24 25 |
# File 'lib/scraperwiki-api.rb', line 23 def scraper_url(shortname) "https://scraperwiki.com/scrapers/#{shortname}/" end |
Instance Method Details
#datastore_sqlite(shortname, query, opts = {}) ⇒ Array, ...
The query string parameter is name, not shortname as in the ScraperWiki docs
Queries and extracts data via a general purpose SQL interface.
To make an RSS feed you need to use SQL’s AS keyword (e.g. “SELECT name AS description”) to make columns called title, link, description, guid (optional, uses link if not available) and pubDate or date.
jsondict example output:
[
{
"fieldA": "valueA",
"fieldB": "valueB",
"fieldC": "valueC",
},
...
]
jsonlist example output:
{
"keys": ["fieldA", "fieldB", "fieldC"],
"data": [
["valueA", "valueB", "valueC"],
...
]
}
csv example output:
fieldA,fieldB,fieldC
valueA,valueB,valueC
...
86 87 88 89 90 91 |
# File 'lib/scraperwiki-api.rb', line 86 def datastore_sqlite(shortname, query, opts = {}) if Array === opts[:attach] opts[:attach] = opts[:attach].join ';' end request_with_apikey '/datastore/sqlite', {name: shortname, query: query}.merge(opts) end |
#scraper_getinfo(shortname, opts = {}) ⇒ Array
Returns an array although the array seems to always have only one item
The tags field seems to always be an empty array
Fields like last_run seem to follow British Summer Time.
The query string parameter is name, not shortname as in the ScraperWiki docs
Extracts data about a scraper’s code, owner, history, etc.
-
runidis a Unix timestamp with microseconds and a UUID. -
The value of
recordsis the same as that oftotal_rowsunderdatasummary. -
run_intervalis the number of seconds between runs. It is one of:-
-1 (never)
-
2678400 (monthly)
-
604800 (weekly)
-
86400 (daily)
-
3600 (hourly)
-
-
privacy_statusis one of:-
“public” (everyone can see and edit the scraper and its data)
-
“visible” (everyone can see the scraper, but only contributors can edit it)
-
“private” (only contributors can see and edit the scraper and its data)
-
-
An individual
runeventshash will have anexception_messagekey if there was an error during that run.
Example output:
[
{
"code": "require 'nokogiri'\n...",
"datasummary": {
"tables": {
"swdata": {
"keys": [
"fieldA",
...
],
"count": 42,
"sql": "CREATE TABLE `swdata` (...)"
},
"swvariables": {
"keys": [
"value_blob",
"type",
"name"
],
"count": 2,
"sql": "CREATE TABLE `swvariables` (`value_blob` blob, `type` text, `name` text)"
},
...
},
"total_rows": 44,
"filesize": 1000000
},
"description": "Scrapes websites for data.",
"language": "ruby",
"title": "Example scraper",
"tags": [],
"short_name": "example-scraper",
"userroles": {
"owner": [
"johndoe"
],
"editor": [
"janedoe",
...
]
},
"last_run": "1970-01-01T00:00:00",
"created": "1970-01-01T00:00:00",
"runevents": [
{
"still_running": false,
"pages_scraped": 5,
"run_started": "1970-01-01T00:00:00",
"last_update": "1970-01-01T00:00:00",
"runid": "1325394000.000000_xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx",
"records_produced": 42
},
...
],
"records": 44,
"wiki_type": "scraper",
"privacy_status": "visible",
"run_interval": 604800,
"attachable_here": [],
"attachables": [],
"history": [
...,
{
"date": "1970-01-01T00:00:00",
"version": 0,
"user": "johndoe",
"session": "Thu, 1 Jan 1970 00:00:08 GMT"
}
]
}
]
198 199 200 201 202 203 |
# File 'lib/scraperwiki-api.rb', line 198 def scraper_getinfo(shortname, opts = {}) if Array === opts[:quietfields] opts[:quietfields] = opts[:quietfields].join '|' end request_with_apikey '/scraper/getinfo', {name: shortname}.merge(opts) end |
#scraper_getruninfo(shortname, opts = {}) ⇒ Array
Returns an array although the array seems to always have only one item
The query string parameter is name, not shortname as in the ScraperWiki docs
See what the scraper did during each run.
Example output:
[
{
"run_ended": "1970-01-01T00:00:00",
"first_url_scraped": "http://www.iana.org/domains/example/",
"pages_scraped": 5,
"run_started": "1970-01-01T00:00:00",
"runid": "1325394000.000000_xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx",
"domainsscraped": [
{
"domain": "http://example.com",
"bytes": 1000000,
"pages": 5
}
...
],
"output": "...",
"records_produced": 42
}
]
237 238 239 |
# File 'lib/scraperwiki-api.rb', line 237 def scraper_getruninfo(shortname, opts = {}) request_with_apikey '/scraper/getruninfo', {name: shortname}.merge(opts) end |
#scraper_getuserinfo(username) ⇒ Array
Returns an array although the array seems to always have only one item
The date joined field is date_joined (with underscore) on #scraper_usersearch
Find out information about a user.
Example output:
[
{
"username": "johndoe",
"profilename": "John Doe",
"coderoles": {
"owner": [
"johndoe.emailer",
"example-scraper",
...
],
"email": [
"johndoe.emailer"
],
"editor": [
"yet-another-scraper",
...
]
},
"datejoined": "1970-01-01T00:00:00"
}
]
273 274 275 |
# File 'lib/scraperwiki-api.rb', line 273 def scraper_getuserinfo(username) request_with_apikey '/scraper/getuserinfo', username: username end |
#scraper_search(opts = {}) ⇒ Array
Search the titles and descriptions of all the scrapers.
Example output:
[
{
"description": "Scrapes websites for data.",
"language": "ruby",
"created": "1970-01-01T00:00:00",
"title": "Example scraper",
"short_name": "example-scraper",
"privacy_status": "public"
},
...
]
299 300 301 |
# File 'lib/scraperwiki-api.rb', line 299 def scraper_search(opts = {}) request_with_apikey '/scraper/search', opts end |
#scraper_usersearch(opts = {}) ⇒ Array
The date joined field is datejoined (without underscore) on #scraper_getuserinfo
Search for a user by name.
Example output:
[
{
"username": "johndoe",
"profilename": "John Doe",
"date_joined": "1970-01-01T00:00:00"
},
...
]
327 328 329 330 331 332 |
# File 'lib/scraperwiki-api.rb', line 327 def scraper_usersearch(opts = {}) if Array === opts[:nolist] opts[:nolist] = opts[:nolist].join ' ' end request '/scraper/usersearch', opts end |