Class: ScraperWiki::API

Inherits:

Object

Object
ScraperWiki::API

show all

Includes:: HTTParty

Defined in:: lib/scraperwiki-api.rb,
lib/scraperwiki-api/version.rb,
lib/scraperwiki-api/matchers.rb

Overview

A Ruby wrapper for the ScraperWiki API.

Defined Under Namespace

Modules: Matchers

Constant Summary collapse

RUN_INTERVALS =

{
  never: -1,
  monthly: 2678400,
  weekly: 604800,
  daily: 86400,
  hourly: 3600,
}

VERSION =

"0.0.6"

Class Method Summary collapse

.edit_scraper_url(shortname) ⇒ String

Returns the URL to edit the scraper.
.scraper_url(shortname) ⇒ String

Returns the URL to the scraper’s overview.

Instance Method Summary collapse

#datastore_sqlite(shortname, query, opts = {}) ⇒ Array, ...

Queries and extracts data via a general purpose SQL interface.
#initialize(apikey = nil) ⇒ API constructor

Initializes a ScraperWiki API object.
#scraper_getinfo(shortname, opts = {}) ⇒ Array

Extracts data about a scraper’s code, owner, history, etc.
#scraper_getruninfo(shortname, opts = {}) ⇒ Array

See what the scraper did during each run.
#scraper_getuserinfo(username) ⇒ Array

Find out information about a user.
#scraper_search(opts = {}) ⇒ Array

Search the titles and descriptions of all the scrapers.
#scraper_usersearch(opts = {}) ⇒ Array

Search for a user by name.

Constructor Details

#initialize(apikey = nil) ⇒ `API`

Initializes a ScraperWiki API object.



37
38
39

# File 'lib/scraperwiki-api.rb', line 37

def initialize(apikey = nil)
  @apikey = apikey
end

Class Method Details

.edit_scraper_url(shortname) ⇒ `String`

Returns the URL to edit the scraper.



31
32
33

# File 'lib/scraperwiki-api.rb', line 31

def edit_scraper_url(shortname)
  "https://scraperwiki.com/scrapers/#{shortname}/edit/"
end

.scraper_url(shortname) ⇒ `String`

Returns the URL to the scraper’s overview.



23
24
25

# File 'lib/scraperwiki-api.rb', line 23

def scraper_url(shortname)
  "https://scraperwiki.com/scrapers/#{shortname}/"
end

Instance Method Details

#datastore_sqlite(shortname, query, opts = {}) ⇒ `Array`, ...

Note:

The query string parameter is name, not shortname as in the ScraperWiki docs

Queries and extracts data via a general purpose SQL interface.

To make an RSS feed you need to use SQL’s AS keyword (e.g. “SELECT name AS description”) to make columns called title, link, description, guid (optional, uses link if not available) and pubDate or date.

jsondict example output:

[
  {
    "fieldA": "valueA",
    "fieldB": "valueB",
    "fieldC": "valueC",
  },
  ...
]

jsonlist example output:

{
  "keys": ["fieldA", "fieldB", "fieldC"],
  "data": [
    ["valueA", "valueB", "valueC"],
    ...
  ]
}

csv example output:

fieldA,fieldB,fieldC
valueA,valueB,valueC
...

Options Hash (opts):

:format (String) —

one of “jsondict”, “jsonlist”, “csv”, “htmltable” or “rss2”
:attach (Array, String) —

“;”-delimited list of shortnames of other scrapers whose data you need to access

#scraper_getinfo(shortname, opts = {}) ⇒ `Array`

Note:

Returns an array although the array seems to always have only one item

Note:

The tags field seems to always be an empty array

Note:

Fields like last_run seem to follow British Summer Time.

Note:

The query string parameter is name, not shortname as in the ScraperWiki docs

Extracts data about a scraper’s code, owner, history, etc.

runid is a Unix timestamp with microseconds and a UUID.
The value of records is the same as that of total_rows under datasummary.
run_interval is the number of seconds between runs. It is one of:
- -1 (never)
- 2678400 (monthly)
- 604800 (weekly)
- 86400 (daily)
- 3600 (hourly)
privacy_status is one of:
- “public” (everyone can see and edit the scraper and its data)
- “visible” (everyone can see the scraper, but only contributors can edit it)
- “private” (only contributors can see and edit the scraper and its data)
An individual runevents hash will have an exception_message key if there was an error during that run.

Example output:

[
  {
    "code": "require 'nokogiri'\n...",
    "datasummary": {
      "tables": {
        "swdata": {
          "keys": [
            "fieldA",
            ...
          ],
          "count": 42,
          "sql": "CREATE TABLE `swdata` (...)"
        },
        "swvariables": {
          "keys": [
            "value_blob",
            "type",
            "name"
          ],
          "count": 2,
          "sql": "CREATE TABLE `swvariables` (`value_blob` blob, `type` text, `name` text)"
        },
        ...
      },
      "total_rows": 44,
      "filesize": 1000000
    },
    "description": "Scrapes websites for data.",
    "language": "ruby",
    "title": "Example scraper",
    "tags": [],
    "short_name": "example-scraper",
    "userroles": {
      "owner": [
        "johndoe"
      ],
      "editor": [
        "janedoe",
        ...
      ]
    },
    "last_run": "1970-01-01T00:00:00",
    "created": "1970-01-01T00:00:00",
    "runevents": [
      {
        "still_running": false,
        "pages_scraped": 5,
        "run_started": "1970-01-01T00:00:00",
        "last_update": "1970-01-01T00:00:00",
        "runid": "1325394000.000000_xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx",
        "records_produced": 42
      },
      ...
    ],
    "records": 44,
    "wiki_type": "scraper",
    "privacy_status": "visible",
    "run_interval": 604800,
    "attachable_here": [],
    "attachables": [],
    "history": [
      ...,
      {
        "date": "1970-01-01T00:00:00",
        "version": 0,
        "user": "johndoe",
        "session": "Thu, 1 Jan 1970 00:00:08 GMT"
      }
    ]
  }
]

Options Hash (opts):

:version (String) —

version number (-1 for most recent) [default -1]
:history_start_date (String) —

history and runevents are restricted to this date or after, enter as YYYY-MM-DD
:quietfields (Array, String) —

“|”-delimited list of fields to exclude from the output. Must be a subset of ‘code|runevents|datasummary|userroles|history’

# File 'lib/scraperwiki-api.rb', line 198

def scraper_getinfo(shortname, opts = {})
  if Array === opts[:quietfields]
    opts[:quietfields] = opts[:quietfields].join '|'
  end
  request_with_apikey '/scraper/getinfo', {name: shortname}.merge(opts)
end

#scraper_getruninfo(shortname, opts = {}) ⇒ `Array`

Note:

Returns an array although the array seems to always have only one item

Note:

The query string parameter is name, not shortname as in the ScraperWiki docs

See what the scraper did during each run.

Example output:

[
  {
    "run_ended": "1970-01-01T00:00:00",
    "first_url_scraped": "http://www.iana.org/domains/example/",
    "pages_scraped": 5,
    "run_started": "1970-01-01T00:00:00",
    "runid": "1325394000.000000_xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx",
    "domainsscraped": [
      {
        "domain": "http://example.com",
        "bytes": 1000000,
        "pages": 5
      }
      ...
    ],
    "output": "...",
    "records_produced": 42
  }
]

Options Hash (opts):

runid (String) —

a run ID



237
238
239

# File 'lib/scraperwiki-api.rb', line 237

def scraper_getruninfo(shortname, opts = {})
  request_with_apikey '/scraper/getruninfo', {name: shortname}.merge(opts)
end

#scraper_getuserinfo(username) ⇒ `Array`

Note:

Returns an array although the array seems to always have only one item

Note:

The date joined field is date_joined (with underscore) on #scraper_usersearch

Find out information about a user.

Example output:

[
  {
    "username": "johndoe",
    "profilename": "John Doe",
    "coderoles": {
      "owner": [
        "johndoe.emailer",
        "example-scraper",
        ...
      ],
      "email": [
        "johndoe.emailer"
      ],
      "editor": [
        "yet-another-scraper",
        ...
      ]
    },
    "datejoined": "1970-01-01T00:00:00"
  }
]



273
274
275

# File 'lib/scraperwiki-api.rb', line 273

def scraper_getuserinfo(username)
  request_with_apikey '/scraper/getuserinfo', username: username
end

#scraper_search(opts = {}) ⇒ `Array`

Search the titles and descriptions of all the scrapers.

Example output:

[
  {
    "description": "Scrapes websites for data.",
    "language": "ruby",
    "created": "1970-01-01T00:00:00",
    "title": "Example scraper",
    "short_name": "example-scraper",
    "privacy_status": "public"
  },
  ...
]

Options Hash (opts):

:searchquery (String) —

search terms
:maxrows (Integer) —

number of results to return [default 5]
:requestinguser (String) —

the name of the user making the search, which changes the order of the matches



299
300
301

# File 'lib/scraperwiki-api.rb', line 299

def scraper_search(opts = {})
  request_with_apikey '/scraper/search', opts
end

#scraper_usersearch(opts = {}) ⇒ `Array`

Note:

The date joined field is datejoined (without underscore) on #scraper_getuserinfo

Search for a user by name.

Example output:

[
  {
    "username": "johndoe",
    "profilename": "John Doe",
    "date_joined": "1970-01-01T00:00:00"
  },
  ...
]

Options Hash (opts):

:searchquery (String) —

search terms
:maxrows (Integer) —

number of results to return [default 5]
:nolist (Array, String) —

space-separated list of usernames to exclude from the output
:requestinguser (String) —

the name of the user making the search, which changes the order of the matches

# File 'lib/scraperwiki-api.rb', line 327

def scraper_usersearch(opts = {})
  if Array === opts[:nolist]
    opts[:nolist] = opts[:nolist].join ' '
  end
  request '/scraper/usersearch', opts
end

Class: ScraperWiki::API

Overview

Defined Under Namespace

Constant Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(apikey = nil) ⇒ API

Class Method Details

.edit_scraper_url(shortname) ⇒ String

.scraper_url(shortname) ⇒ String

Instance Method Details

#datastore_sqlite(shortname, query, opts = {}) ⇒ Array, ...

#scraper_getinfo(shortname, opts = {}) ⇒ Array

#scraper_getruninfo(shortname, opts = {}) ⇒ Array

#scraper_getuserinfo(username) ⇒ Array

#scraper_search(opts = {}) ⇒ Array

#scraper_usersearch(opts = {}) ⇒ Array

#initialize(apikey = nil) ⇒ `API`

.edit_scraper_url(shortname) ⇒ `String`

.scraper_url(shortname) ⇒ `String`

#datastore_sqlite(shortname, query, opts = {}) ⇒ `Array`, ...

#scraper_getinfo(shortname, opts = {}) ⇒ `Array`

#scraper_getruninfo(shortname, opts = {}) ⇒ `Array`

#scraper_getuserinfo(username) ⇒ `Array`

#scraper_search(opts = {}) ⇒ `Array`

#scraper_usersearch(opts = {}) ⇒ `Array`