Wiki::Api
Wiki API is a gem (Ruby on Rails) that interfaces with the MediaWiki API (https://www.mediawiki.org/wiki/API:Main_page). This gem is more than a interface, it has abstract classes like: Page on which you can request page parameters (like headlines, and text blocks within headlines).
NOTE: nokogiri is used for background parsing of HTML. Because I believe there is no point of wrapping internals (composing) for this purpose, nokogiri nodes elements etc. are exposed (http://nokogiri.org/Nokogiri.html) through the wiki-api.
Requests to the MediaWiki API use the following URI structure:
http(s)://somemediawiki.org/w/api.php?action=parse&format=json&page="anypage"
Dependencies (production)
- json
- nokogiri
Roadmap
- Version (0.0.2) (current)
Index important words per block, page, list item;
Parse objects for more elements within a Page.
Changelog
- Version (0.0.1) -> (0.0.2)
Nested ListItems, Links (within Page)
Search on Page headline (ignore case, and underscore)
Known Issues
None discovered thus far.
Installation
Add this line to your application's Gemfile (bundler):
gem 'wiki-api', git: "git://github.com/dblommesteijn/wiki-api.git"
And then execute:
$ bundle
Or install it yourself (RubyGems):
$ gem install wiki-api
Setup
Define a configuration for your connection (initialize script), this example uses wiktionary.org. NOTE: it can connect to both HTTP and HTTPS MediaWikis.
CONFIG = { uri: "http://en.wiktionary.org" }
Setup default configuration (initialize script)
Wiki::Api::Connect.config = CONFIG
Usage
Query a Page
Requesting headlines from a given page.
page = Wiki::Api::Page.new name: "Wiktionary:Welcome,_newcomers"
page.headlines.each do |headline|
# printing headline name (PageHeadline)
puts headline.name
end
Getting headlines for a given name.
page = Wiki::Api::Page.new name: "Wiktionary:Welcome,_newcomers"
page.headline("Wiktionary:Welcome,_newcomers").each do |headline|
# printing headline name (PageHeadline)
puts headline.name
end
Basic Page structure
page = Wiki::Api::Page.new name: "Wiktionary:Welcome,_newcomers"
# iterate PageHeadline objects
page.headlines.each do |headline|
# exposing nokogiri internal elements
elements = headline.elements.flatten
elements.each do |element|
# access Nokogiri::XML::*
end
# string representation of all nested text
block.to_texts
# iterate PageListItem objects
block.list_items.each do |list_item|
# string representation of nested text
list_item.to_text
# iterate PageLink objects
list_item.links.each do |link|
# check part: 'iterate PageLink objects'
end
end
# iterate PageLink objects
headline.block.links.each do |link|
# absolute URI object
link.uri
# html link
link.html
# link name
link.title
# string representation of nested text
link.to_text
end
end
Example using Global config (https://en.wikipedia.org/wiki/Ruby_on_rails)
This is a example of querying wikipedia.org on the page: "Ruby_on_rails", and printing the References headline links for each list item.
# setting a target config
CONFIG = { uri: "https://en.wikipedia.org" }
Wiki::Api::Connect.config = CONFIG
# querying the page
page = Wiki::Api::Page.new name: "Ruby_on_rails"
# get headlines with name Reference (there can be multiple headlines with the same name!)
headlines = page.headline "References"
# iterate headlines
headlines.each do |headline|
# iterate list items on the given headline
headline.block.list_items.each do |list_item|
# print the uri of all links
puts list_item.links.map{ |l| l.uri }
end
end
Example passing URI (https://en.wikipedia.org/wiki/Ruby_on_rails)
This is the same example as the one above, except for setting a global config to direct the requests to a given URI.
# querying the page
page = Wiki::Api::Page.new name: "Ruby_on_rails", uri: "https://en.wikipedia.org"
# get headlines with name Reference (there can be multiple headlines with the same name!)
headlines = page.headline "References"
# iterate headlines
headlines.each do |headline|
# iterate list items on the given headline
headline.block.list_items.each do |list_item|
# print the uri of all links
puts list_item.links.map{ |l| l.uri }
end
end