Module: Scrapifier::Methods
- Includes:
- Support
- Defined in:
- lib/scrapifier/methods.rb
Overview
Methods which will be included into the String class.
Constant Summary
Constants included from XPath
XPath::AUTHOR, XPath::DESC, XPath::ENCODE, XPath::IMG, XPath::KEYWORDS, XPath::LANG, XPath::REPLY_TO, XPath::TITLE
Instance Method Summary collapse
-
#find_uri(which = 0) ⇒ Object
Find URIs in the String.
-
#scrapify(options = {}) ⇒ Object
Get metadata from an URI using the screen scraping technique.
Methods included from Support
sf_check_img_ext, sf_domain, sf_eval_uri, sf_fix_imgs, sf_fix_protocol, sf_img_regex, sf_regex, sf_uri_regex, sf_xpaths
Instance Method Details
#find_uri(which = 0) ⇒ Object
Find URIs in the String.
Example:
>> 'Wow! What an awesome site: http://adtangerine.com!'.find_uri
=> 'http://adtangerine.com'
>> 'Very cool: http://adtangerine.com and www.twitflink.com'.find_uri 1
=> 'www.twitflink.com'
Arguments:
which: (Integer)
- Which URI in the String: first (0), second (1) and so on.
53 54 55 56 57 58 |
# File 'lib/scrapifier/methods.rb', line 53 def find_uri(which = 0) which = scan(sf_regex(:uri))[which.to_i][0] which =~ sf_regex(:protocol) ? which : "http://#{which}" rescue NoMethodError nil end |
#scrapify(options = {}) ⇒ Object
Get metadata from an URI using the screen scraping technique.
Example:
>> 'Wow! What an awesome site: http://adtangerine.com!'.scrapify
=> {
:title => "AdTangerine | Advertising Platform for Social Media",
:description => "AdTangerine is an advertising platform that...",
:images => [
"http://adtangerine.com/assets/logo_adt_og.png",
"http://adtangerine.com/assets/logo_adt_og.png
],
:uri => "http://adtangerine.com"
}
Arguments:
options: (Hash)
- which: (Integer)
Which URI in the String will be used. It starts from 0 to N.
- images: (Symbol or Array)
Image extensions which are allowed to be returned as result.
30 31 32 33 34 35 36 37 38 39 40 41 |
# File 'lib/scrapifier/methods.rb', line 30 def scrapify( = {}) uri, = find_uri([:which]), {} return if uri.nil? if !(uri =~ sf_regex(:image)) = sf_eval_uri(uri, [:images]) elsif !sf_check_img_ext(uri, [:images]).empty? [:title, :description, :uri, :images].each { |k| [k] = uri } end end |