Module: Scrapifier::Methods

Includes:
Support
Defined in:
lib/scrapifier/methods.rb

Overview

Methods which will be included into the String class.

Constant Summary

Constants included from XPath

XPath::AUTHOR, XPath::DESC, XPath::ENCODE, XPath::IMG, XPath::KEYWORDS, XPath::LANG, XPath::REPLY_TO, XPath::TITLE

Instance Method Summary collapse

Methods included from Support

sf_check_img_ext, sf_domain, sf_eval_uri, sf_fix_imgs, sf_fix_protocol, sf_img_regex, sf_regex, sf_uri_regex, sf_xpaths

Instance Method Details

#find_uri(which = 0) ⇒ Object

Find URIs in the String.

Example:

>> 'Wow! What an awesome site: http://adtangerine.com!'.find_uri
=> 'http://adtangerine.com'
>> 'Very cool: http://adtangerine.com and www.twitflink.com'.find_uri 1
=> 'www.twitflink.com'

Arguments:

which: (Integer)
  - Which URI in the String: first (0), second (1) and so on.


53
54
55
56
57
58
# File 'lib/scrapifier/methods.rb', line 53

def find_uri(which = 0)
  which = scan(sf_regex(:uri))[which.to_i][0]
  which =~ sf_regex(:protocol) ? which : "http://#{which}"
rescue NoMethodError
  nil
end

#scrapify(options = {}) ⇒ Object

Get metadata from an URI using the screen scraping technique.

Example:

>> 'Wow! What an awesome site: http://adtangerine.com!'.scrapify
=> {
     :title => "AdTangerine | Advertising Platform for Social Media",
     :description => "AdTangerine is an advertising platform that...",
     :images => [
       "http://adtangerine.com/assets/logo_adt_og.png",
       "http://adtangerine.com/assets/logo_adt_og.png
     ],
     :uri => "http://adtangerine.com"
   }

Arguments:

options: (Hash)
  - which: (Integer)
      Which URI in the String will be used. It starts from 0 to N.
  - images: (Symbol or Array)
      Image extensions which are allowed to be returned as result.


30
31
32
33
34
35
36
37
38
39
40
41
# File 'lib/scrapifier/methods.rb', line 30

def scrapify(options = {})
  uri, meta = find_uri(options[:which]), {}
  return meta if uri.nil?

  if !(uri =~ sf_regex(:image))
    meta = sf_eval_uri(uri, options[:images])
  elsif !sf_check_img_ext(uri, options[:images]).empty?
    [:title, :description, :uri, :images].each { |k| meta[k] = uri }
  end

  meta
end