Class: SearchYJ::Searcher

Inherits:
Object
  • Object
show all
Defined in:
lib/searchyj/searcher.rb

Overview

Search from the search engine, parse HTML, dig the atound page

Author:

  • indeep-xyz

Constant Summary collapse

ENCODING =
'UTF-8'
USER_AGENT =
'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0)' \
'Gecko/20100101 Firefox/38.0'
OpenUriError =
Class.new(StandardError)

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(encoding: ENCODING, from: 1, sleep_time: 1, limit_loop: 50, user_agent: USER_AGENT) ⇒ Searcher

Initialize myself.

Parameters:

  • encoding (String) (defaults to: ENCODING)

    The character encoding that is used to parse HTML

  • from (Integer) (defaults to: 1)

    Start to search from this number of the search ranking

  • sleep_time (Integer) (defaults to: 1)

    The time of sleep after fetching from internet

  • limit_loop (Integer) (defaults to: 50)

    The number of limit that is connectable in one process

  • user_agent (String) (defaults to: USER_AGENT)

    Specify the user agent when open uri



39
40
41
42
43
44
45
46
47
48
49
50
51
52
# File 'lib/searchyj/searcher.rb', line 39

def initialize(
    encoding:   ENCODING,
    from:       1,
    sleep_time: 1,
    limit_loop: 50,
    user_agent: USER_AGENT)
  @pager      = PageSizeAdjuster.new
  @uri        = UriManager.new
  @uri.index  = from
  @encoding   = encoding
  @limit_loop = limit_loop
  @sleep_time = sleep_time
  @user_agent = user_agent
end

Instance Attribute Details

#limit_loopObject

Returns the value of attribute limit_loop.



17
18
19
# File 'lib/searchyj/searcher.rb', line 17

def limit_loop
  @limit_loop
end

#pagerObject

Returns the value of attribute pager.



17
18
19
# File 'lib/searchyj/searcher.rb', line 17

def pager
  @pager
end

#resultsObject (readonly)

Returns the value of attribute results.



16
17
18
# File 'lib/searchyj/searcher.rb', line 16

def results
  @results
end

#sleep_timeObject

Returns the value of attribute sleep_time.



17
18
19
# File 'lib/searchyj/searcher.rb', line 17

def sleep_time
  @sleep_time
end

#uriObject

Returns the value of attribute uri.



17
18
19
# File 'lib/searchyj/searcher.rb', line 17

def uri
  @uri
end

#user_agentObject

Returns the value of attribute user_agent.



17
18
19
# File 'lib/searchyj/searcher.rb', line 17

def user_agent
  @user_agent
end

Instance Method Details

#run(&block) ⇒ Object



54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
# File 'lib/searchyj/searcher.rb', line 54

def run(&block)
  loop_count = 0
  sorter = RecordSorter.new(@uri.index, @pager.size)

  while loop_count < @limit_loop
    fetch_html
    records = extract_records

    sorter.run(records, &block)

    if records.empty? || final_page?
      break
    end

    next_page(records.size + sorter.page_gap)
    sleep @sleep_time
    loop_count += 1
  end
end