Class: Wikiwhat::Text
Overview
Extract portions of text from Wiki article
Instance Method Summary collapse
-
#find_header(header) ⇒ Object
Find all paragraphs under a given heading.
-
#initialize(api_return, prop = 'extract') ⇒ Text
constructor
A new instance of Text.
-
#only_text(string) ⇒ Object
Removes HTML tags from a String.
-
#paragraph(quantity) ⇒ Object
Returns the requested number of paragraphs of a Wiki article.
-
#refs ⇒ Object
Find all references on a page.
-
#sidebar_image ⇒ Object
Find the image from the sidebar, if one exists.
Methods inherited from Results
#content_split, #pull_from_hash
Constructor Details
#initialize(api_return, prop = 'extract') ⇒ Text
46 47 48 49 50 51 |
# File 'lib/wikiwhat/parse.rb', line 46 def initialize(api_return, prop='extract') @request = self.pull_from_hash(api_return, prop) if @request.class == Array @request = self.pull_from_hash(@request[0], "*") end end |
Instance Method Details
#find_header(header) ⇒ Object
Find all paragraphs under a given heading
header = the name of the header as a String paras = the number of paragraphs
Return a String.
87 88 89 90 91 92 93 94 95 96 97 98 99 100 |
# File 'lib/wikiwhat/parse.rb', line 87 def find_header(header) # Find the requested header start = @request.index(header) if start # Find next instance of the tag. end_first_tag = start + @request[start..-1].index("h2") + 3 # Find start_next_tag = @request[end_first_tag..-1].index("h2") + end_first_tag - 2 # Select substring of requested text. @request[end_first_tag..start_next_tag] else raise Wikiwhat::WikiwhatError.new("Sorry, that header isn't on this page.") end end |
#only_text(string) ⇒ Object
Removes HTML tags from a String
string - a String that contains HTML tags.
Returns the string without HTML tags.
107 108 109 |
# File 'lib/wikiwhat/parse.rb', line 107 def only_text(string) = string.gsub(/<\/?.*?>/,'') end |
#paragraph(quantity) ⇒ Object
Returns the requested number of paragraphs of a Wiki article
quantity - the Number of paragraphs to be returned starting from the top
of the article. Defaults is to get the first paragraph.
Return an array of strings.
59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 |
# File 'lib/wikiwhat/parse.rb', line 59 def paragraph(quantity) # Break the article into individual paragraphs and store in an array. start = @request.split("</p>") # Re-add the closing paragraph HTML tags. start.each do |string| string << "</p>" end # Check to make sure the quantity being requested is not more paragraphs # than exist. # # Return the correct number of paragraphs assigned to new_arr if start.length < quantity quantity = start.length - 1 new_arr = start[0..quantity] else quantity = quantity - 1 new_arr = start[0..quantity] end end |
#refs ⇒ Object
Find all references on a page.
Return all refrences as an array of arrays.
TODO: Currently nested array, want to return as array of strings.
145 146 147 148 149 150 |
# File 'lib/wikiwhat/parse.rb', line 145 def refs @content = content_split(1, 2) #add all references to an array. still in wiki markup @content.scan(/<ref>(.*?)<\/ref>/) end |
#sidebar_image ⇒ Object
Find the image from the sidebar, if one exists
Return the url of the image as a String.
119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 |
# File 'lib/wikiwhat/parse.rb', line 119 def # Check to see if a sidebar image exists if self.content_split(0)[/(image).*?(\.\w\w(g|G|f|F))/] # Grab the sidebar image title image_name = self.content_split(0)[/(image).*?(\.\w\w(g|G|f|F))/] # Remove the 'image = ' part of the string image_name = image_name.split("=")[1].strip # Call Wikipedia for image url get_url = Wikiwhat::Call.call_api(('File:'+ image_name), :prop => "imageinfo", :iiprop => true) # Pull url from hash img_name_2 = pull_from_hash(get_url, "pages") img_array = pull_from_hash(img_name_2, "imageinfo") img_array[0]["url"] else # If no sidebar image exists, raise error. raise Wikiwhat::WikiwhatError.new("Sorry, it looks like there is no sidebar image on this page.") end end |