Class: Bcat::HeadParser
- Inherits:
-
Object
- Object
- Bcat::HeadParser
- Defined in:
- lib/bcat/html.rb
Overview
Parses HTML until the first displayable body character and provides methods for accessing head and body contents.
Constant Summary collapse
- HEAD_TOKS =
[ /\A(<!DOCTYPE.*?>)/m, /\A(<title.*?>.*?<\/title>)/mi, /\A(<script.*?>.*?<\/script>)/mi, /\A(<style.*?>.*?<\/style>)/mi, /\A(<(?:html|head|meta|link|base).*?>)/mi, /\A(<\/(?:html|head|meta|link|base|script|style|title)>)/mi, /\A(<!--(.*?)-->)/m ]
- BODY_TOKS =
[ /\A[^<]/, /\A<(?!html|head|meta|link|base|script|style|title).*?>/ ]
Instance Attribute Summary collapse
-
#buf ⇒ Object
Returns the value of attribute buf.
Instance Method Summary collapse
-
#body(inject = nil) ⇒ Object
The current body contents.
-
#complete? ⇒ Boolean
Truthy once the first displayed character of the body has arrived.
-
#feed(data) ⇒ Object
Called to parse new data as it arrives.
-
#head ⇒ Object
The head contents without any DOCTYPE, <html>, or <head> tags.
-
#html? ⇒ Boolean
Determine if the input is HTML.
-
#initialize ⇒ HeadParser
constructor
A new instance of HeadParser.
-
#parse(buf = @buf) ⇒ Object
Parses buf into head and body parts.
Constructor Details
#initialize ⇒ HeadParser
Returns a new instance of HeadParser.
8 9 10 11 12 13 |
# File 'lib/bcat/html.rb', line 8 def initialize @buf = '' @head = [] @body = nil @html = nil end |
Instance Attribute Details
#buf ⇒ Object
Returns the value of attribute buf.
6 7 8 |
# File 'lib/bcat/html.rb', line 6 def buf @buf end |
Instance Method Details
#body(inject = nil) ⇒ Object
The current body contents. The <body> tag is guaranteed to be present. If a <body> was included in the input, it’s preserved with original attributes; otherwise, a <body> tag is inserted. The inject argument can be used to insert a string as the immediate descendant of the <body> tag.
49 50 51 52 53 54 55 |
# File 'lib/bcat/html.rb', line 49 def body(inject=nil) if @body =~ /\A\s*(<body.*?>)(.*)/mi [$1, inject, $2].compact.join("\n") else ["<body>", inject, @body].compact.join("\n") end end |
#complete? ⇒ Boolean
Truthy once the first displayed character of the body has arrived.
27 28 29 |
# File 'lib/bcat/html.rb', line 27 def complete? !@body.nil? end |
#feed(data) ⇒ Object
Called to parse new data as it arrives.
16 17 18 19 20 21 22 23 24 |
# File 'lib/bcat/html.rb', line 16 def feed(data) if complete? @body << data else @buf << data parse(@buf) end complete? end |
#head ⇒ Object
The head contents without any DOCTYPE, <html>, or <head> tags. This should consist of only <style>, <script>, <link>, <meta>, <base>, and <title> tags.
41 42 43 |
# File 'lib/bcat/html.rb', line 41 def head @head.join.gsub(/<\/?(?:html|head|!DOCTYPE).*?>/mi, '') end |
#html? ⇒ Boolean
Determine if the input is HTML. This is nil before the first non-whitespace character is received, true if the first non-whitespace character is a ‘<’, and false if the first non-whitespace character is something other than ‘<’.
35 36 37 |
# File 'lib/bcat/html.rb', line 35 def html? @html end |
#parse(buf = @buf) ⇒ Object
Parses buf into head and body parts. Basic approach is to eat anything possibly body related until we hit text or a body element.
74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 |
# File 'lib/bcat/html.rb', line 74 def parse(buf=@buf) if @html.nil? if buf =~ /\A\s*[<]/m @html = true elsif buf =~ /\A\s*[^<]/m @html = false end end while !buf.empty? buf.sub!(/\A(\s+)/m) { @head << $1 ; '' } matched = HEAD_TOKS.any? do |tok| buf.sub!(tok) do @head << $1 '' end end break unless matched end if buf.empty? buf elsif BODY_TOKS.any? { |tok| buf =~ tok } @body = buf nil else buf end end |