Class: Bcat::HeadParser

Inherits:

Object

Object
Bcat::HeadParser

show all

Defined in:: lib/bcat/html.rb

Overview

Parses HTML until the first displayable body character and provides methods for accessing head and body contents.

Constant Summary collapse

HEAD_TOKS =

[
  /\A(<!DOCTYPE.*?>)/m,
  /\A(<title.*?>.*?<\/title>)/mi,
  /\A(<script.*?>.*?<\/script>)/mi,
  /\A(<style.*?>.*?<\/style>)/mi,
  /\A(<(?:html|head|meta|link|base).*?>)/mi,
  /\A(<\/(?:html|head|meta|link|base|script|style|title)>)/mi,
  /\A(<!--(.*?)-->)/m
]

BODY_TOKS =

[
  /\A[^<]/,
  /\A<(?!html|head|meta|link|base|script|style|title).*?>/
]

Instance Attribute Summary collapse

#buf ⇒ Object

Returns the value of attribute buf.

Instance Method Summary collapse

#body(inject = nil) ⇒ Object

The current body contents.
#complete? ⇒ Boolean

Truthy once the first displayed character of the body has arrived.
#feed(data) ⇒ Object

Called to parse new data as it arrives.
#head ⇒ Object

The head contents without any DOCTYPE, <html>, or <head> tags.
#html? ⇒ Boolean

Determine if the input is HTML.
#initialize ⇒ HeadParser constructor

A new instance of HeadParser.
#parse(buf = @buf) ⇒ Object

Parses buf into head and body parts.

Constructor Details

permalink #initialize ⇒ `HeadParser`

Returns a new instance of HeadParser.

[View source]

# File 'lib/bcat/html.rb', line 8

def initialize
  @buf = ''
  @head = []
  @body = nil
  @html = nil
end

Instance Attribute Details

permalink #buf ⇒ `Object`

Returns the value of attribute buf.


6
7
8

# File 'lib/bcat/html.rb', line 6

def buf
  @buf
end

Instance Method Details

permalink #body(inject = nil) ⇒ `Object`

The current body contents. The <body> tag is guaranteed to be present. If a <body> was included in the input, it’s preserved with original attributes; otherwise, a <body> tag is inserted. The inject argument can be used to insert a string as the immediate descendant of the <body> tag.

[View source]

# File 'lib/bcat/html.rb', line 49

def body(inject=nil)
  if @body =~ /\A\s*(<body.*?>)(.*)/mi
    [$1, inject, $2].compact.join("\n")
  else
    ["<body>", inject, @body].compact.join("\n")
  end
end

permalink #complete? ⇒ `Boolean`

Truthy once the first displayed character of the body has arrived.

Returns:

(Boolean)

[View source]


27
28
29

# File 'lib/bcat/html.rb', line 27

def complete?
  !@body.nil?
end

permalink #feed(data) ⇒ `Object`

Called to parse new data as it arrives.

[View source]

# File 'lib/bcat/html.rb', line 16

def feed(data)
  if complete?
    @body << data
  else
    @buf << data
    parse(@buf)
  end
  complete?
end

permalink #head ⇒ `Object`

The head contents without any DOCTYPE, <html>, or <head> tags. This should consist of only <style>, <script>, <link>, <meta>, <base>, and <title> tags.

[View source]


41
42
43

# File 'lib/bcat/html.rb', line 41

def head
  @head.join.gsub(/<\/?(?:html|head|!DOCTYPE).*?>/mi, '')
end

permalink #html? ⇒ `Boolean`

Determine if the input is HTML. This is nil before the first non-whitespace character is received, true if the first non-whitespace character is a ‘<’, and false if the first non-whitespace character is something other than ‘<’.

Returns:

(Boolean)

[View source]


35
36
37

# File 'lib/bcat/html.rb', line 35

def html?
  @html
end

permalink #parse(buf = @buf) ⇒ `Object`

Parses buf into head and body parts. Basic approach is to eat anything possibly body related until we hit text or a body element.

[View source]

# File 'lib/bcat/html.rb', line 74

def parse(buf=@buf)
  if @html.nil?
    if buf =~ /\A\s*[<]/m
      @html = true
    elsif buf =~ /\A\s*[^<]/m
      @html = false
    end
  end

  while !buf.empty?
    buf.sub!(/\A(\s+)/m) { @head << $1 ; '' }
    matched =
      HEAD_TOKS.any? do |tok|
        buf.sub!(tok) do
          @head << $1
          ''
        end
      end
    break unless matched
  end


  if buf.empty?
    buf
  elsif BODY_TOKS.any? { |tok| buf =~ tok }
    @body = buf
    nil
  else
    buf
  end
end

Class: Bcat::HeadParser

Overview

Constant Summary collapse

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

permalink #initialize ⇒ HeadParser

Instance Attribute Details

permalink #buf ⇒ Object

Instance Method Details

permalink #body(inject = nil) ⇒ Object

permalink #complete? ⇒ Boolean

permalink #feed(data) ⇒ Object

permalink #head ⇒ Object

permalink #html? ⇒ Boolean

permalink #parse(buf = @buf) ⇒ Object

permalink #initialize ⇒ `HeadParser`

permalink #buf ⇒ `Object`

permalink #body(inject = nil) ⇒ `Object`

permalink #complete? ⇒ `Boolean`

permalink #feed(data) ⇒ `Object`

permalink #head ⇒ `Object`

permalink #html? ⇒ `Boolean`

permalink #parse(buf = @buf) ⇒ `Object`