Class: String

Inherits:
Object show all
Defined in:
lib/taskjuggler/UTF8String.rb

Overview

This is an extension and modification of the standard String class. We do a lot of UTF-8 character processing in the parser. Ruby 1.8 does not have good enough UTF-8 support and Ruby 1.9 only handles UTF-8 characters as Strings. This is very inefficient compared to representing them as Integer objects. Some of these hacks can be removed once we have switched to 1.9 support only.

Instance Method Summary collapse

Instance Method Details

#<<(obj) ⇒ Object

Replacement for the existing << operator that also works for characters above Integer 255 (UTF-8 characters).



62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
# File 'lib/taskjuggler/UTF8String.rb', line 62

def <<(obj)
  if obj.is_a?(String) || (obj < 256)
    # In this case we can use the built-in concat.
    concat(obj)
  else
    # UTF-8 characters have a maximum length of 4 byte and no byte is 0.
    mask = 0xFF000000
    pos = 3
    while pos >= 0
      # Use the built-in concat operator for each byte.
      concat((obj & mask) >> (8 * pos)) if (obj & mask) != 0
      # Move mask and position to the next byte.
      mask = mask >> 8
      pos -= 1
    end
  end
end

#forceUTF8EncodingObject

Ensure the String is really UTF-8 encoded and newlines are only n. If that’s not possible, an Encoding::UndefinedConversionError is raised.



122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
# File 'lib/taskjuggler/UTF8String.rb', line 122

def forceUTF8Encoding
  if RUBY_VERSION < '1.9.0'
    # Ruby 1.8 really only support 7 bit ASCII well. Only do the line-end
    # clean-up.
    gsub(/\r\n/, "\n")
  else
    begin
      # Ensure that the text has LF line ends and is UTF-8 encoded.
      encode('UTF-8', :universal_newline => true)
    rescue
      # The encoding of the String is broken. Find the first broken line and
      # report it.
      lineCtr = 1
      each_line do |line|
        begin
          line.encode('UTF-8')
        rescue
         line = line.encode('UTF-8', :invalid => :replace,
                                     :undef => :replace, :replace => '<?>')
          raise Encoding::UndefinedConversionError,
                "UTF-8 encoding error in line #{lineCtr}: #{line}"
        end
        lineCtr += 1
      end
    end
  end
end

#ljust(len, pad = ' ') ⇒ Object



89
90
91
92
# File 'lib/taskjuggler/UTF8String.rb', line 89

def ljust(len, pad = ' ')
  return self + pad * (len - length_utf8) if length_utf8 < len
  self
end

#old_double_left_angleObject



58
# File 'lib/taskjuggler/UTF8String.rb', line 58

alias old_double_left_angle <<

#old_reverseObject



94
# File 'lib/taskjuggler/UTF8String.rb', line 94

alias old_reverse reverse

#reverseObject

UTF-8 aware version of reverse that replaces the built-in one.



97
98
99
100
101
# File 'lib/taskjuggler/UTF8String.rb', line 97

def reverse
  a = []
  each_utf8_char { |c| a << c }
  a.reverse.join
end

#to_base64Object



112
113
114
# File 'lib/taskjuggler/UTF8String.rb', line 112

def to_base64
  Base64.encode64(self)
end

#to_quoted_printableObject



108
109
110
# File 'lib/taskjuggler/UTF8String.rb', line 108

def to_quoted_printable
  [self].pack('M').gsub(/\n/, "\r\n")
end

#unix2dosObject



116
117
118
# File 'lib/taskjuggler/UTF8String.rb', line 116

def unix2dos
  gsub(/\n/, "\r\n")
end