Module: Unisec::Utils::String

Defined in:
lib/unisec/utils.rb

Overview

About string conversion and manipulation.

Class Method Summary collapse

Class Method Details

.autodetect(str) ⇒ Symbol

Internal method used for convert.

Autodetect the representation type of the string input.

Examples:

# Hexadecimal
Unisec::Utils::String.autodetect('0x1f4a9') # => :hexadecimal
# Decimal
Unisec::Utils::String.autodetect('0d128169') # => :decimal
# Binary
Unisec::Utils::String.autodetect('0b11111010010101001') # => :binary
# Unicode string
Unisec::Utils::String.autodetect('💩') # => :string
# Standardized format of hexadecimal code point
Unisec::Utils::String.autodetect('U+1F4A9') # => :stdcp

Parameters:

Returns:

  • (Symbol)

    the detected type: :hexadecimal, :decimal, :binary, :string, :stdcp.



155
156
157
158
159
160
161
162
163
164
165
166
167
168
# File 'lib/unisec/utils.rb', line 155

def self.autodetect(str)
  case str
  when /0x[0-9a-fA-F]+/
    :hexadecimal
  when /U\+[0-9A-F]+/
    :stdcp
  when /0d[0-9]+/
    :decimal
  when /0b[0-1]+/
    :binary
  else
    :string
  end
end

.char2codepoint(chr) ⇒ String

TODO:

Replace this method by target type :stdcp in String.convert()

Display the code point in Unicode format for a given character (code point as string)

Examples:

Unisec::Utils::String.char2codepoint('💎') # => "U+1F48E"

Parameters:

  • chr (String)

    Unicode code point (as character / string)

Returns:

  • (String)

    code point in Unicode format



186
187
188
# File 'lib/unisec/utils.rb', line 186

def self.char2codepoint(chr)
  Integer.deccp2stdhexcp(chr.codepoints.first)
end

.chars2codepoints(chrs) ⇒ String

Display the code points in Unicode format for the given characters (code points as string)

Examples:

Unisec::Utils::String.chars2codepoints("ỳ́") # => "U+0079 U+0300 U+0301"
Unisec::Utils::String.chars2codepoints("🧑‍🌾") # => "U+1F9D1 U+200D U+1F33E"

Parameters:

  • chrs (String)

    Unicode code points (as characters / string)

Returns:

  • (String)

    code points in Unicode format



196
197
198
199
200
201
202
# File 'lib/unisec/utils.rb', line 196

def self.chars2codepoints(chrs)
  out = []
  chrs.each_char do |chr|
    out << char2codepoint(chr)
  end
  out.join(' ')
end

.chars2intcodepoints(chrs) ⇒ String

Display the code points in integer format for the given characters (code points as string)

Examples:

Unisec::Utils::String.chars2intcodepoints('I 💕 Ruby 💎') # => "73 32 128149 32 82 117 98 121 32 128142"

Parameters:

  • chrs (String)

    Unicode code points (as characters / string)

Returns:

  • (String)

    code points in integer format



209
210
211
# File 'lib/unisec/utils.rb', line 209

def self.chars2intcodepoints(chrs)
  chrs.codepoints.map(&:to_s).join(' ')
end

.convert(input, target_type) ⇒ Variable

Convert a string input into the chosen type.

Examples:

Unisec::Utils::String.convert('0x1f4a9', :integer) # => 128169
Unisec::Utils::String.convert('0x1f4a9', :char) # => "💩"

Parameters:

  • input (String)

    If the input is a Unicode string, only the first code point will be taken into account. The input must represent a character encoded in hexadecimal, decimal, binary or standard code point format. See convert_to_integer and convert_to_char for detailed examples.

  • target_type (Symbol)

    Convert to the chosen type. Currently only supports :integer and :char.

Returns:

  • (Variable)

    The type of the output depends on the chosen target_type.



66
67
68
69
70
71
72
73
74
75
# File 'lib/unisec/utils.rb', line 66

def self.convert(input, target_type)
  case target_type
  when :integer
    convert_to_integer(input)
  when :char
    convert_to_char(input)
  else
    raise TypeError, "Target type \"#{target_type}\" not avaible"
  end
end

.convert_to_char(input) ⇒ String

Internal method used for convert.

Convert a string input into a character.

Examples:

# Hexadecimal
Unisec::Utils::String.convert_to_char('0x1f4a9') # => "💩"
# Decimal
Unisec::Utils::String.convert_to_char('0d128169') # => "💩"
# Binary
Unisec::Utils::String.convert_to_char('0b11111010010101001') # => "💩"
# Unicode string
Unisec::Utils::String.convert_to_char('💩') # => "💩"
# Standardized format of hexadecimal code point
Unisec::Utils::String.convert_to_char('U+1F4A9') # => "💩"

Parameters:

  • input (String)

    If the input is a Unicode string, only the first code point will be taken into account. The input must represent a character encoded in hexadecimal, decimal, binary, standard code point format. The input type is determined automatically based on the prefix.

Returns:



130
131
132
133
134
135
136
137
# File 'lib/unisec/utils.rb', line 130

def self.convert_to_char(input)
  case autodetect(input)
  when :hexadecimal, :stdcp, :decimal, :binary, :string
    [convert(input, :integer)].pack('U')
  else
    raise TypeError, "Input \"#{input}\" is not of the expected type"
  end
end

.convert_to_integer(input) ⇒ Integer

Internal method used for convert.

Convert a string input into integer.

Examples:

# Hexadecimal
Unisec::Utils::String.convert_to_integer('0x1f4a9') # => 128169
# Decimal
Unisec::Utils::String.convert_to_integer('0d128169') # => 128169
# Binary
Unisec::Utils::String.convert_to_integer('0b11111010010101001') # => 128169
# Unicode string
Unisec::Utils::String.convert_to_integer('💩') # => 128169
# Standardized format of hexadecimal code point
Unisec::Utils::String.convert_to_integer('U+1F4A9') # => 128169

Parameters:

  • input (String)

    If the input is a Unicode string, only the first code point will be taken into account. The input must represent a character encoded in hexadecimal, decimal, binary, standard code point format. The input type is determined automatically based on the prefix.

Returns:



95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
# File 'lib/unisec/utils.rb', line 95

def self.convert_to_integer(input)
  case autodetect(input)
  when :hexadecimal
    input.hex2dec(prefix: '0x').to_i
  when :stdcp
    input.hex2dec(prefix: 'U+').to_i
  when :decimal
    input.to_i
  when :binary
    input.bin2hex.hex2dec.to_i
  when :string
    input.codepoints.first
  else
    raise TypeError, "Input \"#{input}\" is not of the expected type"
  end
end

.grapheme_reverse(str) ⇒ String

Reverse a string by graphemes (not by code points)

Examples:

b = "\u{1f1eb}\u{1f1f7}\u{1F413}" # => "🇫🇷🐓"
b.reverse # => "🐓🇷🇫"
Unisec::Utils::String.grapheme_reverse(b) # => "🐓🇫🇷"

Returns:

  • (String)

    the reversed string



176
177
178
# File 'lib/unisec/utils.rb', line 176

def self.grapheme_reverse(str)
  str.grapheme_clusters.reverse.join
end

.to_range(range_str) ⇒ Range

Convert a string of hex encoded Unicode code points range to actual integer Ruby range.

Examples:

Unisec::Utils::String::to_range('0080..00FF') # => 128..255

Parameters:

  • range_str (String)

    Unicode code points range as in data/Blocks.txt

Returns:



219
220
221
# File 'lib/unisec/utils.rb', line 219

def self.to_range(range_str)
  ::Range.new(*range_str.split('..').map { |x| x.hex2dec.to_i })
end