Class: Encoding

Inherits:
Object show all
Defined in:
encoding.c

Overview

An Encoding instance represents a character encoding usable in Ruby. It is defined as a constant under the Encoding namespace. It has a name and, optionally, aliases:

Encoding::US_ASCII.name  # => "US-ASCII"
Encoding::US_ASCII.names # => ["US-ASCII", "ASCII", "ANSI_X3.4-1968", "646"]

A Ruby method that accepts an encoding as an argument will accept:

  • An Encoding object.

  • The name of an encoding.

  • An alias for an encoding name.

These are equivalent:

'foo'.encode(Encoding::US_ASCII) # Encoding object.
'foo'.encode('US-ASCII')         # Encoding name.
'foo'.encode('ASCII')            # Encoding alias.

For a full discussion of encodings and their uses, see the Encodings document.

Encoding::ASCII_8BIT is a special-purpose encoding that is usually used for a string of bytes, not a string of characters. But as the name indicates, its characters in the ASCII range are considered as ASCII characters. This is useful when you use other ASCII-compatible encodings.

Defined Under Namespace

Classes: CompatibilityError, Converter, ConverterNotFoundError, InvalidByteSequenceError, UndefinedConversionError

Class Method Summary collapse

Instance Method Summary collapse

Class Method Details

._load(str) ⇒ Object

:nodoc:



1449
1450
1451
1452
1453
# File 'encoding.c', line 1449

static VALUE
enc_load(VALUE klass, VALUE str)
{
    return str;
}

.aliasesObject

Returns the hash of available encoding alias and original encoding name.

Encoding.aliases
#=> {"BINARY"=>"ASCII-8BIT", "ASCII"=>"US-ASCII", "ANSI_X3.4-1968"=>"US-ASCII",
      "SJIS"=>"Windows-31J", "eucJP"=>"EUC-JP", "CP932"=>"Windows-31J"}


1870
1871
1872
1873
1874
1875
1876
1877
1878
1879
1880
# File 'encoding.c', line 1870

static VALUE
rb_enc_aliases(VALUE klass)
{
    VALUE aliases[2];
    aliases[0] = rb_hash_new();
    aliases[1] = rb_ary_new();

    st_foreach(global_enc_table.names, rb_enc_aliases_enc_i, (st_data_t)aliases);

    return aliases[0];
}

.compatible?(obj1, obj2) ⇒ nil

Checks the compatibility of two objects.

If the objects are both strings they are compatible when they are concatenatable. The encoding of the concatenated string will be returned if they are compatible, nil if they are not.

Encoding.compatible?("\xa1".force_encoding("iso-8859-1"), "b")
#=> #<Encoding:ISO-8859-1>

Encoding.compatible?(
  "\xa1".force_encoding("iso-8859-1"),
  "\xa1\xa1".force_encoding("euc-jp"))
#=> nil

If the objects are non-strings their encodings are compatible when they have an encoding and:

  • Either encoding is US-ASCII compatible

  • One of the encodings is a 7-bit encoding

Returns:

  • (nil)


1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
# File 'encoding.c', line 1419

static VALUE
enc_compatible_p(VALUE klass, VALUE str1, VALUE str2)
{
    rb_encoding *enc;

    if (!enc_capable(str1)) return Qnil;
    if (!enc_capable(str2)) return Qnil;
    enc = rb_enc_compatible(str1, str2);
    if (!enc) return Qnil;
    return rb_enc_from_encoding(enc);
}

.default_externalObject

Returns default external encoding.

The default external encoding is used by default for strings created from the following locations:

  • CSV

  • File data read from disk

  • SDBM

  • StringIO

  • Zlib::GzipReader

  • Zlib::GzipWriter

  • String#inspect

  • Regexp#inspect

While strings created from these locations will have this encoding, the encoding may not be valid. Be sure to check String#valid_encoding?.

File data written to disk will be transcoded to the default external encoding when written, if default_internal is not nil.

The default external encoding is initialized by the -E option. If -E isn’t set, it is initialized to UTF-8 on Windows and the locale on other operating systems.



1636
1637
1638
1639
1640
# File 'encoding.c', line 1636

static VALUE
get_default_external(VALUE klass)
{
    return rb_enc_default_external();
}

.default_external=(enc) ⇒ Object

Sets default external encoding. You should not set Encoding::default_external in ruby code as strings created before changing the value may have a different encoding from strings created after the value was changed., instead you should use ruby -E to invoke ruby with the correct default_external.

See Encoding::default_external for information on how the default external encoding is used.



1665
1666
1667
1668
1669
1670
1671
# File 'encoding.c', line 1665

static VALUE
set_default_external(VALUE klass, VALUE encoding)
{
    rb_warning("setting Encoding.default_external");
    rb_enc_set_default_external(encoding);
    return encoding;
}

.default_internalObject

Returns default internal encoding. Strings will be transcoded to the default internal encoding in the following places if the default internal encoding is not nil:

  • CSV

  • Etc.sysconfdir and Etc.systmpdir

  • File data read from disk

  • File names from Dir

  • Integer#chr

  • String#inspect and Regexp#inspect

  • Strings returned from Readline

  • Strings returned from SDBM

  • Time#zone

  • Values from ENV

  • Values in ARGV including $PROGRAM_NAME

Additionally String#encode and String#encode! use the default internal encoding if no encoding is given.

The script encoding (__ENCODING__), not default_internal, is used as the encoding of created strings.

Encoding::default_internal is initialized with -E option or nil otherwise.



1719
1720
1721
1722
1723
# File 'encoding.c', line 1719

static VALUE
get_default_internal(VALUE klass)
{
    return rb_enc_default_internal();
}

.default_internal=(enc) ⇒ Object

Sets default internal encoding or removes default internal encoding when passed nil. You should not set Encoding::default_internal in ruby code as strings created before changing the value may have a different encoding from strings created after the change. Instead you should use ruby -E to invoke ruby with the correct default_internal.

See Encoding::default_internal for information on how the default internal encoding is used.



1745
1746
1747
1748
1749
1750
1751
# File 'encoding.c', line 1745

static VALUE
set_default_internal(VALUE klass, VALUE encoding)
{
    rb_warning("setting Encoding.default_internal");
    rb_enc_set_default_internal(encoding);
    return encoding;
}

.find(string) ⇒ Object

Search the encoding with specified name. name should be a string.

Encoding.find("US-ASCII")  #=> #<Encoding:US-ASCII>

Names which this method accept are encoding names and aliases including following special aliases

“external”

default external encoding

“internal”

default internal encoding

“locale”

locale encoding

“filesystem”

filesystem encoding

An ArgumentError is raised when no encoding with name. Only Encoding.find("internal") however returns nil when no encoding named “internal”, in other words, when Ruby has no default internal encoding.



1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
# File 'encoding.c', line 1384

static VALUE
enc_find(VALUE klass, VALUE enc)
{
    int idx;
    if (is_obj_encoding(enc))
        return enc;
    idx = str_to_encindex(enc);
    if (idx == UNSPECIFIED_ENCODING) return Qnil;
    return rb_enc_from_encoding_index(idx);
}

.listArray

Returns the list of loaded encodings.

Encoding.list
#=> [#<Encoding:ASCII-8BIT>, #<Encoding:UTF-8>,
      #<Encoding:ISO-2022-JP (dummy)>]

Encoding.find("US-ASCII")
#=> #<Encoding:US-ASCII>

Encoding.list
#=> [#<Encoding:ASCII-8BIT>, #<Encoding:UTF-8>,
      #<Encoding:US-ASCII>, #<Encoding:ISO-2022-JP (dummy)>]

Returns:



1354
1355
1356
1357
1358
1359
1360
# File 'encoding.c', line 1354

static VALUE
enc_list(VALUE klass)
{
    VALUE ary = rb_ary_new2(0);
    rb_ary_replace(ary, rb_encoding_list);
    return ary;
}

.locale_charmapString

Returns the locale charmap name. It returns nil if no appropriate information.

Debian GNU/Linux
  LANG=C
    Encoding.locale_charmap  #=> "ANSI_X3.4-1968"
  LANG=ja_JP.EUC-JP
    Encoding.locale_charmap  #=> "EUC-JP"

SunOS 5
  LANG=C
    Encoding.locale_charmap  #=> "646"
  LANG=ja
    Encoding.locale_charmap  #=> "eucJP"

The result is highly platform dependent. So Encoding.find(Encoding.locale_charmap) may cause an error. If you need some encoding object even for unknown locale, Encoding.find(“locale”) can be used.

Returns:



90
91
92
93
94
95
96
97
98
# File 'localeinit.c', line 90

VALUE
rb_locale_charmap(VALUE klass)
{
#if NO_LOCALE_CHARMAP
    return rb_usascii_str_new_cstr("US-ASCII");
#else
    return locale_charmap(rb_usascii_str_new_cstr);
#endif
}

.name_listArray

Returns the list of available encoding names.

Encoding.name_list
#=> ["US-ASCII", "ASCII-8BIT", "UTF-8",
      "ISO-8859-1", "Shift_JIS", "EUC-JP",
      "Windows-31J",
      "BINARY", "CP932", "eucJP"]

Returns:



1827
1828
1829
1830
1831
1832
1833
# File 'encoding.c', line 1827

static VALUE
rb_enc_name_list(VALUE klass)
{
    VALUE ary = rb_ary_new2(global_enc_table.names->num_entries);
    st_foreach(global_enc_table.names, rb_enc_name_list_i, (st_data_t)ary);
    return ary;
}

Instance Method Details

#_dump(*args) ⇒ Object

:nodoc:



1441
1442
1443
1444
1445
1446
# File 'encoding.c', line 1441

static VALUE
enc_dump(int argc, VALUE *argv, VALUE self)
{
    rb_check_arity(argc, 0, 1);
    return enc_name(self);
}

#ascii_compatible?Boolean

Returns whether ASCII-compatible or not.

Encoding::UTF_8.ascii_compatible?     #=> true
Encoding::UTF_16BE.ascii_compatible?  #=> false

Returns:

  • (Boolean)


628
629
630
631
632
# File 'encoding.c', line 628

static VALUE
enc_ascii_compatible_p(VALUE enc)
{
    return RBOOL(rb_enc_asciicompat(must_encoding(enc)));
}

#dummy?Boolean

Returns true for dummy encodings. A dummy encoding is an encoding for which character handling is not properly implemented. It is used for stateful encodings.

Encoding::ISO_2022_JP.dummy?       #=> true
Encoding::UTF_8.dummy?             #=> false

Returns:

  • (Boolean)


612
613
614
615
616
# File 'encoding.c', line 612

static VALUE
enc_dummy_p(VALUE enc)
{
    return RBOOL(ENC_DUMMY_P(must_encoding(enc)));
}

#inspectString

Returns a string which represents the encoding for programmers.

Encoding::UTF_8.inspect       #=> "#<Encoding:UTF-8>"
Encoding::ISO_2022_JP.inspect #=> "#<Encoding:ISO-2022-JP (dummy)>"

Returns:



1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
# File 'encoding.c', line 1271

static VALUE
enc_inspect(VALUE self)
{
    rb_encoding *enc;

    if (!is_data_encoding(self)) {
        not_encoding(self);
    }
    if (!(enc = DATA_PTR(self)) || rb_enc_from_index(rb_enc_to_index(enc)) != enc) {
        rb_raise(rb_eTypeError, "broken Encoding");
    }

    return rb_enc_sprintf(rb_usascii_encoding(),
                          "#<%"PRIsVALUE":%s%s%s>", rb_obj_class(self),
                          rb_enc_inspect_name(enc),
                          (ENC_DUMMY_P(enc) ? " (dummy)" : ""),
                          rb_enc_autoload_p(enc) ? " (autoload)" : "");
}

#nameString #to_sString

Returns the name of the encoding.

Encoding::UTF_8.name      #=> "UTF-8"

Overloads:



1299
1300
1301
1302
1303
# File 'encoding.c', line 1299

static VALUE
enc_name(VALUE self)
{
    return rb_fstring_cstr(rb_enc_name((rb_encoding*)DATA_PTR(self)));
}

#namesArray

Returns the list of name and aliases of the encoding.

Encoding::WINDOWS_31J.names  #=> ["Windows-31J", "CP932", "csWindows31J", "SJIS", "PCK"]

Returns:



1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
# File 'encoding.c', line 1325

static VALUE
enc_names(VALUE self)
{
    VALUE args[2];

    args[0] = (VALUE)rb_to_encoding_index(self);
    args[1] = rb_ary_new2(0);
    st_foreach(global_enc_table.names, enc_names_i, (st_data_t)args);
    return args[1];
}

#nameString #to_sString

Returns the name of the encoding.

Encoding::UTF_8.name      #=> "UTF-8"

Overloads:



1299
1300
1301
1302
1303
# File 'encoding.c', line 1299

static VALUE
enc_name(VALUE self)
{
    return rb_fstring_cstr(rb_enc_name((rb_encoding*)DATA_PTR(self)));
}