Class: Encoding
Overview
An Encoding instance represents a character encoding usable in Ruby. It is defined as a constant under the Encoding namespace. It has a name and, optionally, aliases:
Encoding::US_ASCII.name # => "US-ASCII"
Encoding::US_ASCII.names # => ["US-ASCII", "ASCII", "ANSI_X3.4-1968", "646"]
A Ruby method that accepts an encoding as an argument will accept:
-
An Encoding object.
-
The name of an encoding.
-
An alias for an encoding name.
These are equivalent:
'foo'.encode(Encoding::US_ASCII) # Encoding object.
'foo'.encode('US-ASCII') # Encoding name.
'foo'.encode('ASCII') # Encoding alias.
For a full discussion of encodings and their uses, see the Encodings document.
Encoding::ASCII_8BIT is a special-purpose encoding that is usually used for a string of bytes, not a string of characters. But as the name indicates, its characters in the ASCII range are considered as ASCII characters. This is useful when you use other ASCII-compatible encodings.
Defined Under Namespace
Classes: CompatibilityError, Converter, ConverterNotFoundError, InvalidByteSequenceError, UndefinedConversionError
Class Method Summary collapse
-
._load(str) ⇒ Object
:nodoc:.
-
.aliases ⇒ Object
Returns the hash of available encoding alias and original encoding name.
-
.compatible?(obj1, obj2) ⇒ nil
Checks the compatibility of two objects.
-
.default_external ⇒ Object
Returns default external encoding.
-
.default_external=(enc) ⇒ Object
Sets default external encoding.
-
.default_internal ⇒ Object
Returns default internal encoding.
-
.default_internal=(enc) ⇒ Object
Sets default internal encoding or removes default internal encoding when passed nil.
-
.find(string) ⇒ Object
Search the encoding with specified name.
-
.list ⇒ Array
Returns the list of loaded encodings.
-
.locale_charmap ⇒ String
Returns the locale charmap name.
-
.name_list ⇒ Array
Returns the list of available encoding names.
Instance Method Summary collapse
-
#_dump(*args) ⇒ Object
:nodoc:.
-
#ascii_compatible? ⇒ Boolean
Returns whether ASCII-compatible or not.
-
#dummy? ⇒ Boolean
Returns true for dummy encodings.
-
#inspect ⇒ String
Returns a string which represents the encoding for programmers.
-
#name ⇒ Object
Returns the name of the encoding.
-
#names ⇒ Array
Returns the list of name and aliases of the encoding.
-
#to_s ⇒ Object
Returns the name of the encoding.
Class Method Details
._load(str) ⇒ Object
:nodoc:
1449 1450 1451 1452 1453 |
# File 'encoding.c', line 1449
static VALUE
enc_load(VALUE klass, VALUE str)
{
return str;
}
|
.aliases ⇒ Object
Returns the hash of available encoding alias and original encoding name.
Encoding.aliases
#=> {"BINARY"=>"ASCII-8BIT", "ASCII"=>"US-ASCII", "ANSI_X3.4-1968"=>"US-ASCII",
"SJIS"=>"Windows-31J", "eucJP"=>"EUC-JP", "CP932"=>"Windows-31J"}
1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 |
# File 'encoding.c', line 1870
static VALUE
rb_enc_aliases(VALUE klass)
{
VALUE aliases[2];
aliases[0] = rb_hash_new();
aliases[1] = rb_ary_new();
st_foreach(global_enc_table.names, rb_enc_aliases_enc_i, (st_data_t)aliases);
return aliases[0];
}
|
.compatible?(obj1, obj2) ⇒ nil
Checks the compatibility of two objects.
If the objects are both strings they are compatible when they are concatenatable. The encoding of the concatenated string will be returned if they are compatible, nil if they are not.
Encoding.compatible?("\xa1".force_encoding("iso-8859-1"), "b")
#=> #<Encoding:ISO-8859-1>
Encoding.compatible?(
"\xa1".force_encoding("iso-8859-1"),
"\xa1\xa1".force_encoding("euc-jp"))
#=> nil
If the objects are non-strings their encodings are compatible when they have an encoding and:
-
Either encoding is US-ASCII compatible
-
One of the encodings is a 7-bit encoding
1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 |
# File 'encoding.c', line 1419
static VALUE
enc_compatible_p(VALUE klass, VALUE str1, VALUE str2)
{
rb_encoding *enc;
if (!enc_capable(str1)) return Qnil;
if (!enc_capable(str2)) return Qnil;
enc = rb_enc_compatible(str1, str2);
if (!enc) return Qnil;
return rb_enc_from_encoding(enc);
}
|
.default_external ⇒ Object
Returns default external encoding.
The default external encoding is used by default for strings created from the following locations:
-
CSV
-
File data read from disk
-
SDBM
-
StringIO
-
Zlib::GzipReader
-
Zlib::GzipWriter
-
String#inspect
-
Regexp#inspect
While strings created from these locations will have this encoding, the encoding may not be valid. Be sure to check String#valid_encoding?.
File data written to disk will be transcoded to the default external encoding when written, if default_internal is not nil.
The default external encoding is initialized by the -E option. If -E isn’t set, it is initialized to UTF-8 on Windows and the locale on other operating systems.
1636 1637 1638 1639 1640 |
# File 'encoding.c', line 1636
static VALUE
get_default_external(VALUE klass)
{
return rb_enc_default_external();
}
|
.default_external=(enc) ⇒ Object
Sets default external encoding. You should not set Encoding::default_external in ruby code as strings created before changing the value may have a different encoding from strings created after the value was changed., instead you should use ruby -E
to invoke ruby with the correct default_external.
See Encoding::default_external for information on how the default external encoding is used.
1665 1666 1667 1668 1669 1670 1671 |
# File 'encoding.c', line 1665
static VALUE
set_default_external(VALUE klass, VALUE encoding)
{
rb_warning("setting Encoding.default_external");
rb_enc_set_default_external(encoding);
return encoding;
}
|
.default_internal ⇒ Object
Returns default internal encoding. Strings will be transcoded to the default internal encoding in the following places if the default internal encoding is not nil:
-
CSV
-
Etc.sysconfdir and Etc.systmpdir
-
File data read from disk
-
File names from Dir
-
Integer#chr
-
String#inspect and Regexp#inspect
-
Strings returned from Readline
-
Strings returned from SDBM
-
Time#zone
-
Values from ENV
-
Values in ARGV including $PROGRAM_NAME
Additionally String#encode and String#encode! use the default internal encoding if no encoding is given.
The script encoding (__ENCODING__), not default_internal, is used as the encoding of created strings.
Encoding::default_internal is initialized with -E option or nil otherwise.
1719 1720 1721 1722 1723 |
# File 'encoding.c', line 1719
static VALUE
get_default_internal(VALUE klass)
{
return rb_enc_default_internal();
}
|
.default_internal=(enc) ⇒ Object
Sets default internal encoding or removes default internal encoding when passed nil. You should not set Encoding::default_internal in ruby code as strings created before changing the value may have a different encoding from strings created after the change. Instead you should use ruby -E
to invoke ruby with the correct default_internal.
See Encoding::default_internal for information on how the default internal encoding is used.
1745 1746 1747 1748 1749 1750 1751 |
# File 'encoding.c', line 1745
static VALUE
set_default_internal(VALUE klass, VALUE encoding)
{
rb_warning("setting Encoding.default_internal");
rb_enc_set_default_internal(encoding);
return encoding;
}
|
.find(string) ⇒ Object
Search the encoding with specified name. name should be a string.
Encoding.find("US-ASCII") #=> #<Encoding:US-ASCII>
Names which this method accept are encoding names and aliases including following special aliases
- “external”
-
default external encoding
- “internal”
-
default internal encoding
- “locale”
-
locale encoding
- “filesystem”
-
filesystem encoding
An ArgumentError is raised when no encoding with name. Only Encoding.find("internal")
however returns nil when no encoding named “internal”, in other words, when Ruby has no default internal encoding.
1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 |
# File 'encoding.c', line 1384
static VALUE
enc_find(VALUE klass, VALUE enc)
{
int idx;
if (is_obj_encoding(enc))
return enc;
idx = str_to_encindex(enc);
if (idx == UNSPECIFIED_ENCODING) return Qnil;
return rb_enc_from_encoding_index(idx);
}
|
.list ⇒ Array
Returns the list of loaded encodings.
Encoding.list
#=> [#<Encoding:ASCII-8BIT>, #<Encoding:UTF-8>,
#<Encoding:ISO-2022-JP (dummy)>]
Encoding.find("US-ASCII")
#=> #<Encoding:US-ASCII>
Encoding.list
#=> [#<Encoding:ASCII-8BIT>, #<Encoding:UTF-8>,
#<Encoding:US-ASCII>, #<Encoding:ISO-2022-JP (dummy)>]
1354 1355 1356 1357 1358 1359 1360 |
# File 'encoding.c', line 1354
static VALUE
enc_list(VALUE klass)
{
VALUE ary = rb_ary_new2(0);
rb_ary_replace(ary, rb_encoding_list);
return ary;
}
|
.locale_charmap ⇒ String
Returns the locale charmap name. It returns nil if no appropriate information.
Debian GNU/Linux
LANG=C
Encoding.locale_charmap #=> "ANSI_X3.4-1968"
LANG=ja_JP.EUC-JP
Encoding.locale_charmap #=> "EUC-JP"
SunOS 5
LANG=C
Encoding.locale_charmap #=> "646"
LANG=ja
Encoding.locale_charmap #=> "eucJP"
The result is highly platform dependent. So Encoding.find(Encoding.locale_charmap) may cause an error. If you need some encoding object even for unknown locale, Encoding.find(“locale”) can be used.
90 91 92 93 94 95 96 97 98 |
# File 'localeinit.c', line 90
VALUE
rb_locale_charmap(VALUE klass)
{
#if NO_LOCALE_CHARMAP
return rb_usascii_str_new_cstr("US-ASCII");
#else
return locale_charmap(rb_usascii_str_new_cstr);
#endif
}
|
.name_list ⇒ Array
Returns the list of available encoding names.
Encoding.name_list
#=> ["US-ASCII", "ASCII-8BIT", "UTF-8",
"ISO-8859-1", "Shift_JIS", "EUC-JP",
"Windows-31J",
"BINARY", "CP932", "eucJP"]
1827 1828 1829 1830 1831 1832 1833 |
# File 'encoding.c', line 1827
static VALUE
rb_enc_name_list(VALUE klass)
{
VALUE ary = rb_ary_new2(global_enc_table.names->num_entries);
st_foreach(global_enc_table.names, rb_enc_name_list_i, (st_data_t)ary);
return ary;
}
|
Instance Method Details
#_dump(*args) ⇒ Object
:nodoc:
1441 1442 1443 1444 1445 1446 |
# File 'encoding.c', line 1441
static VALUE
enc_dump(int argc, VALUE *argv, VALUE self)
{
rb_check_arity(argc, 0, 1);
return enc_name(self);
}
|
#ascii_compatible? ⇒ Boolean
628 629 630 631 632 |
# File 'encoding.c', line 628
static VALUE
enc_ascii_compatible_p(VALUE enc)
{
return RBOOL(rb_enc_asciicompat(must_encoding(enc)));
}
|
#dummy? ⇒ Boolean
612 613 614 615 616 |
# File 'encoding.c', line 612
static VALUE
enc_dummy_p(VALUE enc)
{
return RBOOL(ENC_DUMMY_P(must_encoding(enc)));
}
|
#inspect ⇒ String
1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 |
# File 'encoding.c', line 1271
static VALUE
enc_inspect(VALUE self)
{
rb_encoding *enc;
if (!is_data_encoding(self)) {
not_encoding(self);
}
if (!(enc = DATA_PTR(self)) || rb_enc_from_index(rb_enc_to_index(enc)) != enc) {
rb_raise(rb_eTypeError, "broken Encoding");
}
return rb_enc_sprintf(rb_usascii_encoding(),
"#<%"PRIsVALUE":%s%s%s>", rb_obj_class(self),
rb_enc_inspect_name(enc),
(ENC_DUMMY_P(enc) ? " (dummy)" : ""),
rb_enc_autoload_p(enc) ? " (autoload)" : "");
}
|
#name ⇒ String #to_s ⇒ String
Returns the name of the encoding.
Encoding::UTF_8.name #=> "UTF-8"
1299 1300 1301 1302 1303 |
# File 'encoding.c', line 1299
static VALUE
enc_name(VALUE self)
{
return rb_fstring_cstr(rb_enc_name((rb_encoding*)DATA_PTR(self)));
}
|
#names ⇒ Array
Returns the list of name and aliases of the encoding.
Encoding::WINDOWS_31J.names #=> ["Windows-31J", "CP932", "csWindows31J", "SJIS", "PCK"]
1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 |
# File 'encoding.c', line 1325
static VALUE
enc_names(VALUE self)
{
VALUE args[2];
args[0] = (VALUE)rb_to_encoding_index(self);
args[1] = rb_ary_new2(0);
st_foreach(global_enc_table.names, enc_names_i, (st_data_t)args);
return args[1];
}
|