Module: ACSV::Detect
- Defined in:
- lib/acsv/detect/encoding.rb,
lib/acsv/detect/separator.rb,
lib/acsv/detect/encoding_holmes.rb,
lib/acsv/detect/encoding_rchardet.rb,
lib/acsv/detect/encoding_uchardet.rb
Defined Under Namespace
Modules: EncodingHolmes, EncodingRChardet, EncodingUChardet
Constant Summary collapse
- CONFIDENCE =
Default confidence level for encoding detection to succeed
0.6
- PREVIEW_BYTES =
Number of bytes to test encoding on
8 * 4096
- SEPARATORS =
Possible CSV separators to check
[",", ";", "\t", "|", "#"]
Class Method Summary collapse
-
.encoding(file_or_data, options = {}) ⇒ String
Tries to detect the file encoding.
-
.encoding_methods ⇒ Array<String>
List of available methods for encoding.
-
.encoding_methods_all ⇒ Array<String>
List of possible methods for encoding (even if its gem is missing).
-
.separator(file_or_data) ⇒ String
Most probable column separator character from first line, or
nil
when none found.
Class Method Details
.encoding(file_or_data, options = {}) ⇒ String
Tries to detect the file encoding.
20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 |
# File 'lib/acsv/detect/encoding.rb', line 20 def encoding(file_or_data, ={}) if file_or_data.is_a? File position = file_or_data.tell data = file_or_data.read(PREVIEW_BYTES) file_or_data.seek(position) else data = file_or_data end detector_do() do |detector| if enc = detector.encoding(data, ) return enc end end nil end |
.encoding_methods ⇒ Array<String>
Returns List of available methods for encoding.
38 39 40 |
# File 'lib/acsv/detect/encoding.rb', line 38 def encoding_methods ENCODING_DETECTORS_AVAIL.map(&:require_name) end |
.encoding_methods_all ⇒ Array<String>
Returns List of possible methods for encoding (even if its gem is missing).
43 44 45 |
# File 'lib/acsv/detect/encoding.rb', line 43 def encoding_methods_all ENCODING_DETECTORS_ALL.map(&:require_name) end |
.separator(file_or_data) ⇒ String
return whichever character returns the same number of columns over multiple lines
Returns most probable column separator character from first line, or nil
when none found.
10 11 12 13 14 15 16 17 18 19 20 21 |
# File 'lib/acsv/detect/separator.rb', line 10 def self.separator(file_or_data) if file_or_data.is_a? File position = file_or_data.tell firstline = file_or_data.readline file_or_data.seek(position) else firstline = file_or_data.split("\n", 2)[0] end separators = SEPARATORS.map{|s| s.encode(firstline.encoding)} sep = separators.map {|x| [firstline.count(x),x]}.sort_by {|x| x[0]}.last sep[0] == 0 ? nil : sep[1].encode('ascii') end |