Module: Metanorma::Utils
- Defined in:
- lib/utils/image.rb,
lib/utils/cjk.rb,
lib/utils/log.rb,
lib/utils/xml.rb,
lib/utils/main.rb,
lib/utils/version.rb,
lib/utils/namespace.rb,
lib/utils/linestatus.rb,
lib/utils/hash_transform_keys.rb,
lib/utils/hash_transform_keys.rb
Overview
Image methods were moved to the Vectory gem
Defined Under Namespace
Modules: Array, Hash Classes: LineStatus, Log, Namespace
Constant Summary collapse
- HAN =
Basic CJK scripts
"\\p{Han}".freeze
- BOPOMOFO =
"\\p{Bopomofo}".freeze
- HANGUL =
"\\p{Hangul}".freeze
- HIRAGANA =
"\\p{Hiragana}".freeze
- KATAKANA =
"\\p{Katakana}".freeze
- CJK_SYMBOLS =
CJK Symbols and Punctuation (U+3000–U+303F) Used across all CJK scripts
"[\\u3000-\\u303F]".freeze
- CJK_PUNCTUATION =
CJK Punctuation (subset of CJK Symbols commonly used)
"[\\u3001-\\u3003\\u3008-\\u3011\\u3014-\\u301F]".freeze
- CJK_HALFWIDTH_FULLWIDTH =
Halfwidth and Fullwidth Forms (U+FF00–U+FFEF) Used in all CJK contexts
"[\\uFF00-\\uFFEF]".freeze
- CJK_COMPAT =
CJK Compatibility Forms (U+FE30–U+FE4F) Primarily used with Han but relevant for all CJK
"[\\uFE30-\\uFE4F]".freeze
- CJK_VERTICAL =
Vertical Forms (U+FE10–U+FE1F) Used in vertical text layout for all CJK
"[\\uFE10-\\uFE1F]".freeze
- CJK_SMALL_FORMS =
Small Form Variants (U+FE50–U+FE6F) Used in all CJK contexts
"[\\uFE50-\\uFE6F]".freeze
- HAN_IDC =
Ideographic Description Characters (U+2FF0–U+2FFF) Used with Han script
"[\\u2FF0-\\u2FFF]".freeze
- KANBUN =
Kanbun (U+3190–U+319F) Used with Han script for Japanese
"[\\u3190-\\u319F]".freeze
- CJK_COMPAT_IDEOGRAPHS =
CJK Compatibility (U+3300–U+33FF) Used with Han script
"[\\u3300-\\u33FF]".freeze
- HAN_COMPAT_IDEOGRAPHS =
CJK Compatibility Ideographs (U+F900–U+FAFF)
"[\\uF900-\\uFAFF]".freeze
- HAN_EXTENSIONS =
Script extensions by primary script
[ HAN, CJK_SYMBOLS, CJK_PUNCTUATION, CJK_HALFWIDTH_FULLWIDTH, CJK_COMPAT, CJK_VERTICAL, CJK_SMALL_FORMS, HAN_IDC, KANBUN, CJK_COMPAT_IDEOGRAPHS, HAN_COMPAT_IDEOGRAPHS ].join("|").freeze
- HANGUL_EXTENSIONS =
[ HANGUL, CJK_SYMBOLS, CJK_PUNCTUATION, CJK_HALFWIDTH_FULLWIDTH, CJK_VERTICAL, CJK_SMALL_FORMS ].join("|").freeze
- HIRAGANA_EXTENSIONS =
[ HIRAGANA, CJK_SYMBOLS, CJK_PUNCTUATION, CJK_HALFWIDTH_FULLWIDTH, CJK_VERTICAL, CJK_SMALL_FORMS ].join("|").freeze
- KATAKANA_EXTENSIONS =
[ KATAKANA, CJK_SYMBOLS, CJK_PUNCTUATION, CJK_HALFWIDTH_FULLWIDTH, CJK_VERTICAL, CJK_SMALL_FORMS ].join("|").freeze
- BOPOMOFO_EXTENSIONS =
[ BOPOMOFO, CJK_SYMBOLS, CJK_PUNCTUATION, CJK_HALFWIDTH_FULLWIDTH ].join("|").freeze
- CJK =
Combined CJK pattern including all script extensions
[ HAN_EXTENSIONS, HANGUL_EXTENSIONS, HIRAGANA_EXTENSIONS, KATAKANA_EXTENSIONS, BOPOMOFO_EXTENSIONS ].join("|").freeze
- NAMECHAR =
"\u0000-\u002c\u002f\u003a-\u0040\\u005b-\u005e" \ "\u0060\u007b-\u00b6\u00b8-\u00bf\u00d7\u00f7\u037e" \ "\u2000-\u200b" \ "\u200e-\u203e\u2041-\u206f\u2190-\u2bff\u2ff0-\u3000".freeze
- NAMESTARTCHAR =
"\\u002d\u002e\u0030-\u0039\u00b7\u0300-\u036f" \ "\u203f-\u2040".freeze
- NOKOHEAD =
<<~HERE.freeze <!DOCTYPE html SYSTEM "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <title></title> <meta charset="UTF-8" /> </head> <body> </body> </html> HERE
- LONGSTR_THRESHOLD =
10
- LONGSTR_NOPUNCT =
2
- STR_BREAKUP_RE =
%r{ (?<=[=_—–\u2009→?+;]) | # break after any of these (?<=[,.:])(?!\d) | # break on punct only if not preceding digit (?<=[>])(?![>]) | # > not >-> (?<=[\]])(?![\]]) | # ] not ]-] (?<=//) | # // (?<=[/])(?![/]) | # / not /-/ (?<![<])(?=[<]) | # < not <-< (?<=\p{L})(?=[(\{\[]\p{L}) # letter and bracket, followed by letter }x.freeze
- CAMEL_CASE_RE =
%r{ (?<=\p{Ll}\p{Ll})(?=\p{Lu}\p{Ll}\p{Ll}) # 2 lowerc / upperc, 2 lowerc }x.freeze
- VERSION =
"1.11.2".freeze
Class Method Summary collapse
-
.anchor_attributes(presxml: false) ⇒ Object
all element/attribute pairs that are ID anchors in Metanorma.
- .anchor_or_uuid(node = nil) ⇒ Object
- .asciidoc_sub(text, flavour = :standoc) ⇒ Object
- .attr_code(attributes) ⇒ Object
-
.break_up_long_str(text, threshold = LONGSTR_THRESHOLD, nopunct = LONGSTR_NOPUNCT) ⇒ Object
break on punct every LONGSTRING_THRESHOLD chars, with zero width space if punct fails, try break on camel case, with soft hyphen break regardless every LONGSTRING_THRESHOLD * LONGSTR_NOPUNCT, with soft hyphen.
- .break_up_long_str1(text, iteration, nopunct) ⇒ Object
- .break_up_long_str2(text) ⇒ Object
- .case_transform_xml(xml, kase) ⇒ Object
- .create_namespace(xmldoc) ⇒ Object
-
.csv_split(text, delim = ";") ⇒ Object
, “ => ,” : CSV definition does not deal with space followed by quote at start of field.
- .default_script(lang) ⇒ Object
-
.dl_to_attrs(elem, dlist, name) ⇒ Object
convert definition list term/value pair into Nokogiri XML attribute.
-
.dl_to_elems(ins, elem, dlist, name) ⇒ Object
convert definition list term/value pairs into Nokogiri XML elements.
- .dl_to_elems1(term, name, ins) ⇒ Object
- .endash_date(elem) ⇒ Object
- .external_path(path) ⇒ Object
-
.firstchar_xml(line) ⇒ Object
need to deal with both <em> and its reverse string, >me<.
- .guid_anchor?(id) ⇒ Boolean
-
.line_sanitise(ret) ⇒ Object
By default, carriage return in source translates to whitespace; but in CJK, it does not.
- .localdir(node) ⇒ Object
-
.noko(_script = "Latn", &block) ⇒ Object
block for processing XML document fragments as XHTML, to allow for HTMLentities Unescape special chars used in Asciidoctor substitution processing.
- .noko_html(&block) ⇒ Object
- .ns(xpath) ⇒ Object
- .numeric_escapes(xml) ⇒ Object
- .rtl_script?(script) ⇒ Boolean
-
.set_nested_value(hash, keys, new_val) ⇒ Object
Set hash value using keys path mod from stackoverflow.com/a/42425884.
-
.smartformat(text) ⇒ Object
TODO needs internationalisation of quote.
- .strict_capitalize_first(str) ⇒ Object
- .strict_capitalize_phrase(str) ⇒ Object
- .to_ncname(tag, asciionly: true) ⇒ Object
- .to_xhtml_fragment(xml) ⇒ Object
-
.wrap_in_para(node, out) ⇒ Object
if the contents of node are blocks, output them to out; else, wrap them in <p>.
Class Method Details
.anchor_attributes(presxml: false) ⇒ Object
all element/attribute pairs that are ID anchors in Metanorma
128 129 130 131 132 133 134 135 136 137 138 139 140 |
# File 'lib/utils/xml.rb', line 128 def anchor_attributes(presxml: false) ret = [%w(review from), %w(review to), %w(callout target), %w(xref to), %w(eref bibitemid), %w(citation bibitemid), %w(xref target), %w(label for), %w(location target), %w(index to), %w(termsource bibitemid), %w(admonition target)] ret1 = [%w(fn target), %w(semx source), %w(fmt-title source), %w(fmt-xref to), %w(fmt-xref target), %w(fmt-eref bibitemid), %w(fmt-xref-label container), %w(fmt-fn-body target), %w(fmt-review-start source), %w(fmt-review-start end), %w(fmt-review-start target), %w(fmt-review-end source), %w(fmt-review-end start), %w(fmt-review-end target)] presxml ? ret + ret1 : ret end |
.anchor_or_uuid(node = nil) ⇒ Object
43 44 45 46 |
# File 'lib/utils/xml.rb', line 43 def anchor_or_uuid(node = nil) uuid = UUIDTools::UUID.random_create node.nil? || node.id.nil? || node.id.empty? ? "_#{uuid}" : node.id end |
.asciidoc_sub(text, flavour = :standoc) ⇒ Object
22 23 24 25 26 27 28 29 30 31 32 |
# File 'lib/utils/main.rb', line 22 def asciidoc_sub(text, flavour = :standoc) return nil if text.nil? return "" if text.empty? d = Asciidoctor::Document.new( text.lines.entries, { header_footer: false, backend: flavour }, ) b = d.parse.blocks.first b.apply_subs(b.source) end |
.attr_code(attributes) ⇒ Object
24 25 26 27 28 |
# File 'lib/utils/xml.rb', line 24 def attr_code(attributes) attributes.compact.transform_values do |v| v.is_a?(String) ? HTMLEntities.new.decode(v) : v end end |
.break_up_long_str(text, threshold = LONGSTR_THRESHOLD, nopunct = LONGSTR_NOPUNCT) ⇒ Object
break on punct every LONGSTRING_THRESHOLD chars, with zero width space if punct fails, try break on camel case, with soft hyphen break regardless every LONGSTRING_THRESHOLD * LONGSTR_NOPUNCT, with soft hyphen
140 141 142 143 144 145 146 147 148 149 150 151 |
# File 'lib/utils/main.rb', line 140 def break_up_long_str(text, threshold = LONGSTR_THRESHOLD, nopunct = LONGSTR_NOPUNCT) /^\s*$/.match?(text) and return text text.split(/(?=(?:\s|-))/).map do |w| if /^\s*$/.match(w) || (w.size < threshold) then w else w.scan(/.{,#{threshold}}/o).map.with_index do |w1, i| w1.size < threshold ? w1 : break_up_long_str1(w1, i + 1, nopunct) end.join end end.join end |
.break_up_long_str1(text, iteration, nopunct) ⇒ Object
168 169 170 171 172 173 174 175 176 177 178 |
# File 'lib/utils/main.rb', line 168 def break_up_long_str1(text, iteration, nopunct) s, separator = break_up_long_str2(text) if s.size == 1 # could not break up (iteration % nopunct).zero? and text += "\u00ad" # force soft hyphen text else s[-1] = "#{separator}#{s[-1]}" s.join end end |
.break_up_long_str2(text) ⇒ Object
180 181 182 183 184 185 186 187 188 |
# File 'lib/utils/main.rb', line 180 def break_up_long_str2(text) s = text.split(STR_BREAKUP_RE, -1) separator = "\u200b" if s.size == 1 s = text.split(CAMEL_CASE_RE) separator = "\u00ad" end [s, separator] end |
.case_transform_xml(xml, kase) ⇒ Object
167 168 169 170 171 172 173 174 |
# File 'lib/utils/xml.rb', line 167 def case_transform_xml(xml, kase) x = Nokogiri::XML("<root>#{xml}</root>") x.traverse do |e| e.text? or next e.replace(e.text.send(kase)) end x.root.children.to_xml end |
.create_namespace(xmldoc) ⇒ Object
21 22 23 |
# File 'lib/utils/namespace.rb', line 21 def create_namespace(xmldoc) Namespace.new(xmldoc) end |
.csv_split(text, delim = ";") ⇒ Object
, “ => ,” : CSV definition does not deal with space followed by quote at start of field
15 16 17 18 19 20 |
# File 'lib/utils/main.rb', line 15 def csv_split(text, delim = ";") text.nil? || text.empty? and return [] CSV.parse_line(text.gsub(/#{delim} "(?!")/, "#{delim}\""), liberal_parsing: true, col_sep: delim)&.compact&.map(&:strip) end |
.default_script(lang) ⇒ Object
113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 |
# File 'lib/utils/main.rb', line 113 def default_script(lang) case lang when "ar", "fa" then "Arab" when "ur" then "Aran" when "ru", "bg" then "Cyrl" when "hi" then "Deva" when "el" then "Grek" when "zh" then "Hans" when "ko" then "Kore" when "he" then "Hebr" when "ja" then "Jpan" else "Latn" end end |
.dl_to_attrs(elem, dlist, name) ⇒ Object
convert definition list term/value pair into Nokogiri XML attribute
143 144 145 146 147 |
# File 'lib/utils/xml.rb', line 143 def dl_to_attrs(elem, dlist, name) e = dlist.at("./dt[text()='#{name}']") or return val = e.at("./following::dd/p") || e.at("./following::dd") or return elem[name] = val.text end |
.dl_to_elems(ins, elem, dlist, name) ⇒ Object
convert definition list term/value pairs into Nokogiri XML elements
150 151 152 153 154 155 156 157 |
# File 'lib/utils/xml.rb', line 150 def dl_to_elems(ins, elem, dlist, name) a = elem.at("./#{name}[last()]") ins = a if a dlist.xpath("./dt[text()='#{name}']").each do |e| ins = dl_to_elems1(e, name, ins) end ins end |
.dl_to_elems1(term, name, ins) ⇒ Object
159 160 161 162 163 164 165 |
# File 'lib/utils/xml.rb', line 159 def dl_to_elems1(term, name, ins) v = term.at("./following::dd") e = v.elements and e.size == 1 && e.first.name == "p" and v = e.first v.name = name ins.next = v ins.next end |
.endash_date(elem) ⇒ Object
53 54 55 56 57 58 |
# File 'lib/utils/main.rb', line 53 def endash_date(elem) elem.traverse do |n| n.text? or next n.replace(n.text.gsub(/\s+--?\s+/, "–").gsub("--", "–")) end end |
.external_path(path) ⇒ Object
102 103 104 105 106 107 108 109 110 111 |
# File 'lib/utils/main.rb', line 102 def external_path(path) win = !!((RUBY_PLATFORM =~ /(win|w)(32|64)$/) || (RUBY_PLATFORM =~ /mswin|mingw/)) if win path.gsub!(%{/}, "\\") path[/\s/] ? "\"#{path}\"" : path else path end end |
.firstchar_xml(line) ⇒ Object
need to deal with both <em> and its reverse string, >me<
80 81 82 83 |
# File 'lib/utils/xml.rb', line 80 def firstchar_xml(line) m = /^([<>][^<>]+[<>])*(.)/.match(line) or return "" m[2] end |
.guid_anchor?(id) ⇒ Boolean
176 177 178 179 |
# File 'lib/utils/xml.rb', line 176 def guid_anchor?(id) /^_[0-9A-F]{8}-[0-9A-F]{4}-[0-9A-F]{4}-[0-9A-F]{4}-[0-9A-F]{12}$/i .match?(id) end |
.line_sanitise(ret) ⇒ Object
By default, carriage return in source translates to whitespace; but in CJK, it does not. (Non-CJK text n CJK)
63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 |
# File 'lib/utils/xml.rb', line 63 def line_sanitise(ret) ret.size == 1 and return ret (0...(ret.size - 1)).each do |i| last = firstchar_xml(ret[i].reverse) nextfirst = firstchar_xml(ret[i + 1]) cjk1 = /#{CJK}/o.match?(last) cjk2 = /#{CJK}/o.match?(nextfirst) text1 = /[^\p{Z}\p{C}]/.match?(last) text2 = /[^\p{Z}\p{C}]/.match?(nextfirst) cjk1 && (cjk2 || !text2) and next !text1 && cjk2 and next ret[i] += " " end ret end |
.localdir(node) ⇒ Object
34 35 36 37 |
# File 'lib/utils/main.rb', line 34 def localdir(node) docfile = node.attr("docfile") docfile.nil? ? "./" : "#{Pathname.new(docfile).parent}/" end |
.noko(_script = "Latn", &block) ⇒ Object
block for processing XML document fragments as XHTML, to allow for HTMLentities Unescape special chars used in Asciidoctor substitution processing
51 52 53 54 55 56 57 58 59 |
# File 'lib/utils/xml.rb', line 51 def noko(_script = "Latn", &block) fragment = ::Nokogiri::XML.parse(NOKOHEAD).fragment("") ::Nokogiri::XML::Builder.with fragment, &block fragment .to_xml(encoding: "UTF-8", indent: 0, save_with: Nokogiri::XML::Node::SaveOptions::AS_XML) .gsub("–", "\u0096").gsub("—", "\u0097") .gsub("–", "\u0096").gsub("—", "\u0097") end |
.noko_html(&block) ⇒ Object
85 86 87 88 89 90 91 92 93 94 |
# File 'lib/utils/xml.rb', line 85 def noko_html(&block) doc = ::Nokogiri::XML.parse(NOKOHEAD) fragment = doc.fragment("") ::Nokogiri::XML::Builder.with fragment, &block fragment.to_xml(encoding: "UTF-8", indent: 0, save_with: Nokogiri::XML::Node::SaveOptions::AS_XML) .lines.map do |l| l.gsub(/\s*\n/, "") end end |
.ns(xpath) ⇒ Object
101 102 103 104 105 106 |
# File 'lib/utils/xml.rb', line 101 def ns(xpath) xpath.gsub(%r{/([a-zA-Z])}, "/xmlns:\\1") .gsub(%r{::([a-zA-Z])}, "::xmlns:\\1") .gsub(%r{\[([a-zA-Z][a-z0-9A-Z@/-]* ?=)}, "[xmlns:\\1") .gsub(%r{\[([a-zA-Z][a-z0-9A-Z@/-]*[/\[\]])}, "[xmlns:\\1") end |
.numeric_escapes(xml) ⇒ Object
108 109 110 111 112 113 114 115 116 |
# File 'lib/utils/xml.rb', line 108 def numeric_escapes(xml) c = HTMLEntities.new xml.split(/(&[^ \r\n\t#&;]+;)/).map do |t| if /^(&[^ \t\r\n#;]+;)/.match?(t) c.encode(c.decode(t), :hexadecimal) else t end end.join end |
.rtl_script?(script) ⇒ Boolean
129 130 131 |
# File 'lib/utils/main.rb', line 129 def rtl_script?(script) %w(Arab Aran Hebr).include? script end |
.set_nested_value(hash, keys, new_val) ⇒ Object
Set hash value using keys path mod from stackoverflow.com/a/42425884
62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 |
# File 'lib/utils/main.rb', line 62 def set_nested_value(hash, keys, new_val) key = keys[0] if keys.length == 1 hash[key] = if hash[key].is_a?(::Array) then (hash[key] << new_val) else hash[key].nil? ? new_val : [hash[key], new_val] end elsif hash[key].is_a?(::Array) hash[key][-1] = {} if !hash[key].empty? && hash[key][-1].nil? hash[key] << {} if hash[key].empty? || !hash[key][-1].is_a?(::Hash) set_nested_value(hash[key][-1], keys[1..-1], new_val) elsif hash[key].nil? || hash[key].empty? hash[key] = {} set_nested_value(hash[key], keys[1..-1], new_val) elsif hash[key].is_a?(::Hash) && !hash[key][keys[1]] set_nested_value(hash[key], keys[1..-1], new_val) elsif !hash[key][keys[1]] hash[key] = [hash[key], {}] set_nested_value(hash[key][-1], keys[1..-1], new_val) else set_nested_value(hash[key], keys[1..-1], new_val) end hash end |
.smartformat(text) ⇒ Object
TODO needs internationalisation of quote
40 41 42 43 44 45 46 47 48 49 50 51 |
# File 'lib/utils/main.rb', line 40 def smartformat(text) ret = HTMLEntities.new.decode( text.gsub(/ --? /, " — ") .gsub("--", "—"), ) ret = ret.gsub(%r{(#{CJK})(["'])}o, "\\1\u200a\\2") .gsub(%r{(["'])(#{CJK})}o, "\\1\u200a\\2") ret = ret.smart_format ret = ret.gsub(%r{(#{CJK})\u200a}o, "\\1") .gsub(%r{\u200a(#{CJK})}o, "\\1") HTMLEntities.new.encode(ret, :basic) end |
.strict_capitalize_first(str) ⇒ Object
94 95 96 97 98 99 100 |
# File 'lib/utils/main.rb', line 94 def strict_capitalize_first(str) str.split(/ /).each_with_index.map do |w, i| letters = w.chars letters.first.upcase! if i.zero? letters.join end.join(" ") end |
.strict_capitalize_phrase(str) ⇒ Object
86 87 88 89 90 91 92 |
# File 'lib/utils/main.rb', line 86 def strict_capitalize_phrase(str) str.split(/ /).map do |w| letters = w.chars letters.first.upcase! letters.join end.join(" ") end |
.to_ncname(tag, asciionly: true) ⇒ Object
30 31 32 33 34 35 36 37 38 39 40 41 |
# File 'lib/utils/xml.rb', line 30 def to_ncname(tag, asciionly: true) asciionly and tag = HTMLEntities.new.encode(tag, :basic, :hexadecimal) start = tag[0] ret1 = if %r([#{NAMECHAR}#])o.match?(start) "_" else (%r([#{NAMESTARTCHAR}#])o.match?(start) ? "_#{start}" : start) end ret2 = tag[1..-1] || "" (ret1 || "") + ret2.gsub(%r([#{NAMECHAR}#])o, "_") end |
.to_xhtml_fragment(xml) ⇒ Object
96 97 98 99 |
# File 'lib/utils/xml.rb', line 96 def to_xhtml_fragment(xml) doc = ::Nokogiri::XML.parse(NOKOHEAD) doc.fragment(xml) end |
.wrap_in_para(node, out) ⇒ Object
if the contents of node are blocks, output them to out; else, wrap them in <p>
120 121 122 123 124 125 |
# File 'lib/utils/xml.rb', line 120 def wrap_in_para(node, out) if node.blocks? then out << node.content else out.p { |p| p << node.content } end end |