Module: TexterraNLP
Constant Summary
Constants included from TexterraNLPSpecs
Instance Method Summary collapse
-
#disambiguation_annotate(text) ⇒ Array
Detects the most appropriate meanings (concepts) for terms occurred in a given text.
-
#domain_detection_annotate(text) ⇒ Array
Detects the most appropriate domain for the given text.
-
#domain_polarity_detection_annotate(text, domain = '') ⇒ Array
Detects whether the given text has positive, negative, or no sentiment, with respect to domain.
-
#key_concepts_annotate(text) ⇒ Array
Key concepts are the concepts providing short (conceptual) and informative text description.
-
#language_detection_annotate(text) ⇒ Array
Detects language of given text.
-
#lemmatization_annotate(text) ⇒ Array
Detects lemma of each word of a given text.
-
#named_entities_annotate(text) ⇒ Array
Finds all named entities occurences in a given text.
-
#polarity_detection_annotate(text) ⇒ Array
Detects whether the given text has positive, negative or no sentiment.
-
#pos_tagging_annotate(text) ⇒ Array
Detects part of speech tag for each word of a given text.
-
#sentence_detection_annotate(text) ⇒ Array
Detects boundaries of sentences in a given text.
-
#spelling_correction_annotate(text) ⇒ Array
Tries to correct disprints and other spelling errors in a given text.
-
#subjectivity_detection_annotate(text) ⇒ Array
Detects whether the given text is subjective or not.
-
#syntax_detection(text) ⇒ Array
Detects Syntax relations in text.
-
#term_detection_annotate(text) ⇒ Array
Extracts not overlapping terms within a given text; term is a textual representation for some concept of the real world.
-
#tokenization_annotate(text) ⇒ Array
Detects all tokens (minimal significant text parts) in a given text.
-
#tweet_normalization(text) ⇒ Array
Detects Twitter-specific entities: Hashtags, User names, Emoticons, URLs.
Instance Method Details
#disambiguation_annotate(text) ⇒ Array
Detects the most appropriate meanings (concepts) for terms occurred in a given text
73 74 75 |
# File 'lib/ispras-api/texterra/nlp.rb', line 73 def disambiguation_annotate(text) preset_nlp(:disambiguation, text) end |
#domain_detection_annotate(text) ⇒ Array
Detects the most appropriate domain for the given text. Currently only 2 specific domains are supported: ‘movie’ and ‘politics’ If no domain from this list has been detected, the text is assumed to be no domain, or general domain
92 93 94 |
# File 'lib/ispras-api/texterra/nlp.rb', line 92 def domain_detection_annotate(text) preset_nlp(:domainDetection, text) end |
#domain_polarity_detection_annotate(text, domain = '') ⇒ Array
Detects whether the given text has positive, negative, or no sentiment, with respect to domain. If domain isn’t provided, Domain detection is applied, this way method tries to achieve best results. If no domain is detected general domain algorithm is applied
119 120 121 122 123 124 125 126 |
# File 'lib/ispras-api/texterra/nlp.rb', line 119 def domain_polarity_detection_annotate(text, domain = '') specs = NLP_SPECS[:domainPolarityDetection] domain = "(#{domain})" unless domain.empty? result = POST(specs[:path] % domain, specs[:params], text: text)[:nlp_document][:annotations][:i_annotation] return [] if result.nil? result = [].push result unless result.is_a? Array result.map { |e| assign_text(e, text) } end |
#key_concepts_annotate(text) ⇒ Array
Key concepts are the concepts providing short (conceptual) and informative text description. This service extracts a set of key concepts for a given text
82 83 84 |
# File 'lib/ispras-api/texterra/nlp.rb', line 82 def key_concepts_annotate(text) preset_nlp(:keyConcepts, text) end |
#language_detection_annotate(text) ⇒ Array
Detects language of given text
9 10 11 |
# File 'lib/ispras-api/texterra/nlp.rb', line 9 def language_detection_annotate(text) preset_nlp(:languageDetection, text) end |
#lemmatization_annotate(text) ⇒ Array
Detects lemma of each word of a given text
33 34 35 |
# File 'lib/ispras-api/texterra/nlp.rb', line 33 def lemmatization_annotate(text) preset_nlp(:lemmatization, text) end |
#named_entities_annotate(text) ⇒ Array
Finds all named entities occurences in a given text
57 58 59 |
# File 'lib/ispras-api/texterra/nlp.rb', line 57 def named_entities_annotate(text) preset_nlp(:namedEntities, text) end |
#polarity_detection_annotate(text) ⇒ Array
Detects whether the given text has positive, negative or no sentiment
108 109 110 |
# File 'lib/ispras-api/texterra/nlp.rb', line 108 def polarity_detection_annotate(text) preset_nlp(:polarityDetection, text) end |
#pos_tagging_annotate(text) ⇒ Array
Detects part of speech tag for each word of a given text
41 42 43 |
# File 'lib/ispras-api/texterra/nlp.rb', line 41 def pos_tagging_annotate(text) preset_nlp(:posTagging, text) end |
#sentence_detection_annotate(text) ⇒ Array
Detects boundaries of sentences in a given text
17 18 19 |
# File 'lib/ispras-api/texterra/nlp.rb', line 17 def sentence_detection_annotate(text) preset_nlp(:sentenceDetection, text) end |
#spelling_correction_annotate(text) ⇒ Array
Tries to correct disprints and other spelling errors in a given text
49 50 51 |
# File 'lib/ispras-api/texterra/nlp.rb', line 49 def spelling_correction_annotate(text) preset_nlp(:spellingCorrection, text) end |
#subjectivity_detection_annotate(text) ⇒ Array
Detects whether the given text is subjective or not
100 101 102 |
# File 'lib/ispras-api/texterra/nlp.rb', line 100 def subjectivity_detection_annotate(text) preset_nlp(:subjectivityDetection, text) end |
#syntax_detection(text) ⇒ Array
Detects Syntax relations in text. Only works for russian texts
141 142 143 144 145 |
# File 'lib/ispras-api/texterra/nlp.rb', line 141 def syntax_detection(text) preset_nlp(:syntaxDetection, text).each do |an| an[:value][:parent_token] = assign_text(an[:value][:parent_token], text) if an[:value] && an[:value][:parent_token] end end |
#term_detection_annotate(text) ⇒ Array
Extracts not overlapping terms within a given text; term is a textual representation for some concept of the real world
65 66 67 |
# File 'lib/ispras-api/texterra/nlp.rb', line 65 def term_detection_annotate(text) preset_nlp(:termDetection, text) end |
#tokenization_annotate(text) ⇒ Array
Detects all tokens (minimal significant text parts) in a given text
25 26 27 |
# File 'lib/ispras-api/texterra/nlp.rb', line 25 def tokenization_annotate(text) preset_nlp(:tokenization, text) end |
#tweet_normalization(text) ⇒ Array
Detects Twitter-specific entities: Hashtags, User names, Emoticons, URLs. And also: Stop-words, Misspellings, Spelling suggestions, Spelling corrections
133 134 135 |
# File 'lib/ispras-api/texterra/nlp.rb', line 133 def tweet_normalization(text) preset_nlp(:tweetNormalization, text) end |