Class: Spacy::Doc
Overview
See also spaCy Python API document for Doc
.
Instance Attribute Summary collapse
-
#py_doc ⇒ Object
readonly
A Python
Doc
instance accessible viaPyCall
. -
#py_nlp ⇒ Object
readonly
A Python
Language
instance accessible viaPyCall
. -
#text ⇒ String
readonly
A text string of the document.
Instance Method Summary collapse
-
#[](range) ⇒ Object
Returns a span if given a range object; or returns a token if given an integer representing a position in the doc.
-
#displacy(style: "dep", compact: false) ⇒ String
Visualize the document in one of two styles: "dep" (dependencies) or "ent" (named entities).
-
#each ⇒ Object
Iterates over the elements in the doc yielding a token instance each time.
-
#ents ⇒ Array<Span>
Returns an array of spans each representing a named entity.
-
#initialize(nlp, py_doc: nil, text: nil, max_retrial: MAX_RETRIAL, retrial: 0) ⇒ Doc
constructor
It is recommended to use Language#read method to create a doc.
-
#method_missing(name, *args) ⇒ Object
Methods defined in Python but not wrapped in ruby-spacy can be called by this dynamic method handling mechanism.
-
#noun_chunks ⇒ Array<Span>
Returns an array of spans representing noun chunks.
- #respond_to_missing?(sym) ⇒ Boolean
-
#retokenize(start_index, end_index, attributes = {}) ⇒ Object
Retokenizes the text merging a span into a single token.
-
#retokenize_split(pos_in_doc, split_array, head_pos_in_split, ancestor_pos, attributes = {}) ⇒ Object
Retokenizes the text splitting the specified token.
-
#sents ⇒ Array<Span>
Returns an array of spans each representing a sentence.
-
#similarity(other) ⇒ Float
Returns a semantic similarity estimate.
-
#span(range_or_start, optional_size = nil) ⇒ Span
Returns a span of the specified range within the doc.
-
#to_s ⇒ String
String representation of the document.
-
#tokens ⇒ Array<Token>
Returns an array of tokens contained in the doc.
Constructor Details
#initialize(nlp, py_doc: nil, text: nil, max_retrial: MAX_RETRIAL, retrial: 0) ⇒ Doc
It is recommended to use Language#read method to create a doc. If you need to
create one using #initialize, there are two method signatures:
Spacy::Doc.new(nlp_id, py_doc: Object)
and Spacy::Doc.new(nlp_id, text: String)
.
62 63 64 65 66 67 68 69 70 71 72 |
# File 'lib/ruby-spacy.rb', line 62 def initialize(nlp, py_doc: nil, text: nil, max_retrial: MAX_RETRIAL, retrial: 0) @py_nlp = nlp @py_doc = py_doc || @py_doc = nlp.call(text) @text = @py_doc.text rescue StandardError retrial += 1 raise "Error: Failed to construct a Doc object" unless retrial <= max_retrial sleep 0.5 initialize(nlp, py_doc: py_doc, text: text, max_retrial: max_retrial, retrial: retrial) end |
Dynamic Method Handling
This class handles dynamic methods through the method_missing method
#method_missing(name, *args) ⇒ Object
Methods defined in Python but not wrapped in ruby-spacy can be called by this dynamic method handling mechanism.
202 203 204 |
# File 'lib/ruby-spacy.rb', line 202 def method_missing(name, *args) @py_doc.send(name, *args) end |
Instance Attribute Details
#py_doc ⇒ Object (readonly)
Returns a Python Doc
instance accessible via PyCall
.
45 46 47 |
# File 'lib/ruby-spacy.rb', line 45 def py_doc @py_doc end |
#py_nlp ⇒ Object (readonly)
Returns a Python Language
instance accessible via PyCall
.
42 43 44 |
# File 'lib/ruby-spacy.rb', line 42 def py_nlp @py_nlp end |
#text ⇒ String (readonly)
Returns a text string of the document.
48 49 50 |
# File 'lib/ruby-spacy.rb', line 48 def text @text end |
Instance Method Details
#[](range) ⇒ Object
Returns a span if given a range object; or returns a token if given an integer representing a position in the doc.
177 178 179 180 181 182 183 184 |
# File 'lib/ruby-spacy.rb', line 177 def [](range) if range.is_a?(Range) py_span = @py_doc[range] Span.new(self, start_index: py_span.start, end_index: py_span.end - 1) else Token.new(@py_doc[range]) end end |
#displacy(style: "dep", compact: false) ⇒ String
Visualize the document in one of two styles: "dep" (dependencies) or "ent" (named entities).
197 198 199 |
# File 'lib/ruby-spacy.rb', line 197 def displacy(style: "dep", compact: false) PyDisplacy.render(py_doc, style: style, options: { compact: compact }, jupyter: false) end |
#each ⇒ Object
Iterates over the elements in the doc yielding a token instance each time.
113 114 115 116 117 |
# File 'lib/ruby-spacy.rb', line 113 def each PyCall::List.call(@py_doc).each do |py_token| yield Token.new(py_token) end end |
#ents ⇒ Array<Span>
Returns an array of spans each representing a named entity.
163 164 165 166 167 168 169 170 171 172 173 |
# File 'lib/ruby-spacy.rb', line 163 def ents # so that ents canbe "each"-ed in Ruby ent_array = [] PyCall::List.call(@py_doc.ents).each do |ent| ent.define_singleton_method :label do label_ end ent_array << ent end ent_array end |
#noun_chunks ⇒ Array<Span>
Returns an array of spans representing noun chunks.
141 142 143 144 145 146 147 148 |
# File 'lib/ruby-spacy.rb', line 141 def noun_chunks chunk_array = [] py_chunks = PyCall::List.call(@py_doc.noun_chunks) py_chunks.each do |py_chunk| chunk_array << Span.new(self, start_index: py_chunk.start, end_index: py_chunk.end - 1) end chunk_array end |
#respond_to_missing?(sym) ⇒ Boolean
206 207 208 |
# File 'lib/ruby-spacy.rb', line 206 def respond_to_missing?(sym) sym ? true : super end |
#retokenize(start_index, end_index, attributes = {}) ⇒ Object
Retokenizes the text merging a span into a single token.
78 79 80 81 82 |
# File 'lib/ruby-spacy.rb', line 78 def retokenize(start_index, end_index, attributes = {}) PyCall.with(@py_doc.retokenize) do |retokenizer| retokenizer.merge(@py_doc[start_index..end_index], attrs: attributes) end end |
#retokenize_split(pos_in_doc, split_array, head_pos_in_split, ancestor_pos, attributes = {}) ⇒ Object
Retokenizes the text splitting the specified token.
89 90 91 92 93 94 |
# File 'lib/ruby-spacy.rb', line 89 def retokenize_split(pos_in_doc, split_array, head_pos_in_split, ancestor_pos, attributes = {}) PyCall.with(@py_doc.retokenize) do |retokenizer| heads = [[@py_doc[pos_in_doc], head_pos_in_split], @py_doc[ancestor_pos]] retokenizer.split(@py_doc[pos_in_doc], split_array, heads: heads, attrs: attributes) end end |
#sents ⇒ Array<Span>
Returns an array of spans each representing a sentence.
152 153 154 155 156 157 158 159 |
# File 'lib/ruby-spacy.rb', line 152 def sents sentence_array = [] py_sentences = PyCall::List.call(@py_doc.sents) py_sentences.each do |py_sent| sentence_array << Span.new(self, start_index: py_sent.start, end_index: py_sent.end - 1) end sentence_array end |
#similarity(other) ⇒ Float
Returns a semantic similarity estimate.
189 190 191 |
# File 'lib/ruby-spacy.rb', line 189 def similarity(other) py_doc.similarity(other.py_doc) end |
#span(range_or_start, optional_size = nil) ⇒ Span
Returns a span of the specified range within the doc.
The method should be used either of the two ways: Doc#span(range)
or Doc#span{start_pos, size_of_span}
.
124 125 126 127 128 129 130 131 132 133 134 135 136 137 |
# File 'lib/ruby-spacy.rb', line 124 def span(range_or_start, optional_size = nil) if optional_size start_index = range_or_start temp = tokens[start_index...start_index + optional_size] else start_index = range_or_start.first range = range_or_start temp = tokens[range] end end_index = start_index + temp.size - 1 Span.new(self, start_index: start_index, end_index: end_index) end |
#to_s ⇒ String
String representation of the document.
98 99 100 |
# File 'lib/ruby-spacy.rb', line 98 def to_s @text end |