Class: Spacy::Doc

Inherits:

Object

Object
Spacy::Doc

show all

Includes:: Enumerable

Defined in:: lib/ruby-spacy.rb

Overview

See also spaCy Python API document for Doc.

Instance Attribute Summary collapse

#py_doc ⇒ Object readonly
A Python Doc instance accessible via PyCall.
#py_nlp ⇒ Object readonly
A Python Language instance accessible via PyCall.
#text ⇒ String readonly
A text string of the document.

Instance Method Summary collapse

#[](range) ⇒ Object
Returns a span if given a range object; or returns a token if given an integer representing a position in the doc.
#displacy(style: "dep", compact: false) ⇒ String
Visualize the document in one of two styles: "dep" (dependencies) or "ent" (named entities).
#each ⇒ Object
Iterates over the elements in the doc yielding a token instance each time.
#ents ⇒ Array
Returns an array of spans each representing a named entity.
#initialize(nlp, py_doc: nil, text: nil, max_retrial: MAX_RETRIAL, retrial: 0) ⇒ Doc constructor
It is recommended to use Language#read method to create a doc.
#method_missing(name, *args) ⇒ Object
Methods defined in Python but not wrapped in ruby-spacy can be called by this dynamic method handling mechanism.
#noun_chunks ⇒ Array
Returns an array of spans representing noun chunks.
#respond_to_missing?(sym) ⇒ Boolean
#retokenize(start_index, end_index, attributes = {}) ⇒ Object
Retokenizes the text merging a span into a single token.
#retokenize_split(pos_in_doc, split_array, head_pos_in_split, ancestor_pos, attributes = {}) ⇒ Object
Retokenizes the text splitting the specified token.
#sents ⇒ Array
Returns an array of spans each representing a sentence.
#similarity(other) ⇒ Float
Returns a semantic similarity estimate.
#span(range_or_start, optional_size = nil) ⇒ Span
Returns a span of the specified range within the doc.
#to_s ⇒ String
String representation of the document.
#tokens ⇒ Array<Token>
Returns an array of tokens contained in the doc.

Constructor Details

#initialize(nlp, py_doc: nil, text: nil, max_retrial: MAX_RETRIAL, retrial: 0) ⇒ `Doc`

It is recommended to use Language#read method to create a doc. If you need to create one using #initialize, there are two method signatures: Spacy::Doc.new(nlp_id, py_doc: Object) and Spacy::Doc.new(nlp_id, text: String).

Parameters:

nlp (Language) —
an instance of Language class
py_doc (Object) (defaults to: nil) —
an instance of Python Doc class
text (String) (defaults to: nil) —
the text string to be analyzed

# File 'lib/ruby-spacy.rb', line 62

def initialize(nlp, py_doc: nil, text: nil, max_retrial: MAX_RETRIAL, retrial: 0)
  @py_nlp = nlp
  @py_doc = py_doc || @py_doc = nlp.call(text)
  @text = @py_doc.text
rescue StandardError
  retrial += 1
  raise "Error: Failed to construct a Doc object" unless retrial <= max_retrial

  sleep 0.5
  initialize(nlp, py_doc: py_doc, text: text, max_retrial: max_retrial, retrial: retrial)
end

Dynamic Method Handling

This class handles dynamic methods through the method_missing method

#method_missing(name, *args) ⇒ `Object`

Methods defined in Python but not wrapped in ruby-spacy can be called by this dynamic method handling mechanism.



202
203
204

# File 'lib/ruby-spacy.rb', line 202

def method_missing(name, *args)
  @py_doc.send(name, *args)
end

Instance Attribute Details

#py_doc ⇒ `Object` (readonly)

Returns a Python Doc instance accessible via PyCall.

Returns:

(Object) —
a Python Doc instance accessible via PyCall



45
46
47

# File 'lib/ruby-spacy.rb', line 45

def py_doc
  @py_doc
end

#py_nlp ⇒ `Object` (readonly)

Returns a Python Language instance accessible via PyCall.

Returns:

(Object) —
a Python Language instance accessible via PyCall



42
43
44

# File 'lib/ruby-spacy.rb', line 42

def py_nlp
  @py_nlp
end

#text ⇒ `String` (readonly)

Returns a text string of the document.

Returns:

(String) —
a text string of the document



48
49
50

# File 'lib/ruby-spacy.rb', line 48

def text
  @text
end

Instance Method Details

#[](range) ⇒ `Object`

Returns a span if given a range object; or returns a token if given an integer representing a position in the doc.

Parameters:

range (Range) —
an ordinary Ruby's range object such as 0..3, 1...4, or 3 .. -1

# File 'lib/ruby-spacy.rb', line 177

def [](range)
  if range.is_a?(Range)
    py_span = @py_doc[range]
    Span.new(self, start_index: py_span.start, end_index: py_span.end - 1)
  else
    Token.new(@py_doc[range])
  end
end

#displacy(style: "dep", compact: false) ⇒ `String`

Visualize the document in one of two styles: "dep" (dependencies) or "ent" (named entities).

Parameters:

style (String) (defaults to: "dep") —
either dep or ent
compact (Boolean) (defaults to: false) —
only relevant to the `dep' style

Returns:

(String) —
in the case of dep, the output text will be an SVG, whereas in the ent style, the output text will be an HTML.



197
198
199

# File 'lib/ruby-spacy.rb', line 197

def displacy(style: "dep", compact: false)
  PyDisplacy.render(py_doc, style: style, options: { compact: compact }, jupyter: false)
end

#each ⇒ `Object`

Iterates over the elements in the doc yielding a token instance each time.

# File 'lib/ruby-spacy.rb', line 113

def each
  PyCall::List.call(@py_doc).each do |py_token|
    yield Token.new(py_token)
  end
end

#ents ⇒ `Array`

Returns an array of spans each representing a named entity.

Returns:

(Array)

# File 'lib/ruby-spacy.rb', line 163

def ents
  # so that ents canbe "each"-ed in Ruby
  ent_array = []
  PyCall::List.call(@py_doc.ents).each do |ent|
    ent.define_singleton_method :label do
      label_
    end
    ent_array << ent
  end
  ent_array
end

#noun_chunks ⇒ `Array`

Returns an array of spans representing noun chunks.

Returns:

(Array)

# File 'lib/ruby-spacy.rb', line 141

def noun_chunks
  chunk_array = []
  py_chunks = PyCall::List.call(@py_doc.noun_chunks)
  py_chunks.each do |py_chunk|
    chunk_array << Span.new(self, start_index: py_chunk.start, end_index: py_chunk.end - 1)
  end
  chunk_array
end

#respond_to_missing?(sym) ⇒ `Boolean`

Returns:

(Boolean)



206
207
208

# File 'lib/ruby-spacy.rb', line 206

def respond_to_missing?(sym)
  sym ? true : super
end

#retokenize(start_index, end_index, attributes = {}) ⇒ `Object`

Retokenizes the text merging a span into a single token.

Parameters:

start_index (Integer) —
the start position of the span to be retokenized in the document
end_index (Integer) —
the end position of the span to be retokenized in the document
attributes (Hash) (defaults to: {}) —
attributes to set on the merged token

# File 'lib/ruby-spacy.rb', line 78

def retokenize(start_index, end_index, attributes = {})
  PyCall.with(@py_doc.retokenize) do |retokenizer|
    retokenizer.merge(@py_doc[start_index..end_index], attrs: attributes)
  end
end

#retokenize_split(pos_in_doc, split_array, head_pos_in_split, ancestor_pos, attributes = {}) ⇒ `Object`

Retokenizes the text splitting the specified token.

Parameters:

pos_in_doc (Integer) —
the position of the span to be retokenized in the document
split_array (Array<String>) —
text strings of the split results
ancestor_pos (Integer) —
the position of the immediate ancestor element of the split elements in the document
attributes (Hash) (defaults to: {}) —
the attributes of the split elements

# File 'lib/ruby-spacy.rb', line 89

def retokenize_split(pos_in_doc, split_array, head_pos_in_split, ancestor_pos, attributes = {})
  PyCall.with(@py_doc.retokenize) do |retokenizer|
    heads = [[@py_doc[pos_in_doc], head_pos_in_split], @py_doc[ancestor_pos]]
    retokenizer.split(@py_doc[pos_in_doc], split_array, heads: heads, attrs: attributes)
  end
end

#sents ⇒ `Array`

Returns an array of spans each representing a sentence.

Returns:

(Array)

# File 'lib/ruby-spacy.rb', line 152

def sents
  sentence_array = []
  py_sentences = PyCall::List.call(@py_doc.sents)
  py_sentences.each do |py_sent|
    sentence_array << Span.new(self, start_index: py_sent.start, end_index: py_sent.end - 1)
  end
  sentence_array
end

#similarity(other) ⇒ `Float`

Returns a semantic similarity estimate.

Parameters:

other (Doc) —
the other doc to which a similarity estimation is made

Returns:

(Float)



189
190
191

# File 'lib/ruby-spacy.rb', line 189

def similarity(other)
  py_doc.similarity(other.py_doc)
end

#span(range_or_start, optional_size = nil) ⇒ `Span`

Returns a span of the specified range within the doc. The method should be used either of the two ways: Doc#span(range) or Doc#span{start_pos, size_of_span}.

Parameters:

range_or_start (Range, Integer) —
a range object, or, alternatively, an integer that represents the start position of the span
optional_size (Integer) (defaults to: nil) —
an integer representing the size of the span

Returns:

(Span)

# File 'lib/ruby-spacy.rb', line 124

def span(range_or_start, optional_size = nil)
  if optional_size
    start_index = range_or_start
    temp = tokens[start_index...start_index + optional_size]
  else
    start_index = range_or_start.first
    range = range_or_start
    temp = tokens[range]
  end

  end_index = start_index + temp.size - 1

  Span.new(self, start_index: start_index, end_index: end_index)
end

#to_s ⇒ `String`

String representation of the document.

Returns:

(String)



98
99
100

# File 'lib/ruby-spacy.rb', line 98

def to_s
  @text
end

#tokens ⇒ `Array<Token>`

Returns an array of tokens contained in the doc.

Returns:

(Array<Token>)

# File 'lib/ruby-spacy.rb', line 104

def tokens
  results = []
  PyCall::List.call(@py_doc).each do |py_token|
    results << Token.new(py_token)
  end
  results
end

Class: Spacy::Doc

Overview

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(nlp, py_doc: nil, text: nil, max_retrial: MAX_RETRIAL, retrial: 0) ⇒ `Doc`

Dynamic Method Handling

#method_missing(name, *args) ⇒ `Object`

Instance Attribute Details

#py_doc ⇒ `Object` (readonly)

#py_nlp ⇒ `Object` (readonly)

#text ⇒ `String` (readonly)

Instance Method Details

#[](range) ⇒ `Object`

#displacy(style: "dep", compact: false) ⇒ `String`

#each ⇒ `Object`

#ents ⇒ `Array<Span>`

#noun_chunks ⇒ `Array<Span>`

#respond_to_missing?(sym) ⇒ `Boolean`

#retokenize(start_index, end_index, attributes = {}) ⇒ `Object`

#retokenize_split(pos_in_doc, split_array, head_pos_in_split, ancestor_pos, attributes = {}) ⇒ `Object`

#sents ⇒ `Array<Span>`

#similarity(other) ⇒ `Float`

#span(range_or_start, optional_size = nil) ⇒ `Span`

#to_s ⇒ `String`

#tokens ⇒ `Array<Token>`

Class: Spacy::Doc

Overview

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(nlp, py_doc: nil, text: nil, max_retrial: MAX_RETRIAL, retrial: 0) ⇒ Doc

Dynamic Method Handling

#method_missing(name, *args) ⇒ Object

Instance Attribute Details

#py_doc ⇒ Object (readonly)

#py_nlp ⇒ Object (readonly)

#text ⇒ String (readonly)

Instance Method Details

#[](range) ⇒ Object

#displacy(style: "dep", compact: false) ⇒ String

#each ⇒ Object

#ents ⇒ Array<Span>

#noun_chunks ⇒ Array<Span>

#respond_to_missing?(sym) ⇒ Boolean

#retokenize(start_index, end_index, attributes = {}) ⇒ Object

#retokenize_split(pos_in_doc, split_array, head_pos_in_split, ancestor_pos, attributes = {}) ⇒ Object

#sents ⇒ Array<Span>

#similarity(other) ⇒ Float

#span(range_or_start, optional_size = nil) ⇒ Span

#to_s ⇒ String

#tokens ⇒ Array<Token>

#initialize(nlp, py_doc: nil, text: nil, max_retrial: MAX_RETRIAL, retrial: 0) ⇒ `Doc`

#method_missing(name, *args) ⇒ `Object`

#py_doc ⇒ `Object` (readonly)

#py_nlp ⇒ `Object` (readonly)

#text ⇒ `String` (readonly)

#[](range) ⇒ `Object`

#displacy(style: "dep", compact: false) ⇒ `String`

#each ⇒ `Object`

#ents ⇒ `Array<Span>`

#noun_chunks ⇒ `Array<Span>`

#respond_to_missing?(sym) ⇒ `Boolean`

#retokenize(start_index, end_index, attributes = {}) ⇒ `Object`

#retokenize_split(pos_in_doc, split_array, head_pos_in_split, ancestor_pos, attributes = {}) ⇒ `Object`

#sents ⇒ `Array<Span>`

#similarity(other) ⇒ `Float`

#span(range_or_start, optional_size = nil) ⇒ `Span`

#to_s ⇒ `String`

#tokens ⇒ `Array<Token>`