Class: Spacy::Language

Inherits:
Object
  • Object
show all
Defined in:
lib/ruby-spacy.rb

Overview

See also spaCy Python API document for Language.

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(model = "en_core_web_sm", max_retrial: MAX_RETRIAL, retrial: 0) ⇒ Language

Creates a language model instance, which is conventionally referred to by a variable named nlp.

Parameters:

  • model (String) (defaults to: "en_core_web_sm")

    A language model installed in the system



221
222
223
224
225
226
227
228
229
230
231
# File 'lib/ruby-spacy.rb', line 221

def initialize(model = "en_core_web_sm", max_retrial: MAX_RETRIAL, retrial: 0)
  @spacy_nlp_id = "nlp_#{model.object_id}"
  PyCall.exec("import spacy; #{@spacy_nlp_id} = spacy.load('#{model}')")
  @py_nlp = PyCall.eval(@spacy_nlp_id)
rescue StandardError
  retrial += 1
  raise "Error: Pycall failed to load Spacy" unless retrial <= max_retrial

  sleep 0.5
  initialize(model, max_retrial: max_retrial, retrial: retrial)
end

Dynamic Method Handling

This class handles dynamic methods through the method_missing method

#method_missing(name, *args) ⇒ Object

Methods defined in Python but not wrapped in ruby-spacy can be called by this dynamic method handling mechanism....



316
317
318
# File 'lib/ruby-spacy.rb', line 316

def method_missing(name, *args)
  @py_nlp.send(name, *args)
end

Instance Attribute Details

#py_nlpObject (readonly)

Returns a Python Language instance accessible via PyCall.

Returns:

  • (Object)

    a Python Language instance accessible via PyCall



217
218
219
# File 'lib/ruby-spacy.rb', line 217

def py_nlp
  @py_nlp
end

#spacy_nlp_idString (readonly)

Returns an identifier string that can be used to refer to the Python Language object inside PyCall::exec or PyCall::eval.

Returns:

  • (String)

    an identifier string that can be used to refer to the Python Language object inside PyCall::exec or PyCall::eval



214
215
216
# File 'lib/ruby-spacy.rb', line 214

def spacy_nlp_id
  @spacy_nlp_id
end

Instance Method Details

#get_lexeme(text) ⇒ Object

A utility method to get a Python Lexeme object.

Parameters:

  • text (String)

    A text string representing a lexeme

Returns:



265
266
267
# File 'lib/ruby-spacy.rb', line 265

def get_lexeme(text)
  @py_nlp.vocab[text]
end

#matcherMatcher

Generates a matcher for the current language model.

Returns:



241
242
243
# File 'lib/ruby-spacy.rb', line 241

def matcher
  Matcher.new(@py_nlp)
end

#most_similar(vector, num) ⇒ Array<Hash{:key => Integer, :text => String, :best_rows => Array<Float>, :score => Float}>

Returns n lexemes having the vector representations that are the most similar to a given vector representation of a word.

Parameters:

  • vector (Object)

    A vector representation of a word (whether existing or non-existing)

Returns:

  • (Array<Hash{:key => Integer, :text => String, :best_rows => Array<Float>, :score => Float}>)

    An array of hash objects each contains the key, text, best_row and similarity score of a lexeme



279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
# File 'lib/ruby-spacy.rb', line 279

def most_similar(vector, num)
  vec_array = Numpy.asarray([vector])
  py_result = @py_nlp.vocab.vectors.most_similar(vec_array, n: num)
  key_texts = PyCall.eval("[[str(num), #{@spacy_nlp_id}.vocab[num].text] for num in #{py_result[0][0].tolist}]")
  keys = key_texts.map { |kt| kt[0] }
  texts = key_texts.map { |kt| kt[1] }
  best_rows = PyCall::List.call(py_result[1])[0]
  scores = PyCall::List.call(py_result[2])[0]

  results = []
  num.times do |i|
    result = { key: keys[i].to_i,
               text: texts[i],
               best_row: best_rows[i],
               score: scores[i] }
    result.each_key do |key|
      result.define_singleton_method(key) { result[key] }
    end
    results << result
  end
  results
end

#pipe(texts, disable: [], batch_size: 50) ⇒ Array<Doc>

Utility function to batch process many texts

Parameters:

  • texts (String)
  • disable (Array<String>) (defaults to: [])
  • batch_size (Integer) (defaults to: 50)

Returns:



307
308
309
310
311
312
313
# File 'lib/ruby-spacy.rb', line 307

def pipe(texts, disable: [], batch_size: 50)
  docs = []
  PyCall::List.call(@py_nlp.pipe(texts, disable: disable, batch_size: batch_size)).each do |py_doc|
    docs << Doc.new(@py_nlp, py_doc: py_doc)
  end
  docs
end

#pipe_namesArray<String>

A utility method to list pipeline components.

Returns:

  • (Array<String>)

    An array of text strings representing pipeline components



254
255
256
257
258
259
260
# File 'lib/ruby-spacy.rb', line 254

def pipe_names
  pipe_array = []
  PyCall::List.call(@py_nlp.pipe_names).each do |pipe|
    pipe_array << pipe
  end
  pipe_array
end

#read(text) ⇒ Object

Reads and analyze the given text.

Parameters:

  • text (String)

    a text to be read and analyzed



235
236
237
# File 'lib/ruby-spacy.rb', line 235

def read(text)
  Doc.new(py_nlp, text: text)
end

#respond_to_missing?(sym) ⇒ Boolean

Returns:

  • (Boolean)


320
321
322
# File 'lib/ruby-spacy.rb', line 320

def respond_to_missing?(sym)
  sym ? true : super
end

#vocab(text) ⇒ Lexeme

Returns a ruby lexeme object

Parameters:

  • text (String)

    a text string representing the vocabulary item

Returns:



272
273
274
# File 'lib/ruby-spacy.rb', line 272

def vocab(text)
  Lexeme.new(@py_nlp.vocab[text])
end

#vocab_string_lookup(id) ⇒ Object

A utility method to lookup a vocabulary item of the given id.

Parameters:

  • id (Integer)

    a vocabulary id

Returns:



248
249
250
# File 'lib/ruby-spacy.rb', line 248

def vocab_string_lookup(id)
  PyCall.eval("#{@spacy_nlp_id}.vocab.strings[#{id}]")
end