Class: Spacy::Language

Inherits:
Object
  • Object
show all
Defined in:
lib/ruby-spacy.rb

Overview

See also spaCy Python API document for [‘Language`](spacy.io/api/language).

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(model = "en_core_web_sm", max_retrial: MAX_RETRIAL, retrial: 0, timeout: 60) ⇒ Language

Creates a language model instance, which is conventionally referred to by a variable named ‘nlp`.

Parameters:

  • model (String) (defaults to: "en_core_web_sm")

    A language model installed in the system



369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
# File 'lib/ruby-spacy.rb', line 369

def initialize(model = "en_core_web_sm", max_retrial: MAX_RETRIAL, retrial: 0, timeout: 60)
  @spacy_nlp_id = "nlp_#{model.object_id}"
  begin
    Timeout.timeout(timeout) do
      PyCall.exec("import spacy; #{@spacy_nlp_id} = spacy.load('#{model}')")
    end
    @py_nlp = PyCall.eval(@spacy_nlp_id)
  rescue Timeout::Error
    raise "PyCall execution timed out after #{timeout} seconds"
  rescue StandardError => e
    retrial += 1
    if retrial <= max_retrial
      sleep 0.5
      retry
    else
      raise "Failed to initialize Spacy after #{max_retrial} attempts: #{e.message}"
    end
  end
end

Dynamic Method Handling

This class handles dynamic methods through the method_missing method

#method_missing(name, *args) ⇒ Object

Methods defined in Python but not wrapped in ruby-spacy can be called by this dynamic method handling mechanism.…



472
473
474
# File 'lib/ruby-spacy.rb', line 472

def method_missing(name, *args)
  @py_nlp.send(name, *args)
end

Instance Attribute Details

#py_nlpObject (readonly)

Returns a Python ‘Language` instance accessible via `PyCall`.

Returns:

  • (Object)

    a Python ‘Language` instance accessible via `PyCall`



365
366
367
# File 'lib/ruby-spacy.rb', line 365

def py_nlp
  @py_nlp
end

#spacy_nlp_idString (readonly)

Returns an identifier string that can be used to refer to the Python ‘Language` object inside `PyCall::exec` or `PyCall::eval`.

Returns:

  • (String)

    an identifier string that can be used to refer to the Python ‘Language` object inside `PyCall::exec` or `PyCall::eval`



362
363
364
# File 'lib/ruby-spacy.rb', line 362

def spacy_nlp_id
  @spacy_nlp_id
end

Instance Method Details

#get_lexeme(text) ⇒ Object

A utility method to get a Python ‘Lexeme` object.

Parameters:

  • text (String)

    A text string representing a lexeme

Returns:



421
422
423
# File 'lib/ruby-spacy.rb', line 421

def get_lexeme(text)
  @py_nlp.vocab[text]
end

#matcherMatcher

Generates a matcher for the current language model.

Returns:



397
398
399
# File 'lib/ruby-spacy.rb', line 397

def matcher
  Matcher.new(@py_nlp)
end

#most_similar(vector, num) ⇒ Array<Hash{:key => Integer, :text => String, :best_rows => Array<Float>, :score => Float}>

Returns n lexemes having the vector representations that are the most similar to a given vector representation of a word.

Parameters:

  • vector (Object)

    A vector representation of a word (whether existing or non-existing)

Returns:

  • (Array<Hash{:key => Integer, :text => String, :best_rows => Array<Float>, :score => Float}>)

    An array of hash objects each contains the ‘key`, `text`, `best_row` and similarity `score` of a lexeme



435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
# File 'lib/ruby-spacy.rb', line 435

def most_similar(vector, num)
  vec_array = Numpy.asarray([vector])
  py_result = @py_nlp.vocab.vectors.most_similar(vec_array, n: num)
  key_texts = PyCall.eval("[[str(num), #{@spacy_nlp_id}.vocab[num].text] for num in #{py_result[0][0].tolist}]")
  keys = key_texts.map { |kt| kt[0] }
  texts = key_texts.map { |kt| kt[1] }
  best_rows = PyCall::List.call(py_result[1])[0]
  scores = PyCall::List.call(py_result[2])[0]

  results = []
  num.times do |i|
    result = { key: keys[i].to_i,
               text: texts[i],
               best_row: best_rows[i],
               score: scores[i] }
    result.each_key do |key|
      result.define_singleton_method(key) { result[key] }
    end
    results << result
  end
  results
end

#pipe(texts, disable: [], batch_size: 50) ⇒ Array<Doc>

Utility function to batch process many texts

Parameters:

  • texts (String)
  • disable (Array<String>) (defaults to: [])
  • batch_size (Integer) (defaults to: 50)

Returns:



463
464
465
466
467
468
469
# File 'lib/ruby-spacy.rb', line 463

def pipe(texts, disable: [], batch_size: 50)
  docs = []
  PyCall::List.call(@py_nlp.pipe(texts, disable: disable, batch_size: batch_size)).each do |py_doc|
    docs << Doc.new(@py_nlp, py_doc: py_doc)
  end
  docs
end

#pipe_namesArray<String>

A utility method to list pipeline components.

Returns:

  • (Array<String>)

    An array of text strings representing pipeline components



410
411
412
413
414
415
416
# File 'lib/ruby-spacy.rb', line 410

def pipe_names
  pipe_array = []
  PyCall::List.call(@py_nlp.pipe_names).each do |pipe|
    pipe_array << pipe
  end
  pipe_array
end

#read(text) ⇒ Object

Reads and analyze the given text.

Parameters:

  • text (String)

    a text to be read and analyzed



391
392
393
# File 'lib/ruby-spacy.rb', line 391

def read(text)
  Doc.new(py_nlp, text: text)
end

#respond_to_missing?(sym) ⇒ Boolean

Returns:

  • (Boolean)


476
477
478
# File 'lib/ruby-spacy.rb', line 476

def respond_to_missing?(sym)
  sym ? true : super
end

#vocab(text) ⇒ Lexeme

Returns a ruby lexeme object

Parameters:

  • text (String)

    a text string representing the vocabulary item

Returns:



428
429
430
# File 'lib/ruby-spacy.rb', line 428

def vocab(text)
  Lexeme.new(@py_nlp.vocab[text])
end

#vocab_string_lookup(id) ⇒ Object

A utility method to lookup a vocabulary item of the given id.

Parameters:

  • id (Integer)

    a vocabulary id

Returns:



404
405
406
# File 'lib/ruby-spacy.rb', line 404

def vocab_string_lookup(id)
  PyCall.eval("#{@spacy_nlp_id}.vocab.strings[#{id}]")
end