Class: Spacy::Language
- Inherits:
-
Object
- Object
- Spacy::Language
- Defined in:
- lib/ruby-spacy.rb
Overview
See also spaCy Python API document for [‘Language`](spacy.io/api/language).
Instance Attribute Summary collapse
-
#py_nlp ⇒ Object
readonly
A Python ‘Language` instance accessible via `PyCall`.
-
#spacy_nlp_id ⇒ String
readonly
An identifier string that can be used to refer to the Python ‘Language` object inside `PyCall::exec` or `PyCall::eval`.
Instance Method Summary collapse
-
#get_lexeme(text) ⇒ Object
A utility method to get a Python ‘Lexeme` object.
-
#initialize(model = "en_core_web_sm", max_retrial: MAX_RETRIAL, retrial: 0, timeout: 60) ⇒ Language
constructor
Creates a language model instance, which is conventionally referred to by a variable named ‘nlp`.
-
#matcher ⇒ Matcher
Generates a matcher for the current language model.
-
#method_missing(name, *args) ⇒ Object
Methods defined in Python but not wrapped in ruby-spacy can be called by this dynamic method handling mechanism.…
-
#most_similar(vector, num) ⇒ Array<Hash{:key => Integer, :text => String, :best_rows => Array<Float>, :score => Float}>
Returns n lexemes having the vector representations that are the most similar to a given vector representation of a word.
-
#pipe(texts, disable: [], batch_size: 50) ⇒ Array<Doc>
Utility function to batch process many texts.
-
#pipe_names ⇒ Array<String>
A utility method to list pipeline components.
-
#read(text) ⇒ Object
Reads and analyze the given text.
- #respond_to_missing?(sym) ⇒ Boolean
-
#vocab(text) ⇒ Lexeme
Returns a ruby lexeme object.
-
#vocab_string_lookup(id) ⇒ Object
A utility method to lookup a vocabulary item of the given id.
Constructor Details
#initialize(model = "en_core_web_sm", max_retrial: MAX_RETRIAL, retrial: 0, timeout: 60) ⇒ Language
Creates a language model instance, which is conventionally referred to by a variable named ‘nlp`.
369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 |
# File 'lib/ruby-spacy.rb', line 369 def initialize(model = "en_core_web_sm", max_retrial: MAX_RETRIAL, retrial: 0, timeout: 60) @spacy_nlp_id = "nlp_#{model.object_id}" begin Timeout.timeout(timeout) do PyCall.exec("import spacy; #{@spacy_nlp_id} = spacy.load('#{model}')") end @py_nlp = PyCall.eval(@spacy_nlp_id) rescue Timeout::Error raise "PyCall execution timed out after #{timeout} seconds" rescue StandardError => e retrial += 1 if retrial <= max_retrial sleep 0.5 retry else raise "Failed to initialize Spacy after #{max_retrial} attempts: #{e.message}" end end end |
Dynamic Method Handling
This class handles dynamic methods through the method_missing method
#method_missing(name, *args) ⇒ Object
Methods defined in Python but not wrapped in ruby-spacy can be called by this dynamic method handling mechanism.…
472 473 474 |
# File 'lib/ruby-spacy.rb', line 472 def method_missing(name, *args) @py_nlp.send(name, *args) end |
Instance Attribute Details
#py_nlp ⇒ Object (readonly)
Returns a Python ‘Language` instance accessible via `PyCall`.
365 366 367 |
# File 'lib/ruby-spacy.rb', line 365 def py_nlp @py_nlp end |
#spacy_nlp_id ⇒ String (readonly)
Returns an identifier string that can be used to refer to the Python ‘Language` object inside `PyCall::exec` or `PyCall::eval`.
362 363 364 |
# File 'lib/ruby-spacy.rb', line 362 def spacy_nlp_id @spacy_nlp_id end |
Instance Method Details
#get_lexeme(text) ⇒ Object
A utility method to get a Python ‘Lexeme` object.
421 422 423 |
# File 'lib/ruby-spacy.rb', line 421 def get_lexeme(text) @py_nlp.vocab[text] end |
#matcher ⇒ Matcher
Generates a matcher for the current language model.
397 398 399 |
# File 'lib/ruby-spacy.rb', line 397 def matcher Matcher.new(@py_nlp) end |
#most_similar(vector, num) ⇒ Array<Hash{:key => Integer, :text => String, :best_rows => Array<Float>, :score => Float}>
Returns n lexemes having the vector representations that are the most similar to a given vector representation of a word.
435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 |
# File 'lib/ruby-spacy.rb', line 435 def most_similar(vector, num) vec_array = Numpy.asarray([vector]) py_result = @py_nlp.vocab.vectors.most_similar(vec_array, n: num) key_texts = PyCall.eval("[[str(num), #{@spacy_nlp_id}.vocab[num].text] for num in #{py_result[0][0].tolist}]") keys = key_texts.map { |kt| kt[0] } texts = key_texts.map { |kt| kt[1] } best_rows = PyCall::List.call(py_result[1])[0] scores = PyCall::List.call(py_result[2])[0] results = [] num.times do |i| result = { key: keys[i].to_i, text: texts[i], best_row: best_rows[i], score: scores[i] } result.each_key do |key| result.define_singleton_method(key) { result[key] } end results << result end results end |
#pipe(texts, disable: [], batch_size: 50) ⇒ Array<Doc>
Utility function to batch process many texts
463 464 465 466 467 468 469 |
# File 'lib/ruby-spacy.rb', line 463 def pipe(texts, disable: [], batch_size: 50) docs = [] PyCall::List.call(@py_nlp.pipe(texts, disable: disable, batch_size: batch_size)).each do |py_doc| docs << Doc.new(@py_nlp, py_doc: py_doc) end docs end |
#pipe_names ⇒ Array<String>
A utility method to list pipeline components.
410 411 412 413 414 415 416 |
# File 'lib/ruby-spacy.rb', line 410 def pipe_names pipe_array = [] PyCall::List.call(@py_nlp.pipe_names).each do |pipe| pipe_array << pipe end pipe_array end |
#read(text) ⇒ Object
Reads and analyze the given text.
391 392 393 |
# File 'lib/ruby-spacy.rb', line 391 def read(text) Doc.new(py_nlp, text: text) end |
#respond_to_missing?(sym) ⇒ Boolean
476 477 478 |
# File 'lib/ruby-spacy.rb', line 476 def respond_to_missing?(sym) sym ? true : super end |
#vocab(text) ⇒ Lexeme
Returns a ruby lexeme object
428 429 430 |
# File 'lib/ruby-spacy.rb', line 428 def vocab(text) Lexeme.new(@py_nlp.vocab[text]) end |
#vocab_string_lookup(id) ⇒ Object
A utility method to lookup a vocabulary item of the given id.
404 405 406 |
# File 'lib/ruby-spacy.rb', line 404 def vocab_string_lookup(id) PyCall.eval("#{@spacy_nlp_id}.vocab.strings[#{id}]") end |