Class: Spacy::Language
- Inherits:
-
Object
- Object
- Spacy::Language
- Defined in:
- lib/ruby-spacy.rb
Overview
See also spaCy Python API document for [Language](spacy.io/api/language).
Instance Attribute Summary collapse
-
#py_nlp ⇒ Object
readonly
A Python
Languageinstance accessible viaPyCall. -
#spacy_nlp_id ⇒ String
readonly
An identifier string that can be used to refer to the Python
Languageobject insidePyCall::execorPyCall::eval.
Instance Method Summary collapse
-
#get_lexeme(text) ⇒ Object
A utility method to get a Python
Lexemeobject. -
#initialize(model = "en_core_web_sm", max_retrial: MAX_RETRIAL, timeout: 60) ⇒ Language
constructor
Creates a language model instance, which is conventionally referred to by a variable named
nlp. - #instance_variables_to_inspect ⇒ Object
-
#matcher ⇒ Matcher
Generates a matcher for the current language model.
-
#memory_zone { ... } ⇒ Object
Executes a block within spaCy’s memory zone for efficient memory management.
-
#method_missing(name, *args) ⇒ Object
Methods defined in Python but not wrapped in ruby-spacy can be called by this dynamic method handling mechanism.
-
#most_similar(vector, num) ⇒ Array<Hash{:key => Integer, :text => String, :best_rows => Array<Float>, :score => Float}>
Returns n lexemes having the vector representations that are the most similar to a given vector representation of a word.
-
#phrase_matcher(attr: "ORTH") ⇒ PhraseMatcher
Generates a phrase matcher for the current language model.
-
#pipe(texts, disable: [], batch_size: 50) ⇒ Array<Doc>
Utility function to batch process many texts.
-
#pipe_names ⇒ Array<String>
A utility method to list pipeline components.
-
#read(text) ⇒ Object
Reads and analyze the given text.
- #respond_to_missing?(sym, include_private = false) ⇒ Boolean
-
#vocab(text) ⇒ Lexeme
Returns a ruby lexeme object.
-
#vocab_string_lookup(id) ⇒ Object
A utility method to lookup a vocabulary item of the given id.
-
#with_openai(access_token: nil, model: "gpt-5-mini", max_completion_tokens: 1000, temperature: 0.7) {|OpenAIHelper| ... } ⇒ Object
Yields an OpenAIHelper instance for making OpenAI API calls within a block.
Constructor Details
#initialize(model = "en_core_web_sm", max_retrial: MAX_RETRIAL, timeout: 60) ⇒ Language
Creates a language model instance, which is conventionally referred to by a variable named nlp.
508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 |
# File 'lib/ruby-spacy.rb', line 508 def initialize(model = "en_core_web_sm", max_retrial: MAX_RETRIAL, timeout: 60) unless model.to_s.match?(/\A[a-zA-Z0-9_\-\.\/]+\z/) raise ArgumentError, "Invalid model name: #{model.inspect}" end @spacy_nlp_id = "nlp_#{model.object_id}" retrial = 0 begin Timeout.timeout(timeout) do PyCall.exec("import spacy; #{@spacy_nlp_id} = spacy.load('#{model}')") end @py_nlp = PyCall.eval(@spacy_nlp_id) rescue Timeout::Error raise "PyCall execution timed out after #{timeout} seconds" rescue StandardError => e retrial += 1 if retrial <= max_retrial sleep 0.5 retry else raise "Failed to initialize Spacy after #{max_retrial} attempts: #{e.message}" end end end |
Dynamic Method Handling
This class handles dynamic methods through the method_missing method
#method_missing(name, *args) ⇒ Object
Methods defined in Python but not wrapped in ruby-spacy can be called by this dynamic method handling mechanism.
662 663 664 |
# File 'lib/ruby-spacy.rb', line 662 def method_missing(name, *args) @py_nlp.send(name, *args) end |
Instance Attribute Details
#py_nlp ⇒ Object (readonly)
504 505 506 |
# File 'lib/ruby-spacy.rb', line 504 def py_nlp @py_nlp end |
#spacy_nlp_id ⇒ String (readonly)
501 502 503 |
# File 'lib/ruby-spacy.rb', line 501 def spacy_nlp_id @spacy_nlp_id end |
Instance Method Details
#get_lexeme(text) ⇒ Object
A utility method to get a Python Lexeme object.
573 574 575 |
# File 'lib/ruby-spacy.rb', line 573 def get_lexeme(text) @py_nlp.vocab[text] end |
#instance_variables_to_inspect ⇒ Object
670 671 672 |
# File 'lib/ruby-spacy.rb', line 670 def instance_variables_to_inspect [:@spacy_nlp_id] end |
#matcher ⇒ Matcher
Generates a matcher for the current language model.
541 542 543 |
# File 'lib/ruby-spacy.rb', line 541 def matcher Matcher.new(@py_nlp) end |
#memory_zone { ... } ⇒ Object
Executes a block within spaCy’s memory zone for efficient memory management. Requires spaCy >= 3.8.
652 653 654 655 656 657 658 659 |
# File 'lib/ruby-spacy.rb', line 652 def memory_zone(&block) major, minor = SpacyVersion.split(".").map(&:to_i) unless major > 3 || (major == 3 && minor >= 8) raise NotImplementedError, "memory_zone requires spaCy >= 3.8 (current: #{SpacyVersion})" end PyCall.with(@py_nlp.memory_zone, &block) end |
#most_similar(vector, num) ⇒ Array<Hash{:key => Integer, :text => String, :best_rows => Array<Float>, :score => Float}>
Returns n lexemes having the vector representations that are the most similar to a given vector representation of a word.
587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 |
# File 'lib/ruby-spacy.rb', line 587 def most_similar(vector, num) vec_array = Numpy.asarray([vector]) py_result = @py_nlp.vocab.vectors.most_similar(vec_array, n: num) key_texts = PyCall.eval("[[str(num), #{@spacy_nlp_id}.vocab[num].text] for num in #{py_result[0][0].tolist}]") keys = key_texts.map { |kt| kt[0] } texts = key_texts.map { |kt| kt[1] } best_rows = PyCall::List.call(py_result[1])[0] scores = PyCall::List.call(py_result[2])[0] results = [] num.times do |i| result = { key: keys[i].to_i, text: texts[i], best_row: best_rows[i], score: scores[i] } result.each_key do |key| result.define_singleton_method(key) { result[key] } end results << result end results end |
#phrase_matcher(attr: "ORTH") ⇒ PhraseMatcher
Generates a phrase matcher for the current language model. PhraseMatcher is more efficient than Matcher for matching large terminology lists.
553 554 555 |
# File 'lib/ruby-spacy.rb', line 553 def phrase_matcher(attr: "ORTH") PhraseMatcher.new(self, attr: attr) end |
#pipe(texts, disable: [], batch_size: 50) ⇒ Array<Doc>
Utility function to batch process many texts
615 616 617 618 619 |
# File 'lib/ruby-spacy.rb', line 615 def pipe(texts, disable: [], batch_size: 50) PyCall::List.call(@py_nlp.pipe(texts, disable: disable, batch_size: batch_size)).map do |py_doc| Doc.new(@py_nlp, py_doc: py_doc) end end |
#pipe_names ⇒ Array<String>
A utility method to list pipeline components.
566 567 568 |
# File 'lib/ruby-spacy.rb', line 566 def pipe_names PyCall::List.call(@py_nlp.pipe_names).to_a end |
#read(text) ⇒ Object
Reads and analyze the given text.
535 536 537 |
# File 'lib/ruby-spacy.rb', line 535 def read(text) Doc.new(py_nlp, text: text) end |
#respond_to_missing?(sym, include_private = false) ⇒ Boolean
666 667 668 |
# File 'lib/ruby-spacy.rb', line 666 def respond_to_missing?(sym, include_private = false) Spacy.py_hasattr?(@py_nlp, sym) || super end |
#vocab(text) ⇒ Lexeme
Returns a ruby lexeme object
580 581 582 |
# File 'lib/ruby-spacy.rb', line 580 def vocab(text) Lexeme.new(@py_nlp.vocab[text]) end |
#vocab_string_lookup(id) ⇒ Object
A utility method to lookup a vocabulary item of the given id.
560 561 562 |
# File 'lib/ruby-spacy.rb', line 560 def vocab_string_lookup(id) PyCall.eval("#{@spacy_nlp_id}.vocab.strings[#{Integer(id)}]") end |
#with_openai(access_token: nil, model: "gpt-5-mini", max_completion_tokens: 1000, temperature: 0.7) {|OpenAIHelper| ... } ⇒ Object
Yields an OpenAIHelper instance for making OpenAI API calls within a block. The helper is configured once and reused for all calls within the block, making it efficient for batch processing with #pipe.
637 638 639 640 641 642 643 644 645 646 |
# File 'lib/ruby-spacy.rb', line 637 def with_openai(access_token: nil, model: "gpt-5-mini", max_completion_tokens: 1000, temperature: 0.7) helper = OpenAIHelper.new( access_token: access_token, model: model, max_completion_tokens: max_completion_tokens, temperature: temperature ) yield helper end |