Class: Spacy::Language

Inherits:
Object
  • Object
show all
Defined in:
lib/ruby-spacy.rb

Overview

See also spaCy Python API document for [Language](spacy.io/api/language).

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(model = "en_core_web_sm", max_retrial: MAX_RETRIAL, timeout: 60) ⇒ Language

Creates a language model instance, which is conventionally referred to by a variable named nlp.



508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
# File 'lib/ruby-spacy.rb', line 508

def initialize(model = "en_core_web_sm", max_retrial: MAX_RETRIAL, timeout: 60)
  unless model.to_s.match?(/\A[a-zA-Z0-9_\-\.\/]+\z/)
    raise ArgumentError, "Invalid model name: #{model.inspect}"
  end

  @spacy_nlp_id = "nlp_#{model.object_id}"
  retrial = 0
  begin
    Timeout.timeout(timeout) do
      PyCall.exec("import spacy; #{@spacy_nlp_id} = spacy.load('#{model}')")
    end
    @py_nlp = PyCall.eval(@spacy_nlp_id)
  rescue Timeout::Error
    raise "PyCall execution timed out after #{timeout} seconds"
  rescue StandardError => e
    retrial += 1
    if retrial <= max_retrial
      sleep 0.5
      retry
    else
      raise "Failed to initialize Spacy after #{max_retrial} attempts: #{e.message}"
    end
  end
end

Dynamic Method Handling

This class handles dynamic methods through the method_missing method

#method_missing(name, *args) ⇒ Object

Methods defined in Python but not wrapped in ruby-spacy can be called by this dynamic method handling mechanism.



662
663
664
# File 'lib/ruby-spacy.rb', line 662

def method_missing(name, *args)
  @py_nlp.send(name, *args)
end

Instance Attribute Details

#py_nlpObject (readonly)



504
505
506
# File 'lib/ruby-spacy.rb', line 504

def py_nlp
  @py_nlp
end

#spacy_nlp_idString (readonly)



501
502
503
# File 'lib/ruby-spacy.rb', line 501

def spacy_nlp_id
  @spacy_nlp_id
end

Instance Method Details

#get_lexeme(text) ⇒ Object

A utility method to get a Python Lexeme object.



573
574
575
# File 'lib/ruby-spacy.rb', line 573

def get_lexeme(text)
  @py_nlp.vocab[text]
end

#instance_variables_to_inspectObject



670
671
672
# File 'lib/ruby-spacy.rb', line 670

def instance_variables_to_inspect
  [:@spacy_nlp_id]
end

#matcherMatcher

Generates a matcher for the current language model.



541
542
543
# File 'lib/ruby-spacy.rb', line 541

def matcher
  Matcher.new(@py_nlp)
end

#memory_zone { ... } ⇒ Object

Executes a block within spaCy’s memory zone for efficient memory management. Requires spaCy >= 3.8.

Yields:

  • the block to execute within the memory zone

Raises:

  • (NotImplementedError)

    if spaCy version does not support memory zones



652
653
654
655
656
657
658
659
# File 'lib/ruby-spacy.rb', line 652

def memory_zone(&block)
  major, minor = SpacyVersion.split(".").map(&:to_i)
  unless major > 3 || (major == 3 && minor >= 8)
    raise NotImplementedError, "memory_zone requires spaCy >= 3.8 (current: #{SpacyVersion})"
  end

  PyCall.with(@py_nlp.memory_zone, &block)
end

#most_similar(vector, num) ⇒ Array<Hash{:key => Integer, :text => String, :best_rows => Array<Float>, :score => Float}>

Returns n lexemes having the vector representations that are the most similar to a given vector representation of a word.



587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
# File 'lib/ruby-spacy.rb', line 587

def most_similar(vector, num)
  vec_array = Numpy.asarray([vector])
  py_result = @py_nlp.vocab.vectors.most_similar(vec_array, n: num)
  key_texts = PyCall.eval("[[str(num), #{@spacy_nlp_id}.vocab[num].text] for num in #{py_result[0][0].tolist}]")
  keys = key_texts.map { |kt| kt[0] }
  texts = key_texts.map { |kt| kt[1] }
  best_rows = PyCall::List.call(py_result[1])[0]
  scores = PyCall::List.call(py_result[2])[0]

  results = []
  num.times do |i|
    result = { key: keys[i].to_i,
               text: texts[i],
               best_row: best_rows[i],
               score: scores[i] }
    result.each_key do |key|
      result.define_singleton_method(key) { result[key] }
    end
    results << result
  end
  results
end

#phrase_matcher(attr: "ORTH") ⇒ PhraseMatcher

Generates a phrase matcher for the current language model. PhraseMatcher is more efficient than Matcher for matching large terminology lists.

Examples:

matcher = nlp.phrase_matcher(attr: "LOWER")
matcher.add("PRODUCT", ["iPhone", "MacBook Pro"])


553
554
555
# File 'lib/ruby-spacy.rb', line 553

def phrase_matcher(attr: "ORTH")
  PhraseMatcher.new(self, attr: attr)
end

#pipe(texts, disable: [], batch_size: 50) ⇒ Array<Doc>

Utility function to batch process many texts



615
616
617
618
619
# File 'lib/ruby-spacy.rb', line 615

def pipe(texts, disable: [], batch_size: 50)
  PyCall::List.call(@py_nlp.pipe(texts, disable: disable, batch_size: batch_size)).map do |py_doc|
    Doc.new(@py_nlp, py_doc: py_doc)
  end
end

#pipe_namesArray<String>

A utility method to list pipeline components.



566
567
568
# File 'lib/ruby-spacy.rb', line 566

def pipe_names
  PyCall::List.call(@py_nlp.pipe_names).to_a
end

#read(text) ⇒ Object

Reads and analyze the given text.



535
536
537
# File 'lib/ruby-spacy.rb', line 535

def read(text)
  Doc.new(py_nlp, text: text)
end

#respond_to_missing?(sym, include_private = false) ⇒ Boolean



666
667
668
# File 'lib/ruby-spacy.rb', line 666

def respond_to_missing?(sym, include_private = false)
  Spacy.py_hasattr?(@py_nlp, sym) || super
end

#vocab(text) ⇒ Lexeme

Returns a ruby lexeme object



580
581
582
# File 'lib/ruby-spacy.rb', line 580

def vocab(text)
  Lexeme.new(@py_nlp.vocab[text])
end

#vocab_string_lookup(id) ⇒ Object

A utility method to lookup a vocabulary item of the given id.



560
561
562
# File 'lib/ruby-spacy.rb', line 560

def vocab_string_lookup(id)
  PyCall.eval("#{@spacy_nlp_id}.vocab.strings[#{Integer(id)}]")
end

#with_openai(access_token: nil, model: "gpt-5-mini", max_completion_tokens: 1000, temperature: 0.7) {|OpenAIHelper| ... } ⇒ Object

Yields an OpenAIHelper instance for making OpenAI API calls within a block. The helper is configured once and reused for all calls within the block, making it efficient for batch processing with #pipe.

Examples:

Batch processing with pipe

nlp.with_openai(model: "gpt-5-mini") do |ai|
  nlp.pipe(texts).map do |doc|
    ai.chat(system: "Analyze.", user: doc.linguistic_summary)
  end
end

Yields:

  • (OpenAIHelper)

    the helper instance for making API calls



637
638
639
640
641
642
643
644
645
646
# File 'lib/ruby-spacy.rb', line 637

def with_openai(access_token: nil, model: "gpt-5-mini",
                max_completion_tokens: 1000, temperature: 0.7)
  helper = OpenAIHelper.new(
    access_token: access_token,
    model: model,
    max_completion_tokens: max_completion_tokens,
    temperature: temperature
  )
  yield helper
end