Class: Spacy::Language
- Inherits:
-
Object
- Object
- Spacy::Language
- Defined in:
- lib/ruby-spacy.rb
Overview
See also spaCy Python API document for Language
.
Instance Attribute Summary collapse
-
#py_nlp ⇒ Object
readonly
A Python
Language
instance accessible viaPyCall
. -
#spacy_nlp_id ⇒ String
readonly
An identifier string that can be used to refer to the Python
Language
object insidePyCall::exec
orPyCall::eval
.
Instance Method Summary collapse
-
#get_lexeme(text) ⇒ Object
A utility method to get a Python
Lexeme
object. -
#initialize(model = "en_core_web_sm", max_retrial: MAX_RETRIAL, retrial: 0) ⇒ Language
constructor
Creates a language model instance, which is conventionally referred to by a variable named
nlp
. -
#matcher ⇒ Matcher
Generates a matcher for the current language model.
-
#method_missing(name, *args) ⇒ Object
Methods defined in Python but not wrapped in ruby-spacy can be called by this dynamic method handling mechanism....
-
#most_similar(vector, num) ⇒ Array<Hash{:key => Integer, :text => String, :best_rows => Array<Float>, :score => Float}>
Returns n lexemes having the vector representations that are the most similar to a given vector representation of a word.
-
#pipe(texts, disable: [], batch_size: 50) ⇒ Array<Doc>
Utility function to batch process many texts.
-
#pipe_names ⇒ Array<String>
A utility method to list pipeline components.
-
#read(text) ⇒ Object
Reads and analyze the given text.
- #respond_to_missing?(sym) ⇒ Boolean
-
#vocab(text) ⇒ Lexeme
Returns a ruby lexeme object.
-
#vocab_string_lookup(id) ⇒ Object
A utility method to lookup a vocabulary item of the given id.
Constructor Details
#initialize(model = "en_core_web_sm", max_retrial: MAX_RETRIAL, retrial: 0) ⇒ Language
Creates a language model instance, which is conventionally referred to by a variable named nlp
.
221 222 223 224 225 226 227 228 229 230 231 |
# File 'lib/ruby-spacy.rb', line 221 def initialize(model = "en_core_web_sm", max_retrial: MAX_RETRIAL, retrial: 0) @spacy_nlp_id = "nlp_#{model.object_id}" PyCall.exec("import spacy; #{@spacy_nlp_id} = spacy.load('#{model}')") @py_nlp = PyCall.eval(@spacy_nlp_id) rescue StandardError retrial += 1 raise "Error: Pycall failed to load Spacy" unless retrial <= max_retrial sleep 0.5 initialize(model, max_retrial: max_retrial, retrial: retrial) end |
Dynamic Method Handling
This class handles dynamic methods through the method_missing method
#method_missing(name, *args) ⇒ Object
Methods defined in Python but not wrapped in ruby-spacy can be called by this dynamic method handling mechanism....
316 317 318 |
# File 'lib/ruby-spacy.rb', line 316 def method_missing(name, *args) @py_nlp.send(name, *args) end |
Instance Attribute Details
#py_nlp ⇒ Object (readonly)
Returns a Python Language
instance accessible via PyCall
.
217 218 219 |
# File 'lib/ruby-spacy.rb', line 217 def py_nlp @py_nlp end |
#spacy_nlp_id ⇒ String (readonly)
Returns an identifier string that can be used to refer to the Python Language
object inside PyCall::exec
or PyCall::eval
.
214 215 216 |
# File 'lib/ruby-spacy.rb', line 214 def spacy_nlp_id @spacy_nlp_id end |
Instance Method Details
#get_lexeme(text) ⇒ Object
A utility method to get a Python Lexeme
object.
265 266 267 |
# File 'lib/ruby-spacy.rb', line 265 def get_lexeme(text) @py_nlp.vocab[text] end |
#matcher ⇒ Matcher
Generates a matcher for the current language model.
241 242 243 |
# File 'lib/ruby-spacy.rb', line 241 def matcher Matcher.new(@py_nlp) end |
#most_similar(vector, num) ⇒ Array<Hash{:key => Integer, :text => String, :best_rows => Array<Float>, :score => Float}>
Returns n lexemes having the vector representations that are the most similar to a given vector representation of a word.
279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 |
# File 'lib/ruby-spacy.rb', line 279 def most_similar(vector, num) vec_array = Numpy.asarray([vector]) py_result = @py_nlp.vocab.vectors.most_similar(vec_array, n: num) key_texts = PyCall.eval("[[str(num), #{@spacy_nlp_id}.vocab[num].text] for num in #{py_result[0][0].tolist}]") keys = key_texts.map { |kt| kt[0] } texts = key_texts.map { |kt| kt[1] } best_rows = PyCall::List.call(py_result[1])[0] scores = PyCall::List.call(py_result[2])[0] results = [] num.times do |i| result = { key: keys[i].to_i, text: texts[i], best_row: best_rows[i], score: scores[i] } result.each_key do |key| result.define_singleton_method(key) { result[key] } end results << result end results end |
#pipe(texts, disable: [], batch_size: 50) ⇒ Array<Doc>
Utility function to batch process many texts
307 308 309 310 311 312 313 |
# File 'lib/ruby-spacy.rb', line 307 def pipe(texts, disable: [], batch_size: 50) docs = [] PyCall::List.call(@py_nlp.pipe(texts, disable: disable, batch_size: batch_size)).each do |py_doc| docs << Doc.new(@py_nlp, py_doc: py_doc) end docs end |
#pipe_names ⇒ Array<String>
A utility method to list pipeline components.
254 255 256 257 258 259 260 |
# File 'lib/ruby-spacy.rb', line 254 def pipe_names pipe_array = [] PyCall::List.call(@py_nlp.pipe_names).each do |pipe| pipe_array << pipe end pipe_array end |
#read(text) ⇒ Object
Reads and analyze the given text.
235 236 237 |
# File 'lib/ruby-spacy.rb', line 235 def read(text) Doc.new(py_nlp, text: text) end |
#respond_to_missing?(sym) ⇒ Boolean
320 321 322 |
# File 'lib/ruby-spacy.rb', line 320 def respond_to_missing?(sym) sym ? true : super end |
#vocab(text) ⇒ Lexeme
Returns a ruby lexeme object
272 273 274 |
# File 'lib/ruby-spacy.rb', line 272 def vocab(text) Lexeme.new(@py_nlp.vocab[text]) end |
#vocab_string_lookup(id) ⇒ Object
A utility method to lookup a vocabulary item of the given id.
248 249 250 |
# File 'lib/ruby-spacy.rb', line 248 def vocab_string_lookup(id) PyCall.eval("#{@spacy_nlp_id}.vocab.strings[#{id}]") end |