Class: Poliqarp::Client

Inherits:
Object
  • Object
show all
Defined in:
lib/poliqarpr/client.rb

Overview

Author

Aleksander Pohl ([email protected])

License

MIT License

This class is the implementation of the Poliqarp server client.

Constant Summary collapse

GROUPS =
[:left_context, :left_match, :right_match, :right_context]

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(session_name = "RUBY", debug = false) ⇒ Client

Creates new poliqarp server client.

Parameters:

  • session_name the name of the client session. Defaults to “RUBY”.

  • debug if set to true, all messages sent and received from server are printed to standard output. Defaults to false.



23
24
25
26
27
28
29
30
31
32
# File 'lib/poliqarpr/client.rb', line 23

def initialize(session_name="RUBY", debug=false)
  @session_name = session_name
  @left_context = 5
  @right_context = 5
  @debug = debug
  @buffer_size = 500000
  @connector = Connector.new(debug)
  @answer_queue = Queue.new
  new_session
end

Instance Attribute Details

#buffer_size=(value) ⇒ Object (writeonly)

The size of the buffer is the maximum number of excerpts which are returned for single query.



15
16
17
# File 'lib/poliqarpr/client.rb', line 15

def buffer_size=(value)
  @buffer_size = value
end

#debug=(value) ⇒ Object (writeonly)

If debug is turned on, the communication between server and client is logged to standard output.



11
12
13
# File 'lib/poliqarpr/client.rb', line 11

def debug=(value)
  @debug = value
end

Class Method Details

.const_missing(const) ⇒ Object

A hint about installation of default corpus gem



35
36
37
38
39
40
# File 'lib/poliqarpr/client.rb', line 35

def self.const_missing(const)
  if const.to_s =~ /DEFAULT_CORPUS/ 
    raise "You need to install 'apohllo-poliqarpr-corpus' to use the default corpus"
  end
  super
end

Instance Method Details

#closeObject

Closes the opened session.



58
59
60
61
# File 'lib/poliqarpr/client.rb', line 58

def close
  talk "CLOSE-SESSION" 
  @session = false
end

#close_corpusObject

Closes the opened corpus.



64
65
66
# File 'lib/poliqarpr/client.rb', line 64

def close_corpus
  talk "CLOSE"
end

#context(query, index) ⇒ Object

Returns the long context of the excerpt which is identified by given (query, index) pair.



254
255
256
257
258
259
260
261
262
263
264
265
266
267
# File 'lib/poliqarpr/client.rb', line 254

def context(query,index)
  make_query(query)
  result = []
  talk "GET-CONTEXT #{index}"
  # 1st part
  result << read_word 
  # 2nd part
  result << read_word 
  # 3rd part
  result << read_word 
  # 4th part
  result << read_word 
  result
end

#count(query) ⇒ Object

Returns the number of results for given query.



248
249
250
# File 'lib/poliqarpr/client.rb', line 248

def count(query)
  count_results(make_query(query)) 
end

#find(query, options = {}) ⇒ Object Also known as: query

Send the query to the opened corpus.

Options:

  • index the index of the (only one) result to be returned. The index is relative to the beginning of the query result. In normal case you should query the corpus without specifying the index, to see what results are returned. Then you can use the index and the same query to retrieve one result. The pair (query, index) is a kind of unique identifier of the excerpt.

  • page_size the size of the page of results. If the page size is 0, then all results are returned on one page. It is ignored if the index option is present. Defaults to 0.

  • page_index the index of the page of results (the first page has index 1, not 0). It is ignored if the index option is present. Defaults to 1.



237
238
239
240
241
242
243
# File 'lib/poliqarpr/client.rb', line 237

def find(query,options={})
  if options[:index]
    find_one(query, options[:index])
  else
    find_many(query, options)
  end
end

#left_context=(value) ⇒ Object

Sets the size of the left short context. It must be > 0

The size of the left short context is the number of segments displayed in the found excerpts left to the matched segment(s).



73
74
75
76
77
78
79
80
# File 'lib/poliqarpr/client.rb', line 73

def left_context=(value)
  if correct_context_value?(value) 
    result = talk("SET left-context-width #{value}")
    @left_context = value if result =~ /^R OK/
  else
    raise "Invalid argument: #{value}. It must be fixnum greater than 0."
  end
end

#lemmata=(options = {}) ⇒ Object

Sets the lemmatas’ flags. There are four groups of segments which the flags apply for:

  • left_context

  • left_match

  • right_match

  • right_context

If the flag for given group is set to true, all segments in the group are returned with the base form of the lemmata. E.g.:

c.find("kotu")
...
"kotu" base_form: "kot"

You can pass :all to turn on flags for all groups



134
135
136
137
138
139
140
141
142
# File 'lib/poliqarpr/client.rb', line 134

def lemmata=(options={})
  options = set_all_flags if options == :all
  @lemmata_flags = options
  flags = ""
  GROUPS.each do |flag|
    flags << (options[flag] ? "1" : "0")
    end
  talk("SET retrieve-lemmata #{flags}")
end

#metadata(query, index) ⇒ Object

Returns the metadata of the excerpt which is identified by given (query, index) pair.



271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
# File 'lib/poliqarpr/client.rb', line 271

def (query, index)
  make_query(query)
  result = {}
  answer = talk("METADATA #{index}")
  count = answer.split(" ")[1].to_i
  count.times do |index|
    type = read_word.gsub(/[^a-zA-Z]/,"").to_sym
    value = read_word[2..-1]
    unless value.nil?
      result[type] ||= []
      result[type] << value
    end
  end
  result
end

#metadata_typesObject

TODO



197
198
199
# File 'lib/poliqarpr/client.rb', line 197

def 
  raise "Not implemented"
end

#new_session(port = 4567) ⇒ Object

Creates new session for the client with the name given in constructor. If the session was already opened, it is closed.

Parameters:

  • port - the port on which the poliqarpd server is accepting connections (defaults to 4567)



47
48
49
50
51
52
53
54
55
# File 'lib/poliqarpr/client.rb', line 47

def new_session(port=4567)
  close if @session
  @connector.open("localhost",port)
  talk("MAKE-SESSION #{@session_name}")
  talk("BUFFER-RESIZE #{@buffer_size}")
  @session = true
  self.tags = {}
  self.lemmata = {}
end

#open_corpus(path, &handler) ⇒ Object

Asynchronous Opens the corpus given as path. To open the default corpus pass :default as the argument.

If you don’t want to wait until the call is finished, you have to provide handler for the asynchronous answer.



149
150
151
152
153
154
155
156
157
# File 'lib/poliqarpr/client.rb', line 149

def open_corpus(path, &handler)
  if path == :default
    open_corpus(DEFAULT_CORPUS, &handler)
  else
    real_handler = handler || lambda{|msg| @answer_queue.push msg }
    talk("OPEN #{path}", :async, &real_handler)
    do_wait if handler.nil?
  end
end

#pingObject

Server diagnostics – the result should be :pong



160
161
162
# File 'lib/poliqarpr/client.rb', line 160

def ping 
  :pong if talk("PING") =~ /PONG/
end

#right_context=(value) ⇒ Object

Sets the size of the right short context. It must be > 0

The size of the right short context is the number of segments displayed in the found excerpts right to the matched segment(s).



87
88
89
90
91
92
93
94
# File 'lib/poliqarpr/client.rb', line 87

def right_context=(value)
  if correct_context_value?(value)
    result = talk("SET right-context-width #{value}")
    @right_context = value if result =~ /^R OK/
  else
    raise "Invalid argument: #{value}. It must be fixnum greater than 0."
  end
end

#statsObject

Returns corpus statistics:

  • :segment_tokens the number of segments in the corpus (two segments which look exactly the same are counted separately)

  • :segment_types the number of segment types in the corpus (two segments which look exactly the same are counted as one type)

  • :lemmata the number of lemmata (lexemes) types (all forms of inflected word, e.g. ‘kot’, ‘kotu’, … are treated as one “word” – lemmata)

  • :tags the number of different grammar tags (each combination of atomic tags is treated as different “tag”)



179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
# File 'lib/poliqarpr/client.rb', line 179

def stats
  stats = {}
  talk("CORPUS-STATS").split.each_with_index do |value, index|
    case index
    when 1 
      stats[:segment_tokens] = value.to_i
    when 2
      stats[:segment_types] = value.to_i
    when 3
      stats[:lemmata] = value.to_i
    when 4
      stats[:tags] = value.to_i
    end
  end
  stats
end

#tags=(options = {}) ⇒ Object

Sets the tags’ flags. There are four groups of segments which the flags apply for:

  • left_context

  • left_match

  • right_match

  • right_context

If the flag for given group is set to true, all segments in the group are annotated with grammatical tags. E.g.:

c.find("kot")
...
"kot" tags: "subst:sg:nom:m2"

You can pass :all to turn on flags for all groups



110
111
112
113
114
115
116
117
118
# File 'lib/poliqarpr/client.rb', line 110

def tags=(options={})
  options = set_all_flags if options == :all
  @tag_flags = options
  flags = ""
  GROUPS.each do |flag|
    flags << (options[flag] ? "1" : "0")
    end
  talk("SET retrieve-tags #{flags}")
end

#tagsetObject

Returns the tag-set used in the corpus. It is divided into two groups:

  • :categories enlists tags belonging to grammatical categories (each category has a list of its tags, eg. gender: m1 m2 m3 f n, means that there are 5 genders: masculine(1,2,3), feminine and neuter)

  • :classes enlists grammatical tags used to describe it (each class has a list of tags used to describe it, eg. adj: degree gender case number, means that adjectives are described in terms of degree, gender, case and number)



210
211
212
213
214
215
216
217
218
219
220
221
222
# File 'lib/poliqarpr/client.rb', line 210

def tagset
  answer = talk("GET-TAGSET")
  counters = answer.split
  result = {}
  [:categories, :classes].each_with_index do |type, type_index|
    result[type] = {}
    counters[type_index+1].to_i.times do |index|
      values = read_word.split
      result[type][values[0].to_sym] = values[1..-1].map{|v| v.to_sym}
    end
  end
  result
end

#versionObject

Returns server version



165
166
167
# File 'lib/poliqarpr/client.rb', line 165

def version 
  talk("VERSION")
end