Class: Poliqarp::Client
- Inherits:
-
Object
- Object
- Poliqarp::Client
- Defined in:
- lib/poliqarpr/client.rb
Overview
- Author
-
Aleksander Pohl ([email protected])
- License
-
MIT License
This class is the implementation of the Poliqarp server client.
Constant Summary collapse
- GROUPS =
[:left_context, :left_match, :right_match, :right_context]
Instance Attribute Summary collapse
-
#buffer_size ⇒ Object
writeonly
The size of the buffer is the maximum number of excerpts which are returned for single query.
-
#debug ⇒ Object
writeonly
If debug is turned on, the communication between server and client is logged to standard output.
Class Method Summary collapse
-
.const_missing(const) ⇒ Object
A hint about installation of default corpus gem.
Instance Method Summary collapse
-
#close ⇒ Object
Closes the opened session.
-
#close_corpus ⇒ Object
Closes the opened corpus.
-
#context(query, index) ⇒ Object
Returns the long context of the excerpt which is identified by given (query, index) pair.
-
#count(query) ⇒ Object
Returns the number of results for given query.
-
#find(query, options = {}) ⇒ Object
(also: #query)
Send the query to the opened corpus.
-
#initialize(session_name = "RUBY", debug = false) ⇒ Client
constructor
Creates new poliqarp server client.
-
#left_context=(value) ⇒ Object
Sets the size of the left short context.
-
#lemmata=(options = {}) ⇒ Object
Sets the lemmatas’ flags.
-
#metadata(query, index) ⇒ Object
Returns the metadata of the excerpt which is identified by given (query, index) pair.
-
#metadata_types ⇒ Object
TODO.
-
#new_session(port = 4567) ⇒ Object
Creates new session for the client with the name given in constructor.
-
#open_corpus(path, &handler) ⇒ Object
Asynchronous Opens the corpus given as
path
. -
#ping ⇒ Object
Server diagnostics – the result should be :pong.
-
#right_context=(value) ⇒ Object
Sets the size of the right short context.
-
#stats ⇒ Object
Returns corpus statistics: *
:segment_tokens
the number of segments in the corpus (two segments which look exactly the same are counted separately) *:segment_types
the number of segment types in the corpus (two segments which look exactly the same are counted as one type) *:lemmata
the number of lemmata (lexemes) types (all forms of inflected word, e.g. ‘kot’, ‘kotu’, … are treated as one “word” – lemmata) *:tags
the number of different grammar tags (each combination of atomic tags is treated as different “tag”). -
#tags=(options = {}) ⇒ Object
Sets the tags’ flags.
-
#tagset ⇒ Object
Returns the tag-set used in the corpus.
-
#version ⇒ Object
Returns server version.
Constructor Details
#initialize(session_name = "RUBY", debug = false) ⇒ Client
Creates new poliqarp server client.
Parameters:
-
session_name
the name of the client session. Defaults to “RUBY”. -
debug
if set to true, all messages sent and received from server are printed to standard output. Defaults to false.
23 24 25 26 27 28 29 30 31 32 |
# File 'lib/poliqarpr/client.rb', line 23 def initialize(session_name="RUBY", debug=false) @session_name = session_name @left_context = 5 @right_context = 5 @debug = debug @buffer_size = 500000 @connector = Connector.new(debug) @answer_queue = Queue.new new_session end |
Instance Attribute Details
#buffer_size=(value) ⇒ Object (writeonly)
The size of the buffer is the maximum number of excerpts which are returned for single query.
15 16 17 |
# File 'lib/poliqarpr/client.rb', line 15 def buffer_size=(value) @buffer_size = value end |
#debug=(value) ⇒ Object (writeonly)
If debug is turned on, the communication between server and client is logged to standard output.
11 12 13 |
# File 'lib/poliqarpr/client.rb', line 11 def debug=(value) @debug = value end |
Class Method Details
.const_missing(const) ⇒ Object
A hint about installation of default corpus gem
35 36 37 38 39 40 |
# File 'lib/poliqarpr/client.rb', line 35 def self.const_missing(const) if const.to_s =~ /DEFAULT_CORPUS/ raise "You need to install 'apohllo-poliqarpr-corpus' to use the default corpus" end super end |
Instance Method Details
#close ⇒ Object
Closes the opened session.
58 59 60 61 |
# File 'lib/poliqarpr/client.rb', line 58 def close talk "CLOSE-SESSION" @session = false end |
#close_corpus ⇒ Object
Closes the opened corpus.
64 65 66 |
# File 'lib/poliqarpr/client.rb', line 64 def close_corpus talk "CLOSE" end |
#context(query, index) ⇒ Object
Returns the long context of the excerpt which is identified by given (query, index) pair.
254 255 256 257 258 259 260 261 262 263 264 265 266 267 |
# File 'lib/poliqarpr/client.rb', line 254 def context(query,index) make_query(query) result = [] talk "GET-CONTEXT #{index}" # 1st part result << read_word # 2nd part result << read_word # 3rd part result << read_word # 4th part result << read_word result end |
#count(query) ⇒ Object
Returns the number of results for given query.
248 249 250 |
# File 'lib/poliqarpr/client.rb', line 248 def count(query) count_results(make_query(query)) end |
#find(query, options = {}) ⇒ Object Also known as: query
Send the query to the opened corpus.
Options:
-
index
the index of the (only one) result to be returned. The index is relative to the beginning of the query result. In normal case you should query the corpus without specifying the index, to see what results are returned. Then you can use the index and the same query to retrieve one result. The pair (query, index) is a kind of unique identifier of the excerpt. -
page_size
the size of the page of results. If the page size is 0, then all results are returned on one page. It is ignored if theindex
option is present. Defaults to 0. -
page_index
the index of the page of results (the first page has index 1, not 0). It is ignored if theindex
option is present. Defaults to 1.
237 238 239 240 241 242 243 |
# File 'lib/poliqarpr/client.rb', line 237 def find(query,={}) if [:index] find_one(query, [:index]) else find_many(query, ) end end |
#left_context=(value) ⇒ Object
Sets the size of the left short context. It must be > 0
The size of the left short context is the number of segments displayed in the found excerpts left to the matched segment(s).
73 74 75 76 77 78 79 80 |
# File 'lib/poliqarpr/client.rb', line 73 def left_context=(value) if correct_context_value?(value) result = talk("SET left-context-width #{value}") @left_context = value if result =~ /^R OK/ else raise "Invalid argument: #{value}. It must be fixnum greater than 0." end end |
#lemmata=(options = {}) ⇒ Object
Sets the lemmatas’ flags. There are four groups of segments which the flags apply for:
-
left_context
-
left_match
-
right_match
-
right_context
If the flag for given group is set to true, all segments in the group are returned with the base form of the lemmata. E.g.:
c.find("kotu")
...
"kotu" base_form: "kot"
You can pass :all to turn on flags for all groups
134 135 136 137 138 139 140 141 142 |
# File 'lib/poliqarpr/client.rb', line 134 def lemmata=(={}) = set_all_flags if == :all @lemmata_flags = flags = "" GROUPS.each do |flag| flags << ([flag] ? "1" : "0") end talk("SET retrieve-lemmata #{flags}") end |
#metadata(query, index) ⇒ Object
Returns the metadata of the excerpt which is identified by given (query, index) pair.
271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 |
# File 'lib/poliqarpr/client.rb', line 271 def (query, index) make_query(query) result = {} answer = talk("METADATA #{index}") count = answer.split(" ")[1].to_i count.times do |index| type = read_word.gsub(/[^a-zA-Z]/,"").to_sym value = read_word[2..-1] unless value.nil? result[type] ||= [] result[type] << value end end result end |
#metadata_types ⇒ Object
TODO
197 198 199 |
# File 'lib/poliqarpr/client.rb', line 197 def raise "Not implemented" end |
#new_session(port = 4567) ⇒ Object
Creates new session for the client with the name given in constructor. If the session was already opened, it is closed.
Parameters:
-
port
- the port on which the poliqarpd server is accepting connections (defaults to 4567)
47 48 49 50 51 52 53 54 55 |
# File 'lib/poliqarpr/client.rb', line 47 def new_session(port=4567) close if @session @connector.open("localhost",port) talk("MAKE-SESSION #{@session_name}") talk("BUFFER-RESIZE #{@buffer_size}") @session = true self. = {} self.lemmata = {} end |
#open_corpus(path, &handler) ⇒ Object
Asynchronous Opens the corpus given as path
. To open the default corpus pass :default
as the argument.
If you don’t want to wait until the call is finished, you have to provide handler
for the asynchronous answer.
149 150 151 152 153 154 155 156 157 |
# File 'lib/poliqarpr/client.rb', line 149 def open_corpus(path, &handler) if path == :default open_corpus(DEFAULT_CORPUS, &handler) else real_handler = handler || lambda{|msg| @answer_queue.push msg } talk("OPEN #{path}", :async, &real_handler) do_wait if handler.nil? end end |
#ping ⇒ Object
Server diagnostics – the result should be :pong
160 161 162 |
# File 'lib/poliqarpr/client.rb', line 160 def ping :pong if talk("PING") =~ /PONG/ end |
#right_context=(value) ⇒ Object
Sets the size of the right short context. It must be > 0
The size of the right short context is the number of segments displayed in the found excerpts right to the matched segment(s).
87 88 89 90 91 92 93 94 |
# File 'lib/poliqarpr/client.rb', line 87 def right_context=(value) if correct_context_value?(value) result = talk("SET right-context-width #{value}") @right_context = value if result =~ /^R OK/ else raise "Invalid argument: #{value}. It must be fixnum greater than 0." end end |
#stats ⇒ Object
Returns corpus statistics:
-
:segment_tokens
the number of segments in the corpus (two segments which look exactly the same are counted separately) -
:segment_types
the number of segment types in the corpus (two segments which look exactly the same are counted as one type) -
:lemmata
the number of lemmata (lexemes) types (all forms of inflected word, e.g. ‘kot’, ‘kotu’, … are treated as one “word” – lemmata) -
:tags
the number of different grammar tags (each combination of atomic tags is treated as different “tag”)
179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 |
# File 'lib/poliqarpr/client.rb', line 179 def stats stats = {} talk("CORPUS-STATS").split.each_with_index do |value, index| case index when 1 stats[:segment_tokens] = value.to_i when 2 stats[:segment_types] = value.to_i when 3 stats[:lemmata] = value.to_i when 4 stats[:tags] = value.to_i end end stats end |
#tags=(options = {}) ⇒ Object
Sets the tags’ flags. There are four groups of segments which the flags apply for:
-
left_context
-
left_match
-
right_match
-
right_context
If the flag for given group is set to true, all segments in the group are annotated with grammatical tags. E.g.:
c.find("kot")
...
"kot" tags: "subst:sg:nom:m2"
You can pass :all to turn on flags for all groups
110 111 112 113 114 115 116 117 118 |
# File 'lib/poliqarpr/client.rb', line 110 def (={}) = set_all_flags if == :all @tag_flags = flags = "" GROUPS.each do |flag| flags << ([flag] ? "1" : "0") end talk("SET retrieve-tags #{flags}") end |
#tagset ⇒ Object
Returns the tag-set used in the corpus. It is divided into two groups:
-
:categories
enlists tags belonging to grammatical categories (each category has a list of its tags, eg. gender: m1 m2 m3 f n, means that there are 5 genders: masculine(1,2,3), feminine and neuter) -
:classes
enlists grammatical tags used to describe it (each class has a list of tags used to describe it, eg. adj: degree gender case number, means that adjectives are described in terms of degree, gender, case and number)
210 211 212 213 214 215 216 217 218 219 220 221 222 |
# File 'lib/poliqarpr/client.rb', line 210 def answer = talk("GET-TAGSET") counters = answer.split result = {} [:categories, :classes].each_with_index do |type, type_index| result[type] = {} counters[type_index+1].to_i.times do |index| values = read_word.split result[type][values[0].to_sym] = values[1..-1].map{|v| v.to_sym} end end result end |
#version ⇒ Object
Returns server version
165 166 167 |
# File 'lib/poliqarpr/client.rb', line 165 def version talk("VERSION") end |