Class: Bio::NCBI::REST
Direct Known Subclasses
Defined Under Namespace
Constant Summary collapse
- NCBI_INTERVAL =
Make no more than one request every 1 seconds. (NCBI’s restriction is “Make no more than 3 requests every 1 second.”, but limited to 1/sec partly because of keeping the value in integer.)
1
- @@last_access =
nil
Class Method Summary collapse
Instance Method Summary collapse
-
#efetch(ids, hash = {}, step = 100) ⇒ Object
Retrieve database entries by given IDs and using E-Utils (efetch) service.
-
#einfo ⇒ Object
List the NCBI database names E-Utils (einfo) service.
-
#esearch(str, hash = {}, limit = nil, step = 10000) ⇒ Object
Search the NCBI database by given keywords using E-Utils (esearch) service and returns an array of entry IDs.
-
#esearch_count(str, hash = {}) ⇒ Object
- Arguments
- same as esearch method Returns
-
array of entry IDs or a number of results.
Class Method Details
.efetch(*args) ⇒ Object
252 253 254 |
# File 'lib/bio/io/ncbirest.rb', line 252 def self.efetch(*args) self.new.efetch(*args) end |
.einfo ⇒ Object
240 241 242 |
# File 'lib/bio/io/ncbirest.rb', line 240 def self.einfo self.new.einfo end |
.esearch(*args) ⇒ Object
244 245 246 |
# File 'lib/bio/io/ncbirest.rb', line 244 def self.esearch(*args) self.new.esearch(*args) end |
.esearch_count(*args) ⇒ Object
248 249 250 |
# File 'lib/bio/io/ncbirest.rb', line 248 def self.esearch_count(*args) self.new.esearch_count(*args) end |
Instance Method Details
#efetch(ids, hash = {}, step = 100) ⇒ Object
Retrieve database entries by given IDs and using E-Utils (efetch) service.
For information on the possible arguments, see
Usage
ncbi = Bio::NCBI::REST.new
ncbi.efetch("185041", {"db"=>"nucleotide", "rettype"=>"gb", "retmode" => "xml"})
ncbi.efetch("J00231", {"db"=>"nuccore", "rettype"=>"gb", "retmode"=>"xml"})
ncbi.efetch("AAA52805", {"db"=>"protein", "rettype"=>"gb"})
Bio::NCBI::REST.efetch("185041", {"db"=>"nucleotide", "rettype"=>"gb", "retmode" => "xml"})
Bio::NCBI::REST.efetch("J00231", {"db"=>"nuccore", "rettype"=>"gb"})
Bio::NCBI::REST.efetch("AAA52805", {"db"=>"protein", "rettype"=>"gb"})
Arguments:
-
ids: list of NCBI entry IDs (required)
-
hash: hash of E-Utils option => “nuccore”, “rettype” => “gb”
-
db: “sequences”, “nucleotide”, “protein”, “pubmed”, “omim”, …
-
retmode: “text”, “xml”, “html”, …
-
rettype: “gb”, “gbc”, “medline”, “count”,…
-
-
step: maximum number of entries retrieved at a time
- Returns
-
String
212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 |
# File 'lib/bio/io/ncbirest.rb', line 212 def efetch(ids, hash = {}, step = 100) serv = "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi" opts = { "tool" => "bioruby", "retmode" => "text", } opts.update(hash) case ids when Array list = ids else list = ids.to_s.split(/\s*,\s*/) end result = "" 0.step(list.size, step) do |i| opts["id"] = list[i, step].join(',') unless opts["id"].empty? ncbi_access_wait response = Bio::Command.post_form(serv, opts) result += response.body end end return result.strip #return result.strip.split(/\n\n+/) end |
#einfo ⇒ Object
List the NCBI database names E-Utils (einfo) service
pubmed protein nucleotide nuccore nucgss nucest structure genome
books cancerchromosomes cdd gap domains gene genomeprj gensat geo
gds homologene journals mesh ncbisearch nlmcatalog omia omim pmc
popset probe proteinclusters pcassay pccompound pcsubstance snp
taxonomy toolkit unigene unists
Usage
ncbi = Bio::NCBI::REST.new
ncbi.einfo
Bio::NCBI::REST.einfo
- Returns
-
array of string (database names)
68 69 70 71 72 73 74 75 |
# File 'lib/bio/io/ncbirest.rb', line 68 def einfo serv = "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.fcgi" opts = {} response = Bio::Command.post_form(serv, opts) result = response.body list = result.scan(/<DbName>(.*?)<\/DbName>/m).flatten return list end |
#esearch(str, hash = {}, limit = nil, step = 10000) ⇒ Object
Search the NCBI database by given keywords using E-Utils (esearch) service and returns an array of entry IDs.
For information on the possible arguments, see
-
eutils.ncbi.nlm.nih.gov/entrez/query/static/esearch_help.html
-
www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=helppubmed.section.pubmedhelp.Search_Field_Descrip
Usage
ncbi = Bio::NCBI::REST.new
ncbi.esearch("tardigrada", {"db"=>"nucleotide", "rettype"=>"count"})
ncbi.esearch("tardigrada", {"db"=>"nucleotide", "rettype"=>"gb"})
ncbi.esearch("yeast kinase", {"db"=>"nuccore", "rettype"=>"gb", "retmax"=>5})
Bio::NCBI::REST.esearch("tardigrada", {"db"=>"nucleotide", "rettype"=>"count"})
Bio::NCBI::REST.esearch("tardigrada", {"db"=>"nucleotide", "rettype"=>"gb"})
Bio::NCBI::REST.esearch("yeast kinase", {"db"=>"nuccore", "rettype"=>"gb", "retmax"=>5})
Arguments:
-
str: query string (required)
-
hash: hash of E-Utils option => “nuccore”, “rettype” => “gb”
-
db: “sequences”, “nucleotide”, “protein”, “pubmed”, “taxonomy”, …
-
retmode: “text”, “xml”, “html”, …
-
rettype: “gb”, “medline”, “count”, …
-
retmax: integer (default 100)
-
retstart: integer
-
field:
-
“titl”: Title [TI]
-
“tiab”: Title/Abstract [TIAB]
-
“word”: Text words [TW]
-
“auth”: Author [AU]
-
“affl”: Affiliation [AD]
-
“jour”: Journal [TA]
-
“vol”: Volume [VI]
-
“iss”: Issue [IP]
-
“page”: First page [PG]
-
“pdat”: Publication date [DP]
-
“ptyp”: Publication type [PT]
-
“lang”: Language [LA]
-
“mesh”: MeSH term [MH]
-
“majr”: MeSH major topic [MAJR]
-
“subh”: Mesh sub headings [SH]
-
“mhda”: MeSH date [MHDA]
-
“ecno”: EC/RN Number [rn]
-
“si”: Secondary source ID [SI]
-
“uid”: PubMed ID (PMID) [UI]
-
“fltr”: Filter [FILTER] [SB]
-
“subs”: Subset [SB]
-
-
reldate: 365
-
mindate: 2001
-
maxdate: 2002/01/01
-
datetype: “edat”
-
-
limit: maximum number of entries to be returned (0 for unlimited; nil for the “retmax” value in the hash or the internal default value (=100))
-
step: maximum number of entries retrieved at a time
- Returns
-
array of entry IDs or a number of results
135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 |
# File 'lib/bio/io/ncbirest.rb', line 135 def esearch(str, hash = {}, limit = nil, step = 10000) serv = "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi" opts = { "tool" => "bioruby", "term" => str, } opts.update(hash) case opts["rettype"] when "count" count = esearch_count(str, opts) return count else retstart = 0 retstart = hash["retstart"].to_i if hash["retstart"] limit ||= hash["retmax"].to_i if hash["retmax"] limit ||= 100 # default limit is 100 limit = esearch_count(str, opts) if limit == 0 # unlimit list = [] 0.step(limit, step) do |i| retmax = [step, limit - i].min opts.update("retmax" => retmax, "retstart" => i + retstart) ncbi_access_wait response = Bio::Command.post_form(serv, opts) result = response.body list += result.scan(/<Id>(.*?)<\/Id>/m).flatten end return list end end |
#esearch_count(str, hash = {}) ⇒ Object
- Arguments
-
same as esearch method
- Returns
-
array of entry IDs or a number of results
170 171 172 173 174 175 176 177 178 179 180 181 182 183 |
# File 'lib/bio/io/ncbirest.rb', line 170 def esearch_count(str, hash = {}) serv = "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi" opts = { "tool" => "bioruby", "term" => str, } opts.update(hash) opts.update("rettype" => "count") #ncbi_access_wait response = Bio::Command.post_form(serv, opts) result = response.body count = result.scan(/<Count>(.*?)<\/Count>/m).flatten.first.to_i return count end |