Class: Bio::NCBI::REST

Inherits:
Object show all
Defined in:
lib/bio/io/ncbirest.rb

Direct Known Subclasses

PubMed

Defined Under Namespace

Classes: EFetch, ESearch

Constant Summary collapse

NCBI_INTERVAL =

Make no more than one request every 1 seconds. (NCBI’s restriction is “Make no more than 3 requests every 1 second.”, but limited to 1/sec partly because of keeping the value in integer.)

1
@@last_access =
nil

Class Method Summary collapse

Instance Method Summary collapse

Class Method Details

.efetch(*args) ⇒ Object



252
253
254
# File 'lib/bio/io/ncbirest.rb', line 252

def self.efetch(*args)
  self.new.efetch(*args)
end

.einfoObject



240
241
242
# File 'lib/bio/io/ncbirest.rb', line 240

def self.einfo
  self.new.einfo
end

.esearch(*args) ⇒ Object



244
245
246
# File 'lib/bio/io/ncbirest.rb', line 244

def self.esearch(*args)
  self.new.esearch(*args)
end

.esearch_count(*args) ⇒ Object



248
249
250
# File 'lib/bio/io/ncbirest.rb', line 248

def self.esearch_count(*args)
  self.new.esearch_count(*args)
end

Instance Method Details

#efetch(ids, hash = {}, step = 100) ⇒ Object

Retrieve database entries by given IDs and using E-Utils (efetch) service.

For information on the possible arguments, see

Usage

ncbi = Bio::NCBI::REST.new
ncbi.efetch("185041", {"db"=>"nucleotide", "rettype"=>"gb", "retmode" => "xml"})
ncbi.efetch("J00231", {"db"=>"nuccore", "rettype"=>"gb", "retmode"=>"xml"})
ncbi.efetch("AAA52805", {"db"=>"protein", "rettype"=>"gb"})

Bio::NCBI::REST.efetch("185041", {"db"=>"nucleotide", "rettype"=>"gb", "retmode" => "xml"})
Bio::NCBI::REST.efetch("J00231", {"db"=>"nuccore", "rettype"=>"gb"})
Bio::NCBI::REST.efetch("AAA52805", {"db"=>"protein", "rettype"=>"gb"})

Arguments:

  • ids: list of NCBI entry IDs (required)

  • hash: hash of E-Utils option => “nuccore”, “rettype” => “gb”

    • db: “sequences”, “nucleotide”, “protein”, “pubmed”, “omim”, …

    • retmode: “text”, “xml”, “html”, …

    • rettype: “gb”, “gbc”, “medline”, “count”,…

  • step: maximum number of entries retrieved at a time

Returns

String



212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
# File 'lib/bio/io/ncbirest.rb', line 212

def efetch(ids, hash = {}, step = 100)
  serv = "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi"
  opts = {
    "tool"     => "bioruby",
    "retmode"  => "text",
  }
  opts.update(hash)

  case ids
  when Array
    list = ids
  else
    list = ids.to_s.split(/\s*,\s*/)
  end

  result = ""
  0.step(list.size, step) do |i|
    opts["id"] = list[i, step].join(',')
    unless opts["id"].empty?
      ncbi_access_wait
      response = Bio::Command.post_form(serv, opts)
      result += response.body
    end
  end
  return result.strip
  #return result.strip.split(/\n\n+/)
end

#einfoObject

List the NCBI database names E-Utils (einfo) service

pubmed protein nucleotide nuccore nucgss nucest structure genome
books cancerchromosomes cdd gap domains gene genomeprj gensat geo
gds homologene journals mesh ncbisearch nlmcatalog omia omim pmc
popset probe proteinclusters pcassay pccompound pcsubstance snp
taxonomy toolkit unigene unists

Usage

ncbi = Bio::NCBI::REST.new
ncbi.einfo

Bio::NCBI::REST.einfo

Returns

array of string (database names)



68
69
70
71
72
73
74
75
# File 'lib/bio/io/ncbirest.rb', line 68

def einfo
  serv = "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.fcgi"
  opts = {}
  response = Bio::Command.post_form(serv, opts)
  result = response.body
  list = result.scan(/<DbName>(.*?)<\/DbName>/m).flatten
  return list
end

#esearch(str, hash = {}, limit = nil, step = 10000) ⇒ Object

Search the NCBI database by given keywords using E-Utils (esearch) service and returns an array of entry IDs.

For information on the possible arguments, see

Usage

ncbi = Bio::NCBI::REST.new
ncbi.esearch("tardigrada", {"db"=>"nucleotide", "rettype"=>"count"})
ncbi.esearch("tardigrada", {"db"=>"nucleotide", "rettype"=>"gb"})
ncbi.esearch("yeast kinase", {"db"=>"nuccore", "rettype"=>"gb", "retmax"=>5})

Bio::NCBI::REST.esearch("tardigrada", {"db"=>"nucleotide", "rettype"=>"count"})
Bio::NCBI::REST.esearch("tardigrada", {"db"=>"nucleotide", "rettype"=>"gb"})
Bio::NCBI::REST.esearch("yeast kinase", {"db"=>"nuccore", "rettype"=>"gb", "retmax"=>5})

Arguments:

  • str: query string (required)

  • hash: hash of E-Utils option => “nuccore”, “rettype” => “gb”

    • db: “sequences”, “nucleotide”, “protein”, “pubmed”, “taxonomy”, …

    • retmode: “text”, “xml”, “html”, …

    • rettype: “gb”, “medline”, “count”, …

    • retmax: integer (default 100)

    • retstart: integer

    • field:

      • “titl”: Title [TI]

      • “tiab”: Title/Abstract [TIAB]

      • “word”: Text words [TW]

      • “auth”: Author [AU]

      • “affl”: Affiliation [AD]

      • “jour”: Journal [TA]

      • “vol”: Volume [VI]

      • “iss”: Issue [IP]

      • “page”: First page [PG]

      • “pdat”: Publication date [DP]

      • “ptyp”: Publication type [PT]

      • “lang”: Language [LA]

      • “mesh”: MeSH term [MH]

      • “majr”: MeSH major topic [MAJR]

      • “subh”: Mesh sub headings [SH]

      • “mhda”: MeSH date [MHDA]

      • “ecno”: EC/RN Number [rn]

      • “si”: Secondary source ID [SI]

      • “uid”: PubMed ID (PMID) [UI]

      • “fltr”: Filter [FILTER] [SB]

      • “subs”: Subset [SB]

    • reldate: 365

    • mindate: 2001

    • maxdate: 2002/01/01

    • datetype: “edat”

  • limit: maximum number of entries to be returned (0 for unlimited; nil for the “retmax” value in the hash or the internal default value (=100))

  • step: maximum number of entries retrieved at a time

Returns

array of entry IDs or a number of results



135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
# File 'lib/bio/io/ncbirest.rb', line 135

def esearch(str, hash = {}, limit = nil, step = 10000)
  serv = "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi"
  opts = {
    "tool"   => "bioruby",
    "term"   => str,
  }
  opts.update(hash)

  case opts["rettype"]
  when "count"
    count = esearch_count(str, opts)
    return count
  else
    retstart = 0
    retstart = hash["retstart"].to_i if hash["retstart"]

    limit ||= hash["retmax"].to_i if hash["retmax"]
    limit ||= 100 # default limit is 100
    limit = esearch_count(str, opts) if limit == 0   # unlimit

    list = []
    0.step(limit, step) do |i|
      retmax = [step, limit - i].min
      opts.update("retmax" => retmax, "retstart" => i + retstart)
      ncbi_access_wait
      response = Bio::Command.post_form(serv, opts)
      result = response.body
      list += result.scan(/<Id>(.*?)<\/Id>/m).flatten
    end
    return list
  end
end

#esearch_count(str, hash = {}) ⇒ Object

Arguments

same as esearch method

Returns

array of entry IDs or a number of results



170
171
172
173
174
175
176
177
178
179
180
181
182
183
# File 'lib/bio/io/ncbirest.rb', line 170

def esearch_count(str, hash = {})
  serv = "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi"
  opts = {
    "tool"   => "bioruby",
    "term"   => str,
  }
  opts.update(hash)
  opts.update("rettype" => "count")
  #ncbi_access_wait
  response = Bio::Command.post_form(serv, opts)
  result = response.body
  count = result.scan(/<Count>(.*?)<\/Count>/m).flatten.first.to_i
  return count
end