Class: Bio::NCBI::REST

Inherits:

Object

Object
Bio::NCBI::REST

show all

Defined in:: lib/bio/io/ncbirest.rb

Direct Known Subclasses

PubMed

Defined Under Namespace

Classes: EFetch, ESearch

Constant Summary collapse

NCBI_INTERVAL = Make no more than one request every 3 seconds.

@@last_access =

nil

Class Method Summary collapse

Instance Method Summary collapse

#efetch(ids, hash = {}, step = 100) ⇒ Object

Retrieve database entries by given IDs and using E-Utils (efetch) service.
#einfo ⇒ Object

List the NCBI database names E-Utils (einfo) service.
#esearch(str, hash = {}, limit = 100, step = 10000) ⇒ Object

Search the NCBI database by given keywords using E-Utils (esearch) service and returns an array of entry IDs.
#esearch_count(str, hash = {}) ⇒ Object
Arguments

same as esearch method Returns

array of entry IDs or a number of results.

Class Method Details

.efetch(*args) ⇒ `Object`



245
246
247

# File 'lib/bio/io/ncbirest.rb', line 245

def self.efetch(*args)
  self.new.efetch(*args)
end

.einfo ⇒ `Object`



233
234
235

# File 'lib/bio/io/ncbirest.rb', line 233

def self.einfo
  self.new.einfo
end

.esearch(*args) ⇒ `Object`



237
238
239

# File 'lib/bio/io/ncbirest.rb', line 237

def self.esearch(*args)
  self.new.esearch(*args)
end

.esearch_count(*args) ⇒ `Object`



241
242
243

# File 'lib/bio/io/ncbirest.rb', line 241

def self.esearch_count(*args)
  self.new.esearch_count(*args)
end

Instance Method Details

#efetch(ids, hash = {}, step = 100) ⇒ `Object`

Retrieve database entries by given IDs and using E-Utils (efetch) service.

For information on the possible arguments, see

eutils.ncbi.nlm.nih.gov/entrez/query/static/efetch_help.html

Usage

ncbi = Bio::NCBI::REST.new
ncbi.efetch("185041", {"db"=>"nucleotide", "rettype"=>"gb", "retmode" => "xml"})
ncbi.efetch("J00231", {"db"=>"nuccore", "rettype"=>"gb", "retmode"=>"xml"})
ncbi.efetch("AAA52805", {"db"=>"protein", "rettype"=>"gb"})

Bio::NCBI::REST.efetch("185041", {"db"=>"nucleotide", "rettype"=>"gb", "retmode" => "xml"})
Bio::NCBI::REST.efetch("J00231", {"db"=>"nuccore", "rettype"=>"gb"})
Bio::NCBI::REST.efetch("AAA52805", {"db"=>"protein", "rettype"=>"gb"})

Arguments:

ids: list of NCBI entry IDs (required)
hash: hash of E-Utils option => “nuccore”, “rettype” => “gb”
- db: “sequences”, “nucleotide”, “protein”, “pubmed”, “omim”, …
- retmode: “text”, “xml”, “html”, …
- rettype: “gb”, “gbc”, “medline”, “count”,…
step: maximum number of entries retrieved at a time

Returns: String

# File 'lib/bio/io/ncbirest.rb', line 205

def efetch(ids, hash = {}, step = 100)
  serv = "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi"
  opts = {
    "tool"     => "bioruby",
    "retmode"  => "text",
  }
  opts.update(hash)

  case ids
  when Array
    list = ids
  else
    list = ids.to_s.split(/\s*,\s*/)
  end

  result = ""
  0.step(list.size, step) do |i|
    opts["id"] = list[i, step].join(',')
    unless opts["id"].empty?
      ncbi_access_wait
      response = Bio::Command.post_form(serv, opts)
      result += response.body
    end
  end
  return result.strip
  #return result.strip.split(/\n\n+/)
end

#einfo ⇒ `Object`

List the NCBI database names E-Utils (einfo) service

eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.fcgi

pubmed protein nucleotide nuccore nucgss nucest structure genome
books cancerchromosomes cdd gap domains gene genomeprj gensat geo
gds homologene journals mesh ncbisearch nlmcatalog omia omim pmc
popset probe proteinclusters pcassay pccompound pcsubstance snp
taxonomy toolkit unigene unists

Usage

ncbi = Bio::NCBI::REST.new
ncbi.einfo

Bio::NCBI::REST.einfo

Returns: array of string (database names)

# File 'lib/bio/io/ncbirest.rb', line 66

def einfo
  serv = "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.fcgi"
  opts = {}
  response = Bio::Command.post_form(serv, opts)
  result = response.body
  list = result.scan(/<DbName>(.*?)<\/DbName>/m).flatten
  return list
end

#esearch(str, hash = {}, limit = 100, step = 10000) ⇒ `Object`

Search the NCBI database by given keywords using E-Utils (esearch) service and returns an array of entry IDs.

For information on the possible arguments, see

Usage

ncbi = Bio::NCBI::REST.new
ncbi.esearch("tardigrada", {"db"=>"nucleotide", "rettype"=>"count"})
ncbi.esearch("tardigrada", {"db"=>"nucleotide", "rettype"=>"gb"})
ncbi.esearch("yeast kinase", {"db"=>"nuccore", "rettype"=>"gb", "retmax"=>5})

Bio::NCBI::REST.esearch("tardigrada", {"db"=>"nucleotide", "rettype"=>"count"})
Bio::NCBI::REST.esearch("tardigrada", {"db"=>"nucleotide", "rettype"=>"gb"})
Bio::NCBI::REST.esearch("yeast kinase", {"db"=>"nuccore", "rettype"=>"gb", "retmax"=>5})

Arguments:

str: query string (required)
hash: hash of E-Utils option => “nuccore”, “rettype” => “gb”
- db: “sequences”, “nucleotide”, “protein”, “pubmed”, “taxonomy”, …
- retmode: “text”, “xml”, “html”, …
- rettype: “gb”, “medline”, “count”, …
- retmax: integer (default 100)
- retstart: integer
- field:
  - “titl”: Title [TI]
  - “tiab”: Title/Abstract [TIAB]
  - “word”: Text words [TW]
  - “auth”: Author [AU]
  - “affl”: Affiliation [AD]
  - “jour”: Journal [TA]
  - “vol”: Volume [VI]
  - “iss”: Issue [IP]
  - “page”: First page [PG]
  - “pdat”: Publication date [DP]
  - “ptyp”: Publication type [PT]
  - “lang”: Language [LA]
  - “mesh”: MeSH term [MH]
  - “majr”: MeSH major topic [MAJR]
  - “subh”: Mesh sub headings [SH]
  - “mhda”: MeSH date [MHDA]
  - “ecno”: EC/RN Number [rn]
  - “si”: Secondary source ID [SI]
  - “uid”: PubMed ID (PMID) [UI]
  - “fltr”: Filter [FILTER] [SB]
  - “subs”: Subset [SB]
- reldate: 365
- mindate: 2001
- maxdate: 2002/01/01
- datetype: “edat”
limit: maximum number of entries to be returned (0 for unlimited)
step: maximum number of entries retrieved at a time

Returns: array of entry IDs or a number of results

# File 'lib/bio/io/ncbirest.rb', line 133

def esearch(str, hash = {}, limit = 100, step = 10000)
  serv = "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi"
  opts = {
    "tool"   => "bioruby",
    "term"   => str,
  }
  opts.update(hash)

  case opts["rettype"]
  when "count"
    count = esearch_count(str, opts)
    return count
  else
    limit = esearch_count(str, opts) if limit == 0   # unlimit

    list = []
    0.step(limit, step) do |i|
      retmax = [step, limit - i].min
      opts.update("retmax" => retmax, "retstart" => i)
      ncbi_access_wait
      response = Bio::Command.post_form(serv, opts)
      result = response.body
      list += result.scan(/<Id>(.*?)<\/Id>/m).flatten
    end
    return list
  end
end

#esearch_count(str, hash = {}) ⇒ `Object`

Arguments: same as esearch method
Returns: array of entry IDs or a number of results

# File 'lib/bio/io/ncbirest.rb', line 163

def esearch_count(str, hash = {})
  serv = "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi"
  opts = {
    "tool"   => "bioruby",
    "term"   => str,
  }
  opts.update(hash)
  opts.update("rettype" => "count")
  #ncbi_access_wait
  response = Bio::Command.post_form(serv, opts)
  result = response.body
  count = result.scan(/<Count>(.*?)<\/Count>/m).flatten.first.to_i
  return count
end

Class: Bio::NCBI::REST

Direct Known Subclasses

Defined Under Namespace

Constant Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Class Method Details

.efetch(*args) ⇒ Object

.einfo ⇒ Object

.esearch(*args) ⇒ Object

.esearch_count(*args) ⇒ Object

Instance Method Details

#efetch(ids, hash = {}, step = 100) ⇒ Object

Usage

#einfo ⇒ Object

Usage

#esearch(str, hash = {}, limit = 100, step = 10000) ⇒ Object

Usage

#esearch_count(str, hash = {}) ⇒ Object

.efetch(*args) ⇒ `Object`

.einfo ⇒ `Object`

.esearch(*args) ⇒ `Object`

.esearch_count(*args) ⇒ `Object`

#efetch(ids, hash = {}, step = 100) ⇒ `Object`

#einfo ⇒ `Object`

#esearch(str, hash = {}, limit = 100, step = 10000) ⇒ `Object`

#esearch_count(str, hash = {}) ⇒ `Object`