Class: Bio::SPTR
- Includes:
- EMBLDB::Common
- Defined in:
- lib/bio/db/embl/sptr.rb
Overview
Parser class for UniProtKB/SwissProt and TrEMBL database entry.
Constant Summary collapse
- @@entry_regrexp =
/[A-Z0-9]{1,4}_[A-Z0-9]{1,5}/
- @@data_class =
["STANDARD", "PRELIMINARY"]
- @@ac_regrexp =
Bio::EMBLDB::Common#ac -> ary
#accessions -> ary #accession -> String (accessions.first)
/[OPQ][0-9][A-Z0-9]{3}[0-9]/
- @@cc_topics =
['PHARMACEUTICAL', 'BIOTECHNOLOGY', 'TOXIC DOSE', 'ALLERGEN', 'RNA EDITING', 'POLYMORPHISM', 'BIOPHYSICOCHEMICAL PROPERTIES', 'MASS SPECTROMETRY', 'WEB RESOURCE', 'ENZYME REGULATION', 'DISEASE', 'INTERACTION', 'DEVELOPMENTAL STAGE', 'INDUCTION', 'CAUTION', 'ALTERNATIVE PRODUCTS', 'DOMAIN', 'PTM', 'MISCELLANEOUS', 'TISSUE SPECIFICITY', 'COFACTOR', 'PATHWAY', 'SUBUNIT', 'CATALYTIC ACTIVITY', 'SUBCELLULAR LOCATION', 'FUNCTION', 'SIMILARITY']
- @@dr_database_identifier =
returns databases cross-references in the DR lines.
-
Bio::SPTR#dr -> Hash w/in Array
DR Line; defabases cross-reference (>=0)
DR database_identifier; primary_identifier; secondary_identifier. a cross_ref pre one line
-
['EMBL','CARBBANK','DICTYDB','ECO2DBASE', 'ECOGENE', 'FLYBASE','GCRDB','HIV','HSC-2DPAGE','HSSP','INTERPRO','MAIZEDB', 'MAIZE-2DPAGE','MENDEL','MGD''MIM','PDB','PFAM','PIR','PRINTS', 'PROSITE','REBASE','AARHUS/GHENT-2DPAGE','SGD','STYGENE','SUBTILIST', 'SWISS-2DPAGE','TIGR','TRANSFAC','TUBERCULIST','WORMPEP','YEPD','ZFIN']
Constants included from EMBLDB::Common
EMBLDB::Common::DELIMITER, EMBLDB::Common::RS, EMBLDB::Common::TAGSIZE
Instance Method Summary collapse
-
#cc(topic = nil) ⇒ Object
returns contents in the CC lines.
-
#cc_web_resource(data) ⇒ Object
CC -!- WEB RESOURCE: NAME=ResourceName[; NOTE=FreeText][; URL=WWWAddress].
-
#dr(key = nil) ⇒ Object
Bio::SPTR#dr.
-
#dt(key = nil) ⇒ Object
returns a Hash of information in the DT lines.
-
#embl_dr ⇒ Object
Backup Bio::EMBLDB#dr as embl_dr.
-
#entry_id ⇒ Object
(also: #entry_name, #entry)
returns a ENTRY_NAME in the ID line.
-
#ft(feature_key = nil) ⇒ Object
returns contents in the feature table.
-
#gene_name ⇒ Object
returns a String of the first gene name in the GN line.
-
#gene_names ⇒ Object
returns a Array of gene names in the GN line.
-
#gn ⇒ Object
returns gene names in the GN line.
-
#hi ⇒ Object
The HI line Bio::SPTR#hi #=> hash.
-
#id_line(key = nil) ⇒ Object
returns a Hash of the ID line.
-
#molecule ⇒ Object
(also: #molecule_type)
returns a MOLECULE_TYPE in the ID line.
-
#oh ⇒ Object
The OH Line; .
-
#os(num = nil) ⇒ Object
returns a Array of Hashs or a String of the OS line when a key given.
-
#ox ⇒ Object
returns a Hash of oraganism taxonomy cross-references.
-
#protein_name ⇒ Object
returns the proposed official name of the protein.
-
#ref ⇒ Object
returns contents in the R lines.
-
#references ⇒ Object
returns Bio::Reference object from Bio::EMBLDB::Common#ref.
-
#seq ⇒ Object
(also: #aaseq)
returns a Bio::Sequence::AA of the amino acid sequence.
-
#sequence_length ⇒ Object
(also: #aalen)
returns a SEQUENCE_LENGTH in the ID line.
- #set_RN(data) ⇒ Object
-
#sq(key = nil) ⇒ Object
returns a Hash of conteins in the SQ lines.
-
#synonyms ⇒ Object
returns an array of synonyms (unofficial names).
Methods included from EMBLDB::Common
#ac, #accession, #de, #initialize, #kw, #oc, #og
Methods inherited from EMBLDB
Methods inherited from DB
#exists?, #fetch, #get, open, #tags
Instance Method Details
#cc(topic = nil) ⇒ Object
returns contents in the CC lines.
-
Bio::SPTR#cc -> Hash
returns an object of contents in the TOPIC.
-
Bio::SPTR#cc(TOPIC) -> Array w/in Hash, Hash
returns contents of the “ALTERNATIVE PRODUCTS”.
-
Bio::SPTR#cc(‘ALTERNATIVE PRODUCTS’) -> Hash
{'Event' => str, 'Named isoforms' => int, 'Comment' => str, 'Variants'=>[{'Name' => str, 'Synonyms' => str, 'IsoId' => str, 'Sequence' => []}]} CC -!- ALTERNATIVE PRODUCTS: CC Event=Alternative splicing; Named isoforms=15; ... CC placentae isoforms. All tissues differentially splice exon 13; CC Name=A; Synonyms=no del; CC IsoId=P15529-1; Sequence=Displayed;
returns contents of the “DATABASE”.
-
Bio::SPTR#cc(‘DATABASE’) -> Array
[{'NAME'=>str,'NOTE'=>str, 'WWW'=>URI,'FTP'=>URI}, ...] CC -!- DATABASE: NAME=Text[; NOTE=Text][; WWW="Address"][; FTP="Address"].
returns contents of the “MASS SPECTROMETRY”.
-
Bio::SPTR#cc(‘MASS SPECTROMETRY’) -> Array
[{'MW"=>float,'MW_ERR'=>float, 'METHOD'=>str,'RANGE'=>str}, ...] CC -!- MASS SPECTROMETRY: MW=XXX[; MW_ERR=XX][; METHOD=XX][;RANGE=XX-XX].
CC lines (>=0, optional)
CC -!- TISSUE SPECIFICITY: HIGHEST LEVELS FOUND IN TESTIS. ALSO PRESENT
CC IN LIVER, KIDNEY, LUNG AND BRAIN.
CC -!- TOPIC: FIRST LINE OF A COMMENT BLOCK;
CC SECOND AND SUBSEQUENT LINES OF A COMMENT BLOCK.
612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 |
# File 'lib/bio/db/embl/sptr.rb', line 612 def cc(topic = nil) unless @data['CC'] cc = Hash.new comment_border= '-' * (77 - 4 + 1) dlm = /-!- / # 12KD_MYCSM has no CC lines. return cc if get('CC').size == 0 cc_raw = fetch('CC') # Removing the copyright statement. cc_raw.sub!(/ *---.+---/m, '') # Not any CC Lines without the copyright statement. return cc if cc_raw == '' begin cc_raw, copyright = cc_raw.split(/#{comment_border}/)[0] cc_raw = cc_raw.sub(dlm,'') cc_raw.split(dlm).each do |tmp| tmp = tmp.strip if /(^[A-Z ]+[A-Z]): (.+)/ =~ tmp key = $1 body = $2 body.gsub!(/- (?!AND)/,'-') body.strip! unless cc[key] cc[key] = [body] else cc[key].push(body) end else raise ["Error: [#{entry_id}]: CC Lines", '"', tmp, '"', '', get('CC'),''].join("\n") end end rescue NameError if fetch('CC') == '' return {} else raise ["Error: Invalid CC Lines: [#{entry_id}]: ", "\n'#{self.get('CC')}'\n", "(#{$!})"].join end rescue NoMethodError end @data['CC'] = cc end case topic when 'ALLERGEN' return @data['CC'][topic] when 'ALTERNATIVE PRODUCTS' return cc_alternative_products(@data['CC'][topic]) when 'BIOPHYSICOCHEMICAL PROPERTIES' return cc_biophysiochemical_properties(@data['CC'][topic]) when 'BIOTECHNOLOGY' return @data['CC'][topic] when 'CATALITIC ACTIVITY' return cc_catalytic_activity(@data['CC'][topic]) when 'CAUTION' return cc_caution(@data['CC'][topic]) when 'COFACTOR' return @data['CC'][topic] when 'DEVELOPMENTAL STAGE' return @data['CC'][topic].join('') when 'DISEASE' return @data['CC'][topic].join('') when 'DOMAIN' return @data['CC'][topic] when 'ENZYME REGULATION' return @data['CC'][topic].join('') when 'FUNCTION' return @data['CC'][topic].join('') when 'INDUCTION' return @data['CC'][topic].join('') when 'INTERACTION' return cc_interaction(@data['CC'][topic]) when 'MASS SPECTROMETRY' return cc_mass_spectrometry(@data['CC'][topic]) when 'MISCELLANEOUS' return @data['CC'][topic] when 'PATHWAY' return cc_pathway(@data['CC'][topic]) when 'PHARMACEUTICAL' return @data['CC'][topic] when 'POLYMORPHISM' return @data['CC'][topic] when 'PTM' return @data['CC'][topic] when 'RNA EDITING' return cc_rna_editing(@data['CC'][topic]) when 'SIMILARITY' return @data['CC'][topic] when 'SUBCELLULAR LOCATION' return cc_subcellular_location(@data['CC'][topic]) when 'SUBUNIT' return @data['CC'][topic] when 'TISSUE SPECIFICITY' return @data['CC'][topic] when 'TOXIC DOSE' return @data['CC'][topic] when 'WEB RESOURCE' return cc_web_resource(@data['CC'][topic]) when 'DATABASE' # DATABASE: NAME=Text[; NOTE=Text][; WWW="Address"][; FTP="Address"]. tmp = Array.new db = @data['CC']['DATABASE'] return db unless db db.each do |e| db = {'NAME' => nil, 'NOTE' => nil, 'WWW' => nil, 'FTP' => nil} e.sub(/.$/,'').split(/;/).each do |line| case line when /NAME=(.+)/ db['NAME'] = $1 when /NOTE=(.+)/ db['NOTE'] = $1 when /WWW="(.+)"/ db['WWW'] = $1 when /FTP="(.+)"/ db['FTP'] = $1 end end tmp.push(db) end return tmp when nil return @data['CC'] else return @data['CC'][topic] end end |
#cc_web_resource(data) ⇒ Object
CC -!- WEB RESOURCE: NAME=ResourceName[; NOTE=FreeText][; URL=WWWAddress].
924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 |
# File 'lib/bio/db/embl/sptr.rb', line 924 def cc_web_resource(data) data.map {|x| entry = {'NAME' => nil, 'NOTE' => nil, 'URL' => nil} x.split(';').each do |y| case y when /NAME=(.+)/ entry['NAME'] = $1.strip when /NOTE=(.+)/ entry['NOTE'] = $1.strip when /URL="(.+)"/ entry['URL'] = $1.strip end end entry } end |
#dr(key = nil) ⇒ Object
Bio::SPTR#dr
959 960 961 962 963 964 965 966 967 968 969 970 |
# File 'lib/bio/db/embl/sptr.rb', line 959 def dr(key = nil) unless key embl_dr else (embl_dr[key] or []).map {|x| {'Accession' => x[0], 'Version' => x[1], ' ' => x[2], 'Molecular Type' => x[3]} } end end |
#dt(key = nil) ⇒ Object
returns a Hash of information in the DT lines.
hash keys:
['created', 'sequence', 'annotation']
also Symbols acceptable (ASAP):
[:created, :sequence, :annotation]
returns a String of information in the DT lines by a given key..
DT Line; date (3/entry)
DT DD-MMM-YYY (rel. NN, Created)
DT DD-MMM-YYY (rel. NN, Last sequence update)
DT DD-MMM-YYY (rel. NN, Last annotation update)
123 124 125 126 127 128 129 130 131 132 133 |
# File 'lib/bio/db/embl/sptr.rb', line 123 def dt(key = nil) return dt[key] if key return @data['DT'] if @data['DT'] part = self.get('DT').split(/\n/) @data['DT'] = { 'created' => part[0].sub(/\w{2} /,'').strip, 'sequence' => part[1].sub(/\w{2} /,'').strip, 'annotation' => part[2].sub(/\w{2} /,'').strip } end |
#embl_dr ⇒ Object
Backup Bio::EMBLDB#dr as embl_dr
956 |
# File 'lib/bio/db/embl/sptr.rb', line 956 alias :embl_dr :dr |
#entry_id ⇒ Object Also known as: entry_name, entry
returns a ENTRY_NAME in the ID line.
79 80 81 |
# File 'lib/bio/db/embl/sptr.rb', line 79 def entry_id id_line('ENTRY_NAME') end |
#ft(feature_key = nil) ⇒ Object
returns contents in the feature table.
Examples
sp = Bio::SPTR.new(entry)
ft = sp.ft
ft.class #=> Hash
ft.keys.each do |feature_key|
ft[feature_key].each do |feature|
feature['From'] #=> '1'
feature['To'] #=> '21'
feature['Description'] #=> ''
feature['FTId'] #=> ''
feature['diff'] #=> []
feature['original'] #=> [feature_key, '1', '21', '', '']
end
end
-
Bio::SPTR#ft -> Hash
{FEATURE_KEY => [{'From' => int, 'To' => int, 'Description' => aStr, 'FTId' => aStr, 'diff' => [original_residues, changed_residues], 'original' => aAry }],...}
returns an Array of the information about the feature_name in the feature table.
-
Bio::SPTR#ft(feature_name) -> Array of Hash
[{'From' => str, 'To' => str, 'Description' => str, 'FTId' => str},...]
FT Line; feature table data (>=0, optional)
Col Data item
----- -----------------
1- 2 FT
6-13 Feature name
15-20 `FROM' endpoint
22-27 `TO' endpoint
35-75 Description (>=0 per key)
----- -----------------
Note: ‘FROM’ and ‘TO’ endopoints are allowed to use non-numerial charactors including ‘<’, ‘>’ or ‘?’. (c.f. ‘<1’, ‘?42’)
1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 |
# File 'lib/bio/db/embl/sptr.rb', line 1024 def ft(feature_key = nil) return ft[feature_key] if feature_key return @data['FT'] if @data['FT'] table = [] begin get('FT').split("\n").each do |line| if line =~ /^FT \w/ feature = line.chomp.ljust(74) table << [feature[ 5..12].strip, # Feature Name feature[14..19].strip, # From feature[21..26].strip, # To feature[34..74].strip ] # Description else table.last << line.chomp.sub!(/^FT +/, '') end end # Joining Description lines table = table.map { |feature| ftid = feature.pop if feature.last =~ /FTId=/ if feature.size > 4 feature = [feature[0], feature[1], feature[2], feature[3, feature.size - 3].join(" ")] end feature << if ftid then ftid else '' end } hash = {} table.each do |feature| hash[feature[0]] = [] unless hash[feature[0]] hash[feature[0]] << { # Removing '<', '>' or '?' in FROM/TO endopoint. 'From' => feature[1].sub(/\D/, '').to_i, 'To' => feature[2].sub(/\D/, '').to_i, 'Description' => feature[3], 'FTId' => feature[4].to_s.sub(/\/FTId=/, '').sub(/\.$/, ''), 'diff' => [], 'original' => feature } case feature[0] when 'VARSPLIC', 'VARIANT', 'VAR_SEQ', 'CONFLICT' case hash[feature[0]].last['Description'] when /(\w[\w ]*\w*) - ?> (\w[\w ]*\w*)/ original_res = $1 changed_res = $2 original_res = original_res.gsub(/ /,'').strip chenged_res = changed_res.gsub(/ /,'').strip when /Missing/i original_res = seq.subseq(hash[feature[0]].last['From'], hash[feature[0]].last['To']) changed_res = '' end hash[feature[0]].last['diff'] = [original_res, chenged_res] end end rescue raise "Invalid FT Lines(#{$!}) in #{entry_id}:, \n'#{self.get('FT')}'\n" end @data['FT'] = hash end |
#gene_name ⇒ Object
returns a String of the first gene name in the GN line.
275 276 277 |
# File 'lib/bio/db/embl/sptr.rb', line 275 def gene_name gene_names.first end |
#gene_names ⇒ Object
returns a Array of gene names in the GN line.
264 265 266 267 268 269 270 271 |
# File 'lib/bio/db/embl/sptr.rb', line 264 def gene_names gn # set @data['GN'] if it hasn't been already done if @data['GN'].first.class == Hash then @data['GN'].collect { |element| element[:name] } else @data['GN'].first end end |
#gn ⇒ Object
returns gene names in the GN line.
New UniProt/SwissProt format:
-
Bio::SPTR#gn -> [ <gene record>* ]
where <gene record> is:
{ :name => '...',
:synonyms => [ 's1', 's2', ... ],
:loci => [ 'l1', 'l2', ... ],
:orfs => [ 'o1', 'o2', ... ]
}
Old format:
-
Bio::SPTR#gn -> Array # AND
-
Bio::SPTR#gn -> Array # OR
GN Line: Gene name(s) (>=0, optional)
188 189 190 191 192 193 194 195 196 197 198 |
# File 'lib/bio/db/embl/sptr.rb', line 188 def gn unless @data['GN'] case fetch('GN') when /Name=/,/ORFNames=/ @data['GN'] = gn_uniprot_parser else @data['GN'] = gn_old_parser end end @data['GN'] end |
#hi ⇒ Object
The HI line
Bio::SPTR#hi #=> hash
528 529 530 531 532 533 534 535 536 537 538 539 540 541 |
# File 'lib/bio/db/embl/sptr.rb', line 528 def hi unless @data['HI'] @data['HI'] = [] fetch('HI').split(/\. /).each do |hlist| hash = {'Category' => '', 'Keywords' => [], 'Keyword' => ''} hash['Category'], hash['Keywords'] = hlist.split(': ') hash['Keywords'] = hash['Keywords'].split('; ') hash['Keyword'] = hash['Keywords'].pop hash['Keyword'].sub!(/\.$/, '') @data['HI'] << hash end end @data['HI'] end |
#id_line(key = nil) ⇒ Object
returns a Hash of the ID line.
returns a content (Int or String) of the ID line by a given key. Hash keys: [‘ENTRY_NAME’, ‘DATA_CLASS’, ‘MODECULE_TYPE’, ‘SEQUENCE_LENGTH’]
ID Line
ID P53_HUMAN STANDARD; PRT; 393 AA.
#"ID #{ENTRY_NAME} #{DATA_CLASS}; #{MOLECULE_TYPE}; #{SEQUENCE_LENGTH}."
Examples
obj.id_line #=> {"ENTRY_NAME"=>"P53_HUMAN", "DATA_CLASS"=>"STANDARD",
"SEQUENCE_LENGTH"=>393, "MOLECULE_TYPE"=>"PRT"}
obj.id_line('ENTRY_NAME') #=> "P53_HUMAN"
63 64 65 66 67 68 69 70 71 72 73 74 |
# File 'lib/bio/db/embl/sptr.rb', line 63 def id_line(key = nil) return id_line[key] if key return @data['ID'] if @data['ID'] part = @orig['ID'].split(/ +/) @data['ID'] = { 'ENTRY_NAME' => part[1], 'DATA_CLASS' => part[2].sub(/;/,''), 'MOLECULE_TYPE' => part[3].sub(/;/,''), 'SEQUENCE_LENGTH' => part[4].to_i } end |
#molecule ⇒ Object Also known as: molecule_type
returns a MOLECULE_TYPE in the ID line.
A short-cut for Bio::SPTR#id_line(‘MOLECULE_TYPE’).
89 90 91 |
# File 'lib/bio/db/embl/sptr.rb', line 89 def molecule id_line('MOLECULE_TYPE') end |
#oh ⇒ Object
The OH Line;
OH NCBI_TaxID=TaxID; HostName. br.expasy.org/sprot/userman.html#OH_line
358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 |
# File 'lib/bio/db/embl/sptr.rb', line 358 def oh unless @data['OH'] @data['OH'] = fetch('OH').split("\. ").map {|x| if x =~ /NCBI_TaxID=(\d+);/ taxid = $1 else raise ArgumentError, ["Error: Invalid OH line format (#{self.entry_id}):", $!, "\n", get('OH'), "\n"].join end if x =~ /NCBI_TaxID=\d+; (.+)/ host_name = $1 host_name.sub!(/\.$/, '') else host_name = nil end {'NCBI_TaxID' => taxid, 'HostName' => host_name} } end @data['OH'] end |
#os(num = nil) ⇒ Object
returns a Array of Hashs or a String of the OS line when a key given.
-
Bio::EMBLDB#os -> Array
[{'name' => '(Human)', 'os' => 'Homo sapiens'},
{'name' => '(Rat)', 'os' => 'Rattus norveticus'}]
-
Bio::EPTR#os -> Hash
{'name' => "(Human)", 'os' => 'Homo sapiens'}
-
Bio::SPTR#os[‘name’] -> “(Human)”
-
Bio::EPTR#os(0) -> “Homo sapiens (Human)”
OS Line; organism species (>=1)
OS Genus species (name).
OS Genus species (name0) (name1).
OS Genus species (name0) (name1).
OS Genus species (name0), G s0 (name0), and G s (name0) (name1).
OS Homo sapiens (Human), and Rarrus norveticus (Rat)
OS Hippotis sp. Clark and Watts 825.
OS unknown cyperaceous sp.
297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 |
# File 'lib/bio/db/embl/sptr.rb', line 297 def os(num = nil) unless @data['OS'] os = Array.new fetch('OS').split(/, and|, /).each do |tmp| if tmp =~ /(\w+ *[\w\d \:\'\+\-\.]+[\w\d\.])/ org = $1 tmp =~ /(\(.+\))/ os.push({'name' => $1, 'os' => org}) else raise "Error: OS Line. #{$!}\n#{fetch('OS')}\n" end end @data['OS'] = os end if num # EX. "Trifolium repens (white clover)" return "#{@data['OS'][num]['os']} #{@data['OS'][num]['name']}" else return @data['OS'] end end |
#ox ⇒ Object
returns a Hash of oraganism taxonomy cross-references.
-
Bio::SPTR#ox -> Hash
{'NCBI_TaxID' => ['1234','2345','3456','4567'], ...}
OX Line; organism taxonomy cross-reference (>=1 per entry)
OX NCBI_TaxID=1234;
OX NCBI_TaxID=1234, 2345, 3456, 4567;
341 342 343 344 345 346 347 348 349 350 351 352 |
# File 'lib/bio/db/embl/sptr.rb', line 341 def ox unless @data['OX'] tmp = fetch('OX').sub(/\.$/,'').split(/;/).map { |e| e.strip } hsh = Hash.new tmp.each do |e| db,refs = e.split(/=/) hsh[db] = refs.split(/, */) end @data['OX'] = hsh end return @data['OX'] end |
#protein_name ⇒ Object
returns the proposed official name of the protein.
DE Line; description (>=1)
"DE #{OFFICIAL_NAME} (#{SYNONYM})"
"DE #{OFFICIAL_NAME} (#{SYNONYM}) [CONTEINS: #1; #2]."
OFFICIAL_NAME 1/entry
SYNONYM >=0
CONTEINS >=0
144 145 146 147 148 149 150 151 152 |
# File 'lib/bio/db/embl/sptr.rb', line 144 def protein_name name = "" if de_line = fetch('DE') then str = de_line[/^[^\[]*/] # everything preceding the first [ (the "contains" part) name = str[/^[^(]*/].strip name << ' (Fragment)' if str =~ /fragment/i end return name end |
#ref ⇒ Object
returns contents in the R lines.
-
Bio::EMBLDB::Common#ref -> [ <refernece information Hash>* ]
where <reference information Hash> is:
{'RN' => '', 'RC' => '', 'RP' => '', 'RX' => '',
'RA' => '', 'RT' => '', 'RL' => '', 'RG' => ''}
R Lines
-
RN RC RP RX RA RT RL RG
394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 |
# File 'lib/bio/db/embl/sptr.rb', line 394 def ref unless @data['R'] @data['R'] = [get('R').split(/\nRN /)].flatten.map { |str| hash = {'RN' => '', 'RC' => '', 'RP' => '', 'RX' => '', 'RA' => '', 'RT' => '', 'RL' => '', 'RG' => ''} str = 'RN ' + str unless /^RN / =~ str str.split("\n").each do |line| if /^(R[NPXARLCTG]) (.+)/ =~ line hash[$1] += $2 + ' ' else raise "Invalid format in R lines, \n[#{line}]\n" end end hash['RN'] = set_RN(hash['RN']) hash['RC'] = set_RC(hash['RC']) hash['RP'] = set_RP(hash['RP']) hash['RX'] = set_RX(hash['RX']) hash['RA'] = set_RA(hash['RA']) hash['RT'] = set_RT(hash['RT']) hash['RL'] = set_RL(hash['RL']) hash['RG'] = set_RG(hash['RG']) hash } end @data['R'] end |
#references ⇒ Object
returns Bio::Reference object from Bio::EMBLDB::Common#ref.
-
Bio::EMBLDB::Common#ref -> Bio::References
488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 |
# File 'lib/bio/db/embl/sptr.rb', line 488 def references unless @data['references'] ary = self.ref.map {|ent| hash = Hash.new('') ent.each {|key, value| case key when 'RA' hash['authors'] = value.split(/, /) when 'RT' hash['title'] = value when 'RL' if value =~ /(.*) (\d+) \((\d+)\), (\d+-\d+) \((\d+)\)$/ hash['journal'] = $1 hash['volume'] = $2 hash['issue'] = $3 hash['pages'] = $4 hash['year'] = $5 else hash['journal'] = value end when 'RX' # PUBMED, MEDLINE, DOI value.each do |tag, xref| hash[ tag.downcase ] = xref end end } Reference.new(hash) } @data['references'] = References.new(ary) end @data['references'] end |
#seq ⇒ Object Also known as: aaseq
returns a Bio::Sequence::AA of the amino acid sequence.
-
Bio::SPTR#seq -> Bio::Sequence::AA
blank Line; sequence data (>=1)
1134 1135 1136 1137 1138 1139 |
# File 'lib/bio/db/embl/sptr.rb', line 1134 def seq unless @data[''] @data[''] = Sequence::AA.new( fetch('').gsub(/ |\d+/,'') ) end return @data[''] end |
#sequence_length ⇒ Object Also known as: aalen
returns a SEQUENCE_LENGTH in the ID line.
A short-cut for Bio::SPTR#id_line(‘SEQUENCE_LENGHT’).
98 99 100 |
# File 'lib/bio/db/embl/sptr.rb', line 98 def sequence_length id_line('SEQUENCE_LENGTH') end |
#set_RN(data) ⇒ Object
425 426 427 |
# File 'lib/bio/db/embl/sptr.rb', line 425 def set_RN(data) data.strip end |
#sq(key = nil) ⇒ Object
returns a Hash of conteins in the SQ lines.
-
Bio::SPTRL#sq -> hsh
returns a value of a key given in the SQ lines.
-
Bio::SPTRL#sq(key) -> int or str
-
Keys: [‘MW’, ‘mw’, ‘molecular’, ‘weight’, ‘aalen’, ‘len’, ‘length’,
'CRC64']
SQ Line; sequence header (1/entry)
SQ SEQUENCE 233 AA; 25630 MW; 146A1B48A1475C86 CRC64;
SQ SEQUENCE \d+ AA; \d+ MW; [0-9A-Z]+ CRC64;
MW, Dalton unit. CRC64 (64-bit Cyclic Redundancy Check, ISO 3309).
1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 |
# File 'lib/bio/db/embl/sptr.rb', line 1106 def sq(key = nil) unless @data['SQ'] if fetch('SQ') =~ /(\d+) AA\; (\d+) MW; (.+) CRC64;/ @data['SQ'] = { 'aalen' => $1.to_i, 'MW' => $2.to_i, 'CRC64' => $3 } else raise "Invalid SQ Line: \n'#{fetch('SQ')}'" end end if key case key when /mw/, /molecular/, /weight/ @data['SQ']['MW'] when /len/, /length/, /AA/ @data['SQ']['aalen'] else @data['SQ'][key] end else @data['SQ'] end end |
#synonyms ⇒ Object
returns an array of synonyms (unofficial names).
synonyms are each placed in () following the official name on the DE line.
158 159 160 161 162 163 164 165 166 167 168 169 |
# File 'lib/bio/db/embl/sptr.rb', line 158 def synonyms ary = Array.new if de_line = fetch('DE') then line = de_line.sub(/\[.*\]/,'') # ignore stuff between [ and ]. That's the "contains" part line.scan(/\([^)]+/) do |synonym| unless synonym =~ /fragment/i then ary << synonym[1..-1].strip # index to remove the leading ( end end end return ary end |