Class: Bio::GCG::Seq
Overview
Bio::GCG::Seq
This is GCG sequence file format (.seq or .pep) parser class.
References
-
Information about GCG Wisconsin Package®
www.accelrys.com/products/gcg_wisconsin_package .
-
EMBOSS sequence formats
www.hgmp.mrc.ac.uk/Software/EMBOSS/Themes/SequenceFormats.html
-
BioPerl document
Constant Summary collapse
- DELIMITER =
delimiter used by Bio::FlatFile
RS = nil
Instance Attribute Summary collapse
-
#checksum ⇒ Object
readonly
“Check:” field, which indicates checksum of current sequence.
-
#date ⇒ Object
readonly
Date field of this entry.
-
#definition ⇒ Object
readonly
Description field.
-
#entry_id ⇒ Object
readonly
ID field.
-
#heading ⇒ Object
readonly
heading (‘!!NA_SEQUENCE 1.0’ or whatever like this).
-
#length ⇒ Object
readonly
“Length:” field.
-
#seq_type ⇒ Object
readonly
“Type:” field, which indicates sequence type.
Class Method Summary collapse
-
.calc_checksum(str) ⇒ Object
Calculates checksum from given string.
-
.to_gcg(hash) ⇒ Object
Creates a new GCG sequence format text.
Instance Method Summary collapse
-
#aaseq ⇒ Object
If you know the sequence is AA, use this method.
-
#initialize(str) ⇒ Seq
constructor
Creates new instance of this class.
-
#naseq ⇒ Object
If you know the sequence is NA, use this method.
-
#seq ⇒ Object
Sequence data.
-
#validate_checksum ⇒ Object
Validates checksum.
Constructor Details
#initialize(str) ⇒ Seq
Creates new instance of this class. str must be a GCG seq formatted string.
38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 |
# File 'lib/bio/appl/gcg/seq.rb', line 38 def initialize(str) @heading = str[/.*/] # '!!NA_SEQUENCE 1.0' or like this str = str.sub(/.*/, '') str.sub!(/.*\.\.$/m, '') @definition = $&.to_s.sub(/^.*\.\.$/, '').to_s desc = $&.to_s if m = /(.+)\s+Length\:\s+(\d+)\s+(.+)\s+Type\:\s+(\w)\s+Check\:\s+(\d+)/.match(desc) then @entry_id = m[1].to_s.strip @length = (m[2] ? m[2].to_i : nil) @date = m[3].to_s.strip @seq_type = m[4] @checksum = (m[5] ? m[5].to_i : nil) end @data = str @seq = nil @definition.strip! end |
Instance Attribute Details
#checksum ⇒ Object (readonly)
“Check:” field, which indicates checksum of current sequence.
74 75 76 |
# File 'lib/bio/appl/gcg/seq.rb', line 74 def checksum @checksum end |
#date ⇒ Object (readonly)
Date field of this entry.
67 68 69 |
# File 'lib/bio/appl/gcg/seq.rb', line 67 def date @date end |
#definition ⇒ Object (readonly)
Description field.
60 61 62 |
# File 'lib/bio/appl/gcg/seq.rb', line 60 def definition @definition end |
#entry_id ⇒ Object (readonly)
ID field.
57 58 59 |
# File 'lib/bio/appl/gcg/seq.rb', line 57 def entry_id @entry_id end |
#heading ⇒ Object (readonly)
heading (‘!!NA_SEQUENCE 1.0’ or whatever like this)
78 79 80 |
# File 'lib/bio/appl/gcg/seq.rb', line 78 def heading @heading end |
#length ⇒ Object (readonly)
“Length:” field. Note that sometimes this might differ from real sequence length.
64 65 66 |
# File 'lib/bio/appl/gcg/seq.rb', line 64 def length @length end |
#seq_type ⇒ Object (readonly)
“Type:” field, which indicates sequence type. “N” means nucleic acid sequence, “P” means protein sequence.
71 72 73 |
# File 'lib/bio/appl/gcg/seq.rb', line 71 def seq_type @seq_type end |
Class Method Details
.calc_checksum(str) ⇒ Object
Calculates checksum from given string.
141 142 143 144 145 146 147 148 149 150 151 |
# File 'lib/bio/appl/gcg/seq.rb', line 141 def self.calc_checksum(str) # Reference: Bio::SeqIO::gcg of BioPerl-1.2.3 idx = 0 sum = 0 str.upcase.tr('^A-Z.~', '').each_byte do |c| idx += 1 sum += idx * c idx = 0 if idx >= 57 end (sum % 10000) end |
.to_gcg(hash) ⇒ Object
Creates a new GCG sequence format text. Parameters can be omitted.
Examples:
Bio::GCG::Seq.to_gcg(:definition=>'H.sapiens DNA',
:seq_type=>'N', :entry_id=>'gi-1234567',
:seq=>seq, :date=>date)
161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 |
# File 'lib/bio/appl/gcg/seq.rb', line 161 def self.to_gcg(hash) seq = hash[:seq] if seq.is_a?(Bio::Sequence::NA) then seq_type = 'N' elsif seq.is_a?(Bio::Sequence::AA) then seq_type = 'P' else seq_type = (hash[:seq_type] or 'P') end if seq_type == 'N' then head = '!!NA_SEQUENCE 1.0' else head = '!!AA_SEQUENCE 1.0' end date = (hash[:date] or Time.now.strftime('%B %d, %Y %H:%M')) entry_id = hash[:entry_id].to_s.strip len = seq.length checksum = self.calc_checksum(seq) definition = hash[:definition].to_s.strip seq = seq.upcase.gsub(/.{1,50}/, "\\0\n") seq.gsub!(/.{10}/, "\\0 ") w = len.to_s.size + 1 i = 1 seq.gsub!(/^/) { |x| s = sprintf("\n%*d ", w, i); i += 50; s } [ head, "\n", definition, "\n\n", "#{entry_id} Length: #{len} #{date} " \ "Type: #{seq_type} Check: #{checksum} ..\n", seq, "\n" ].join('') end |
Instance Method Details
#aaseq ⇒ Object
If you know the sequence is AA, use this method. Returns a Bio::Sequence::AA object.
If you call naseq for protein sequence, or aaseq for nucleic sequence, RuntimeError will be raised.
108 109 110 111 112 113 114 |
# File 'lib/bio/appl/gcg/seq.rb', line 108 def aaseq if seq.is_a?(Bio::Sequence::AA) then @seq else raise 'seq_type != \'P\'' end end |
#naseq ⇒ Object
If you know the sequence is NA, use this method. Returens a Bio::Sequence::NA object.
If you call naseq for protein sequence, or aaseq for nucleic sequence, RuntimeError will be raised.
121 122 123 124 125 126 127 |
# File 'lib/bio/appl/gcg/seq.rb', line 121 def naseq if seq.is_a?(Bio::Sequence::NA) then @seq else raise 'seq_type != \'N\'' end end |
#seq ⇒ Object
Sequence data. The class of the sequence is Bio::Sequence::NA, Bio::Sequence::AA or Bio::Sequence::Generic, according to the sequence type.
88 89 90 91 92 93 94 95 96 97 98 99 100 101 |
# File 'lib/bio/appl/gcg/seq.rb', line 88 def seq unless @seq then case @seq_type when 'N', 'n' k = Bio::Sequence::NA when 'P', 'p' k = Bio::Sequence::AA else k = Bio::Sequence end @seq = k.new(@data.tr('^-a-zA-Z.~', '')) end @seq end |
#validate_checksum ⇒ Object
Validates checksum. If validation succeeds, returns true. Otherwise, returns false.
132 133 134 |
# File 'lib/bio/appl/gcg/seq.rb', line 132 def validate_checksum checksum == self.class.calc_checksum(seq) end |