BioDSL
Installation
gem install BioDSL
Getting started
A test script:
#!/usr/bin/env ruby
require 'BioDSL'
p = BD.new.
read_fasta(input: "input.fna").
grab(select: "ATC$", keys: :SEQ).
write_fasta(output: "output.fna").
run(progress: true)
Or using an interactive shell using the alias ibp which you can create by
adding the following to your ~/.bashrc
file:
alias ibp="irb -r BioDSL --noinspect"
And then start the interactive shell:
$ ibp
irb(main):001:0> p = BD.new
=> BD.new
irb(main):002:0> p.read_fasta(input: "input.fna")
=> BD.new.read_fasta(input: "input.fna")
irb(main):003:0> p.grab(select: "ATC$", keys: :SEQ)
=> BD.new.read_fasta(input: "input.fna").grab(select: "ATC$", keys: :SEQ)
irb(main):004:0> p.write_fasta(output: "output.fna")
=> BD.new.read_fasta(input: "input.fna").grab(select: "ATC$", keys: :SEQ).write_fasta(output: "output.fna")
irb(main):005:0> p.run(progress: true)
=> BD.new.read_fasta(input: "input.fna").grab(select: "ATC$", keys: :SEQ).write_fasta(output: "output.fna").run(progress: true)
irb(main):006:0>
Or chaining commands directly:
$ ibp
irb(main):001:0> BD.new.read_fasta(input: "input.fna").grab(select: "ATC$", keys: :SEQ).write_fasta(output: "output.fna").run(progress: true)
=> BD.new.read_fasta(input: "input.fna").grab(select: "ATC$", keys: :SEQ).write_fasta(output: "output.fna").run(progress: true)
irb(main):002:0>
Or run on the command line with the alias bp which you can create by adding the following to your ~/.bashrc file:
alias bp="ruby -r BioDSL"
Then you can run the below from the command line:
$ bp -e 'BD.new.read_fasta(input: "input.fna").grab(select: "ATC$", keys: :SEQ).write_fasta(output: "output.fna").run(progress: true)'
Available BioDSL
- add_key
- align_seq_mothur
- analyze_residue_distribution
- assemble_pairs
- assemble_seq_idba
- assemble_seq_ray
- assemble_seq_spades
- classify_seq
- classify_seq_mothur
- clip_primer
- cluster_otus
- collapse_otus
- collect_otus
- complement_seq
- count
- degap_seq
- dereplicate_seq
- dump
- filter_rrna
- genecall
- grab
- index_taxonomy
- mean_scores
- merge_pair_seq
- merge_table
- merge_values
- plot_heatmap
- plot_histogram
- plot_matches
- plot_residue_distribution
- plot_scores
- random
- read_fasta
- read_fastq
- read_table
- reverse_seq
- slice_align
- slice_seq
- sort
- split_pair_seq
- split_values
- trim_primer
- trim_seq
- uchime_ref
- unique_values
- usearch_global
- write_fasta
- write_fastq
- write_table
- write_tree
Log and History
All BioDSL events are logged to ~/.BioDSL_log
.
BioDSL history is saved to ~/.BioDSL_history
.
Features
Progress:
Show nifty progress table with commands, records read and emittet and time.
BD.new.read_fasta(input: "input.fna").dump.run(progress: true)
Verbose:
Output verbose messages from commands and the run status.
BD.new.read_fasta(input: "input.fna").dump.run(verbose: true)
Debug:
Output debug messages from commands using these.
BD.new.read_fasta(input: "input.fna").dump.run(debug: true)
E-mail notification:
Send an email when run is complete.
BD.new.read_fasta(input: "input.fna").dump.run(email: [email protected], subject: "Script done!")
Report:
Create an HTML report of the run stats:
BD.new.read_fasta(input: "input.fna").dump.run(report: "status.html")
Output dir:
All output files from commands are put in a specified dir:
BD.new.read_fasta(input: "input.fna").dump.run(output_dir: "Results")
Configuration File
It is possible to pre-set options in a configuration file located in your $HOME
directory called .BioDSLrc
. Thus if an option is not already set, its value
will fall back to the one set in the configuration file. The configuration file
contains three whitespace separated columns:
- Command name
- Option
- Option value
Lines starting with '#' are considered comments and are ignored.
An example:
maasha@mel:~$ cat ~/.BioDSLrc
uchime_ref database /home/maasha/Install/QIIME1.8/data/rdp_gold.fa
uchime_ref cpus 20
On compute clusters it is necessary to specify the max processor count, which is otherwise determined as the number of cores on the current node. To override this add the following line:
pipeline processor_count 1000
It is also possible to change the temporary directory from the systems default by adding the following line:
pipeline tmp_dir /home/projects/ku_microbio/scratch/tmp
Contributing
Fork it
Create your feature branch (git checkout -b my-new-feature)
Commit your changes (git commit -am 'Add some feature')
Push to the branch (git push origin my-new-feature)
Create new Pull Request