http://bioinformatics.oxfordjournals.org/content/suppl/2014/03/07/btu135.DC1/OrioneSuppMat.pdf
Galaxyのワークフローを用いて、異なるツールを組み合わせた再現性のある処理パイプラインを異なるデータセットに対して自動実行できる(パラメータをリセットする必要なし)。Orioneの主たる機能(前処理、細菌のde novo・再シーケンシング、RNA-Seq、メタゲノム解析)を網羅したワークフローを用意した。ワークフローの説明では、各ステップで使用するツールを[]括弧内に示した。リンク付きの補足資料
NC_012967.1.fasta
README.txt
SRR030257_1.fastq
SRR030257_2.fastq
“Import to current history”を選び、“Go”ボタンを押す。
Input 入力:
Output 出力:
Steps 操作:
File to groom:
SRR030257_1.fastq
File to groom:
SRR030257_2.fastq
Short read data from your current history:
FASTQ Groomer on data *
Short read data from your current history:
FASTQ Groomer on data *
Sequencing reads:
FASTQ Groomer on data *
2nd read set (paired):
ON
Reads 2:
FASTQ Groomer on data *
Is this library mate-paired?:
Paired-end
Forward FASTQ file:
Flexbar on data * and data * (FlexbarTargetFile-1.fastq)
Reverse FASTQ file:
Flexbar on data * and data * (FlexbarTargetFile-2.fastq)
Forward FASTQ file:
FASTQ positional and quality trimming on data * and data *: trimmed forward FASTQ
Reverse FASTQ file:
FASTQ positional and quality trimming on data * and data *: trimmed reverse FASTQ
Short read data from your current history:
Paired-end compositional filtering on data * and data *: filtered forward FASTQ
Short read data from your current history:
Paired-end compositional filtering on data * and data *: filtered reverse FASTQ
Input 入力:
Output 出力:
Steps 操作:
Will you select a reference genome from your history or use a built-in index?:
Use one from the history
Select a reference from history:
NC_012967.1.fasta
Is this library mate-paired?:
Paired-end
Forward FASTQ file:
trimmed forward FASTQ (filtered forward FASTQ が表示されないバグ?)
Reverse FASTQ file:
trimmed reverse FASTQ (filtered reverse FASTQ が表示されないバグ?)
Choose the source for the reference list:
History
SAM file to convert:
Map with BWA-MEM on data *, data *, and data *: mapped reads
Using reference file:
NC_012967.1.fasta
Choose the source for the reference list:
History
Input BAM file:
SAM-to-BAM on data * and data *: converted BAM
Using reference file:
NC_012967.1.fasta
FASTQ file to convert:
BAM to consensus on data * and data *: FASTQ
Output (FASTQ to FASTA on data *) contains 1 sequence by concatenating multiple sequences with ‘n’
Draft genome:
FASTQ to FASTA on data *
Reference genome:
NC_012967.1.fasta
Draft:
FASTQ to FASTA on data *
Minimum contig length:
300
Output (Extract contigs on data *: contigs) contains 7 sequences.
Output (Extract contigs on data *: high quality contigs) contains 100 sequences.
Contigs collection:
Extract contigs on data *: contigs
Estimated genome size in Mb:
4.629812
Contigs collection:
Extract contigs on data *: high quality contigs
Estimated genome size in Mb:
4.629812
Contigs FASTA file (-s):
Extract contigs on data *: high quality contigs
Paired-end reads 1:
Paired-end compositional filtering on data * and data *: filtered forward FASTQ
Paired-end reads 2:
Paired-end compositional filtering on data * and data *: filtered reverse FASTQ
Insert size:
300
Variability (e.g. 0.25 for 25%):
0.25
Contigs collection:
SSPACE on data *, data *, and data *: final scaffolds
Estimated genome size in Mb:
4.629812
10.Align scaffolds against reference [Mugsy] スキャフォールドを参照ゲノム配列にマッピング
Reference:
NC_012967.1.fasta
Contigs/draft:
SSPACE on data *, data *, and data *: final scaffolds
MAF file to convert:
Mugsy on data * and data *: MAF
12.Reformat FASTA output with 60 nucleotides per row [FASTA Width formatter] 配列の表示形式を行あたり60塩基になるように変換
Library to re-format:
MAF to FASTA on data *
New width for nucleotides strings:
60
13.Annotate draft/contigs [Prokka] ドラフトゲノム(コンティグ)配列のアノテーション
Contigs to annotate:
FASTA Width on data *
Fast mode (--fast):
Skip CDS /product searching
このワークフローは、複数のde novoアセンブラー(Velvet, ABySS, SPAdes)を用いて、細菌ゲノムのde novoアセンブリを実行する。このワークフローでは、異なるK-mer値でVelvetを3回実行し、コンティグをマージしCD-HITツール(類似度1.0)でクラスタリングする。コンティグをCISAで統合し、基本的な統計を“Check bacterial contigs”ツールで計算し、最後にProkkaで配列をアノテーションする。
Input 入力:
Output 出力: • Contigs/Scaffolds from each assembler (FASTA) • Integrated contig sequences (FASTA) • Sequence annotations (multiple formats available) • Report with de novo assembly statistics
Steps 操作:
Hash Length:
21
Input Files
Input Files 1
file format:
fastq
read type:
-shortPaired reads
Dataset:
Paired-end compositional filtering on data * and data *: filtered forward FASTQ
Input Files 2
file format:
fastq
read type:
-shortPaired2 reads
Dataset:
Paired-end compositional filtering on data * and data *: filtered reverse FASTQ
Velvet Dataset:
velveth on data * and data *
Sequences to cluster:
velvetg on data *: Contigs
Similarity threshold: 0.75 - 1.0:
1.0
Paired Reads Files
Paired Reads Files 1
Paired read sequences:
Paired-end compositional filtering on data * and data *: filtered forward FASTQ
Paired Reads Files 2
Paired read sequences:
Paired-end compositional filtering on data * and data *: filtered reverse FASTQ
[-k] K-mer size:
21
Tool execution generated the following error message: ERROR RUNNING COMMAND: gunzip /SHARE/USERFS/els7/users/biobank/galaxy/central/job_wd/000/167/167659/dataset_285127_files/abyss-3.sam.gz gzip: /SHARE/USERFS/els7/users/biobank/galaxy/central/job_wd/000/167/167659/dataset_285127_files/abyss-3.sam.gz: No such file or directory
Forward reads:
Paired-end compositional filtering on data * and data *: filtered forward FASTQ
Reverse reads:
Paired-end compositional filtering on data * and data *: filtered reverse FASTQ
Contigs file 1
Contigs file:
SPAdes scaffolds (fasta)
Contigs file 2
Contigs file:
CD-HIT on data 64: representatives.fasta
Expected Genome Size (bp):
4629812
Contigs collection:
CISA Contigs Integrator on data * and data *
Estimated genome size in Mb:
4.629812
Contigs to annotate:
CISA Contigs Integrator on data * and data *
Fast mode (--fast):
Skip CDS /product searching
[Galaxy Workflow | W4 - Bacterial RNA-Seq | Paired-end](https://orione.crs4.it/workflow/display_by_username_and_slug?username=puva&slug=w4—bacterial-rna-seq–paired-end) 入力データ:ペアエンド・リード配列 |
[Galaxy Workflow | W4 - Bacterial RNA-Seq | Single-end](https://orione.crs4.it/workflow/display_by_username_and_slug?username=puva&slug=w4—bacterial-rna-seq) 入力データ:シングルエンド・リード配列 |
Listeria_monocytogenes_EGD_e_uid61583/NC_003210.fna
Listeria_monocytogenes_EGD_e_uid61583/NC_003210.ptt
Listeria_monocytogenes_EGD_e_uid61583/NC_003210.rnt
L monocytogenes_EGD/NC_003210.fastq
画面上部の Shared Data -> Published Workflows をクリックし、“W4 - Bacterial RNA-Seq | Single-end” をimport |
Workflows をクリックし、“imported: W4 - Bacterial RNA-Seq | Single-end” をRun |
Step 1: Input dataset
Reads in FASTQ format
Reads - Single-end
L monocytogenes_EGD/NC_003210.fastq
Step 2: Get EDGE-pro files (version 1.0.1)
RefSeq Genomic Accession ID
NC_003210
“Run workflow”ボタンを押す。