Hi Neelam,
There are several workflow, but you might find this SNP pipeline useful. It uses GATK for SNP calling which currently starts with an alignment from BWA.
In nutshell, the flow involves realigning the BAM file using GATK's -> SNP calling using GATK -> Indel calling -> filtering of the resulting VCF files -> Annotate called and filtered SNPs.
Thanks
Hi Poonam,
The various aspect of SNP calling is covered in this recent http://www.ncbi.nlm.nih.gov/pubmed/21478889 entitled "A framework for variation discovery and genotyping using next-generation DNA sequencing data" from authors of GATK. In addition, keep an eye at software manuals http://www.broadinstitute.org/gatk/ for up-to-date options incorporated in the toolkit.
Thanks
Hi Neelam,
My GATK workflow for a pair end Illumina data. SNPs calling using following steps:
Downloaded the SNP and indels databases from ftp://gsapubftp-anonymous@ftp.broadinstitute.org (bunlde -> 1.5 -> hg19)
The exome intervals using UCSC Table Browser http://genome.ucsc.edu/cgi-bin/hgTables?command=start
$ bwa aln -t 4 hg19.fa seq1.fastq > 1.sai
$ bwa aln -t 4 hg19.fa seq2.fastq > 2.sai
$ bwa sampe -r "@RG\tID:exomeID\tLB:exomeLB\tSM:exomeSM\tPL:illumina\tPU:exomePU" hg19.fa 1.sai 2.sai seq1.fastq seq2.fastq > original.sam
$ java -Xmx5g -jar FixMateInformation.jar I=original.sam O=fixed.sam SO=coordinate VALIDATION_STRINGENCY=LENIENT
$ java -Xmx5g -jar SortSam.jar I=fixed.sam SO=coordinate O=first.bam VALIDATION_STRINGENCY=LENIENT CREATE_INDEX=true
$ java -Xmx5g -jar MarkDuplicates.jar I=first.bam O=marked.bam METRICS_FILE=metricsFile CREATE_INDEX=true VALIDATION_STRINGENCY=LENIENT REMOVE_DUPLICATES=true
$ java -Xmx5g -jar GenomeAnalysisTK.jar -nt 4 -T RealignerTargetCreator -R hg19.fa -o intervalsList -I marked.bam -known Mills_and_1000G_gold_standard.indels.hg19.vcf
$ java -Xmx5g -jar GenomeAnalysisTK.jar -nt 4 -T IndelRealigner -R hg19.fa -I marked.bam -targetIntervals intervalsList -known Mills_and_1000G_gold_standard.indels.hg19.vcf -o realigned.bam
$ java -Xmx5g -jar GenomeAnalysisTK.jar -nt 4 -T CountCovariates -l INFO -R hg19.fa -I realigned.bam -cov ReadGroupCovariate -cov QualityScoreCovariate -cov CycleCovariate -cov DinucCovariate -recalFile recalFile -knownSites dbsnp_135.hg19.vcf
$ java -Xmx5g -jar GenomeAnalysisTK.jar -nt 4 -T TableRecalibration -R hg19.fa -I realigned.bam -o recalibrated.bam -recalFile recalFile
$ java -Xmx5g -jar GenomeAnalysisTK.jar -nt 4 -T UnifiedGenotyper -R hg19.fa -I recalibrated.bam -o resultSNPs.vcf -D dbsnp_135.hg19.vcf -metrics UniGenMetrics -stand_call_conf 50.0 -stand_emit_conf 10.0 -dcov 1000 -A DepthOfCoverage -A AlleleBalance -L exomes.bed
Note: While using it please bear in mind, that it will only call SNPs and not indels.
Thanks
Hi Neelam,
I guess this seep catalog of human genetic variation analysis of 1000 genomes are much useful. http://www.1000genomes.org/analysis
Thanks
Hi Neelam,
Please go through this paper "A fast and accurate SNP detection algorithm for next-generation sequencing data" http://www.nature.com/ncomms/journal/v3/n12/abs/ncomms2256.html
You can call variants with freebayes software http://clavius.bc.edu/~erik/CSHL-advanced-sequencing/freebayes-tutorial.html
https://github.com/ekg/freebayes
Thanks
Hi Neelam,
There are several free software and pipelines for SNP calling. I will suggest you to read this beginners guide to SNP calling from high-throughput DNA-sequencing data. http://www.ncbi.nlm.nih.gov/pubmed/22886560 and try some automatic analysis pipeline of next-generation sequencing data http://www.ncbi.nlm.nih.gov/pubmed/24929521
dDocent: a RADseq, variant-calling pipeline designed for population genomics of non-model organisms https://peerj.com/articles/431/
snpqc – an R pipeline for quality control of Illumina SNP genotyping array data.http://onlinelibrary.wiley.com/doi/10.1111/age.12198/abstract;jsessionid=A3B89DD95DB7E06F361B0E7CB903F63F.f01t03
The basic variant-calling and annotation pipeline developed at the Victorian Life Sciences Computation Initiative (VLSCI), University of Melbourne. https://github.com/claresloggett/variant_calling_pipeline
http://compbio.ufl.edu/wp-content/uploads/2014/02/Azarian_Bioinfo_Seminar_013914.pdf
http://ngsda.blogspot.in/2010/10/snp-call-pipeline.html
Low concordance of multiple variant-calling pipelines: practical implications for exome and genome sequencing http://genomemedicine.com/content/5/3/28
http://www.biotech.cornell.edu/brc/genomic-diversity-facility/services/gbs-data-analysis
Thanks
Hi Neelam,
I prefer VarScan ( http://varscan.sourceforge.net/ ) This is a tool that detects variants (SNPs and indels) in next-generation sequencing data. VarScan now takes SAMtools pileup as input, so it’s compatible with most SAM-friendly short read aligners. SNP, indel, and consensus calling. In addition to detecting variants, VarScan calls consensus genotypes based on read counts and allele frequency. For information http://www.ncbi.nlm.nih.gov/pubmed/22300766
http://massgenomics.org/varscan
Cheers
Hi Neelam,
I found this tutorial very useful https://wikis.utexas.edu/display/bioiteam/Variant+calling+tutorial
Thanks
Hi Neelam,
I guess, this Virmid (Virtual Microdissection for SNP calling) will be useful for you. It is a Java based variant caller designed for disease-control matched samples. Virmid is also specialized for identifying potential within individual contamination where the disease sample cannot be purified enough. While the SNP calling rate is severely compromised with this heterogeneity, Virmid can uncover SNPs with low allele frequency by considering the level of contamination (alpha). http://sourceforge.net/p/virmid/wiki/Home/
FermiKit: assembly-based variant calling for Illumina resequencing dataFermiKit: assembly-based variant calling for Illumina resequencing data https://github.com/lh3/fermikit
Software discoSnp++ is designed for discovering Single Nucleotide Polymorphism (SNP) and insertions/deletions (indels) from raw set(s) of reads obtained with Next Generation Sequencers (NGS).
Note that number of input read sets is not constrained, it can be one, two, or more. Note also that no other data as reference genome or annotations are needed.
The software is composed by two modules. First module, kissnp2, detects SNPs from read sets. A second module, kissreads2, enhance the kissnp2 results by computing per read set and for each variant found i/ its mean read coverage and ii/ the (phred) quality of reads generating the polymorphism. https://colibread.inria.fr/software/discosnp/
Cheers
You should try following as well
Genome-Wide Association Studies
Variant Calling Pipeline: FastQ to Annotated SNPs in Hours
Hi Neelam,
The extraction of single nucleotide polymorphisms (SNPs) from the raw genetic sequences involves many processing steps and the application of a diverse set of tools. The pipeline includes quality control, mapping of short reads to the reference genome, visualization and post-processing of the alignment including base quality recalibration. Followings are the essential useful pipeline links for SNPs callings:
A simple SNP calling pipeline:
http://www.tgac.ac.uk/Event%20Docs/Summer%20School:%20Walk%20Through%20BioInf/NGS%20Challenges.pdf
A beginners guide to SNP calling from high-throughput DNA-sequencing data.
http://www.ncbi.nlm.nih.gov/pubmed/22886560
SNP Calling Workshop
http://www.ebi.ac.uk/training/sites/ebi.ac.uk.training/files/materials/2014/140217_AgriOmics/dan_bolser_snp_calling_tutorial.pdf
Calling SNPs with Samtools
http://ged.msu.edu/angus/tutorials-2012/snp_tutorial.html
Variant Callers for Next-Generation Sequencing Data: A Comparison Study
http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0075619
GotCloud: Variant Calling Pipeline
http://genome.sph.umich.edu/wiki/GotCloud:_Variant_Calling_Pipeline
ngs_backbone: a pipeline for read cleaning, mapping and SNP calling using Next Generation Sequence
http://www.biomedcentral.com/1471-2164/12/285
iSVP: an integrated structural variant calling pipeline from high-throughput sequencing data
http://www.biomedcentral.com/1752-0509/7/S6/S8
QualitySNPng: a user-friendly SNP detection and visualization tool
http://nar.oxfordjournals.org/content/early/2013/04/29/nar.gkt333.full
Calling variants using BWA and GATK best practice pipeline
http://varianttools.sourceforge.net/Calling/BwaGatkHg19
SNP calling pipeline
http://www.bbmriwiki.nl/wiki/SnpCallingPipeline
dDocent: a RADseq, variant-calling pipeline designed for population genomics of non-model organisms
https://peerj.com/preprints/314/
Pipeline for SNP Analysis
http://sniplay.cirad.fr/cgi-bin/analysis.cgi
UGP Variant Pipeline 0.0.3
http://weatherby.genetics.utah.edu/UGP/wiki/index.php/UGP_Variant_Pipeline_0.0.3
SNPs Calling
https://code.google.com/p/rseqflow/wiki/PipelineDescription#SNPs_Calling
Thanks