BOL: Rahul Nayak's pages

World promising health companies !

Rahul Nayak — Tue, 31 Dec 2019 19:10:13 -0600

The health care industry is expected to sustain stable growth over the next decade for a variety of reasons. Advances in medicine have prolonged the average lifespans of most people, requiring more health care treatments over longer terms. In years past, once people turned 65 and enrolled in Medicare, they were expected to live another 10 to 20 years.

Biohub is a joint collaborative effort by Berkeley, UCSF and Stanford for a medical science research center funded by a $600 million commitment from Facebook CEO and founder Mark Zuckerberg and his wife Priscilla Chan. It is trademarked as well as CZ Biohub. It is currently co-led by Stephen Quake and Joseph DeRisi.

More at https://www.czbiohub.org/

Calico LLC is an American research and development biotech company founded on September 18, 2013 by Bill Maris and backed by Google with the goal of combating aging and associated diseases. In Google's 2013 Founders' Letter, Larry Page described Calico as a company focused on "health, well-being, and longevity".

More at https://www.calicolabs.com/

UnitedHealth Group, Inc. (UNH) is the largest health care services company in the world, serving over 50 million individuals in the United States as of late 2018 and 5 million in Brazil. The company provides a wide range of health care products and services, such as health maintenance organizations (HMOs), point of service plans (POS), preferred provider organizations (PPOs), and managed fee-for-service programs.

More at https://www.unitedhealthgroup.com/

Parallel Processing with Perl !

Rahul Nayak — Sat, 25 Aug 2018 11:32:40 -0500

Here is a small tutorial on how to make best use of multiple processors for bioinformatics analysis. One best way is using perl threads and forks. Knowing how these threads and forks work is very important before implementing them. Getting to know how these work would be really useful before reading this tutorial.

Many times in bioinformatics we need to deal with huge datasets which are more than 100GB size. The traditional way to analysis a file is using the while loop

while (FILE){

Do something;

}

This is very slow(since we are using only one processor) and if we have 500 million lines in the dataset it takes more than a day to iterate through the whole dataset. So how do we make best use of all our processors and get the work done quickly?

Here is a very simple and efficient technique with perl which i have been using. I am more inclined towards using perl fork than perl threads.

One of the oldest way to fork is

my $fork = fork();
if($fork){
push (@childs,$fork);
}
elseif($fork==0){
your code here;
exit(0);
}
else{die “Couldnt fork : $!”;}
## wait for the child process to finish
foreach(@childs){
my $tmp=waitid($_,0);
}

what a fork does is it creates a child process and takes the variables and code with it to analyze it separately (detached from the parent process) and thus a separate process is created( which usually runs on a separate processor). Thats it!! One big disadvantage of forking is its very difficult to share variables among the different processes. I will show you how to do it easily but still it has its own drawbacks.

Okie, now if you really do not want to use fork in your code, that’s okie too..There are many useful modules which do it for you very efficiently. One really useful module is Parallel::ForkManager. You can use Parallel::ForkManager to manage the number of forks you want to generate (number of processors you want to use).
Simple usage:
use Parallel::ForkManager;
my $max_processors=8;
my $fork= new Parallel::ForkManager($max_processors);
foreach (@dna) {
$fork->start and next; # do the fork
you code here;
$fork->finish; # do the exit in the child process
}
$pm->wait_all_children;

so you will be generating 8 forks which do the same thing for your each element of array. when one child finishes, Parallel::ForkManager generates a new one and thus you will be using all your processors to analyze the data. Now, if you have generated 8 child processes and want to write the data to one file. You need to lock the file to do this, because you will have problems with the buffering. You can lock the file using flock command.

open (my $QUAL, “myfile.txt”);
flock $QUAL, LOCK_EX or die “cant lock file $!”;
print $QUAL “$output”;
flock $QUAL, LOCK_UN or die “$!”;
close $QUAL;

I would not suggest using flock when dealing with multiple processes because it will decrease the processing efficiency( each child process must wait for the lock to be released by the other child process). Instead, I would suggest each fork writing to a separate file and after the processing just concatenating them.

Putting it all together, If you have 100GB data you can do this

step 1 : split the dataset equally according to number of processors you have. this may take a few hours(about 2-3 hrs for 100GB file)
You can use unix “split” command for this
for example:
my $number_split=int($number_of_entries_in_your_dataset/$max_processors);
my $split_Files=`split -l $number_split “your_file.fasta” “file_name”`;
step2: open you directory comtaining you split files and start Parallel::ForkManager.
For example:
opendir(DIRECTORY, $split_files_directory) or die $!; ### open the directory
my $fork= new Parallel::ForkManager($max_processors);
while (my $file = readdir(DIRECTORY)) { ### read the directory
if($file=~/^\./){next;}
print $file,”\n”;
########## Start fork ##########
my $pid= $super_fork->start and next;
Whatever you want to do with the split file ;
analyze my piece of $file;
######### end fork ###############
$super_fork->finish;
}
$super_fork->wait_all_children;

So basically each processor will be active with its piece of data (split file) and thus you have created 8 processes at one time which run without interfering with the other process. I again will not suggest writing output from each child process to one file(for reasons above). Write output from each fork to a separate file and finally concatenate them. Thats it, you have just increased your program speed by 8 times!! Isnt it easy?

Note:
You may worry about concatenation of the output each child generates, since it does take some time(remember 100GB). I think now you can use a mysql database LOAD DATA LOCAL INFILE command to load all the files into a single table(Should take about 3hrs for 100Gb dataset) and then export the whole table into one file. This should be faster than just concatenating them using “cat” command.(correct me if I am wrong)

Or much simpler way is to use pipes

cat output_dir/* | my_pipe or my_pipe <(file1) final_file;

Thats it guys!! Enjoy programming and please do comment. I am not a computer scientist so forgive me for any mistakes and if any please report them. Thank you.

My commonly used commands in Bioinformatics

Rahul Nayak — Thu, 26 Jul 2018 04:58:45 -0500

FYI, I've found it useful to use MUMmer to extract the specific changes that Racon makes, so I can evaluate them individually:

minimap -t 24 assembly.fasta long_reads.fastq.gz | racon -t 24 long_reads.fastq.gz - assembly.fasta racon_assembly.fasta
nucmer -p nucmer assembly.fasta racon_assembly.fasta
show-snps -C -T -r nucmer.delta

This reports Racon's changes in a table. You can exclude indels with the -I option in show-snps.

This process (Racon -> MUMmer -> SNP table) solves the problem I originally raised in this issue. So as far as I'm concerned, you can close this issue (or keep it open if you still want to implement some kind of variant table).

Interview Puzzles for Bioinformatician !

Rahul Nayak — Tue, 17 Jul 2018 05:26:18 -0500

These are some of the most famous Interview Puzzles being asked in top tech companies.

Here is a list of Top 25 puzzles which have been asked in top Tech Interview.

Specially for Microsoft Interview Puzzles, you may refer,
Top 15 Microsoft Interview Puzzles
Microsoft Interview Puzzles

Other MOST COMMON Interview Puzzles-
Top 25 Tech Interview Logical Puzzles

Each of the puzzles got repeated a number of times in interviews even for top tech companies

Understanding BLASTn output format 6 !

Rahul Nayak — Wed, 27 Jun 2018 18:38:21 -0500

BLASTn output format 6

BLASTn maps DNA against DNA, for example gene sequences against a reference genome

blastn -query genes.ffn -subject genome.fna -outfmt 6

BLASTn tabular output format 6

Column headers:
qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore

1.	qseqid	query (e.g., gene) sequence id
2.	sseqid	subject (e.g., reference genome) sequence id
3.	pident	percentage of identical matches
4.	length	alignment length
5.	mismatch	number of mismatches
6.	gapopen	number of gap openings
7.	qstart	start of alignment in query
8.	qend	end of alignment in query
9.	sstart	start of alignment in subject
10.	send	end of alignment in subject
11.	evalue	expect value
12.	bitscore	bit score

Define your own output format

by adding the option -outfmt, as for example:

-outfmt "6 qseqid sseqid pident qlen length mismatch gapope evalue bitscore"

supported format specifiers are:
qseqid    Query Seq-id
qgi   Query GI
qacc    Query accesion
qaccver   Query accesion.version
qlen    Query sequence length
sseqid    Subject Seq-id
sallseqid All subject Seq-id(s), separated by a ';'
sgi       Subject GI
sallgi    All subject GIs
sacc      Subject accession
saccver   Subject accession.version
sallacc   All subject accessions
slen      Subject sequence length
qstart    Start of alignment in query
qend    End of alignment in query
sstart    Start of alignment in subject
send      End of alignment in subject
qseq      Aligned part of query sequence
sseq      Aligned part of subject sequence
evalue    Expect value
bitscore  Bit score
score   Raw score
length    Alignment length
pident    Percentage of identical matches
nident    Number of identical matches
mismatch  Number of mismatches
positive  Number of positive-scoring matches
gapopen   Number of gap openings
gaps      Total number of gaps
ppos      Percentage of positive-scoring matches
frames    Query and subject frames separated by a '/'
qframe    Query frame
sframe    Subject frame
btop      Blast traceback operations (BTOP)
staxids   Subject Taxonomy ID(s), separated by a ';'
sscinames Subject Scientific Name(s), separated by a ';'
scomnames Subject Common Name(s), separated by a ';'
sblastnames Subject Blast Name(s), separated by a ';'   (in alphabetical order)
sskingdoms  Subject Super Kingdom(s), separated by a ';'     (in alphabetical order)
stitle    Subject Title
salltitles  All Subject Title(s), separated by a '<>'
sstrand   Subject Strand
qcovs   Query Coverage Per Subject
qcovhsp   Query Coverage Per HSP

default values are:
-outfmt "6 qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore"

Gap filling or Contigs extensions tools !

Rahul Nayak — Fri, 01 Jun 2018 08:07:32 -0500

There are many tools to perform gap filling using Illumina short reads, for example "GapFiller: a de novo assembly approach to fill the gap within paired reads" or "Toward almost closed genomes with GapFiller". There are also some tools like GAPresolution that can help to perform local re-assemblies using 454 reads. We used GAPresolution but it is not a very good software, it is useful only in some specific situations.

Take a look at the PRICE software from the DeRisi lab. Its meant to do something very similar. http://derisilab.ucsf.edu/index.php?page=software

You could also look at SSPACE (http://www.baseclear.com/landingpages/basetools-a-wide-range-of-bioinformatics-solutions/sspacev12/), ATLAS tools (http://www.hgsc.bcm.tmc.edu/content/bcm-hgsc-software), and SCARPA (http://compbio.cs.toronto.edu/hapsembler/scarpa.html).

See the PAGIT protocol: http://www.sanger.ac.uk/resources/software/pagit/

In particular, take a look at the IMAGE tool: http://genomebiology.com/2010/11/4/R41

Also SOAPdenovo has ha function for scaffolding. Not sure about ABYSS

Here there is a useful explanation of several tools.

https://bioinformaticsonline.com/search?q=scaffolding&entity_type=object&entity_subtype=bookmarks&offset=0&search_type=entities

I could be wrong, but the above answers to your hypothetical scenario appear to miss the point that you aren't interested in assembling the full genome, just the 100 kb part you're interested in. I suggest the following algorithm:

1. Start with the initial assembly C0 of the contigs you have identified as overlapping your region of interest, and the set S of reads those contigs contain. Let C = C0.

2. Repeat:
a. Identify paired-end reads (not in C) for which one or both ends align within, or extending, contigs in C.
b. Identify unpaired reads that align extending these new paired-end reads.
c. Construct a new assembly C' from C and the new reads identified in (a) and (b).
d. Trim C' so it does not extend more than 100 kb to either end of C0. Set C = C'.
e. Let S' denote the reads that contribute to C'. If S' does not contain any reads not present in S, stop. Otherwise, Set S = S'.

3. If you don't have a complete assembly of the region of interest, generate an STS for each end of each contig, probe a library for clones including these STSes, subclone these clones into a paired-end sequencing vector, and generate paired-end reads for this library; then try steps (1) and (2) again, adding these new sequencing reads to what you had before.

4. If your average sequencing depth for the region of interest exceeds 25 or so without filling all gaps, it is likely that the remaining gaps represent sequences that are not getting cloned in your sequencing vectors. Try different sequencing vectors.

Learning Python Programming - a bioinformatician perspective !

Rahul Nayak — Mon, 14 May 2018 16:33:03 -0500

Python Programming is a general purpose programming language that is open source, flexible, powerful and easy to use. One of the most important features of python is its rich set of utilities and libraries for data processing and analytics tasks. In the current era of big biological data, python and biopython is getting more popularity due to its easy-to-use features which supports big data processing.

In this tutorial series article, I will explore features and packages of python which are widely used in the big data, NGS, and bioinformatics. I will also walk through a real biological example which shows NGS data processing with the help of python packages and programming.

Python has a couple of points to recommend it to biologists and scientists specifically:

It's widely used in the scientific community
It has a couple of very well designed libraries for doing complex scientific computing (although we won't encounter them in this book)
It lend itself well to being integrated with other, existing tools
It has features which make it easy to manipulate strings of characters (for example, strings of DNA bases and protein amino acid residues, which we as biologists are particularly fond of)

In general, following are some of the important features of python which makes it a perfect fit for rapid application development.

Python is interpreted language so the program does not need to be compiled. Interpreter parses the program code and generates the output.
Python is dynamically typed, so the variables types are defined automatically.
Python is strongly typed. So the developers need to cast the type manually.
Less code and more use makes it more acceptable.
Python is portable, extendable and scalable.

There are two major Python versions, Python 2 and Python 3. Python 2 and 3 are quite different. This tutorial uses Python 3, because it more semantically correct and supports newer features.

I will post tutorial on daily basis on this page. Check the sub-pages on right side.

Bioinformatics OneLiner

Rahul Nayak — Tue, 10 Apr 2018 04:13:03 -0500

To remove all line ends (\n) from a Unix text file:

sed ':a;N;$!ba;s/\n//g' filename.txt > newfilename_oneline.txt

To get average for a column of numbers (here the second column $2):

awk '{ sum += $2; n++ } END { if (n > 0) print sum / n; }'

To get sequence length for all sequences in a fasta file:

awk '/^>/ {if (seqlen){print seqlen}; print ;seqlen=0;next; } { seqlen = seqlen +length($0)}END{print seqlen}' \
filename.fasta

To copy (move, rename, etc) files based on their list in a text file:

cat file_list.txt | while read line; do cp "$line" complete_dataset/"$line"; done

To split bam files into sets with mapped and unmapped reads:

samtools view -F4 sample.bam > sample.mapped.sam
samtools view -f4 sample.bam > sample.unmapped.sam

To gzip all your fastq files using gnu parallel and gzip:

parallel gzip ::: *.fastq

To gzip all your fastq files using pigz:

pigz *.fastq

To count all sequences in a fasta file:

grep "^>" yourfile.fasta -c

To count all sequences in all fasta files in your current directory:

for a in *.fasta; do ls $a; grep "^>" -c $a; done

To keep only one copy of duplicated lines:

awk '!seen[$0]++'

To sum assembly size from SPAdes contigs.fasta or scaffolds.fasta file:

grep "^>" scaffolds.fasta | cut -f 4 -d '_' | paste -sd+ | bc

To remove everything after the first space at each line, e.g. to to simplify fasta headers:

cut -d' ' -f1 < your_file

To count reads in a all .fastq.gz files in your current folder (fast, using gnu parallel):

parallel "echo {} && gunzip -c {} | wc -l | awk '{d=\$1; print d/4;}'" ::: *.gz

To count reads in a all .fastq.gz files in your current folder:

zcat *.gz | echo $((`wc -l`/4))

To count reads in a all .fastq files in your current folder:

cat *.fastq | echo $((`wc -l`/4))

To count base pairs in a all .fastq.gz files in your current folder:

zcat *.fastq.gz | paste - - - - | cut -f 2 | tr -d '\n' | wc -c

To split multifasta file into many fasta files:

awk '/^>/ {OUT=substr($0,2) ".fa"}; {print >> OUT; close(OUT)}' Input_File

To convert Illumina FASTQ 1.3 to 1.8:

sed -e '4~4y/@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghi/!"#$%&'\''()*+,-.\/0123456789:;<=>?@ABCDEFGHIJ/' f.fastq

To convert FASTQ to FASTA:

sed -n '1~4s/^@/>/p;2~4p'

To get fastq read length distribution:

cat reads.fastq | awk '{if(NR%4==2) print length($1)}' | sort | uniq -c

To deinterleave interleaved fastq file:

cat myf.fq | paste - - - - - - - - | tee >(cut -f 1-4 | tr "\t" "\n" > myfile_1.fq) | cut -f 5-8 | \
tr "\t" "\n" > myf2.fq

To filter and sort contig identifiers from SPAdes assembly (e.g. here lenght >= 4000 + coverage >=100):

grep "^>" scaffolds.fasta | sed s"/_/ /"g | awk '{ if ($4 >= 4000 && $6 >= 100) print $0 }' | sort -k 4 -n | \
sed s"/ /_/"g

To append something to all headers of your fasta files:

sed 's/>.*/&YOURSTRING/' filename.fasta > new_filename.fasta

To replace/squeeze multiple adjacent spaces by only one space:

tr -s " " < file

To filter fastq based on length (here larger than or equal to 21, but smaller than or equal to 25.

cat your.fastq | paste - - - - | awk 'length($2)  >= 21 && length($2) <= 25' | sed 's/\t/\n/g' > filtered.fastq

To print difference between the last and first row in 5th column:

awk '{if (!first){first=$5;}; last=$5;} END {print last-first}' myfile.txt

To sample only 200 first bases from all sequences in a multifasta file (e.g. from assembly scaffolds.fasta file here):

awk '/^>/{ seqlen=0; print; next; } seqlen < 200 { if (seqlen + length($0) > 200) $0 = substr($0, 1, 200-seqlen);\
 seqlen += length($0); print }' scaffolds.fasta > 200bp_scaffolds.fasta

To pipe a compressed fasta file directly into makeblastdb.

gunzip -c fasta.gz | makeblastdb -in -

To remove sequences with duplicate fasta headers from a fasta file.

awk '/^>/{f=!d[$1];d[$1]=1}f' in.fasta > out.fasta

Linux Commands Cheat Sheet for Bioinformatics and Computational Biology Professionals

Rahul Nayak — Mon, 05 Feb 2018 18:50:41 -0600

The purpose of this cheat sheet is to introduce biologist and bioinformatician to the frequently used tools for NGS analysis as well as giving experience in writing one-liners.

File System
ls — list items in current directory
ls -l — list items in current directory and show in long format to see perimissions, size, and modification date
ls -a — list all items in current directory, including hidden files
ls -F — list all items in current directory and show directories with a slash and executables with a star
ls dir — list all items in directory dir
cd dir — change directory to dir
cd .. — go up one directory
cd / — go to the root directory
cd ~ — go to to your home directory
cd - — go to the last directory you were just in
pwd — show present working directory
mkdir dir — make directory dir
rm file — remove file
rm -r dir — remove directory dir recursively
cp file1 file2 — copy file1 to file2
cp -r dir1 dir2 — copy directory dir1 to dir2 recursively
mv file1 file2 — move (rename) file1 to file2
ln -s file link — create symbolic link to file
touch file — create or update file
cat file — output the contents of file
less file — view file with page navigation
head file — output the first 10 lines of file
tail file — output the last 10 lines of file
tail -f file — output the contents of file as it grows, starting with the last 10 lines
vim file — edit file
alias name 'command' — create an alias for a command
System
shutdown — shut down machine
reboot — restart machine
date — show the current date and time
whoami — who you are logged in as
finger user — display information about user
man command — show the manual for command
df — show disk usage
du — show directory space usage
free — show memory and swap usage
whereis app — show possible locations of app
which app — show which app will be run by default
Process Management
ps — display your currently active processes
top — display all running processes
kill pid — kill process id pid
kill -9 pid — force kill process id pid
Permissions
ls -l — list items in current directory and show permissions
chmod ugo file — change permissions of file to ugo - u is the user's permissions, g is the group's permissions, and o is everyone else's permissions. The values of u, g, and o can be any number between 0 and 7.
7 — full permissions
6 — read and write only
5 — read and execute only
4 — read only
3 — write and execute only
2 — write only
1 — execute only
0 — no permissions
chmod 600 file — you can read and write - good for files
chmod 700 file — you can read, write, and execute - good for scripts
chmod 644 file — you can read and write, and everyone else can only read - good for web pages
chmod 755 file — you can read, write, and execute, and everyone else can read and execute - good for programs that you want to share
Networking
wget file — download a file
curl file — download a file
scp user@host:file dir — secure copy a file from remote server to the dir directory on your machine
scp file user@host:dir — secure copy a file from your machine to the dir directory on a remote server
scp -r user@host:dir dir — secure copy the directory dir from remote server to the directory dir on your machine
ssh user@host — connect to host as user
ssh -p port user@host — connect to host on port as user
ssh-copy-id user@host — add your key to host for user to enable a keyed or passwordless login
ping host — ping host and output results
whois domain — get information for domain
dig domain — get DNS information for domain
dig -x host — reverse lookup host
lsof -i tcp:1337 — list all processes running on port 1337
Searching
grep pattern files — search for pattern in files
grep -r pattern dir — search recursively for pattern in dir
grep -rn pattern dir — search recursively for pattern in dir and show the line number found
grep -r pattern dir --include='*.ext — search recursively for pattern in dir and only search in files with .ext extension
command | grep pattern — search for pattern in the output of command
find file — find all instances of file in real system
locate file — find all instances of file using indexed database built from the updatedb command. Much faster than find
sed -i 's/day/night/g' file — find all occurrences of day in a file and replace them with night - s means substitude and g means global - sed also supports regular expressions
Compression
tar cf file.tar files — create a tar named file.tar containing files
tar xf file.tar — extract the files from file.tar
tar czf file.tar.gz files — create a tar with Gzip compression
tar xzf file.tar.gz — extract a tar using Gzip
gzip file — compresses file and renames it to file.gz
gzip -d file.gz — decompresses file.gz back to file
Shortcuts
ctrl+a — move cursor to beginning of line
ctrl+f — move cursor to end of line
alt+f — move cursor forward 1 word
alt+b — move cursor backward 1 word

List of visualization tools for genome alignments

Rahul Nayak — Fri, 02 Feb 2018 13:25:33 -0600

Genome browsers are useful not only for showing final results but also for improving analysis protocols, testing data quality, and generating result drafts. Its integration in analysis pipelines allows the optimization of parameters, which leads to better results. But sometime, we need publication ready figure of genomes. Following are the list of genome alignment visualization tools, which could be useful for analysis and interpretation of results:

ABySS Explorer

Interactive Java application that uses a novel graph-based representation to display a sequence assembly and associated metadata

http://www.bcgsc.ca/platform/bioinfo/software/abyss-explorer

BamView

Genome browser and annotation tool that allows visualization of sequence features, next-generation sequencing (NGS) data and the results of analyses within the context of the sequence, and also its six-frame translation

http://www.sanger.ac.uk/resources/software/artemis/

DNannotator

Annotation web toolkit for regional genomic sequences

http://bioapp.psych.uic.edu/DNannotator.htm

JVM

Java Visual Mapping tool for NGS reads

http://www.springer.com/cda/content/document/cda_downloaddocument/9789401792448-c2.pdf?SGWID=0-0-45-1487072-p176815501

LookSeq

Web-based visualization of sequences derived from multiple sequencing technologies. Low- or high-depth read pileups and easy visualization of putative single nucleotide and structural variation

http://lookseq.sourceforge.net

MagicViewer

Visualization of short read alignment, identification of genetic variation and association with annotation information of a reference genome

http://bioinformatics.zj.cn/magicviewer/

MapView

Alignments of huge-scale single-end and pair-end short reads

http://omictools.com/mapview-s1367.html

MultiPipMaker

Computes alignments of similar regions in two DNA sequences. The resulting alignments are summarized with a ‘percent identity plot’ (pip)

http://pipmaker.bx.psu.edu/pipmaker/

PileLineGUI

Handling genome position files in NGS studies

http://sing.ei.uvigo.es/pileline/pilelinegui.html

SAMtools tview

Simple and fast text alignment viewer; NGS compatible

http://www.htslib.org/

SEWAL

Uses a locality-sensitive hashing algorithm to enumerate all unique sequences in an entire Illumina sequencing run

http://www.sourceforge.net/projects/sewal

STAR

A web-based integrated solution to management and visualization of sequencing data

http://wanglab.ucsd.edu/star/browser

SVA

Software for annotating and visualizing sequenced human genomes

http://www.svaproject.org

Viewer (IGV)

Visualization of large heterogeneous datasets, providing a smooth and intuitive user experience at all levels of genome resolution

https://www.broadinstitute.org/igv/

ZOOM Lite

NGS data mapping and visualization software

http://bioinfor.com/zoom/lite/