BOL: Related items

DBG2OLC:Efficient Assembly of Large Genomes Using Long Erroneous Reads of the Third Generation Sequencing Technologies

Jit — Wed, 19 Apr 2017 10:09:51 -0500

DBG2OLC:Efficient Assembly of Large Genomes Using Long Erroneous Reads of the Third Generation Sequencing Technologies

Our work is published in Scientific Reports:

Ye, C. et al. DBG2OLC: Efficient Assembly of Large Genomes Using Long Erroneous Reads of the Third Generation Sequencing Technologies. Sci. Rep. 6, 31900; doi: 10.1038/srep31900 (2016).

http://www.nature.com/articles/srep31900

The manual can be downloaded from:

https://github.com/yechengxi/DBG2OLC/raw/master/Manual.docx

To use precompiled versions,please go to:

https://github.com/yechengxi/DBG2OLC/tree/master/compiled

Address of the bookmark: https://github.com/yechengxi/DBG2OLC

CoLoRMap: Correcting Long Reads by Mapping short reads

Jit — Mon, 20 Aug 2018 14:17:05 -0500

Second generation sequencing technologies paved the way to an exceptional increase in the number of sequenced genomes, both prokaryotic and eukaryotic. However, short reads are difficult to assemble and often lead to highly fragmented assemblies. The recent developments in long reads sequencing methods offer a promising way to address this issue. However, so far long reads are characterized by a high error rate, and assembling from long reads require a high depth of coverage. This motivates the development of hybrid approaches that leverage the high quality of short reads to correct errors in long reads.We introduce CoLoRMap, a hybrid method for correcting noisy long reads, such as the ones produced by PacBio sequencing technology, using high-quality Illumina paired-end reads mapped onto the long reads. Our algorithm is based on two novel ideas: using a classical shortest path algorithm to find a sequence of overlapping short reads that minimizes the edit score to a long read and extending corrected regions by local assembly of unmapped mates of mapped short reads. Our results on bacterial, fungal and insect data sets show that CoLoRMap compares well with existing hybrid correction methods.The source code of CoLoRMap is freely available for non-commercial use at https://github.com/sfu-compbio/colormap

ehaghshe@sfu.ca or cedric.chauve@sfu.ca

Address of the bookmark: https://github.com/sfu-compbio/colormap

CoverM: Read coverage calculator for metagenomics

Neel — Thu, 29 Apr 2021 23:39:14 -0500

CoverM aims to be a configurable, easy to use and fast DNA read coverage and relative abundance calculator focused on metagenomics applications.

CoverM calculates coverage of genomes/MAGs coverm genome (help) or individual contigs coverm contig (help). Calculating coverage by read mapping, its input can either be BAM files sorted by reference, or raw reads and reference genomes in various formats.

Address of the bookmark: https://github.com/wwood/CoverM

proovread : large-scale high-accuracy PacBio correction through iterative short read consensus

Jit — Fri, 05 Jan 2018 04:12:20 -0600

proovread : large-scale high-accuracy PacBio correction through iterative short read consensus

outperforms PacBioToCA/LSC in terms of accuracy and contiguity/sensitivity (http://dx.doi.org/10.1093/bioinformatics/btu392)
is easy to install/run/configure
supports various types of dat
- HiSeq/MiSeq (100-500bp)
- Unitigs
- 454, ...

proovread maps high coverage data to pacbio reads (bwa mem, blasr, daligner) in multiple iterations.

Address of the bookmark: https://github.com/BioInf-Wuerzburg/proovread

URMAP, an ultra-fast read mapper

Jit — Thu, 29 Oct 2020 23:03:54 -0500

URMAP, a new read mapping algorithm. URMAP is an order of magnitude faster than BWA with comparable accuracy on several validation tests. On a Genome in a Bottle (GIAB) variant calling test with 30× coverage 2×150 reads, URMAP achieves high accuracy (precision 0.998, sensitivity 0.982 and F-measure 0.990) with the strelka2 caller. However, GIAB reference variants are shown to be biased against repetitive regions which are difficult to map and may therefore pose an unrealistically easy challenge to read mappers and variant callers.

More at https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7320720/

Address of the bookmark: https://github.com/rcedgar/urmap

Find certain files/documents in Linux OS

Rahul Nayak — Sun, 06 Apr 2014 23:56:18 -0500

As bioinformatician I know the fact that we usually handle the large dataset and lost in the huge numbers of files and folders. In order to search the missing file a strong search command is required. The Linux Find Command is one of the most important and much used command in Linux sytems. Find command used to search and locate list of files and directories based on conditions you specify for files that match the arguments. Find can be used in variety of conditions like you can find files by permissions, users, groups, file type, date, size and other possible criteria.

Through this article we are sharing our day-to-day Linux find command experience and its usage in the form of examples. In this article we will show you the most used 35 Find Commands examples in Linux. We have divided the section into Five parts from basic to advance usage of find command.

Part I – Basic Find Commands for Finding Files with Names
1. Find Files Using Name in Current Directory

Find all the files whose name is gene.txt in a current working directory.

# find . -name gene.txt

./gene.txt

2. Find Files Under Home Directory

Find all the files under /home directory with name gene.txt.

# find /home -name gene.txt

/home/gene.txt

3. Find Files Using Name and Ignoring Case

Find all the files whose name is gene.txt and contains both capital and small letters in /home directory.

# find /home -iname gene.txt

./gene.txt
./Gene.txt

4. Find Directories Using Name

Find all directories whose name is Gene in / directory.

# find / -type d -name Gene

/Gene

5. Find fasta Files Using Name

Find all php files whose name is gene.fasta in a current working directory.

# find . -type f -name gene.fasta

./gene.fasta

6. Find all PHP Files in Directory

Find all fasta files in a directory.

# find . -type f -name "*.fasta"

./gene.fasta
./cancer.fasta
./allgene.fasta

Part II – Find Files Based on their Permissions
7. Find Files With 777 Permissions

Find all the files whose permissions are 777.

# find . -type f -perm 0777 -print

8. Find Files Without 777 Permissions

Find all the files without permission 777.

# find / -type f ! -perm 777

9. Find SGID Files with 644 Permissions

Find all the SGID bit files whose permissions set to 644.

# find / -perm 2644

10. Find Sticky Bit Files with 551 Permissions

Find all the Sticky Bit set files whose permission are 551.

# find / -perm 1551

11. Find SUID Files

Find all SUID set files.

# find / -perm /u=s

12. Find SGID Files

Find all SGID set files.

# find / -perm /g+s

13. Find Read Only Files

Find all Read Only files.

# find / -perm /u=r

14. Find Executable Files

Find all Executable files.

# find / -perm /a=x

15. Find Files with 777 Permissions and Chmod to 644

Find all 777 permission files and use chmod command to set permissions to 644.

# find / -type f -perm 0777 -print -exec chmod 644 {} \;

16. Find Directories with 777 Permissions and Chmod to 755

Find all 777 permission directories and use chmod command to set permissions to 755.

# find / -type d -perm 777 -print -exec chmod 755 {} \;

17. Find and remove single File

To find a single file called gene.txt and remove it.

# find . -type f -name "gene.txt" -exec rm -f {} \;

18. Find and remove Multiple File

To find and remove multiple files such as .fa or .gb, then use.

# find . -type f -name "*.fa" -exec rm -f {} \;

OR

# find . -type f -name "*.gb" -exec rm -f {} \;

19. Find all Empty Files

To file all empty files under certain path.

# find /tmp -type f -empty

20. Find all Empty Directories

To file all empty directories under certain path.

# find /tmp -type d -empty

21. File all Hidden Files

To find all hidden files, use below command.

# find /tmp -type f -name ".*"

Part III – Search Files Based On Owners and Groups
22. Find Single File Based on User

To find all or single file called gene.txt under / root directory of owner root.

# find / -user root -name gene.txt

23. Find all Files Based on User

To find all files that belongs to user Rahul under /home directory.

# find /home -user rahul

24. Find all Files Based on Group

To find all files that belongs to group Developer under /home directory.

# find /home -group developer

25. Find Particular Files of User

To find all .txt files of user Rahul under /home directory.

# find /home -user rahul -iname "*.txt"

Part IV – Find Files and Directories Based on Date and Time
26. Find Last 50 Days Modified Files

To find all the files which are modified 50 days back.

# find / -mtime 50

27. Find Last 50 Days Accessed Files

To find all the files which are accessed 50 days back.

# find / -atime 50

28. Find Last 50-100 Days Modified Files

To find all the files which are modified more than 50 days back and less than 100 days.

# find / -mtime +50 –mtime -100

29. Find Changed Files in Last 1 Hour

To find all the files which are changed in last 1 hour.

# find / -cmin -60

30. Find Modified Files in Last 1 Hour

To find all the files which are modified in last 1 hour.

# find / -mmin -60

31. Find Accessed Files in Last 1 Hour

To find all the files which are accessed in last 1 hour.

# find / -amin -60

Part V – Find Files and Directories Based on Size
32. Find 50MB Files

To find all 50MB files, use.

# find / -size 50M

33. Find Size between 50MB – 100MB

To find all the files which are greater than 50MB and less than 100MB.

# find / -size +50M -size -100M

34. Find and Delete 100MB Files

To find all 100MB files and delete them using one single command.

# find / -size +100M -exec rm -rf {} \;

35. Find Specific Files and Delete

Find all .gb files with more than 10MB and delete them using one single command.

# find / -type f -name *.gb -size +10M -exec rm {} \;

Π-cyc: A Reference-free SNP Discovery Application using Parallel Graph Search

Jit — Tue, 28 Jan 2020 03:34:23 -0600

Reference free SNP search for comparative population genomics: multiple samples run simultanously. **experimental phase, compiles and runs with OpenMPI-1.8.8 with Intel Compiler only

Cycles enumeration (aka Bubbles) as part of de novo de bruijn graphs assembly using colours can be unpractical for large error prone genomes which makes the assembly process produce an excessive number of false positive cycles. Our solution is to search the graph in multicores shared memory parallel mode using graph decomposition then use filtering method to generate good quality SNPs.

https://arxiv.org/abs/1809.06700

https://github.com/redayounsi/2KP2P

/2kp2omp/bin/main_2kp2_K63_C2 -i fastq_files.txt -o fungus_bub.fasta -r stat_fungus.txt -c cov_fungus_hash.txt -k 63 -h 20 -b 100 -g 600 -l 100 -f 16 -t 5.0 -x 1 -v 0 -p 1 -y 1 -u 1

Address of the bookmark: https://github.com/redayounsi/2KP2P

Common Bioinformatics Interview Questions !

Jit — Sat, 23 Jan 2021 06:07:50 -0600

The possibility of an interview for a bioinformatics position in the life sciences may be very disquieting, but the same concerns emerge time and again in my experience. So, it is exceedingly worthwhile to plan for future bioinformatics interview questions. Doing this will really give you the advantage in obtaining the position.

The following 5 questions are those that I have heard many times during the job-search process. There is no reason for not planning responses in such situations.

1. Tell Us About Yourself
This is a very typical opener in interviews. It's a perfect question to ask, and getting something planned will really help you concentrate and ease in the conversation. However, you need to make sure that your response is applicable to the job you're interviewing.
It's probably better to keep your answer professional. Try to include these in the answer as well: where did your love of science and bioinformatics come from? How the heck did you end up in this field? Why programming and scripting ?

2. What is your plan for your bioinformatics career? / How do you look at yourself in five years? / How are your personal objectives to accomplish these goals / What are the plan for your research fundings ?

Your CV/resume has already impressed the selection panel if you have been invited for an interview. The questions from the bioinformatics interview team provide an incentive for you to market yourself and illustrate the work in question with the most appropriate knowledge.

3. What do you understand about the job description/What would your suggested research path be if you were a successful candidate?
Summarize the specifics of the advertised bioinformatics position in your own words. Follow on with some suggestions of how you want to extend your research and create your own projects within the community.

4. Will you work as a group or do you want to work on your own?
This requirement can vary from jobs to job, so when addressing, be alert. A company/research PI may need a bioinformatician that is able to work on a single project autonomously, or they may need a person who can help direct and organize a team. In your response, refer to the job description.

5. What particular methods have you used to date with your experiments?
You might have experience with all the laboratory techniques described in the job description, but stress the ones you highly experienced with. Highlight your professional abilities and stress that you are extremely capable of mastering new techniques with others ...

At the end of the day, remember that you're questioning the jury as well as they're interviewing you. You will ought to think of any questions you would like the interview panel to pose. This indicates that you have done your homework and serious about the position.

All the best for your future job interview.

GAPPadder: A Sensitive Approach for Closing Gaps on Draft Genomes with Short Sequence Reads

Jit — Mon, 14 May 2018 05:25:48 -0500

This software is provided ``as is” without warranty of any kind. In no event shall the author be held responsible for any damage resulting from the use of this software. The program package, including source codes, executables, and this documentation, is distributed free of charge. If you use this program in a publication, please cite the following reference:
Chong Chu, Xin Li, and Yufeng Wu. "GAPPadder: A Sensitive Approach for Closing Gaps on Draft Genomes with Short Sequence Reads." bioRxiv (2017): 125534.

Address of the bookmark: https://github.com/Reedwarbler/GAPPadder

Sequencing Solutions to World Health

Rahul Agarwal — Thu, 29 Aug 2013 15:05:35 -0500

"New technology that quickly, easily and economically reveals the genomes of viruses and pathogens transforms public health and medicine."

Source: Life technologies

Address of the bookmark: http://www.lifetechnologies.com/global/en/home/communities-social/blog/blogs/sequencing-solutions-to-world-health.html?cid=social_blogseries_20130829_11098264