BOL: Related items

Rebaler: program for conducting reference-based assemblies using long reads.

Jit — Tue, 18 Sep 2018 07:52:41 -0500

Rebaler is a program for conducting reference-based assemblies using long reads. It relies mainly on minimap2 for alignment and Racon for making consensus sequences.

I made Rebaler for bacterial genomes (specifically for the task of testing basecallers). It should in principle work for non-bacterial genomes as well, but I haven't tested it.

Address of the bookmark: https://github.com/rrwick/Rebaler

Merqury: reference-free quality and phasing assessment for genome assemblies

Jit — Sat, 06 Jun 2020 05:38:34 -0500

Often, genome assembly projects have illumina whole genome sequencing reads available for the assembled individual. The k-mer spectrum of this read set can be used for independently evaluating assembly quality without the need of a high quality reference. Merqury provides a set of tools for this purpose.

https://github.com/marbl/meryl

Address of the bookmark: https://github.com/marbl/merqury

Minipolish: A tool for Racon polishing of miniasm assemblies

BioStar — Tue, 03 Dec 2019 02:40:54 -0600

Miniasm is a great long-read assembly tool: straight-forward, effective and very fast. However, it does not include a polishing step, so its assemblies have a high error rate – they are essentially made of stitched-together pieces of long reads.

Racon is a great polishing tool that can be used to clean up assembly errors. It's also very fast and well suited for long-read data. However, it operates on FASTA files, not the GFA graphs that miniasm makes.

That's where Minipolish comes in. With a single command, it will use Racon to polish up a miniasm assembly, while keeping the assembly in graph form.

It also takes care of some of the other nuances of polishing a miniasm assembly:

Adding read depth information to contigs
Fixing sequence truncation that can occur in Racon
Adding circularising links to circular contigs if not already present (so they display better in Bandage)
'Rotating' circular contigs between polishing rounds to ensure clean circularisation

Address of the bookmark: https://github.com/rrwick/Minipolish

merqury: Evaluate genome assemblies with k-mers

Jit — Fri, 03 Jul 2020 19:29:34 -0500

More at https://www.biorxiv.org/content/10.1101/2020.03.15.992941v1.full

Address of the bookmark: https://github.com/marbl/merqury

Syri compares alignments between two chromosome-level assemblies and identifies synteny and structural rearrangements.

Shruti Paniwala — Wed, 01 Jun 2022 02:01:13 -0500

Syri compares alignments between two chromosome-level assemblies and identifies synteny and structural rearrangements.

Address of the bookmark: https://github.com/schneebergerlab/syri

Find certain files/documents in Linux OS

Rahul Nayak — Sun, 06 Apr 2014 23:56:18 -0500

As bioinformatician I know the fact that we usually handle the large dataset and lost in the huge numbers of files and folders. In order to search the missing file a strong search command is required. The Linux Find Command is one of the most important and much used command in Linux sytems. Find command used to search and locate list of files and directories based on conditions you specify for files that match the arguments. Find can be used in variety of conditions like you can find files by permissions, users, groups, file type, date, size and other possible criteria.

Through this article we are sharing our day-to-day Linux find command experience and its usage in the form of examples. In this article we will show you the most used 35 Find Commands examples in Linux. We have divided the section into Five parts from basic to advance usage of find command.

Part I – Basic Find Commands for Finding Files with Names
1. Find Files Using Name in Current Directory

Find all the files whose name is gene.txt in a current working directory.

# find . -name gene.txt

./gene.txt

2. Find Files Under Home Directory

Find all the files under /home directory with name gene.txt.

# find /home -name gene.txt

/home/gene.txt

3. Find Files Using Name and Ignoring Case

Find all the files whose name is gene.txt and contains both capital and small letters in /home directory.

# find /home -iname gene.txt

./gene.txt
./Gene.txt

4. Find Directories Using Name

Find all directories whose name is Gene in / directory.

# find / -type d -name Gene

/Gene

5. Find fasta Files Using Name

Find all php files whose name is gene.fasta in a current working directory.

# find . -type f -name gene.fasta

./gene.fasta

6. Find all PHP Files in Directory

Find all fasta files in a directory.

# find . -type f -name "*.fasta"

./gene.fasta
./cancer.fasta
./allgene.fasta

Part II – Find Files Based on their Permissions
7. Find Files With 777 Permissions

Find all the files whose permissions are 777.

# find . -type f -perm 0777 -print

8. Find Files Without 777 Permissions

Find all the files without permission 777.

# find / -type f ! -perm 777

9. Find SGID Files with 644 Permissions

Find all the SGID bit files whose permissions set to 644.

# find / -perm 2644

10. Find Sticky Bit Files with 551 Permissions

Find all the Sticky Bit set files whose permission are 551.

# find / -perm 1551

11. Find SUID Files

Find all SUID set files.

# find / -perm /u=s

12. Find SGID Files

Find all SGID set files.

# find / -perm /g+s

13. Find Read Only Files

Find all Read Only files.

# find / -perm /u=r

14. Find Executable Files

Find all Executable files.

# find / -perm /a=x

15. Find Files with 777 Permissions and Chmod to 644

Find all 777 permission files and use chmod command to set permissions to 644.

# find / -type f -perm 0777 -print -exec chmod 644 {} \;

16. Find Directories with 777 Permissions and Chmod to 755

Find all 777 permission directories and use chmod command to set permissions to 755.

# find / -type d -perm 777 -print -exec chmod 755 {} \;

17. Find and remove single File

To find a single file called gene.txt and remove it.

# find . -type f -name "gene.txt" -exec rm -f {} \;

18. Find and remove Multiple File

To find and remove multiple files such as .fa or .gb, then use.

# find . -type f -name "*.fa" -exec rm -f {} \;

OR

# find . -type f -name "*.gb" -exec rm -f {} \;

19. Find all Empty Files

To file all empty files under certain path.

# find /tmp -type f -empty

20. Find all Empty Directories

To file all empty directories under certain path.

# find /tmp -type d -empty

21. File all Hidden Files

To find all hidden files, use below command.

# find /tmp -type f -name ".*"

Part III – Search Files Based On Owners and Groups
22. Find Single File Based on User

To find all or single file called gene.txt under / root directory of owner root.

# find / -user root -name gene.txt

23. Find all Files Based on User

To find all files that belongs to user Rahul under /home directory.

# find /home -user rahul

24. Find all Files Based on Group

To find all files that belongs to group Developer under /home directory.

# find /home -group developer

25. Find Particular Files of User

To find all .txt files of user Rahul under /home directory.

# find /home -user rahul -iname "*.txt"

Part IV – Find Files and Directories Based on Date and Time
26. Find Last 50 Days Modified Files

To find all the files which are modified 50 days back.

# find / -mtime 50

27. Find Last 50 Days Accessed Files

To find all the files which are accessed 50 days back.

# find / -atime 50

28. Find Last 50-100 Days Modified Files

To find all the files which are modified more than 50 days back and less than 100 days.

# find / -mtime +50 –mtime -100

29. Find Changed Files in Last 1 Hour

To find all the files which are changed in last 1 hour.

# find / -cmin -60

30. Find Modified Files in Last 1 Hour

To find all the files which are modified in last 1 hour.

# find / -mmin -60

31. Find Accessed Files in Last 1 Hour

To find all the files which are accessed in last 1 hour.

# find / -amin -60

Part V – Find Files and Directories Based on Size
32. Find 50MB Files

To find all 50MB files, use.

# find / -size 50M

33. Find Size between 50MB – 100MB

To find all the files which are greater than 50MB and less than 100MB.

# find / -size +50M -size -100M

34. Find and Delete 100MB Files

To find all 100MB files and delete them using one single command.

# find / -size +100M -exec rm -rf {} \;

35. Find Specific Files and Delete

Find all .gb files with more than 10MB and delete them using one single command.

# find / -type f -name *.gb -size +10M -exec rm {} \;

SMASH: An alignment-free tool to find and visualise rearrangements between pairs of DNA sequences

Jit — Thu, 21 Dec 2017 08:26:57 -0600

SMASH is a completely alignment-free method to find and visualise rearrangements between pairs of DNA sequences. The detection is based on relative compression, namely using a FCM, also known as Markov model, of high context order (typically 20). The method has been approached with a tool (also called SMASH). For visualization, SMASH outputs a SVG image, with an ideogram output architecture, where the patterns are represented with several HSV values (only value varies). The following image, illustrating the information maps between human and chimpanzee for the several chromosomes, depicts an example:

Address of the bookmark: https://github.com/pratas/smash

TwinBLAST: When Two Is Better than One

Jit — Sat, 07 Sep 2019 08:50:08 -0500

TwinBLAST is a web-based tool for viewing 2 BLAST reports simultaneouslyside-by-side. It uses ExtJS (www.sencha.com/products/extjs/) to provide 2independently scrollable panels. BioPerl (www.bioperl.org) is used to indexraw BLAST reports and Bio::Graphics is used to draw pictograms of the BLASThits.

https://github.com/IGS/twinblast

https://mra.asm.org/content/8/35/e00842-19

Address of the bookmark: https://github.com/IGS/twinblast

List of bioinformatics workflow management tools !

Rahul Nayak — Sat, 20 Mar 2021 00:15:25 -0500

Here are list of Workflow Managers

BigDataScript – A cross-system scripting language for working with big data pipelines in computer systems of different sizes and capabilities. [ paper-2014 | web ]
Bpipe – A small language for defining pipeline stages and linking them together to make pipelines. [ web ]
Common Workflow Language – a specification for describing analysis workflows and tools that are portable and scalable across a variety of software and hardware environments, from workstations to cluster, cloud, and high performance computing (HPC) environments. [ web ]
Cromwell – A Workflow Management System geared towards scientific workflows. [ web ]
Galaxy – a popular open-source, web-based platform for data intensive biomedical research. Has several features, from data analysis to workflow management to visualization tools. [ paper-2018 | web ]
Nextflow (recommended) – A fluent DSL modelled around the UNIX pipe concept, that simplifies writing parallel and scalable pipelines in a portable manner. [ paper-2018 | web ]
Ruffus – Computation Pipeline library for python widely used in science and bioinformatics. [ paper-2010 | web ]
SeqWare – Hadoop Oozie-based workflow system focused on genomics data analysis in cloud environments. [ paper-2010 | web ]
Snakemake – A workflow management system in Python that aims to reduce the complexity of creating workflows by providing a fast and comfortable execution environment. [ paper-2018 | web ]
Workflow Descriptor Language – Workflow standard developed by the Broad. [ web ]

Snakemake workflow: dna-seq-gatk-variant-calling

Jit — Thu, 25 Jul 2019 12:55:07 -0500

This Snakemake pipeline implements the GATK best-practices workflow for calling small genomic variants.

Address of the bookmark: https://github.com/snakemake-workflows/dna-seq-gatk-variant-calling