BOL: Related items

Linux SSH Client Commands for Bioinformatics

Rahul Nayak — Thu, 13 Mar 2014 17:16:32 -0500

Here come on let play with the following basic command line usage of the ssh client.

1. Check your SSH Client Version:

Checking for your SSH client is very sare, but sometimes it may be necessary to identify the SSH client that you are currently running and it’s corresponding version number. The SSh client can be identified as follows

$ ssh -V
OpenSSH_3.9p1, OpenSSL 0.9.7a Feb 19 2013

$ ssh -V
ssh: SSH Secure Shell 3.2.9.1 (non-commercial version) on i686-pc-linux-gnu

2. Connect and login to remote host:

The First time when you login to the remotehost from a localhost, it will display the host key not found message and you can give “yes” to continue. The host key of the remote host will be added under .ssh2/hostkeys directory of your home directory, as shown below.

localhost$ ssh -l jit remotehost.example.com

jit@remotehost.example.com password:

remotehost.example.com$

The Second time when you login to the remote host from the localhost, it will prompt only for the password as the remote host key is already added to the known hosts list of the ssh client.

localhost$ ssh -l jit remotehost.example.com
jit@remotehost.example.com password:
remotehost.example.com$

For some reason, if the host key of the remote host is changed after you logged in for the first time, you may get a warning message as shown below. This could be because of various reasons such as 1) Sysadmin upgraded/reinstalled the SSH server on the remote host 2) someone is doing malicious activity etc., The best possible action to take before saying “yes” to the message below, is to call your sysadmin and identify why you got the host key changed message and verify whether it is the correct host key or not.

localhost$ ssh -l jit remotehost.example.com

jit @remotehost.example.com's password:
remotehost$

4. Debug SSH Client:

Sometimes it is necessary to view debug messages to troubleshoot any SSH connection issues. For this purpose, pass -v (lowercase v) option to the ssh as shown below.

Example without debug message:

        localhost$ ssh -l jit remotehost.example.com
        warning: Connecting to remotehost.example.com failed: No address associated to the name
        localhost$

Example with debug message:

        locaclhost$ ssh -v -l jit remotehost.example.com
        debug: SshConfig/sshconfig.c:2838/ssh2_parse_config_ext: Metaconfig parsing stopped at line 3.
        debug: SshConfig/sshconfig.c:637/ssh_config_set_param_verbose: Setting variable 'VerboseMode' to 'FALSE'.
        debug: SshConfig/sshconfig.c:3130/ssh_config_read_file_ext: Read 17 params from config file.
        debug: Ssh2/ssh2.c:1707/main: User config file not found, using defaults. (Looked for '/home/jit/.ssh2/ssh2_config')
        debug: Connecting to remotehost.example.com, port 22... (SOCKS not used)
        warning: Connecting to remotehost.example.com failed: No address associated to

5. Escape Character: (Toggle SSH session, SSH session statistics etc.)

Escape character ~ get’s SSH clients attention and the character following the ~ determines the escape command.
Toggle SSH Session: When you’ve logged on to the remotehost using ssh from the localhost, you may want to come back to the localhost to perform some activity and go back to remote host again. In this case, you don’t need to disconnect the ssh session to the remote host. Instead follow the steps below.

i. Login to remotehost from localhost: localhost$ssh -l jit remotehost
ii. Now you are connected to the remotehost: remotehost$
iii. To come back to the localhost temporarily, type the escape character ~ and Control-Z. When you type ~ you will not see that immediately on the screen until you press and press enter. So, on the remotehost in a new line enter the following key strokes for the below to work: ~

    remotehost$ ~^Z
    [1]+ Stopped                 ssh -l jit remotehost
    localhost$

iv. Now you are back to the localhost and the ssh remotehost client session runs as a typical unix background job, which you can check as shown below:

    localhost$ jobs
    [1]+ Stopped                 ssh -l jit remotehost

v. You can go back to the remote host ssh without entering the password again by bringing the background ssh remotehost session job to foreground on the localhost

    localhost$ fg %1
    ssh -l jit remotehost
    remotehost$

Steps to find all the repeats in the genome !

Neel — Thu, 31 Aug 2023 02:43:28 -0500

To find repeats in a genome from 2 to 9 length using a Perl script, you can use the RepeatMasker tool with the "--length" option[0]. Here's a step-by-step guide:

Install RepeatMasker: First, you need to install RepeatMasker on your system. You can download it from the RepeatMasker website[0].

Prepare the genome sequence: Make sure you have the genome sequence in a FASTA file format. Let's assume the file is named "genome.fasta".

./RepeatMasker -pa -nolow -norna -no_is -div -lib RepeatMaskerLib.embl -gff -xsmall -small -poly -species -dir -length - genome.fasta

Replace the following placeholders with appropriate values:

: The number of processors/threads you want to use for parallel processing.
: The divergence value for the species you are analyzing. You can find divergence values for different species in the RepeatMasker documentation[0].
: The name of the species you are analyzing.
: The directory where you want the output files to be saved.
and : The minimum and maximum lengths of the repeats you want to find (in this case, 2 and 9).

Analyze the output: RepeatMasker will generate several output files, including a .out file. You can parse this file to extract the information you need. There is a Perl tool called "one_code_to_find_them_all.pl" that can help you parse RepeatMasker output files[0]. You can download it from the source provided.

Use the provided Perl script: Once you have the "one_code_to_find_them_all.pl" script, you can run it to conveniently parse the RepeatMasker output files. Here's an example of how to use it:

perl one_code_to_find_them_all.pl --rm --length

Replace with the path to your RepeatMasker .out file, and with the path to a file containing the lengths of the reference elements.

This script will generate several output files, including .log.txt and .copynumber.csv, which contain quantitative information about the identified repeat elements.

Remember to adjust the parameters and options according to your specific needs and the characteristics of your genome.

Basic Notions in Molecular Biology and Genetics

Antony Campos — Sun, 16 Mar 2014 18:15:29 -0500

This is a presentation about some fundamental concepts applied in molecular biology and genetics, also it contains a little bit of the experience that one of our members has gained in his years of undergraduate state related to molecular cloning. Our research group, called "BIOPHARM" (Acronymus of Laboratory of Bioinformatics and Pharmacogenetics), was stablished on 2007, took it a bit of years to make it real this initative, although, nowadays, we're working on some projects involved in those fields. This research group belongs to the Department of Biochemistry, Faculty of Pharmacy and Biochemistry, Universidad Nacional Mayor de San Marcos, Lima, Perú. We try to encourage research initiatives, helping them and also we use to participate in differents courses, congress and symposiums.

UnCoVar: Workflow for Transparent and Robust Virus Variant Calling, Genome Reconstruction and Lineage Assignment

BioStar — Mon, 05 Aug 2024 23:01:29 -0500

UnCoVar: Workflow for Transparent and Robust Virus Variant Calling, Genome Reconstruction and Lineage Assignment

Using state of the art tools, easily extended for other viruses
Tool and database updates for critical components via Conda
Built using modern design patterns with Conda and Snakemake
Extensible and easy to customize
Submission Ready Genomes
Customizable reporting with comprehensive visualization

https://ikim-essen.github.io/uncovar/

Github https://github.com/IKIM-Essen/uncovar

Address of the bookmark: https://ikim-essen.github.io/uncovar/

Gerstein Lab

Wed, 19 Mar 2014 12:48:20 -0500

The focus of the Gerstein Lab is interpreting personal genomes, particularly in relation to disorders, such as cancer. This endeavor has a number of related aspects described below. Moreover, the approaches we take have broad connections to a variety of data-intensive fields, within the emerging discipline of data science.

Personal Genome Variation: SVs
Human Genome Annotation: Processing Next-Gen Sequencing Data
Comparative Genomics: Pseudogenes as Molecular Fossils
Protein Structure and Function: Macromolecular Motions
Analysis of Diverse Networks
Genomics at the Forefront of Data Science

Lab page: http://www.gersteinlab.org/

Genomic architecture surrounding the fusion site of human chromosome 2

LEGE — Tue, 04 Mar 2025 12:26:29 -0600

The article "Genomic Structure and Evolution of the Ancestral Chromosome Fusion Site in 2q13–2q14.1 and Paralogous Regions on Other Human Chromosomes (https://pmc.ncbi.nlm.nih.gov/articles/PMC187548/)" explores the genomic architecture surrounding the fusion site of human chromosome 2. This fusion event is a key evolutionary marker distinguishing humans from other great apes, as humans have 46 chromosomes while chimpanzees, gorillas, and orangutans possess 48. The fusion occurred through an end-to-end joining of two ancestral chromosomes, which remain separate in nonhuman primates.

Key Findings:

Chromosomal Fusion and Its Molecular Signature:
- The fusion site is located at 2q13–2q14.1 and is characterized by degenerate telomeric sequences appearing interstitially, indicating the historical head-to-head joining of ancestral chromosomes.
- Despite being a signature of a past fusion event, these telomeric repeats are no longer functional and have undergone sequence degradation over time.
Extensive Duplications in the Surrounding Genomic Region:
- The study identifies large-scale segmental duplications flanking the fusion site, with several of these regions duplicated and scattered across multiple chromosomes.
- These duplications are predominantly located in subtelomeric and pericentromeric regions, suggesting their role in genomic instability and chromosomal evolution.
Paralogous Regions and Their Evolutionary Relationships:
- A 168-kilobase (kb) segment near the fusion site has 98%–99% sequence identity with three regions on chromosome 9 (9pter, 9p11.2, and 9q13).
- Another 67-kb region distal to the fusion site shows a high degree of homology to sequences in chromosome 22qter.
- Additionally, a 100-kb segment exhibits 96% sequence identity with a region in chromosome 2q11.2.
Comparative Genomics and Evolutionary Implications:
- By comparing the duplicated sequences and their arrangement in primates, the researchers traced the order of duplication events leading to their present distribution.
- The presence of specific repetitive elements within these duplicated segments serves as evolutionary markers that help infer their historical rearrangements.
- Some of these duplicated regions are associated with chromosomal inversion breakpoints, potentially contributing to evolutionary changes in primates.
- Recurrent structural rearrangements in these regions have been linked to human chromosomal disorders.

Conclusions and Implications:

The findings provide valuable insights into the structural evolution of human chromosome 2, which played a crucial role in human speciation.
Understanding these segmental duplications and their evolutionary trajectories sheds light on genomic instability, which may contribute to human genetic diseases.
The study highlights how large-scale chromosomal rearrangements, such as fusion and duplication, have influenced the evolutionary divergence of humans from other primates.

This research advances our understanding of human genome evolution and offers a foundation for studying the effects of structural variants in genetic disorders.

Bioinformatics PhD at University of Calcutta

Mon, 31 Mar 2014 08:41:04 -0500

University of Calcutta
Department of Biophysics, Molecular Biology & Bioinformatics

Applications are invited for admission to the Ph.D. programme in the Department of Biophysics, Molecular Biology & Bioinformatics, University of Calcutta for the year 2014 from eligible candidates who would be placed under the departmental teachers or affiliated research supervisors for the pursuance of their Ph.D. programme.

Candidates are requested to download the Ph.D. admission test application form from the University website and apply in the prescribed proforma by paying Rs. 100/- through a challan available through different University Cash counters. The challan is to be duly forwarded through the Head, Department of Biophysics, Molecular Biology & Bioinformatics, University of Calcutta.

The completed application form with a copy of the paid challan is to be submitted to the office of the Department by April 16, 2014.

Syllabus for the Test: The questions for the admission test and interview will be based on topics in the following areas:

Mathematical methods, Molecular and Cellular Biophysics, Molecular and Cell Biology, Biochemistry, Genetics, Plant Biology, Developmental biology, Neurobiology, Biotechnology and Bioinformatics.

However, the interview will be primarily based on the research emphasis of the candidate. Candidates must clearly indicate the program in which they want to apply.

Date of Admission test : April 22, 2014 (Tuesday)

Date of publication of selection list for the interview : April 22, 2014(Tuesday)

Date of Interview : April 23, 2014 (Wednesday)

Number of vacancies for the Ph.D. programme : 12

Reservation policy will be followed as per rules.

Candidates with valid NET/GATE/M.Phil. or equivalent qualifications are not required to appear at the admission test but would need to qualify in the interview.

http://www.caluniv.ac.in/admission%20notice/PHD_BIO_PHYSICS.pdf

DEG 5.0: a database of essential genes in both prokaryotes and eukaryotes

Rahul Nayak — Tue, 30 Mar 2021 11:47:29 -0500

Essential genes are those indispensable for the survival of an organism, and their functions are therefore considered a foundation of life. Determination of a minimal gene set needed to sustain a life form, a fundamental question in biology, plays a key role in the emerging field, synthetic biology.

DEG is freely available at the website http://tubic.tju.edu.cn/deg or http://www.essentialgene.org.

Address of the bookmark: http://www.essentialgene.org/

Phylogenomics/Phylogenetic website

Aaryan Lokwani — Mon, 07 Apr 2014 02:17:18 -0500

Welcome to phylobabble.org, a discussion forum for phylogenetic theory and applications. The primary goal of this forum is to discuss best practice and new developments in phylogenetics. Although we do have a Troubleshooting category for getting feedback on analyses, this is not a help site for running phylogenetics programs.

A great place to chat about phylogenetics for researchers and the broader community of students and science-interested citizens.

Address of the bookmark: http://phylobabble.org/

Scalpel

Shruti Paniwala — Wed, 20 Aug 2014 02:07:58 -0500

A team from Cold Spring Harbor Laboratory has released an algorithm, called Scalpel, for finding insertions and deletions in next generation sequencing data sets. Scalpel, which is open source and available for download on SourceForge, outperformed the popular tools GATK HaplotypeCaller and SOAPindel in test runs on both simulated and real whole human exomes.

Like other indel callers, Scalpel works by performing de novo assembly of regions of interest, so that misalignment to the reference genome cannot obscure the presence of an insertion or deletion. Scalpel's innovation is to repeatedly check its assembly before comparing to the reference genome, to account for simple sequence repeats that are a regular source of error in indel calling. When Scalpel assembles an exon, it collects reads that map to that exon (including partial matches), splits them into k-mers, and creates a de Bruijn graph to span the exon; however, if it detects repeats in the map, it iteratively increases the size of the k-mers by one base until the repeats are eliminated. This ensures that the final assembly of the exon is highly accurate while minimizing compute time.

The Cold Spring Harbor team's validation of Scalpel, published over the weekend in Nature Methods, compares Scalpel's performance on a live whole exome against HaplotypeCaller and SOAPindel. The donor is an individual with serious neurological disorders, which may be linked to a high incidence of indels. One thousand indels from this individual's exome, called by one or more of the informatics pipelines, were selected for focused resequencing. This resequencing revealed a 77% true positive rate for Scalpel calls, dramatically better than the rates for either of the competing tools; Scalpel performed especially well with indels longer than five base pairs, a traditional weak point for indel callers.

Finally, the authors demonstrate Scalpel's use on a large set of genetic data from nearly 600 families who donated samples to the Simons Simplex Collection, a project of the Simons Foundation Autism Research Initiative. Scalpel found a very high enrichment for indels in children affected by autism, compared with their unaffected siblings, a pattern that persisted even after excluding common variants.