BOL: All site blogs

SLURM Commands

Shruti Paniwala — Wed, 06 Jul 2022 07:40:07 -0500

SLURM commands

The following table shows SLURM commands on the SOE cluster.

Command	Description
sbatch	Submit batch scripts to the cluster
scancel	Signal jobs or job steps that are under the control of Slurm.
sinfo	View information about SLURM nodes and partitions.
squeue	View information about jobs located in the SLURM scheduling queue
smap	Graphically view information about SLURM jobs, partitions, and set configurations parameters
sqlog	View information about running and finished jobs
sacct	View resource accounting information for finished and running jobs
sstat	View resource accounting information for running jobs

For more information, run man on the commands above. See some examples below.

1. Info about the partitions and nodes
List all the partitions available to you and the nodes therein:

sinfo

Nodes in state idle can accept new jobs.

Show a partition configuratuin, for example, SOE_main

scontrol show partition=SOE_main

Show current info about a specific node:

scontrol show node=

You can also specify a group of nodes in the command above. For example, if your MPI job is running across soenode05,06,35,36, you can execute the command below to get the info on the nodes you are interested in:

scontrol show node=soenode[05-06,35-36]

An informative parameter in the output to look at would be CPULoad. It allows you to see how your application utilizes the CPUs on the running nodes.

2. Submit scripts
The header in a submit script specifies job name, partition (queue), time limit, memory allocation, number of nodes, number of cores, and files to collect standard output and error at run time, for example

#!/bin/bash

#SBATCH --job-name=OMP_run     # job name, "OMP_run"
#SBATCH --partition=SOE_main   # partition (queue)
#SBATCH -t 0-2:00              # time limit: (D-HH:MM) 
#SBATCH --mem=32000            # memory per node in MB 
#SBATCH --nodes=1              # number of nodes
#SBATCH --ntasks-per-node=16   # number of cores
#SBATCH --output=slurm.out     # file to collect standard output
#SBATCH --error=slurm.err      # file to collect standard errors

If the time limit is not specified in the submit script, SLURM will assign the default run time, 3 days. This means the job will be terminated by SLURM in 72 hrs. The maximum allowed run time is two weeks, 14-0:00.
If the memory limit is not requested, SLURM will assign the default 16 GB. The maximum allowed memory per node is 128 GB. To see how much RAM per node your job is using, you can run commands sacct or sstat to query MaxRSS for the job on the node - see examples below.
Depending on a type of application you need to run, the submit script may contain commands to create a temporary space on a computational node - see the discussion about using the file systems on the cluster.
Then it sets the environment specific to the application and starts the application on one or multiple nodes - see sbatch sample scripts in directory /usr/local/Samples on soemaster1.hpc.rutgers.edu.
You can submit your job to the cluster with sbatch command:

sbatch myscript.sh

3. Query job information
List all currently submitted jobs in running and pending states for a user:

squeue -u

Command squeue can be run with format options to expose specific information, for example, when pending job #706 is scheduled to start running:

squeue -j 706 --format="%S"

START_TIME
2015-04-30T09:54:32

More info can be shown by placing additional format options, for example:

squeue -j 706 --format="%i %P %j %u %T %l %C %S"

JOBID PARTITION   NAME    USER STATE   TIMELIMIT  CPUS START_TIME
706   SOE_main  Par_job_3 mike PENDING 3-00:00:00 64   2015-04-30T09:54:32

To see when all the jobs, pending in the queue, are scheduled to start:

squeue --start

List all running and completed jobs for a user

sqlog -u

sqlog -j

The following appreviations are used for the job states:

       CA   CANCELLED      Job was cancelled.

       CD   COMPLETED      Job completed normally.

       CG   COMPLETING     Job is in the process of completing.

       F    FAILED         Job termined abnormally.

       NF   NODE_FAIL      Job terminated due to node failure.

       PD   PENDING        Job is pending allocation.

       R    RUNNING        Job currently has an allocation.

       S    SUSPENDED      Job is suspended.

       TO   TIMEOUT        Job terminated upon reaching its time limit.

You can specify the fields you would like to see in the output of sqlog:

sqlog --format=list

The command below, for example, provides Job ID, user name, exit state, start date-time, and end date-time for job #2831:

sqlog -j 2831 --format=jid,user,state,start,end

List status info for a currently running job:

sstat -j

A formatted output can be used to gain only a specific info, for example, the maximum resident RAM usage on a node:

sstat --format="JobID,MaxRSS" -j

To get statistics on completed jobs by jobID:

sacct --format="JobID,JobName,MaxRSS,Elapsed" -j

To view the same information for all jobs of a user:

sacct --format="JobID,JobName,MaxRSS,Elapsed" -u

To print a list of fields that can be specified with the --format option:

sacct --helpformat

For example, to get Job ID, Job name, Exit state, start date-time, and end date-time for job #2831:

sacct -j 2831 --format="JobID,JobName,State,Start,End"

Another useful command to gain information about a running job is scontrol:

scontrol show job=

4. Cancel a job
To cancel one job:

scancel

To cancel one job and delete the TMP directory created by the submit script on a node:

sdel

To cancel all the jobs for a user:

scancel -u

To cancel one or more jobs by name:

scancel --name

Finding a mimicry game for teaching on-line and mentioned general resources

Shruti Paniwala — Tue, 28 Jun 2022 07:32:05 -0500

Mimicry and other resources
Mimicry games:
Great Heliconius game:
http://heliconius.org/evolving_butterflies/
(See also 
https://royalsocietypublishing.org/doi/10.1098/rspb.2020.0014)
Other one, a bit less friendly:
https://ccl.northwestern.edu/netlogo/models/Mimicry
Camouflage practical
https://alexis-catherine.github.io/publication/natural-selection-and-camouflage/
(NetLogo also has one: 
https://ccl.northwestern.edu/netlogo/models/BugHuntCamouflage)
Peppered moth game:
https://askabiologist.asu.edu/peppered-moths-game/play.html

General resources
The always popular Populus:
https://cbs.umn.edu/populus/overview
Drift & Gene Flow 
https://cartwrig.ht/apps/genie/
(Cock van Oosterhout has a great ppt to lead students through this)
See also https://cartwrig.ht/apps/redlynx/
https://demonstrations.wolfram.com/ReplicatorMutatorDynamicsWithThreeStrategies/
NetLogo:
http://ccl.northwestern.edu/netlogo/models/index.cgi
Population Genetics:
https://www.radford.edu/~rsheehy/Gen_flash/popgen/
Evolution in general
https://evolution.berkeley.edu/evolibrary/home.php
Mitochondrial Eve:
https://projects.ncsu.edu/cals/gn/ex/mit-eve.html
Y chromosomes:
https://projects.ncsu.edu/cals/gn/ex/y-chrom.html
A professional online package from Michael Kasumovic:
https://arludo.com/
a compilation of resources:
https://planted.botany.org/index.php?P=Home
Finally, Donald Forsdyke has some great on-line videos explaining
evolutionary principles (occasionally in a fake Scottish accent):
http://post.queensu.ca/~forsdyke/videolectures.htm

Online resources on must-read papers in evolutionary biology, for a literature club

Shruti Paniwala — Tue, 28 Jun 2022 07:29:08 -0500

1.       *Nick Barton:*

- The textbook "Evolution" by Nick Barton, with resources for
  exploring the literature: Barton, N. H., Briggs, D. E. G., Eisen, J.
  A., Goldstein, D. B., & Patel, N. H. (2007). Evolution. Cold Spring
  Harbor Laboratory Press.

- Papers from a course named "Classics in Evolutionary Biology":

Evolutionary Synthesis
1. Haldane, J. B. S. 1932. The causes of evolution. Longmans. New York.
   (esp. Ch. IV).
2. Fisher, R. A. 1930. The genetical theory of natural selection. Oxford
   University Press, Oxford. Selected Sections - Fundamental Theorem.

Genetic Variation
1a. Lewontin, R. C., and J. L. Hubby. 1966. A molecular approach to
the study of genic heterozygosity in natural populations. II. Amount
of variation and degree of heterozygosity in natural populations of
Drosophila pseudoobscura. Genetics. 54:595-609.

1b. Sachidandam et al. 2001. A map of human genome sequence variation
containing 1.42 million single nucleotide polymorphisms. 409: 928-33.

2. Wright S., Dobzhansky T., Hovanitz W. 1942 Genetics of natural
populations VII The allelism of lethals in the third chromosome of
Drosophila pseudoobscura. Genetics 27: 363-394.

Recombination and evolution
1. Hill, W. G., and A. Robertson. 1966. The effect of linkage on limits
to artificial selection. Genet. Res. 8:269-294.

2. Maynard Smith and Haigh. 1974. The hitch-hiking effect of a favourable
gene. Genet. Res. 23: 23-35.

Understanding sequence variation
1. Begun D. J., Aquadro C. F., 1992 Levels of naturally occurring DNA
polymorphism correlate with recombination rate in Drosophila melanogaster.
Nature 356: 519-520.

2. Green R. E., Reich D., Pääbo S., 2010 A draft sequence of the
Neandertal genome. Science 328: 710-722.

Quantitative Genetics:  variation in complex traits
1. Galton F., 1877 Typical laws of heredity. Nature 15: 492-495-
512-514- 532-533.

2. Turelli M., 1984 Heritable genetic variation via
mutation-selection balance: Lerch's Zeta meets the abdominal
bristle. Theor. Popul. Biol. 25: 138-193.

Quantitative Genetics:  finding the genes
1. Shrimpton A. E., Robertson A., 1988 The Isolation of polygenic factors
controlling bristle score in Drosophila melanogaster II Distribution of
third chromosome bristle effects within chromosome sections. Genetics
118: 445-459.

2. Boyle E. A., Li Y. I., Pritchard J. K., 2017 An expanded view of
complex traits: from polygenic to omnigenic. Cell 169: 1177-1186.

Neutral Evolution
1. Kimura, M. 1968. Evolutionary rate at the molecular level. Science.
217:624-626.

2a. Kern A. D., Hahn M. W., 2018 The Neutral Theory in Light of Natural
Selection. Molecular Biology and Evolution 110: 21077-6.

2b. Jensen J. D., Payseur B. A., Stephan W., Aquadro C. F., Lynch M.,
Charlesworth D., Charlesworth B., 2018 The importance of the Neutral Theory
in 1968 and 50 years on: a response to Kern and Hahn 2018. Evolution 112:
2109-4.

2c. Ellegren & Galtier. 2016. Determinants of genetic diversity. Nature
Reviews Genetics.

Mutation and Genetic Variability
1. Luria, S. E., and M. Delbrück. 1943. Mutations of Bacteria from Virus
Sensitivity to Virus Resistance. Genetics. 28(6):491-511.

2. Hill, W G. 1982. "Rates of Change in Quantitative Traits From Fixation
of New Mutations." Proceedings of the National Academy of Sciences (U.S.A.)
79: 142-45.

Testing for selection
1. McDonald & Kreitman. 1991. Adaptive protein evolution at the Adh locus
in Drosophila. Nature.

2. Begun, et al. Mol. Biol. Evol. 16, 1816-1819 (1999).

3. Siddiq et al. 2016. Experimental test and refutation of a classic case
of molecular adaptation in Drosophila melanogaster.  Nature Ecology &
Evolution.

The shifting balance
1. Wright, S. 1932. The roles of mutation, inbreeding, crossbreeding and
selection in evolution. Proceedings of the VI International Congress of
Genetics: 1. pp 356-366.

2. Coyne, J.A., N.H. Barton, and M. Turelli. 1997. A critique of Wright's
shifting balance theory of evolution.  Evolution 51: 643-671.

3. Barton. 2016. Sewall Wright on Evolution in Mendelian Populations and
the "Shifting Balance". Genetics.

Evolution of Sex
1.  Muller, H.J. 1964. The relation of recombination to mutational advance.
Mutation Res. 1(1):2-9

2. McDonald et al. 2016. Sex speeds adaptation by altering the dynamics of
molecular evolution. Nature.

Kin Selection, Cooperation, and Conflict
1. Hamilton, W. D. 1964. The genetical evolution of social behaviour I.
Journal of Theoretical Biology. 7:1-52.

2. Trivers, R. L. 1974 Parent-offspring conflict. American Zoologist.
14(1):249-264.

Sexual Selection
1. Zahavi, A. 1975. Mate selection - a selection of a handicap. J. Theor.
Biol. 53:205-214.

2. Kirkpatrick, M., and Ryan, M.J. 1991. The evolution of mating
preferences and the paradox of the lek. Nature. 350:33-38.

Fitness Landscapes
1. Dean, A. 1995. A Molecular Investigation of Genotype by Environment
Interactions. Genetics. 139:19-33.

2. Costanzo et al. 2010. The Genetic Landscape of a Cell. Science.

Speciation
1. Coyne, J. A., and H. A. Orr. 1989. Patterns of speciation in Drosophila.
Evolution. 43:362-381.

2. Corbett-Detig et al. 2013. Genetic incompatibilities are widespread
within species. Nature.

2.       *Marcos Antezana:*

Valen, L. v. 1975. Energy and Evolution. University of Chicago, Department
of Biology.

3.       *Remco Folkertsma:*

1. The work by Hopi Hoekstra on local adaptation and oldfield mice

2. Poelstra, J. W., Vijay, N., Bossu, C. M., Lantz, H., Ryll, B., Müller,
I., ... & Wolf, J. B. (2014). The genomic landscape underlying phenotypic
integrity in the face of gene flow in crows. Science, 344(6190), 1410-1414.

4.       *Joshka Kaufmann and Leslie Turner*

They offer us a link to 'papers every evolutionary biologist should read',
the papers are collected by Leslie Turner.
https://static1.squarespace.com/static/53e8cb7ce4b02c4bc3aeeee4/t/5ab8fcb670a6ad55c67fcdf4/1522072758665/EvoBioClassicsRefList.pdf

5.       *Sarah Stockwell*

Matt Ridley collected classic papers in evolutionary biology and printed
part of these papers in his book Evolution (see Matt Ridley. Evolution
(Univ. of Oxford Press, 2nd edition, 2004))

List of comparative genomics resources !

Shruti Paniwala — Tue, 28 Jun 2022 04:08:06 -0500

3D-GENOMICS -- A Database to Compare Structural and Functional Annotations of Proteins between Sequenced Genomes

Compare structural and functional annotations of proteins between sequenced genomes.

ARED Organism -- expansion of ARED reveals AU-rich element cluster variations between human and mouse

View AREs in the human transcriptome and study the comparative genomics of AREs in model organisms.

ATGC -- Alignable Tight Genomic Clusters Database

Find information about orthologous genes in prokaryotes.

AnimalQTLdb -- a livestock QTL database tool set for positional QTL information mining and beyond

Search for publicly available QTL data on livestocks and animal species.

BGDB -- Bovine Genome Database

Find information about bovine genomics data.

COMPARE -- a multi-organism system for cross-species data comparison and transfer of information

A multi-organism web-based resource system designed to easily retrieve, correlate and interpret data across species.

CONDOR -- COnserved Non-coDing Orthologous Regions

A database resource of developmentally associated conserved non-coding elements.

CORG -- A database for COmparative Regulatory Genomics

Delineate conserved non-coding blocks from upstream regions of putative orthologous gene pairs from man, mouse, rat, fugu, Mus musculus, Danio rerio, and zebrafish.

COXPRESdb -- a database of coexpressed gene networks in mammals

Find coexpressed gene lists and networks in human and mouse.

CVTree -- A Phylogenetic Tree Reconstruction Tool Based on Whole Genomes

Construct phylogenetic tree of microorganisms based on oligopeptide content of their complete proteomes.

CleanEST -- the cleansed EST libraries database

A novel database server that classifies GenBank's dbEST (database of expressed gene sequences) libraries and removes contaminants.

CoCoa -- COefficient of COAncestry software

Find information about the ancestral relationship between genes.

CoGemiR -- a comparative genomics microRNA database

Provides an overview of the genomic organization of microRNAs and extent of conservation during evolution in different metazoan species.

Comparative Genometrics (CG) -- a database dedicated to biometric comparisons of whole genomes

Conduct comparative biometric analysis of chromosomes of different organisms.

DoTS -- Database Of Transcribed Sequences

Search for Indices of gene and transcripts in human and mouse.

DroSpeGe -- rapid access database for new Drosophila species genomes

Search and compare 12 new and old Drosophila genomes.

ECR Browser -- A Tool for Visualizing and Accessing Data from Comparisons of Multiple Vertebrate Genomes

Access to whole genome alignments of human, mouse, rat and fish sequences.

EPGD -- Eukaryotic Paralog Group Database

Find eukaryotic paralog/paralogon information.

EVOG -- evolutionary visualizer for overlapping genes

Analyze the evolutionary process of overlapping genes when comparing different species.

GNAT -- Inter-species gene mention normalization (ISGN)

The first publicly available system reported to handle inter-species gene mention normalization.

GenColors -- annotation and comparative genomics of prokaryotes made easy

A web-based software/database system aimed at an improved and accelerated annotation of prokaryotic genomes.

GeneNest gene indices

Visualize gene indices of human, mouse, Arabidopsis, Zebrafish, Drosophila and Sheep.

GenomeTrafac -- a whole genome resource for the detection of transcription factor binding site clusters associated with conventional and microRNA encoding genes conserved between mouse and human gene orthologs

Use comparative genomics approach to characterize gene models and identify putative cis-regulatory regions of RefSeq Gene Orthologs.

IKMC -- International Knockout Mouse Consortium web portal

Find information about mutated mouse genes.

IMG/M -- Integrated Microbial Genomes/Metagenomes

A data management and analysis system for metagenomes

ISED -- Influenza sequence and epitope database.

Search for influenza sequence, vaccine, and drug resistance information.

LAMDHI: The Search for Animal Models Starts Here

LAMHDI, the initiative to Link Animal Models to Human DIsease, is designed to accelerate the research process by providing biomedical researchers with a simple, comprehensive Web-based resource to find the best animal models for their research.

MANTIS -- a phylogenetic framework for multi-species genome comparisons

The missing link between multi-species full genome comparisons and functional analysis.

MBGD -- Microbial genome database for comparative analysis

Conduct comparative analysis of completely sequenced microbial genomes.

MEGA -- Molecular Evolutionary Genetics Analysis

A biologist-centric software for evolutionary analysis of DNA and protein sequences.

MamPol -- a database of nucleotide polymorphism in the Mammalia class

Conduct single nucleotide polymorphisms diversity measurements among homologous sequences from the Mammalia class.

MicrobesOnline -- Prokaryotic Genome Database

Find information about 1000s of microbial genomes.

Narcisse -- a mirror view of conserved syntenies

A database dedicated to the study of genome conservation.

OMA -- the Orthologous MAtrix project

Explore orthologous relations across 352 complete genomes.

OPTIC -- orthologous and paralogous transcripts in clades

Browse complete genomes in several clades.

OrthoDB -- the hierarchical catalog of eukaryotic orthologs

Find groups of orthologous genes.

OrthoMaM -- orthologous mammalian markers

A database of orthologous genomic markers for placental mammal phylogenetics.

PEDANT -- Protein Extraction, Description and ANalysis Tool

Conduct genome wide functional and structural analysis.

PReMod -- a database of genome-wide mammalian cis-regulatory module predictions

Conduct genome-wide cis-regulatory module (CRM) predictions for both the human and the mouse genomes.

PhenomicDB -- Comparison of phenotypes of orthologous genes in human and model organisms

Compare phenotypes of a given gene or gene set in different model organisms.

Phylemon -- A suite of web tools for molecular evolution, phylogenetics and phylogenomics

Phylemon is a web server that integrates a selected suite of more than 20 different tools from the most popular stand-alone programs of phylogenetic and evolutionary analysis.

PhyloPat -- the phylogenetic pattern database

Use this database to see where in the evolution some phylogenetic lineages were started, and over which species they were contained.

Pristionchus.org -- a genome-centric database of the nematode satellite species Pristionchus pacificus

Search for genomic information on nematode satellite species Pristionchus pacificus.

ProtClustDB -- NCBI Protein Clusters Database

Find information about related protein sequences.

ProtozoaDB -- database of protozoan genomes

Database hosting genomics and post-genomics data from multiple protozoans.

Pseudofam -- the pseudogene families database

A database of pseudogene families based on the protein families from the Pfam database.

RIDM - RIKEN Integrated Database of Mammals

Find genomic information about mammals.

RegPrecise -- Regulon Prediction Database

Find information about predicted regulons in prokaryotic transcription regulation.

SALAD -- Surveyed contained motif ALignment diagram and the Associating Dendrogram

Perform systematic comparison of proteome data among species.

SGN -- SOL Genomics Network

A comparative map viewer dedicated to the biology of the Solanaceae family.

ShotgunFunctionalizeR -- R-package for functional comparison of metagenomes

Analyze data from functional analysis on fragmented microbial genetic material.

SnoopCGH -- Comparative Genomic Hybridization software

Visualize and explore comparative genomic hybridization data sets.

SwissRegulon -- a database of genome-wide annotations of regulatory sites

Search for genome-wide annotations of regulatory sites in yeast and prokaryotes genomes.

TaxonGap -- a visualization tool for intra- and inter-species variation among individual biomarkers

Compare and select individual biomarkers.

The Adaptive Evolution Database (TAED) -- a phylogeny based tool for comparative genomics

Search for information on adaptive evolution in gene families of higher plants and chordate.

The CGView Server -- a comparative genomics tool for circular genomes

Generate graphical maps of circular genomes that show sequence features, base composition plots, analysis results and sequence similarity plots.

The ERGO -- Genome analysis and discovery system

Conduct a comprehensive analysis of genes and genomes.

The Macaque Genome: Interactive Poster and Teaching Resource

An interactive online poster presentation on the Macaque genome, including high-quality images, video clips, and Web resources

The TIGR Gene Indices -- clustering and assembling EST and known genes and integration with eukaryotic genomes

Search for annotated genetic information of expressed sequence tags (ESTs) in different eukaryotic organisms.

UniGene

Find mapping and expression information for a unigene cluster (ESTs and full-length mRNA sequences organized into clusters that each represent a unique known or putative gene)

Uprobe -- universal overgo hybridization-based probe retrieval and design

A public online resource for identifying or designing 'universal' overgo-hybridization probes from conserved sequences that can be used to efficiently screen one or more genomic libraries from a designated group of species.

VISTA -- Computational Tools for Comparative Genomics

Comprehensive suite of programs and databases for comparative analysis of genomic sequences.

cBARBEL -- Catfish Breeder and Researcher Bioinformatics Entry Location

Find information about ictalurid catfish.

eggNOG -- evolutionary genealogy of genes: Non-supervised Orthologous Groups

Discover orthologous groups of genes.

metaTIGER -- a metabolic gene evolution resource

Find metabolic networks and phylogenomic information on a taxonomically diverse range of eukaryotes.

xBASE -- a collection of online databases for bacterial comparative genomics

Conduct bacterial comparative genomics.

Installing ELGG on Ubuntu !

Neel — Wed, 25 May 2022 02:26:05 -0500

Elgg is an open-source and highly customizable framework used for building an online social environment. It provides a simple and powerful user interface that helps to manage and build your content through a web browser. Elgg offers a rich set of features including messaging, microblogging, file-sharing, RSS support, access control, groups, and many more.

In this tutorial, we will show you how to install and configure Elgg social networking platform on Ubuntu 20.04.

Prerequisites

• A fresh Ubuntu 20.04 VPS on the Atlantic.net Cloud Platform
• A valid domain name pointed to your server IP
• A root password configured on your server

Step 1 – Create Atlantic.Net Cloud Server

First, log in to your Atlantic.Net Cloud Server. Create a new server, choosing Ubuntu 20.04 as the operating system with at least 2GB RAM. Connect to your Cloud Server via SSH and log in using the credentials highlighted at the top of the page.

Once you are logged in to your Ubuntu 20.04 server, run the following command to update your base system with the latest available packages.

apt-get update -y

Step 2 – Install Apache, MariaDB and PHP

Elgg runs on Apache web server, is written in PHP, and uses MySQL/MariaDB as a database backend, so you will need to install the Apache, MariaDB, PHP and other required PHP extensions to your server. You can install all of them with the following command:

apt-get install apache2 mariadb-server php libapache2-mod-php php-common php-sqlite3 php-curl 
php-intl php-mbstring php-xmlrpc php-mysql php-gd php-xml php-cli php-zip unzip wget -y

After installing all the packages, edit the php.ini file and change some recommended settings.

nano /etc/php/7.4/apache2/php.ini

Change the following values:

max_execution_time = 300
memory_limit = 512M
upload_max_filesize = 100M
date.timezone = Asia/Kolkata

Save and close the file, then restart the Apache service to apply the configuration changes.

systemctl restart apache2

Step 3 – Create a Database for Elgg

Next, you will need to create a database and user for Elgg. First, log in to MySQL shell with the following command:

mysql

Once logged in, create a database and user with the following command:

CREATE DATABASE elgg;
CREATE USER 'elgg'@'localhost' IDENTIFIED BY 'secure-password';

Next, grant all the privileges to the elgg database with the following command:

GRANT ALL ON elgg.* TO 'elgg'@'localhost' IDENTIFIED BY 'secure-password' WITH GRANT 
OPTION;

Next, flush the privileges and exit from the MariaDB shell with the following command:

FLUSH PRIVILEGES;
EXIT;

At this point, the MariaDB database is created for Elgg.

Step 4 – Install Elgg

First, download the latest version of Elgg from its official website using the following command:

wget https://elgg.org/download/elgg-3.3.13.zip

Once the download is completed, unzip the downloaded file with the following command:

unzip elgg-3.3.13.zip

Next, move the extracted directory to the Apache root directory:

mv elgg-3.3.13 /var/www/html/elgg

Next, create a data directory and set proper ownership and permissions to the Elgg directory:

mkdir /var/www/html/data
chown -R www-data:www-data /var/www/html/elgg
chown -R www-data:www-data /var/www/html/data
chmod -R 755 /var/www/html/elgg

Once you are finished, you can proceed to the next step.

Step 5 – Configure Apache for Elgg

Next, you will need to configure Apache to serve Elgg. You can configure it by creating a new Apache virtual host configuration file:

nano /etc/apache2/sites-available/elgg.conf

Add the following lines:


ServerAdmin admin@example.com
DocumentRoot /var/www/html/elgg/
ServerName elgg.example.com
Options FollowSymLinks
AllowOverride All
ErrorLog /var/log/apache2/elgg-error_log
CustomLog /var/log/apache2/elgg-access_log common

Save and close the file, then enable the virtual host and Apache rewrite module with the following command:

a2ensite elgg.conf
a2enmod rewrite

Finally, restart the Apache service to apply the changes:

systemctl restart apache2

Step 6 – Access Elgg Web Interface

Now, open your web browser and access the Elgg web interface using the URL http://elgg.example.com. You should see the Elgg welcome screen:

Useful Bioinformatics Analysis Tools !

Neel — Thu, 23 Dec 2021 23:10:02 -0600

CoMeta

Classificier of reads from metagenomic sequencing experiments.

• Kawulok, J., Deorowicz, S., CoMeta: Classification of Metagenomes Using k-mers, PLOS ONE, 2015; 10(4):1–23,

CoMSA

Compressor of multiple sequence alignments of proteins.

• Deorowicz, S., Walczyszyn, J., Debudaj-Grabysz, A., CoMSA: compression of protein multiple sequence alignment files, Bioinformatics, 2019; 35(2):22–234,

DSRC

Compressor of sequencing reads.

• Roguski, L., Deorowicz, S., DSRC 2: Industry-oriented compression of FASTQ files, Bioinformatics, 2014; 30(15):2213–2215,
• Deorowicz, S., Grabowski, Sz., Compression of DNA sequences in FASTQ format, Bioinformatics, 2011; 27(6):860–862,

FAMSA

Multiple sequence alignment designed for huge families of proteins (even containing hundreds of thousands of sequences).

• Deorowicz, S., Debudaj-Grabysz, A., Gudys, A., FAMSA: Fast and accurate multiple sequence alignment of huge protein families, Scientific Reports, 2016; 6(33964):

FaStore

Compressor of FASTQ files.

• Roguski, L., Ochoa, I., Hernaez, M., Deorowicz, S., FaStore - a space-saving solution for raw sequencing data, Bioinformatics, 2018; 34(16):2748–2756,

FQSqueezer

Experimental high-end compressor of FASTQ files.

• Deorowicz, S., FQSqueezer: k-mer-based compression of sequencing data, Scientific Reports, 2020; 10(578):

GDC

Compressor of collections of genome sequences.

• Deorowicz, S., Danek, A., Niemiec, M., GDC 2: Compression of large collections of genomes, Scientific Reports, 2015; 5(11565):1–12,
• Deorowicz, S., Grabowski, Sz., Robust relative compression of genomes with random access, Bioinformatics, 2011; 27(21):2979–2986,

GTC

Genotype databases compressor with support for fast queries.

• Danek, A., Deorowicz, S., GTC: how to maintain huge genotype collections in a compressed form, Bioinformatics, 2018; 34(11):1834–1840,

GTShark

Genotypes compressor.

• Deorowicz, S., Danek, A., GTShark: Genotype compression in large projects, Bioinformatics, 2019; 35(22):4791–4793,

KMC

Memory frugal k-mer counter.

•  Kokot, M., Długosz, M., Deorowicz, S., KMC 3: counting and manipulating k -mer statistics, Bioinformatics, 2017; 33(17):2759–2761,
•  Deorowicz, S., Kokot, M., Grabowski, Sz., Debudaj-Grabysz, A., KMC 2: Fast and resource-frugal k-mer counting, Bioinformatics, 2015; 31(10):1569–1576,
•  Deorowicz, S., Debudaj-Grabysz, A., Grabowski, Sz., Disk-based k-mer counting on a PC, BMC Bioinformatics, 2013; 14():Article no. 160,

Kmer-db

Tool for estimation of evolutionary distances in a collection of genomes.

• Deorowicz, S., Gudys, A., Dlugosz, M., Kokot, M., Danek, A., Kmer-db: instant evolutionary distance estimation, Bioinformatics, 2019; 35(1):133–136,

MuGI

Index allowing queries for a collection of multiple genome sequences.

• Danek, A., Deorowicz, S., Grabowski, Sz., Indexes of Large Genome Collections on a PC, PLOS ONE, 2014; 9(10):e109384,

ORCOM

Experimental compressor of sequencing reads.

• Grabowski, Sz., Deorowicz, S., Roguski, L., Disk-based compression of data from genome sequencing, Bioinformatics, 2014; 31(9):1389–1395,

PgSA

Index allowing queries for a collection of sequencing reads.

• Kowalski, T., Grabowski, Sz., Deorowicz, S., Indexing arbitrary-length k-mers in sequencing reads, PLOS ONE, 2015; 10(7):1–16,

QuickProbs

Multiple sequence alignment designed especially for GPU.

• Gudys, A., Deorowicz, S., QuickProbs 2: towards rapid construction of high-quality alignments of large protein families, Scientific Reports, 2017; 7(41553):
• Gudys, A., Deorowicz, S., QuickProbs – A Fast Multiple Sequence Alignment Algorithm Designed for Graphics Processors, PLOS ONE, 2014; 9(2):e88901,

RECKONER

Read error corrector.

• Maciej Długosz, M., Deorowicz, S., RECKONER: read error corrector based on KMC, Bioinformatics, 2017; 33(7):1086–1089,

TGC

Compressor of collections of genomes given in Variant Call Format (VCF) files.

• Deorowicz, S., Danek, A., Grabowski, Sz., Genome compression: a novel approach for large collections, Bioinformatics, 2013; 29(20):2572–2578,

VCFShark

Compressor of VCF files.

• Deorowicz, S., Danek, A., GTShark: Genotype compression in large projects, biorxiv.org, 2020; ():

Whisper

Experimental mapper of whole genome sequencing data.

•  Deorowicz, S., Gudys, A., Whisper 2: indel-sensitive short read mapping, bioRxiv.org, 2019; :
•  Deorowicz, S., Debudaj-Grabysz, A., Gudys, A., Grabowski, Sz., Whisper: read sorting allows robust robust mapping of DNA sequencing data, Bioinformatics, 2019; 35(12):2043–2050,
•  Deorowicz, S., Debudaj-Grabysz, A., Gudys, A., Grabowski, Sz., Robust mapping of whole genome sequencing data, Poster at The Biology of Genomes Conference, 2017;

Illumina based assembly pipeline steps !

Surabhi Chaudhary — Fri, 10 Dec 2021 06:22:54 -0600

Illumina

Merge re-sequenced FastQ files (cat)
Read QC (FastQC)
Adapter trimming (fastp)
Removal of host reads (Kraken 2; optional)
Variant calling
1. Read alignment (Bowtie 2)
2. Sort and index alignments (SAMtools)
3. Primer sequence removal (iVar; amplicon data only)
4. Duplicate read marking (picard; optional)
5. Alignment-level QC (picard, SAMtools)
6. Genome-wide and amplicon coverage QC plots (mosdepth)
7. Choice of multiple variant calling and consensus sequence generation routes (iVar variants and consensus; default for amplicon data || BCFTools, BEDTools; default for metagenomics data)
  - Variant annotation (SnpEff, SnpSift)
  - Consensus assessment report (QUAST)
  - Lineage analysis (Pangolin)
  - Clade assignment, mutation calling and sequence quality checks (Nextclade)
  - Individual variant screenshots with annotation tracks (ASCIIGenome)
8. Intersect variants across callers (BCFTools)
De novo assembly
1. Primer trimming (Cutadapt; amplicon data only)
2. Choice of multiple assembly tools (SPAdes || Unicycler || minia)
  - Blast to reference genome (blastn)
  - Contiguate assembly (ABACAS)
  - Assembly report (PlasmidID)
  - Assembly assessment report (QUAST)
Present QC and visualisation for raw read, alignment, assembly and variant calling results (MultiQC)

Classification of SARS-CoV2 Variant !

Jit — Fri, 26 Nov 2021 12:53:12 -0600

The scientists established some guidelines for determining whether a variant is a legitimate branch of an existing lineage:

The variant should be transmitted from its original location to another "geographically distinct population"—say, another country or a province of a large and populous country.
It should differ from its ancestor by at least one nucleotide.
At least 95% of its genetic code should have been sequenced at least five times from different samples.

Basic Structure of Snakemake Pipeline Run !

Abhi — Thu, 14 Oct 2021 07:01:38 -0500

/user/snakemake-demo$ ls

config.json data envs scripts slurm-240702.out Snakefile

data = mock data for the snakefile to use
Snakefile = name of the snakemake “formula” file
- Note: The default file that snakemake looks for in the current working directory is the Snakefile. If you would like to override that you can specify it following the -s
  - snakemake -s snakefile.py
envs = directory for storing the conda environments that the workflow will use.
scripts = directory for storing python scripts called by the snakemake formula.
config.json = json format file with extra parameters for our snakemake file to use.
cluster.json = json format file with specification for running on the HPC
samples.txt = file we will use later relating to the config.json file.

Run the snakemake file as a dry run (the example workflow shown above).

This will build a DAG of the jobs to be run without actually executing them.
snakemake --dry-run

User can execute rules of interest.

snakemake --dry-run all VS. snakemake --dry-run call VS. snakemake --dry-run bwa

Run the snakemake file in order to produce an image of the DAG of jobs to be run.

snakemake --dag | dot -Tsvg > dag.svg OR snakemake --dag | dot -Tsvg > dag.svg

Run the snakemake (this time not as a dry run)

snakemake --use-conda

REST API

Neel — Mon, 04 Oct 2021 12:46:40 -0500

REST API

The Representational State Transfer (REST) sample clients are provided for a number of programming languages. For details of how to use these clients, download the client and run the program without any arguments.

Language	Download	Requirements
Perl	psiblast.pl	LWP and XML::Simple
Python	psiblast.py	xmltramp2

For details see Environment setup for REST Web Services and Examples for Perl REST Web Services Clients pages.

BOL: All site blogs

SLURM Commands

SLURM commands

Finding a mimicry game for teaching on-line and mentioned general resources

Online resources on must-read papers in evolutionary biology, for a literature club

List of comparative genomics resources !

Installing ELGG on Ubuntu !

Prerequisites

Step 1 – Create Atlantic.Net Cloud Server

Step 2 – Install Apache, MariaDB and PHP

Step 3 – Create a Database for Elgg

Step 4 – Install Elgg

Step 5 – Configure Apache for Elgg

Step 6 – Access Elgg Web Interface

Useful Bioinformatics Analysis Tools !

Illumina based assembly pipeline steps !

Illumina

Classification of SARS-CoV2 Variant !

Basic Structure of Snakemake Pipeline Run !

REST API

REST API

Python