BOL: Related items

Alvis: a tool for contig and read ALignment VISualisation and chimera detection

LEGE — Wed, 08 May 2024 07:02:55 -0500

Alvis, a simple command line tool that can generate visualisations for a number of common alignment analysis tasks. Alvis is a fast and portable tool that accepts input in a variety of alignment formats and will output production ready vector images. Additionally, Alvis will highlight potentially chimeric reads or contigs, a common source of misassemblies.

More at https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-021-04056-0

Address of the bookmark: https://github.com/SR-Martin/alvis

HELIANO: A fast and accurate tool for detection of Helitron-like elements

LEGE — Tue, 13 Aug 2024 07:16:34 -0500

Helitron-like elements (HLE1 and HLE2) are DNA transposons. They have been found in diverse species and seem to play significant roles in the evolution of host genomes. Although known for over twenty years, Helitron sequences are still challenging to identify. Here, we propose HELIANO (Helitron-like elements annotator) as an efficient solution for detecting Helitron-like elements.

https://academic.oup.com/nar/advance-article/doi/10.1093/nar/gkae679/7730539?login=true

Address of the bookmark: https://github.com/Zhenlisme/heliano/

Installing croSSRoad on Ubuntu !

ComBioX — Fri, 29 May 2026 05:19:45 -0500

(base) hp@hp-HP-Z2-Tower-G9-Workstation-Desktop-PC:~/jitendraTEST$ conda
usage: conda [-h] [-v] [--no-plugins] [-V] COMMAND ...

conda is a tool for managing and deploying applications, environments and packages.

options:
-h, --help Show this help message and exit.
-v, --verbose Can be used multiple times. Once for detailed output, twice for INFO logging, thrice for DEBUG logging, four times for TRACE logging.
--no-plugins Disable all plugins that are not built into conda.
-V, --version Show the conda version number and exit.

commands:
The following built-in and plugins subcommands are available.

COMMAND
activate Activate a conda environment.
clean Remove unused packages and caches.
commands List all available conda subcommands (including those from plugins). Generally only used by tab-completion.
compare Compare packages between conda environments.
config Modify configuration values in .condarc.
create Create a new conda environment from a list of specified packages.
deactivate Deactivate the current active conda environment.
doctor Display a health report for your environment.
env Create and manage conda environments.
export Export a given environment
info Display information about current conda install.
init Initialize conda for shell interaction.
install Install a list of packages into a specified conda environment.
list List installed packages in a conda environment.
notices Retrieve latest channel notifications.
package Create low-level conda packages. (EXPERIMENTAL)
remove (uninstall) Remove a list of packages from a specified conda environment.
rename Rename an existing environment.
repoquery Advanced search for repodata.
run Run an executable in a conda environment.
search Search for packages and display associated information using the MatchSpec format.
update (upgrade) Update conda packages to the latest compatible version.
(base) hp@hp-HP-Z2-Tower-G9-Workstation-Desktop-PC:~/jitendraTEST$ conda create -n jitENV
Retrieving notices: done
Channels:
- ursky
- bioconda
- conda-forge
Platform: linux-64
Collecting package metadata (repodata.json): done
Solving environment: done

==> WARNING: A newer version of conda exists. <==
current version: 25.7.0
latest version: 26.5.0

Please update conda by running

$ conda update -n base -c conda-forge conda

## Package Plan ##

environment location: /home/hp/miniforge3/envs/jitENV

Proceed ([y]/n)? y

Downloading and Extracting Packages:

Preparing transaction: done
Verifying transaction: done
Executing transaction: done
#
# To activate this environment, use
#
# $ conda activate jitENV
#
# To deactivate an active environment, use
#
# $ conda deactivate

(base) hp@hp-HP-Z2-Tower-G9-Workstation-Desktop-PC:~/jitendraTEST$ conda activate jitENV
(jitENV) hp@hp-HP-Z2-Tower-G9-Workstation-Desktop-PC:~/jitendraTEST$ conda install conda-forge::mamba
Channels:
- ursky
- bioconda
- conda-forge
Platform: linux-64
Collecting package metadata (repodata.json): done
Solving environment: done

==> WARNING: A newer version of conda exists. <==
current version: 25.7.0
latest version: 26.5.0

Please update conda by running

$ conda update -n base -c conda-forge conda

## Package Plan ##

environment location: /home/hp/miniforge3/envs/jitENV

added / updated specs:
- conda-forge::mamba

The following packages will be downloaded:

package | build
---------------------------|-----------------
ca-certificates-2026.5.20 | hbd8a1cb_0 127 KB conda-forge
cpp-expected-1.3.1 | h171cf75_0 24 KB conda-forge
fmt-12.1.0 | hff5e90c_0 193 KB conda-forge
libarchive-3.8.7 | gpl_hc2c16d8_101 869 KB conda-forge
libcurl-8.20.0 | hcf29cc6_0 458 KB conda-forge
libgcc-15.2.0 | he0feb66_19 1017 KB conda-forge
libgcc-ng-15.2.0 | h69a702a_19 27 KB conda-forge
libgomp-15.2.0 | he0feb66_19 590 KB conda-forge
libmamba-2.6.2 | hd28c85e_0 2.7 MB conda-forge
libmsgpack-c-6.1.0 | h54a6638_6 39 KB conda-forge
libsolv-0.7.38 | h9463b59_0 509 KB conda-forge
libstdcxx-15.2.0 | h934c35e_19 5.6 MB conda-forge
libxml2-2.15.3 | h49c6c72_0 46 KB conda-forge
libxml2-16-2.15.3 | hca6bf5a_0 547 KB conda-forge
mamba-2.6.2 | hce6dcdd_0 553 KB conda-forge
ncurses-6.6 | hdb14827_0 897 KB conda-forge
nlohmann_json-abi-3.12.0 | h0f90c79_1 4 KB conda-forge
reproc-14.2.7.post0 | hb03c661_1 35 KB conda-forge
reproc-cpp-14.2.7.post0 | hecca717_1 26 KB conda-forge
simdjson-4.6.4 | hb700be7_0 310 KB conda-forge
spdlog-1.17.0 | hab81395_1 192 KB conda-forge
------------------------------------------------------------
Total: 14.6 MB

The following NEW packages will be INSTALLED:

_openmp_mutex conda-forge/linux-64::_openmp_mutex-4.5-20_gnu
bzip2 conda-forge/linux-64::bzip2-1.0.8-hda65f42_9
c-ares conda-forge/linux-64::c-ares-1.34.6-hb03c661_0
ca-certificates conda-forge/noarch::ca-certificates-2026.5.20-hbd8a1cb_0
cpp-expected conda-forge/linux-64::cpp-expected-1.3.1-h171cf75_0
fmt conda-forge/linux-64::fmt-12.1.0-hff5e90c_0
icu conda-forge/linux-64::icu-78.3-h33c6efd_0
keyutils conda-forge/linux-64::keyutils-1.6.3-hb9d3cd8_0
krb5 conda-forge/linux-64::krb5-1.22.2-ha1258a1_0
libarchive conda-forge/linux-64::libarchive-3.8.7-gpl_hc2c16d8_101
libcurl conda-forge/linux-64::libcurl-8.20.0-hcf29cc6_0
libedit conda-forge/linux-64::libedit-3.1.20250104-pl5321h7949ede_0
libev conda-forge/linux-64::libev-4.33-hd590300_2
libgcc conda-forge/linux-64::libgcc-15.2.0-he0feb66_19
libgcc-ng conda-forge/linux-64::libgcc-ng-15.2.0-h69a702a_19
libgomp conda-forge/linux-64::libgomp-15.2.0-he0feb66_19
libiconv conda-forge/linux-64::libiconv-1.18-h3b78370_2
liblzma conda-forge/linux-64::liblzma-5.8.3-hb03c661_0
libmamba conda-forge/linux-64::libmamba-2.6.2-hd28c85e_0
libmsgpack-c conda-forge/linux-64::libmsgpack-c-6.1.0-h54a6638_6
libnghttp2 conda-forge/linux-64::libnghttp2-1.68.1-h877daf1_0
libsolv conda-forge/linux-64::libsolv-0.7.38-h9463b59_0
libssh2 conda-forge/linux-64::libssh2-1.11.1-hcf80075_0
libstdcxx conda-forge/linux-64::libstdcxx-15.2.0-h934c35e_19
libxml2 conda-forge/linux-64::libxml2-2.15.3-h49c6c72_0
libxml2-16 conda-forge/linux-64::libxml2-16-2.15.3-hca6bf5a_0
libzlib conda-forge/linux-64::libzlib-1.3.2-h25fd6f3_2
lz4-c conda-forge/linux-64::lz4-c-1.10.0-h5888daf_1
lzo conda-forge/linux-64::lzo-2.10-h280c20c_1002
mamba conda-forge/linux-64::mamba-2.6.2-hce6dcdd_0
ncurses conda-forge/linux-64::ncurses-6.6-hdb14827_0
nlohmann_json-abi conda-forge/noarch::nlohmann_json-abi-3.12.0-h0f90c79_1
openssl conda-forge/linux-64::openssl-3.6.2-h35e630c_0
reproc conda-forge/linux-64::reproc-14.2.7.post0-hb03c661_1
reproc-cpp conda-forge/linux-64::reproc-cpp-14.2.7.post0-hecca717_1
simdjson conda-forge/linux-64::simdjson-4.6.4-hb700be7_0
spdlog conda-forge/linux-64::spdlog-1.17.0-hab81395_1
yaml-cpp conda-forge/linux-64::yaml-cpp-0.8.0-h3f2d84a_0
zstd conda-forge/linux-64::zstd-1.5.7-hb78ec9c_6

Proceed ([y]/n)? y

Downloading and Extracting Packages:

Preparing transaction: done
Verifying transaction: done
Executing transaction: done
(jitENV) hp@hp-HP-Z2-Tower-G9-Workstation-Desktop-PC:~/jitendraTEST$ mamba install -c jitendralab -c bioconda -c conda-forge crossroad -y
jitendralab/noarch ??.?MB @ ??.?MB/s 0.3s
jitendralab/linux-64 ??.?MB @ ??.?MB/s 0.4s
bioconda/linux-64 5.6MB @ 2.9MB/s 1.9s
bioconda/noarch 5.6MB @ 2.5MB/s 2.2s
conda-forge/noarch 26.4MB @ 6.0MB/s 4.5s
conda-forge/linux-64 53.8MB @ 6.7MB/s 8.2s

Transaction

Prefix: /home/hp/miniforge3/envs/jitENV

Updating specs:

- crossroad

Package Version Build Channel Size
─────────────────────────────────────────────────────────────────────────────────────────────────
Install:
─────────────────────────────────────────────────────────────────────────────────────────────────

+ annotated-doc 0.0.4 pyhcf101f3_0 conda-forge Cached
+ annotated-types 0.7.0 pyhd8ed1ab_1 conda-forge Cached
+ anyio 4.13.0 pyhcf101f3_0 conda-forge 147kB
+ argcomplete 3.6.3 pyhd8ed1ab_0 conda-forge Cached
+ aws-c-auth 0.10.3 h3aafcba_1 conda-forge 134kB
+ aws-c-cal 0.9.14 h8e43964_1 conda-forge 57kB
+ aws-c-common 0.13.1 hb03c661_0 conda-forge 242kB
+ aws-c-compression 0.3.2 h16e98cb_1 conda-forge 22kB
+ aws-c-event-stream 0.7.1 h9be7a74_1 conda-forge 59kB
+ aws-c-http 0.11.0 hcbcd92d_1 conda-forge 230kB
+ aws-c-io 0.26.3 h955231c_3 conda-forge 182kB
+ aws-c-mqtt 0.15.2 h8af55cf_3 conda-forge 222kB
+ aws-c-s3 0.12.3 h00bea6e_2 conda-forge 153kB
+ aws-c-sdkutils 0.2.4 h16e98cb_5 conda-forge 59kB
+ aws-checksums 0.2.10 h16e98cb_1 conda-forge 102kB
+ aws-crt-cpp 0.38.3 h7b0d4b4_2 conda-forge 413kB
+ aws-sdk-cpp 1.11.747 h5a171d8_5 conda-forge 4MB
+ azure-core-cpp 1.16.2 h206d751_0 conda-forge 349kB
+ azure-identity-cpp 1.13.3 hed0cdb0_1 conda-forge 251kB
+ azure-storage-blobs-cpp 12.17.0 hf824e48_1 conda-forge 587kB
+ azure-storage-common-cpp 12.13.0 ha7a2c86_0 conda-forge 159kB
+ azure-storage-files-datalake-cpp 12.15.0 h1e5b466_0 conda-forge 304kB
+ backports.zstd 1.5.0 py314h680f03e_0 conda-forge 8kB
+ bedtools 2.31.1 h13024bc_3 bioconda Cached
+ biopython 1.87 py314h5bd0f2a_0 conda-forge 3MB
+ brotli 1.2.0 hed03a55_1 conda-forge Cached
+ brotli-bin 1.2.0 hb03c661_1 conda-forge Cached
+ brotli-python 1.2.0 py314h3de4e8d_1 conda-forge 367kB
+ certifi 2026.5.20 pyhd8ed1ab_0 conda-forge 134kB
+ charset-normalizer 3.4.7 pyhd8ed1ab_0 conda-forge Cached
+ click 8.4.1 pyhc90fa1f_0 conda-forge 105kB
+ colorama 0.4.6 pyhd8ed1ab_1 conda-forge Cached
+ contourpy 1.3.3 py314h97ea11e_4 conda-forge 324kB
+ crossroad 0.3.6 pyh7e60211_0 jitendralab 2MB
+ cycler 0.12.1 pyhcf101f3_2 conda-forge Cached
+ dnspython 2.8.0 pyhcf101f3_0 conda-forge Cached
+ email-validator 2.3.0 pyhd8ed1ab_0 conda-forge 47kB
+ email_validator 2.3.0 hd8ed1ab_0 conda-forge 7kB
+ exceptiongroup 1.3.1 pyhd8ed1ab_0 conda-forge Cached
+ expat 2.8.1 hecca717_0 conda-forge 148kB
+ fastapi 0.136.3 h5ddb490_0 conda-forge 5kB
+ fastapi-cli 0.0.23 pyhcf101f3_0 conda-forge 19kB
+ fastapi-core 0.136.3 pyhcf101f3_0 conda-forge 96kB
+ fastar 0.11.0 py314h0b738fb_0 conda-forge 423kB
+ font-ttf-dejavu-sans-mono 2.37 hab24e00_0 conda-forge Cached
+ font-ttf-inconsolata 3.000 h77eed37_0 conda-forge Cached
+ font-ttf-source-code-pro 2.038 h77eed37_0 conda-forge Cached
+ font-ttf-ubuntu 0.83 h77eed37_3 conda-forge Cached
+ fontconfig 2.18.0 h27c8c51_0 conda-forge 281kB
+ fonts-conda-forge 1 hc364b38_1 conda-forge Cached
+ fonttools 4.63.0 pyh7db6752_0 conda-forge 846kB
+ freetype 2.14.3 ha770c72_0 conda-forge Cached
+ gflags 2.2.2 h5888daf_1005 conda-forge 120kB
+ glog 0.7.1 hbabe93e_0 conda-forge 143kB
+ h11 0.16.0 pyhcf101f3_1 conda-forge 39kB
+ h2 4.3.0 pyhcf101f3_0 conda-forge Cached
+ hpack 4.1.0 pyhd8ed1ab_0 conda-forge Cached
+ httpcore 1.0.9 pyh29332c3_0 conda-forge Cached
+ httptools 0.7.1 py314h5bd0f2a_1 conda-forge 99kB
+ httpx 0.28.1 pyhd8ed1ab_0 conda-forge Cached
+ hyperframe 6.1.0 pyhd8ed1ab_0 conda-forge Cached
+ idna 3.17 pyhcf101f3_0 conda-forge 57kB
+ jinja2 3.1.6 pyhcf101f3_1 conda-forge Cached
+ kaleido-core 0.2.1 h3644ca4_0 conda-forge Cached
+ kiwisolver 1.5.0 py314h97ea11e_0 conda-forge 77kB
+ lcms2 2.19.1 h0c24ade_0 conda-forge 251kB
+ ld_impl_linux-64 2.45.1 default_hbd61a6d_102 conda-forge Cached
+ lerc 4.1.0 hdb68285_0 conda-forge Cached
+ libabseil 20260107.1 cxx17_h7b12aa8_0 conda-forge 1MB
+ libarrow 24.0.0 h6f10b76_3_cpu conda-forge 7MB
+ libarrow-acero 24.0.0 h635bf11_3_cpu conda-forge 592kB
+ libarrow-compute 24.0.0 h53684a4_3_cpu conda-forge 3MB
+ libarrow-dataset 24.0.0 h635bf11_3_cpu conda-forge 592kB
+ libarrow-substrait 24.0.0 hb4dd7c2_3_cpu conda-forge 502kB
+ libblas 3.11.0 8_h4a7cf45_openblas conda-forge 19kB
+ libbrotlicommon 1.2.0 hb03c661_1 conda-forge Cached
+ libbrotlidec 1.2.0 hb03c661_1 conda-forge Cached
+ libbrotlienc 1.2.0 hb03c661_1 conda-forge Cached
+ libcblas 3.11.0 8_h0358290_openblas conda-forge 19kB
+ libcrc32c 1.1.2 h9c3ff4c_0 conda-forge Cached
+ libdeflate 1.25 h17f619e_0 conda-forge Cached
+ libevent 2.1.12 hf998b51_1 conda-forge Cached
+ libexpat 2.8.1 hecca717_0 conda-forge 77kB
+ libffi 3.5.2 h3435931_0 conda-forge Cached
+ libfreetype 2.14.3 ha770c72_0 conda-forge Cached
+ libfreetype6 2.14.3 h73754d4_0 conda-forge Cached
+ libgfortran 15.2.0 h69a702a_19 conda-forge 28kB
+ libgfortran5 15.2.0 h68bc16d_19 conda-forge 2MB
+ libgoogle-cloud 3.5.0 h25dbb67_0 conda-forge 3MB
+ libgoogle-cloud-storage 3.5.0 hdbdcf42_0 conda-forge 780kB
+ libgrpc 1.78.1 h1d1128b_0 conda-forge 7MB
+ libjpeg-turbo 3.1.4.1 hb03c661_0 conda-forge Cached
+ liblapack 3.11.0 8_h47877c9_openblas conda-forge 19kB
+ libmpdec 4.0.0 hb03c661_1 conda-forge 92kB
+ libopenblas 0.3.33 pthreads_h94d23a6_0 conda-forge 6MB
+ libopentelemetry-cpp 1.26.0 h9692893_0 conda-forge 934kB
+ libopentelemetry-cpp-headers 1.26.0 ha770c72_0 conda-forge 396kB
+ libparquet 24.0.0 h7376487_3_cpu conda-forge 1MB
+ libpng 1.6.58 h421ea60_0 conda-forge 318kB
+ libprotobuf 6.33.5 h6eeba95_1 conda-forge 4MB
+ libre2-11 2025.11.05 h0dc7533_1 conda-forge 213kB
+ libsqlite 3.53.1 h0c1763c_0 conda-forge 955kB
+ libstdcxx-ng 15.2.0 hdf11a46_19 conda-forge 28kB
+ libthrift 0.22.0 h7d032f7_2 conda-forge 424kB
+ libtiff 4.7.1 h9d88235_1 conda-forge Cached
+ libutf8proc 2.11.3 hfe17d71_0 conda-forge 86kB
+ libuuid 2.42.1 h5347b49_0 conda-forge 40kB
+ libuv 1.52.1 h280c20c_0 conda-forge 420kB
+ libwebp-base 1.6.0 hd42ef1d_0 conda-forge Cached
+ libxcb 1.17.0 h8a09558_0 conda-forge Cached
+ markdown-it-py 4.2.0 pyhd8ed1ab_0 conda-forge 69kB
+ markupsafe 3.0.3 py314h67df5f8_1 conda-forge 27kB
+ mathjax 2.7.7 ha770c72_3 conda-forge Cached
+ matplotlib-base 3.10.9 py314h1194b4b_0 conda-forge 9MB
+ mdurl 0.1.2 pyhd8ed1ab_1 conda-forge Cached
+ munkres 1.0.7 py_1 bioconda Cached
+ narwhals 2.21.2 pyhcf101f3_0 conda-forge 284kB
+ nlohmann_json 3.12.0 h54a6638_1 conda-forge 136kB
+ nspr 4.38 h29cc59b_0 conda-forge Cached
+ nss 3.118 h445c969_0 conda-forge Cached
+ numpy 2.4.6 py314h2b28147_0 conda-forge 9MB
+ openjpeg 2.5.4 h55fea9a_0 conda-forge Cached
+ orc 2.3.0 h21090e2_0 conda-forge 1MB
+ packaging 26.2 pyhc364b38_0 conda-forge 92kB
+ pandas 3.0.3 py314hb4ffadd_0 conda-forge 15MB
+ perf_ssr 0.4.8 py_0 jitendralab 720kB
+ pillow 12.2.0 py314h8ec4b1a_0 conda-forge 1MB
+ pip 26.1.1 pyh145f28c_0 conda-forge 1MB
+ plotly 6.6.0 pyhd8ed1ab_0 conda-forge Cached
+ plotly-upset-hd 0.0.2 py_0 jitendralab 356kB
+ prometheus-cpp 1.3.0 ha5d0236_0 conda-forge 200kB
+ pthread-stubs 0.4 hb9d3cd8_1002 conda-forge Cached
+ pyarrow 24.0.0 py314hdafbbf9_0 conda-forge 27kB
+ pyarrow-core 24.0.0 py314h969be7f_0_cpu conda-forge 5MB
+ pydantic 2.13.4 pyhcf101f3_0 conda-forge 347kB
+ pydantic-core 2.46.4 py314h2e6c369_0 conda-forge 2MB
+ pydantic-extra-types 2.11.2 pyhcf101f3_0 conda-forge 74kB
+ pydantic-settings 2.14.1 pyhcf101f3_0 conda-forge 52kB
+ pygments 2.20.0 pyhd8ed1ab_0 conda-forge Cached
+ pyparsing 3.3.2 pyhcf101f3_0 conda-forge Cached
+ pysocks 1.7.1 pyha55dd90_7 conda-forge Cached
+ python 3.14.5 habeac84_100_cp314 conda-forge 37MB
+ python-dateutil 2.9.0.post0 pyhe01879c_2 conda-forge Cached
+ python-dotenv 1.2.2 pyhcf101f3_0 conda-forge Cached
+ python-kaleido 0.2.1 pyhd8ed1ab_0 conda-forge Cached
+ python-multipart 0.0.29 pyhcf101f3_0 conda-forge 38kB
+ python_abi 3.14 8_cp314 conda-forge 7kB
+ pyyaml 6.0.3 py314h67df5f8_1 conda-forge 202kB
+ qhull 2020.2 h434a139_5 conda-forge Cached
+ re2 2025.11.05 h5301d42_1 conda-forge 27kB
+ readline 8.3 h853b02a_0 conda-forge Cached
+ requests 2.34.2 pyhcf101f3_0 conda-forge 69kB
+ rich 15.0.0 pyhcf101f3_0 conda-forge Cached
+ rich-argparse 1.8.0 pyhd8ed1ab_0 conda-forge 27kB
+ rich-click 1.9.8 pyh8f84b5b_0 conda-forge 64kB
+ rich-toolkit 0.19.10 pyhcf101f3_0 conda-forge 33kB
+ s2n 1.7.3 hc5a330e_0 conda-forge 388kB
+ seqkit 2.13.0 he881be0_0 bioconda Cached
+ seqtk 1.5 h577a1d6_1 bioconda 142kB
+ shellingham 1.5.4 pyhd8ed1ab_2 conda-forge Cached
+ six 1.17.0 pyhe01879c_1 conda-forge Cached
+ snappy 1.2.2 h03e3b7b_1 conda-forge Cached
+ sniffio 1.3.1 pyhd8ed1ab_2 conda-forge Cached
+ sqlite 3.53.1 hbc0de68_0 conda-forge 205kB
+ starlette 1.1.0 pyhcf101f3_0 conda-forge 64kB
+ tk 8.6.13 noxft_h366c992_103 conda-forge Cached
+ tomli 2.4.1 pyhcf101f3_0 conda-forge 22kB
+ tqdm 4.67.3 pyh8f84b5b_0 conda-forge Cached
+ typer 0.26.3 pyhcf101f3_0 conda-forge 184kB
+ typing-extensions 4.15.0 h396c80c_0 conda-forge Cached
+ typing-inspection 0.4.2 pyhcf101f3_2 conda-forge 21kB
+ typing_extensions 4.15.0 pyhcf101f3_0 conda-forge Cached
+ tzdata 2025c hc9c84f9_1 conda-forge Cached
+ unicodedata2 17.0.1 py314h5bd0f2a_0 conda-forge 410kB
+ upsetplot 0.9.0 pyhd8ed1ab_1 conda-forge 28kB
+ urllib3 2.7.0 pyhd8ed1ab_0 conda-forge 104kB
+ uvicorn 0.48.0 pyhc90fa1f_0 conda-forge 56kB
+ uvicorn-standard 0.48.0 he364bde_0 conda-forge 4kB
+ uvloop 0.22.1 py314h5bd0f2a_1 conda-forge 593kB
+ watchfiles 1.2.0 py314ha5689aa_0 conda-forge 416kB
+ websockets 16.0 py314h0f05182_1 conda-forge 383kB
+ xorg-libxau 1.0.12 hb03c661_1 conda-forge Cached
+ xorg-libxdmcp 1.1.5 hb03c661_1 conda-forge Cached
+ yaml 0.2.5 h280c20c_3 conda-forge Cached
+ zlib 1.3.2 h25fd6f3_2 conda-forge Cached
+ zlib-ng 2.3.3 hceb46e0_1 conda-forge Cached

Summary:

Install: 186 packages

Total download: 142MB

─────────────────────────────────────────────────────────────────────────────────────────────────

Transaction starting
libgrpc 7.0MB @ 2.3MB/s 3.0s
numpy 8.9MB @ 2.3MB/s 3.8s
matplotlib-base 8.5MB @ 2.0MB/s 4.2s
libarrow 6.5MB @ 2.3MB/s 2.8s
pandas 15.3MB @ 2.5MB/s 6.2s
libopenblas 5.9MB @ 2.3MB/s 2.5s
pyarrow-core 4.8MB @ 1.6MB/s 3.0s
libprotobuf 3.7MB @ 2.4MB/s 1.6s
aws-sdk-cpp 3.6MB @ 3.1MB/s 1.2s
biopython 3.4MB @ 2.0MB/s 1.7s
libgfortran5 2.5MB @ 2.6MB/s 1.0s
libgoogle-cloud 2.6MB @ 2.4MB/s 1.1s
pydantic-core 1.9MB @ 2.7MB/s 0.7s
libarrow-compute 3.0MB @ 1.9MB/s 1.6s
orc 1.5MB @ 2.8MB/s 0.5s
libparquet 1.4MB @ 3.1MB/s 0.5s
pip 1.2MB @ 2.9MB/s 0.4s
libabseil 1.4MB @ 2.2MB/s 0.6s
pillow 1.1MB @ 2.7MB/s 0.4s
libsqlite 955.0kB @ 2.9MB/s 0.3s
libgoogle-cloud-storage 779.6kB @ 2.7MB/s 0.3s
fonttools 846.0kB @ 2.1MB/s 0.4s
libopentelemetry-cpp 934.3kB @ 1.8MB/s 0.5s
libarrow-acero 592.3kB @ 2.2MB/s 0.2s
uvloop 593.4kB @ 1.3MB/s 0.4s
libarrow-dataset 592.2kB @ 2.7MB/s 0.2s
libarrow-substrait 501.9kB @ 1.8MB/s 0.2s
azure-storage-blobs-cpp 587.1kB @ 1.6MB/s 0.3s
libthrift 423.9kB @ 2.8MB/s 0.2s
crossroad 1.8MB @ 663.3kB/s 2.6s
libuv 419.9kB @ 2.3MB/s 0.2s
fastar 423.4kB @ 966.7kB/s 0.3s
aws-crt-cpp 412.5kB @ 2.9MB/s 0.1s
watchfiles 415.6kB @ 1.6MB/s 0.3s
unicodedata2 409.6kB @ 1.8MB/s 0.2s
libopentelemetry-cpp-headers 396.4kB @ 2.2MB/s 0.2s
s2n 388.1kB @ 2.5MB/s 0.1s
brotli-python 367.4kB @ 1.7MB/s 0.1s
websockets 383.0kB @ 1.3MB/s 0.3s
azure-core-cpp 348.7kB @ 2.7MB/s 0.1s
pydantic 346.5kB @ 1.9MB/s 0.2s
contourpy 324.0kB @ 2.3MB/s 0.1s
libpng 317.7kB @ 1.8MB/s 0.2s
azure-storage-files-datalake-cpp 303.8kB @ 1.9MB/s 0.1s
narwhals 284.3kB @ 1.8MB/s 0.2s
fontconfig 280.9kB @ 866.6kB/s 0.2s
python 36.7MB @ 3.0MB/s 12.0s
azure-identity-cpp 250.5kB @ 1.5MB/s 0.1s
lcms2 251.1kB @ 2.0MB/s 0.1s
aws-c-common 242.3kB @ 2.8MB/s 0.1s
libre2-11 213.1kB @ 66.4kB/s 0.1s
aws-c-http 230.3kB @ 1.7MB/s 0.1s
aws-c-mqtt 221.7kB @ 307.2kB/s 0.1s
sqlite 205.4kB @ ??.?MB/s 0.1s
perf_ssr 720.0kB @ 247.3kB/s 2.3s
prometheus-cpp 199.5kB @ 962.8kB/s 0.1s
pyyaml 202.4kB @ 1.6MB/s 0.1s
typer 184.4kB @ 1.9MB/s 0.1s
aws-c-io 181.6kB @ 1.9MB/s 0.1s
aws-c-s3 153.0kB @ 2.2MB/s 0.1s
azure-storage-common-cpp 159.1kB @ 1.8MB/s 0.1s
expat 148.2kB @ ??.?MB/s 0.0s
anyio 146.8kB @ 2.2MB/s 0.1s
glog 143.5kB @ 2.6MB/s 0.1s
seqtk 141.8kB @ 1.8MB/s 0.1s
nlohmann_json 136.2kB @ 2.1MB/s 0.1s
aws-c-auth 134.4kB @ 1.5MB/s 0.1s
certifi 134.2kB @ 1.8MB/s 0.1s
click 105.0kB @ 1.5MB/s 0.1s
gflags 119.7kB @ 148.2kB/s 0.1s
urllib3 103.6kB @ ??.?MB/s 0.0s
aws-checksums 101.6kB @ ??.?MB/s 0.0s
fastapi-core 95.5kB @ ??.?MB/s 0.0s
libmpdec 92.4kB @ ??.?MB/s 0.0s
packaging 91.6kB @ ??.?MB/s 0.0s
libutf8proc 86.0kB @ ??.?MB/s 0.0s
kiwisolver 77.4kB @ ??.?MB/s 0.0s
libexpat 77.3kB @ 885.4kB/s 0.1s
pydantic-extra-types 73.9kB @ ??.?MB/s 0.0s
markdown-it-py 69.0kB @ ??.?MB/s 0.0s
requests 68.7kB @ ??.?MB/s 0.0s
rich-click 64.4kB @ ??.?MB/s 0.0s
aws-c-event-stream 59.3kB @ ??.?MB/s 0.0s
starlette 63.7kB @ ??.?MB/s 0.0s
aws-c-sdkutils 59.1kB @ ??.?MB/s 0.0s
aws-c-cal 56.9kB @ ??.?MB/s 0.0s
idna 56.9kB @ ??.?MB/s 0.0s
uvicorn 56.3kB @ ??.?MB/s 0.0s
pydantic-settings 52.3kB @ ??.?MB/s 0.0s
email-validator 46.8kB @ ??.?MB/s 0.0s
libuuid 40.2kB @ ??.?MB/s 0.0s
h11 39.1kB @ ??.?MB/s 0.0s
python-multipart 37.8kB @ ??.?MB/s 0.0s
rich-toolkit 32.9kB @ ??.?MB/s 0.0s
upsetplot 28.0kB @ ??.?MB/s 0.0s
libstdcxx-ng 27.8kB @ ??.?MB/s 0.0s
libgfortran 27.7kB @ ??.?MB/s 0.0s
re2 27.5kB @ ??.?MB/s 0.0s
markupsafe 27.4kB @ ??.?MB/s 0.0s
pyarrow 26.8kB @ ??.?MB/s 0.0s
aws-c-compression 22.0kB @ ??.?MB/s 0.0s
tomli 21.6kB @ ??.?MB/s 0.0s
typing-inspection 20.9kB @ ??.?MB/s 0.0s
fastapi-cli 18.9kB @ ??.?MB/s 0.0s
libblas 18.8kB @ ??.?MB/s 0.0s
httptools 99.0kB @ ??.?MB/s 0.4s
liblapack 18.8kB @ ??.?MB/s 0.0s
libcblas 18.8kB @ ??.?MB/s 0.0s
email_validator 7.1kB @ ??.?MB/s 0.0s
backports.zstd 7.5kB @ ??.?MB/s 0.0s
python_abi 7.0kB @ ??.?MB/s 0.0s
fastapi 4.8kB @ ??.?MB/s 0.0s
uvicorn-standard 4.1kB @ ??.?MB/s 0.0s
rich-argparse 26.8kB @ ??.?MB/s 0.2s
plotly-upset-hd 356.0kB @ 181.5kB/s 1.8s
Linking seqkit-2.13.0-he881be0_0
Linking bedtools-2.31.1-h13024bc_3
Linking seqtk-1.5-h577a1d6_1
Linking libuuid-2.42.1-h5347b49_0
Linking readline-8.3-h853b02a_0
Linking libexpat-2.8.1-hecca717_0
Linking nspr-4.38-h29cc59b_0
Linking mathjax-2.7.7-ha770c72_3
Linking libuv-1.52.1-h280c20c_0
Linking yaml-0.2.5-h280c20c_3
Linking ld_impl_linux-64-2.45.1-default_hbd61a6d_102
Linking libmpdec-4.0.0-hb03c661_1
Linking libwebp-base-1.6.0-hd42ef1d_0
Linking zlib-ng-2.3.3-hceb46e0_1
Linking libstdcxx-ng-15.2.0-hdf11a46_19
Linking pthread-stubs-0.4-hb9d3cd8_1002
Linking xorg-libxau-1.0.12-hb03c661_1
Linking xorg-libxdmcp-1.1.5-hb03c661_1
Linking libgfortran5-15.2.0-h68bc16d_19
Linking libpng-1.6.58-h421ea60_0
Linking libbrotlicommon-1.2.0-hb03c661_1
Linking libjpeg-turbo-3.1.4.1-hb03c661_0
Linking libdeflate-1.25-h17f619e_0
Linking lerc-4.1.0-hdb68285_0
Linking libsqlite-3.53.1-h0c1763c_0
Linking libffi-3.5.2-h3435931_0
Linking tk-8.6.13-noxft_h366c992_103
Linking azure-core-cpp-1.16.2-h206d751_0
Linking libabseil-20260107.1-cxx17_h7b12aa8_0
Linking libutf8proc-2.11.3-hfe17d71_0
Linking libopentelemetry-cpp-headers-1.26.0-ha770c72_0
Linking zlib-1.3.2-h25fd6f3_2
Linking snappy-1.2.2-h03e3b7b_1
Linking nlohmann_json-3.12.0-h54a6638_1
Linking aws-c-common-0.13.1-hb03c661_0
Linking s2n-1.7.3-hc5a330e_0
Linking gflags-2.2.2-h5888daf_1005
Linking libevent-2.1.12-hf998b51_1
Linking expat-2.8.1-hecca717_0
Linking libcrc32c-1.1.2-h9c3ff4c_0
Linking qhull-2020.2-h434a139_5
Linking libxcb-1.17.0-h8a09558_0
Linking libgfortran-15.2.0-h69a702a_19
Linking libfreetype6-2.14.3-h73754d4_0
Linking libbrotlienc-1.2.0-hb03c661_1
Linking libbrotlidec-1.2.0-hb03c661_1
Linking libtiff-4.7.1-h9d88235_1
Linking sqlite-3.53.1-hbc0de68_0
Linking nss-3.118-h445c969_0
Linking azure-identity-cpp-1.13.3-hed0cdb0_1
Linking azure-storage-common-cpp-12.13.0-ha7a2c86_0
Linking libprotobuf-6.33.5-h6eeba95_1
Linking libre2-11-2025.11.05-h0dc7533_1
Linking prometheus-cpp-1.3.0-ha5d0236_0
Linking aws-c-compression-0.3.2-h16e98cb_1
Linking aws-checksums-0.2.10-h16e98cb_1
Linking aws-c-sdkutils-0.2.4-h16e98cb_5
Linking aws-c-cal-0.9.14-h8e43964_1
Linking glog-0.7.1-hbabe93e_0
Linking libthrift-0.22.0-h7d032f7_2
Linking libopenblas-0.3.33-pthreads_h94d23a6_0
Linking libfreetype-2.14.3-ha770c72_0
Linking brotli-bin-1.2.0-hb03c661_1
Linking lcms2-2.19.1-h0c24ade_0
Linking openjpeg-2.5.4-h55fea9a_0
Linking azure-storage-blobs-cpp-12.17.0-hf824e48_1
Linking re2-2025.11.05-h5301d42_1
Linking aws-c-io-0.26.3-h955231c_3
Linking libblas-3.11.0-8_h4a7cf45_openblas
Linking fontconfig-2.18.0-h27c8c51_0
Linking freetype-2.14.3-ha770c72_0
Linking brotli-1.2.0-hed03a55_1
Linking azure-storage-files-datalake-cpp-12.15.0-h1e5b466_0
Linking libgrpc-1.78.1-h1d1128b_0
Linking aws-c-event-stream-0.7.1-h9be7a74_1
Linking aws-c-http-0.11.0-hcbcd92d_1
Linking libcblas-3.11.0-8_h0358290_openblas
Linking liblapack-3.11.0-8_h47877c9_openblas
Linking libopentelemetry-cpp-1.26.0-h9692893_0
Linking aws-c-auth-0.10.3-h3aafcba_1
Linking aws-c-mqtt-0.15.2-h8af55cf_3
Linking libgoogle-cloud-3.5.0-h25dbb67_0
Linking aws-c-s3-0.12.3-h00bea6e_2
Linking libgoogle-cloud-storage-3.5.0-hdbdcf42_0
Linking aws-crt-cpp-0.38.3-h7b0d4b4_2
Linking aws-sdk-cpp-1.11.747-h5a171d8_5
Linking python_abi-3.14-8_cp314
Linking font-ttf-dejavu-sans-mono-2.37-hab24e00_0
Linking tzdata-2025c-hc9c84f9_1
Linking font-ttf-ubuntu-0.83-h77eed37_3
Linking font-ttf-inconsolata-3.000-h77eed37_0
Linking font-ttf-source-code-pro-2.038-h77eed37_0
Linking fonts-conda-forge-1-hc364b38_1
Linking orc-2.3.0-h21090e2_0
Linking python-3.14.5-habeac84_100_cp314
Linking kaleido-core-0.2.1-h3644ca4_0
Linking libarrow-24.0.0-h6f10b76_3_cpu
Linking libparquet-24.0.0-h7376487_3_cpu
Linking libarrow-compute-24.0.0-h53684a4_3_cpu
Linking libarrow-acero-24.0.0-h635bf11_3_cpu
Linking libarrow-dataset-24.0.0-h635bf11_3_cpu
Linking libarrow-substrait-24.0.0-hb4dd7c2_3_cpu
Linking pip-26.1.1-pyh145f28c_0
Linking tomli-2.4.1-pyhcf101f3_0
Linking six-1.17.0-pyhe01879c_1
Linking pysocks-1.7.1-pyha55dd90_7
Linking hyperframe-6.1.0-pyhd8ed1ab_0
Linking hpack-4.1.0-pyhd8ed1ab_0
Linking backports.zstd-1.5.0-py314h680f03e_0
Linking pyparsing-3.3.2-pyhcf101f3_0
Linking cycler-0.12.1-pyhcf101f3_2
Linking sniffio-1.3.1-pyhd8ed1ab_2
Linking mdurl-0.1.2-pyhd8ed1ab_1
Linking narwhals-2.21.2-pyhcf101f3_0
Linking packaging-26.2-pyhc364b38_0
Linking charset-normalizer-3.4.7-pyhd8ed1ab_0
Linking certifi-2026.5.20-pyhd8ed1ab_0
Linking idna-3.17-pyhcf101f3_0
Linking pygments-2.20.0-pyhd8ed1ab_0
Linking shellingham-1.5.4-pyhd8ed1ab_2
Linking annotated-doc-0.0.4-pyhcf101f3_0
Linking colorama-0.4.6-pyhd8ed1ab_1
Linking typing_extensions-4.15.0-pyhcf101f3_0
Linking click-8.4.1-pyhc90fa1f_0
Linking tqdm-4.67.3-pyh8f84b5b_0
Linking python-kaleido-0.2.1-pyhd8ed1ab_0
Linking python-multipart-0.0.29-pyhcf101f3_0
Linking python-dotenv-1.2.2-pyhcf101f3_0
Linking argcomplete-3.6.3-pyhd8ed1ab_0
Linking python-dateutil-2.9.0.post0-pyhe01879c_2
Linking h2-4.3.0-pyhcf101f3_0
Linking dnspython-2.8.0-pyhcf101f3_0
Linking markdown-it-py-4.2.0-pyhd8ed1ab_0
Linking plotly-6.6.0-pyhd8ed1ab_0
Linking exceptiongroup-1.3.1-pyhd8ed1ab_0
Linking typing-inspection-0.4.2-pyhcf101f3_2
Linking typing-extensions-4.15.0-h396c80c_0
Linking h11-0.16.0-pyhcf101f3_1
Linking email-validator-2.3.0-pyhd8ed1ab_0
Linking rich-15.0.0-pyhcf101f3_0
Linking anyio-4.13.0-pyhcf101f3_0
Linking annotated-types-0.7.0-pyhd8ed1ab_1
Linking uvicorn-0.48.0-pyhc90fa1f_0
Linking email_validator-2.3.0-hd8ed1ab_0
Linking rich-toolkit-0.19.10-pyhcf101f3_0
Linking typer-0.26.3-pyhcf101f3_0
Linking rich-click-1.9.8-pyh8f84b5b_0
Linking rich-argparse-1.8.0-pyhd8ed1ab_0
Linking httpcore-1.0.9-pyh29332c3_0
Linking starlette-1.1.0-pyhcf101f3_0
Linking httpx-0.28.1-pyhd8ed1ab_0
Linking pyarrow-core-24.0.0-py314h969be7f_0_cpu
Linking unicodedata2-17.0.1-py314h5bd0f2a_0
Linking brotli-python-1.2.0-py314h3de4e8d_1
Linking pillow-12.2.0-py314h8ec4b1a_0
Linking kiwisolver-1.5.0-py314h97ea11e_0
Linking fastar-0.11.0-py314h0b738fb_0
Linking markupsafe-3.0.3-py314h67df5f8_1
Linking websockets-16.0-py314h0f05182_1
Linking uvloop-0.22.1-py314h5bd0f2a_1
Linking pyyaml-6.0.3-py314h67df5f8_1
Linking httptools-0.7.1-py314h5bd0f2a_1
Linking numpy-2.4.6-py314h2b28147_0
Linking pydantic-core-2.46.4-py314h2e6c369_0
Linking watchfiles-1.2.0-py314ha5689aa_0
Linking pyarrow-24.0.0-py314hdafbbf9_0
Linking contourpy-1.3.3-py314h97ea11e_4
Linking biopython-1.87-py314h5bd0f2a_0
Linking pandas-3.0.3-py314hb4ffadd_0
Linking munkres-1.0.7-py_1
Linking urllib3-2.7.0-pyhd8ed1ab_0
Linking jinja2-3.1.6-pyhcf101f3_1
Linking pydantic-2.13.4-pyhcf101f3_0
Linking uvicorn-standard-0.48.0-he364bde_0
Linking fonttools-4.63.0-pyh7db6752_0
Linking requests-2.34.2-pyhcf101f3_0
Linking pydantic-settings-2.14.1-pyhcf101f3_0
Linking pydantic-extra-types-2.11.2-pyhcf101f3_0
Linking fastapi-core-0.136.3-pyhcf101f3_0
Linking fastapi-cli-0.0.23-pyhcf101f3_0
Linking fastapi-0.136.3-h5ddb490_0
Linking plotly-upset-hd-0.0.2-py_0
Linking matplotlib-base-3.10.9-py314h1194b4b_0
Linking upsetplot-0.9.0-pyhd8ed1ab_1
Linking perf_ssr-0.4.8-py_0
Linking crossroad-0.3.6-pyh7e60211_0

Transaction finished

(jitENV) hp@hp-HP-Z2-Tower-G9-Workstation-Desktop-PC:~/jitendraTEST$ crossroad -h

Usage: crossroad [OPTIONS]

Run the main croSSRoad analysis pipeline, or manage jobs.

╭─ Options ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --version -v Show version, logo, citation, and links. │
│ --install-completion Install completion for the current shell. │
│ --show-completion Show completion for the current shell, to copy it or customize the installation. │
│ --help -h Show this message and exit. │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Mode Selection ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --api -a Run the Crossroad web API server. │
│ --slurm -s Submit the analysis job to a Slurm cluster. │
│ --job-status JOB_ID Query the status of a specific job ID. │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Input Files (provide --input-dir OR --fasta) ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --input-dir -i PATH Directory containing: `all_genome.fa`, ``, ``. Exclusive with `--fasta`. │
│ --fasta -fa PATH Input FASTA file (e.g., `all_genome.fa`). Alternative to `--input-dir`. │
│ --categories -c PATH Genome categories TSV file. Optional if using `--fasta`. Ignored if `--input-dir` is used (looks for `genome_categories.tsv` inside). │
│ --gene-bed -b PATH Gene BED file for SSR-gene analysis. Optional. If `--input-dir` is used, looks for `gene.bed` inside. │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Analysis Parameters ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --reference-id -ref TEXT Reference genome ID for comparative analysis. Optional parameter for reference-based comparisons. │
│ --output-dir -o DIRECTORY Base output directory for jobs. Overrides CROSSROAD_JOB_DIR env var. │
│ --flanks -f Process flanking regions. │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ PERF SSR Detection Parameters ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --mono INTEGER Mononucleotide repeat threshold. [default: 12] │
│ --di INTEGER Dinucleotide repeat threshold. [default: 6] │
│ --tri INTEGER Trinucleotide repeat threshold. [default: 4] │
│ --tetra INTEGER Tetranucleotide repeat threshold. [default: 3] │
│ --penta INTEGER Pentanucleotide repeat threshold. [default: 3] │
│ --hexa INTEGER Hexanucleotide repeat threshold. [default: 2] │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Filtering Parameters ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --min-len -l INTEGER Minimum genome length for filtering. [default: 1000] │
│ --max-len -L INTEGER Maximum genome length for filtering. [default: 10000000] │
│ --unfair -u INTEGER Maximum number of N's allowed per genome for Crossroad analysis. [default: 0] │
│ --repeat-threshold -rc INTEGER Repeat count Threshold for hotspot filtering (keeps records > this value). [default: 1] │
│ --genome-threshold -g INTEGER Genome count Threshold for hotspot filtering (keeps records > this value). [default: 2] │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Performance & Output ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --threads -t INTEGER Number of threads for Crossroad analysis. [default: 50] │
│ --plots -p Enable plot generation. │
│ --intrim-dir TEXT Name for the intermediate files directory (within the main job output dir). [default: intrim] │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

(jitENV) hp@hp-HP-Z2-Tower-G9-Workstation-Desktop-PC:~/jitendraTEST$

mojolicious: a next generation web framework for the Perl programming language.

Jit — Fri, 12 Jan 2018 16:48:10 -0600

Back in the early days of the web, many people learned Perl because of a wonderful Perl library called CGI. It was simple enough to get started without knowing much about the language and powerful enough to keep you going, learning by doing was much fun. While most of the techniques used are outdated now, the idea behind it is not. Mojolicious is a new endeavor to implement this idea using bleeding edge technologies.

Features

An amazing real-time web framework, allowing you to easily grow single file prototypes into well-structured MVC web applications.
- Powerful out of the box with RESTful routes, plugins, commands, Perl-ish templates, content negotiation, session management, form validation, testing framework, static file server, CGI/PSGI detection, first class Unicode support and much more for you to discover.
A powerful web development toolkit, that you can use for all kinds of applications, independently of the web framework.
- Full stack HTTP and WebSocket client/server implementation with IPv6, TLS, SNI, IDNA, HTTP/SOCKS5 proxy, UNIX domain socket, Comet (long polling), Promises/A+, keep-alive, connection pooling, timeout, cookie, multipart and gzip compression support.
- Built-in non-blocking I/O web server, supporting multiple event loops as well as optional pre-forking and hot deployment, perfect for building highly scalable web services.
- JSON and HTML/XML parser with CSS selector support.
Very clean, portable and object-oriented pure-Perl API with no hidden magic and no requirements besides Perl 5.24.0 (versions as old as 5.10.1 can be used too, but may require additional CPAN modules to be installed)
Fresh code based upon years of experience developing Catalyst, free and open source.
Hundreds of 3rd party extensions and high quality spin-off projects like the Minion job queue.

http://mojolicious.org/

Address of the bookmark: http://mojolicious.org/

gpsrdocker: docker-based container that contain all software/web servers developed in the field of bioinformatics.

Jit — Sun, 16 Dec 2018 13:04:46 -0600

GPSRdocker (http://webs.iiitd.edu.in/gpsrdocker/) is Presently it contain software developed at G. P. S. Raghava's group (http://webs.iiitd.edu.in/raghava/ ).

The programs and the package are free software for academic users. Permission to use, copy, and modify any part of this software for educational, research and non-profit purposes is hereby granted. In this package or Docker image, number of other supported software has been integrated which may be under other licenses, along with any direct or indirect dependencies of the primary software being contained. As for any pre-built image usage, it is the image user's responsibility to ensure that any use of this image complies with any relevant licenses for all software contained within.

All software packages are distributed in the hope that they will be useful but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. If you have any query, please contact at raghava@iiitd.ac.in.

Address of the bookmark: https://hub.docker.com/r/raghavagps/gpsrdocker/

Bioinformatics web development course

Jit — Wed, 06 Nov 2019 20:42:48 -0600

This web development course, targeted at Biology and Bioinformatics students, aims at teaching from scratch all the skills needed to setup a fully working Linux web server and to develop and deploy web applications for Bioinformatics.

No previous programming knowledge is assumed. By following this tutorial you will learn the fundamental concepts of programming by using scripting languages: variables, types, arrays, cycles, conditional statements, functions, objects, regular expressions, files reading and manipulation et-cetera.

Address of the bookmark: http://www.cellbiol.com/bioinformatics_web_development/introduction/

NCBI BLAST have added new columns to the Descriptions

Neel — Tue, 01 Dec 2020 09:56:07 -0600

NCBI BLAST have added new columns to the Descriptions Table for web BLAST output. The new columns are Scientific Name, Common Name, Taxid, and Accession Length. Common Name and Accession Length are now part of the default display. You can click 'Select columns' or 'Manage columns' to add or remove columns from the display Your preferences will be saved for your next visit to BLAST, and when you download your results, whatever columns you have displayed will be saved. See the NCBI Insights post (https://go.usa.gov/x7fPE) for more details.

Building Web UIs With Mojolicious Perl

Jit — Tue, 26 Dec 2017 18:06:57 -0600

Mojolicious is one of 3 leading web frameworks available in the perl ecosystem (along with Dancer and Catalyst) and by far my favorite.

Mojolicious aims to provide a complete web development experience. It thus has no hard dependencies, comes with a built-in development and production server and many other features one needs to build a web application. It's easy to install, has an applciation generator script and many plugins and extensions.

Libraries for developing Web applications

Amon2
Catalyst - Overflowing with features. Very popular.
Dancer (Official site)
Dancer2
Gantry - Web application framework for mod_perl, cgi, etc.
Kossy - A Web framework with simple interface.
Mojolicious - An all in one framework.
Poet - a modern Perl web framework for Mason developers

Address of the bookmark: https://www.ynonperek.com/2017/09/28/perl-mojolicious-web-development/

Apollo: First instantaneous, collaborative genomic annotation editor available on the Web

Jit — Fri, 31 May 2019 19:55:39 -0500

Apollo is a plug-in for the JBrowse Genome Viewer.
In addition to genes and pseudogenes, users can annotate ncRNAs (snRNA, snoRNA, tRNA, rRNA), miRNAs, repeat regions, and transposable elements; each annotation type has its own configuration of the ‘Information Editor’.
History tracking with undo/redo functions is available.
Users are able to directly set an annotation to a specific state, choosing from the ‘History’ display.
Adding and updating PubMed IDs will prompt users with a publication title to confirm their submission.
Gene Ontology (GO) terms are supported and GO ID auto-completion has been incorporated.
Users may access a ‘Recent Changes’ page.
Help page with Apollo specific content is available.

Address of the bookmark: http://genomearchitect.github.io/

BBTools for bioinformatician !

Surabhi Chaudhary — Thu, 15 Feb 2018 16:45:52 -0600

BBMap.sh

Mapping Nanopore reads

BBMap.sh has a length cap of 6kbp. Reads longer than this will be broken into 6kbp pieces and mapped independently.

Code:

$ mapPacBio.sh -Xmx20g k=7 in=reads.fastq ref=reference.fa maxlen=1000 minlen=200 idtag ow int=f qin=33 out=mapped1.sam minratio=0.15 ignorequality slow ordered maxindel1=40 maxindel2=400

The "maxlen" flag shreds them to a max length of 1000; you can set that up to 6000. But I found 1000 gave a higher mapping rate.

Using Paired-end and single-end reads at the same time

BBMap itself can only run single-ended or paired-ended in a single run, but it has a wrapper that can accomplish it, like this:

Code:

$ bbwrap.sh in1=read1.fq,singletons.fq in2=read2.fq,null out=mapped.sam append

This will write all the reads to the same output file but only print the headers once. I have not tried that for bam output, only sam output

Note about alignment stats: For paired reads, you can find the total percent mapped by adding the read 1 percent (where it says "mapped: N%") and read 2 percent, then dividing by 2. The different columns tell you the count/percent of each event. Considering the cigar strings from alignment, "Match Rate" is the number of symbols indicating a reference match (=) and error rate is the number indicating substitution, insertion, or deletion (X, I, D).

Exact matches when mapping small reads (e.g. miRNA)

When mapping small RNA's with BBMap use the following flags to report only perfect matches.

Code:

ambig=all vslow perfectmode maxsites=1000

It should be very fast in that mode (despite the vslow flag). Vslow mainly removes masking of low-complexity repetitive kmers, which is not usually a problem but can be with extremely short sequences like microRNAs.

Important note about BBMap alignments

BBMap is always nondeterministic when run in paired-end mode with multiple threads, because the insert-size average is calculated on a per-thread basis, which affects mapping; and which reads are assigned to which thread is nondeterministic. The only way to avoid that would be to restrict it to a single thread (threads=1), or map the reads as single-ended and then fix pairing afterward:

Code:

bbmap.sh in=reads.fq outu=unmapped.fq int=f
repair.sh in=unmapped.fq out=paired.fq fint outs=singletons.fq

In this case you'd want to only keep the paired output.

BBSplit is based on BBMap, so it is also nondeterministic in paired mode with multiple threads. BBDuk and Seal (which can be used similarly to BBSplit) are always deterministic.

--------------------------------------------------------

Reformat.sh

Count k-mers/find unknown primers

Code:

$ reformat.sh in=reads.fq out=trimmed.fq ftr=19

This will trim all but the first 20 bases (all bases after position 19, zero-based).

Code:

$ kmercountexact.sh in=trimmed.fq out=counts.txt fastadump=f mincount=10 k=20 rcomp=f

This will generate a file containing the counts of all 20-mers that occurred at least 10 times, in a 2-column format that is easy to sort in Excel.

Code:

ACCGTTACCGTTACCGTTAC	100
AAATTTTTTTCCCCCCCCCC	85

...etc. If the primers are 20bp long, they should be pretty obvious.

Convert SAM format from 1.4 to 1.3 (required for many programs)

Code:

$ reformat.sh in=reads.sam out=out.sam sam=1.3

Removing N basecalls

You can use BBDuk or Reformat with "qtrim=rl trimq=1". That will only trim trailing and leading bases with Q-score below 1, which means Q0, which means N (in either fasta or fastq format). The BBMap package automatically changes q-scores of Ns that are above 0 to 0 and called bases with q-scores below 2 to 2, since occasionally some Illumina software versions produces odd things like a handful of Q0 called bases or Ns with Q>0, neither of which make any sense in the Phred scale.

Sampling reads

Code:

$ reformat.sh in=reads.fq out=sampled.fq sample=3000

Code:

To sample 10% of the reads:
reformat.sh in1=reads1.fq in2=reads2.fq out1=sampled1.fq out2=sampled2.fq samplerate=0.1

or more concisely:
reformat.sh in=reads#.fq out=sampled#.fq samplerate=0.1

and for exact sampling:
reformat.sh in=reads#.fq out=sampled#.fq samplereadstarget=100k

Changing fasta headers

Remove anything after the first space in fasta header.

Code:

 reformat.sh in=sequences.fasta out=renamed.fasta trd

"trd" stands for "trim read description" and will truncate everything after the first whitespace.

Extract reads from a sam file

Code:

$ reformat.sh in=reads.sam out=reads.fastq

Verify pairing and optionally de-interleave the reads

Code:

$ reformat.sh in=reads.fastq verifypairing

Verify pairing if the reads are in separate files

Code:

$ reformat.sh in1=r1.fq in2=r2.fq vpair

If that completes successfully and says the reads were correctly paired, then you can simply de-interleave reads into two files like this:

Code:

$ reformat.sh in=reads.fastq out1=r1.fastq out2=r2.fastq

Base quality histograms

Code:

$ reformat.sh in=reads.fq qchist=qchist.txt

That stands for "quality count histogram".

Filter SAM/BAM file by read length

Code:

$ reformat.sh in=x.sam out=y.sam minlength=50 maxlength=200

Filter SAM/BAM file to detect/filter spliced reads

Code:

$ reformat.sh in=mapped.bam out=filtered.bam maxdellen=50

You can set "maxdellen" to whatever length deletion event you consider the minimum to signify splicing, which depends on the organism.
-------------------------------------------------------------
Repair.sh

"Re-pair" out-of-order reads from paired-end data files

Code:

$ repair.sh in1=r1.fq.gz in2=r2.fq.gz out1=fixed1.fq.gz out2=fixed2.fq.gz outsingle=singletons.fq.gz

--------------------------------------------------------------
BBMerge.sh

BBMerge now has a new flag - "outa" or "outadapter". This allows you to automatically detect the adapter sequence of reads with short insert sizes, in case you don't know what adapters were used. It works like this:

Code:

$ bbmerge.sh in=reads.fq outa=adapters.fa reads=1m

Of course, it will only work for paired reads! The output fasta file will look like this:

Code:

>Read1_adapter
GATCGGAAGAGCACACGTCTGAACTCCAGTCACATCACGATCTCGTATGCCGTCTTCTGCTTG
>Read2_adapter
GATCGGAAGAGCACACGTCTGAACTCCAGTCACCGATGTATCTCGTATGCCGTCTTCTGCTTG

If you have multiplexed things with different barcodes in the adapters, the part with the barcode will show up as Ns, like this:

GATCGGAAGAGCACACGTCTGAACTCCAGTCACNNNNNNATCTCGTATGCCGTCTTCTGCTTG

Note: For BBMerge with micro-RNA, you need to add the flag mininsert=17. The default is 35, which is too long for micro-RNA libraries.

Identifying adapters

If you have paired reads, and enough of the reads have inserts shorter than read length, you can identify adapter sequences with BBMerge, like this (they will be printed to adapters.fa):

Code:

$ bbmerge.sh in1=r1.fq in2=r2.fq outa=adapters.fa

-----------------------------------------------------------------

BBDuk.sh

Note: BBDuk is strictly deterministic on a per-read basis, however it does by default reorder the reads when run multithreaded. You can add the flag "ordered" to keep output reads in the same order as input reads

Finding reads with a specific sequence at the beginning of read

Code:

$ bbduk.sh -Xmx1g in=reads.fq outm=matched.fq outu=unmatched.fq restrictleft=25 k=25 literal=AAAAACCCCCTTTTTGGGGGAAAAA

In this case, all reads starting with "AAAAACCCCCTTTTTGGGGGAAAAA" will end up in "matched.fq" and all other reads will end up in "unmatched.fq". Specifically, the command means "look for 25-mers in the leftmost 25 bp of the read", which will require an exact prefix match, though you can relax that if you want.

So you could bin all the reads with your known sequence, then look at the remaining reads to see what they have in common. You can do the same thing with the tail of the read using "restrictright" instead, though you can't use both restrictions at the same time.

Code:

$ bbduk.sh in=reads.fq outm=matched.fq literal=NNNNNNCCCCGGGGGTTTTTAAAAA k=25 copyundefined

With the "copyundefined" flag, a copy of each reference sequence will be made representing every valid combination of defined letter. So instead of increasing memory or time use by 6^75, it only increases them by 4^6 or 4096 which is completely reasonable, but it only allows substitutions at predefined locations. You can use the "copyundefined", "hdist", and "qhdist" flags together for a lot of flexibility - for example, hdist=2 qhdist=1 and 3 Ns in the reference would allow a hamming distance of 6 with much lower resource requirements than hdist=6. Just be sure to give BBDuk as much memory as possible.

Removing illumina adapters (if exact adapters not known)

If you're not sure which adapters are used, you can add "ref=truseq.fa.gz,truseq_rna.fa.gz,nextera.fa.gz" and get them all (this will increase the amount of overtrimming, though it should still be negligible).

Removing illumina control sequences/phiX reads

Code:

bbduk.sh in=trimmed.fq.gz out=filtered.fq.gz k=31 ref=artifacts,phix ordered cardinality

Identify certain reads that contain a specific sequence

Code:

$ bbduk.sh in=reads.fq out=unmatched.fq outm=matched.fq literal=ACGTACGTACGTACGTAC k=18 mm=f hdist=2

Make sure "k" is set to the exact length of the sequence. "hdist" controls the number of substitutions allowed. "outm" gets the reads that match. By default this also looks for the reverse-complement; you can disable that with "rcomp=f".

Extract sequences that share kmers with your sequences with BBDuk

Code:

$ bbduk.sh in=a.fa ref=b.fa out=c.fa mkf=1 mm=f k=31

This will print to C all the sequences in A that share 100% of their 31-mers with sequences in B.

Extract sequences that contain N's with BBDuk

Code:

bbduk.sh in=reads.fq out=readsWithoutNs.fq outm=readsWithNs.fq maxns=0

If you have, say, 100bp reads and only want to separate reads containing all 100 Ns, change that to "maxns=99".

General notes for BBDuk.sh

BBDuk can operate in one of 4 kmer-matching modes:
Right-trimming (ktrim=r), left-trimming (ktrim=l), masking (ktrim=n), and filtering (default). But it can only do one at a time because all kmers are stored in a single table. It can still do non-kmer-based operations such as quality trimming at the same time.

BBDuk2 can do all 4 kmer operations at once and is designed for integration into automated pipelines where you do contaminant removal and adapter-trimming in a single pass to minimize filesystem I/O. Personally, I never use BBDuk2 from the command line. Both have identical capabilities and functionality otherwise, but the syntax is different.

------------------------------------------------------------------

Randomreads.sh

Generate random reads in various formats

Code:

$ randomreads.sh ref=genome.fasta out=reads.fq len=100 reads=10000

You can specify paired reads, an insert size distribution, read lengths (or length ranges), and so forth. But because I developed it to benchmark mapping algorithms, it is specifically designed to give excellent control over mutations. You can specify the number of snps, insertions, deletions, and Ns per read, either exactly or probabilistically; the lengths of these events is individually customizable, the quality values can alternately be set to allow errors to be generated on the basis of quality; there's a PacBio error model; and all of the reads are annotated with their genomic origin, so you will know the correct answer when mapping.

Bear in mind that 50% of the reads are going to be generated from the plus strand and 50% from the minus strand. So, either a read will match the reference perfectly, OR its reverse-complement will match perfectly.

You can generate the same set of reads with and without SNPs by fixing the seed to a positive number, like this:

Code:

$ randomreads.sh maxsnps=0 adderrors=false out=perfect.fastq reads=1000 minlength=18 maxlength=55 seed=5

$ randomreads.sh maxsnps=2 snprate=1 adderrors=false out=2snps.fastq reads=1000 minlength=18 maxlength=55 seed=5

[As of BBmap v. 36.59] rendomreads.sh gains the ability to simulate metagenomes.

coverage=X will automatically set "reads" to a level that will give X average coverage (decimal point is allowed).

metagenome will assign each scaffold a random exponential variable, which decides the probability that a read be generated from that scaffold. So, if you concatenate together 20 bacterial genomes, you can run randomreads and get a metagenomic-like distribution. It could also be used for RNA-seq when using a transcriptome reference.

The coverage is decided on a per-reference-sequence level, so if a bacterial assembly has more than one contig, you may want to glue them together first with fuse.sh before concatenating them with the other references.

Simulate a jump library

You can simulate a 4000bp jump library from your existing data like this.

Code:

$ cat assembly1.fa assembly2.fa > combined.fa
$ bbmap.sh ref=combined.fa
$ randomreads.sh reads=1000000 length=100 paired interleaved mininsert=3500 maxinsert=4500 bell perfect=1 q=35 out=jump.fq.gz

--------------------------------------------------------------
Shred.sh

Code:

$ shred.sh in=ref.fasta out=reads.fastq length=200

The difference is that RandomReads will make reads in a random order from random locations, ensuring flat coverage on average, but it won't ensure 100% coverage unless you generate many fold depth. Shred, on the other hand, gives you exactly 1x depth and exactly 100% coverage (and is not capable of modelling errors). So, the use-cases are different.
---------------------------------------------------------------
Demuxbyname.sh

Demultiplex fastq files when the tag is present in the fastq read header (illumina)

Code:

$ demuxbyname.sh in=r#.fq out=out_%_#.fq prefixmode=f names=GGACTCCT+GCGATCTA,TAAGGCGA+TCTACTCT,...
outu=filename

"Names" can also be a text file with one barcode per line (in exactly the format found in the read header). You do have to include all of the expected barcodes, though.

In the output filename, the "%" symbol gets replaced by the barcode; in both the input and output names, the "#" symbol gets replaced by 1 or 2 for read 1 or read 2. It's optional, though; you can leave it out for interleaved input/output, or specify in1=/in2=/out1=/out2= if you want custom naming.

----------------------------------------------------------------

Readlength.sh

Plotting the length distribution of reads

Code:

$ readlength.sh in=file out=histogram.txt bin=10 max=80000

That will plot the result in bins of size 10, with everything above 80k placed in the same bin. The defaults are set for relatively short sequences so if they are many megabases long you may need to add the flag "-Xmx8g" and increase "max=" to something much higher.

Alternatively, if these are assemblies and you're interested in continuity information (L50, N50, etc), you can run stats on each or statswrapper on all of them:

Code:

stats.sh in=file

Code:

statswrapper.sh in=file,file,file,file…

----------------------------------------------------------------
Filterbyname.sh

By default, "filterbyname" discards reads with names in your name list, and keeps the rest. To include them and discard the others, do this:

Code:

$ filterbyname.sh in=003.fastq out=filter003.fq names=names003.txt include=t

----------------------------------------------------------------
getreads.sh

If you only know the number(s) of the fasta/fastq record(s) in a file (records start at 0) then you can use the following command to extract those reads in a new file.

Code:

$ getreads.sh in= id= out=

The first read (or pair) has ID 0, the second read (or pair) has ID 1, etc.

Parameters:
in= Specify the input file, or stdin.
out= Specify the output file, or stdout.
id= Comma delimited list of numbers or ranges, in any order.
For example: id=5,93,17-31,8,0,12-13
----------------------------------------------------------------
Splitsam.sh

Splits a sam file into forward and reverse reads

Code:

splitsam.sh mapped.sam plus.sam minus.sam unmapped.sam
reformat.sh in=plus.sam out=plus.fq
reformat.sh in=minus.sam out=minus.fq rcomp

----------------------------------------------------------------
BBSplit.sh

BBSplit now has the ability to output paired reads in dual files using the # symbol. For example:

Code:

$ bbsplit.sh ref=x.fa,y.fa in1=read1.fq in2=read2.fq basename=o%_#.fq

will produce ox_1.fq, ox_2.fq, oy_1.fq, and oy_2.fq

You can use the # symbol for input also, like "in=read#.fq", and it will get expanded into 1 and 2.

Added feature: One can specify a directory for the "ref=" argument. If anything in the list is a directory, it will use all fasta files in that directory. They need a fasta extension, like .fa or .fasta, but can be compressed with an additional .gz after that. Reason this is useful is to use BBSplit is to have it split input into one output file per reference file.

NOTE: 1 By default BBSplit uses fairly strict mapping parameters; you can get the same sensitivity as BBMap by adding the flags "minid=0.76 maxindel=16k minhits=1". With those parameters it is extremely sensitive.

NOTE: 2 BBSplit has different ambiguity settings for dealing with reads that map to multiple genomes. In any case, if the alignment score is higher to one genome than another, it will be associated with that genome only (this considers the combined scores of read pairs - pairs are always kept together). But when a read or pair has two identically-scoring mapping locations, on different genomes, the behavior is controlled by the "ambig2" flag - "ambig2=toss" will discard the read, "all" will send it to all output files, and "split" will send it to a separate file for ambiguously-mapped reads (one per genome to which it maps).

NOTE: 3 Zero-count lines are suppressed by default, but they should be printed if you include the flag "nzo=f" (nonzeroonly=false).

NOTE: 4 BBSplit needs multiple reference files as input; one per organism, or one for target and another for everything else. It only outputs one file per reference file.

Seal.sh, on the other hand, which is similar, can use a single concatenated file, as it (by default) will output one file per reference sequence within a concatenated set of references.
--------------------------------------------------------------
Pileup.sh

To generate transcript coverage stats

Code:

$ pileup.sh in=mapped.sam normcov=normcoverage.txt normb=20 stats=stats.txt

That will generate coverage per transcript, with 20 lines per transcript, each line showing the coverage for that fraction of the transcript. "stats" will contain other information like the fraction of bases in each transcript that was covered.

To calculate physical coverage stats (region covered by paired-end reads)

BBMap has a "physcov" flag that allows it to report physical rather than sequenced coverage. It can be used directly in BBMap, or with pileup, if you already have a sam file. For example:

Code:

$ pileup.sh in=mapped.sam covstats=coverage.txt

Calculating coverage of the genome

Program will take sam or bam, sorted or unsorted.

Code:

$ pileup.sh in=mapped.sam out=stats.txt hist=histogram.txt

stats.txt will contain the average depth and percent covered of each reference sequence; the histogram will contain the exact number of bases with a each coverage level. You can also get per-base coverage or binned coverage if you want to plot the coverage. It also generates median and standard deviation, and so forth.

It's also possible to generate coverage directly from BBMap, without an intermediate sam file, like this:

Code:

$ bbmap.sh in=reads.fq ref=reference.fasta nodisk covstats=stats.txt covhist=histogram.txt

We use this a lot in situations where all you care about is coverage distributions, which is somewhat common in metagenome assemblies. It also supports most of the flags that pileup.sh supports, though the syntax is slightly different to prevent collisions. In each case you can see all the possible flags by running the shellscript with no arguments.

To bin aligned reads

Code:

$ pileup.sh in=mapped.sam out=stats.txt bincov=coverage.txt binsize=1000

That will give coverage within each bin. For read density regardless of read length, add the "startcov=t" flag.

--------------------------------------------------------------
Dedupe.sh

Dedupe ensures that there is at most one copy of any input sequence, optionally allowing contaminants (substrings) to be removed, and a variable hamming or edit distance to be specified. Usage:

Code:

$ dedupe.sh in=assembly1.fa,assembly2.fa out=merged.fa

That will absorb exact duplicates and containments. You can use "hdist" and "edist" flags to allow mismatches, or get a complete list of flags by running the shellscript with no arguments.

Dedupe will merge assemblies, but it will not produce consensus sequences or join overlapping reads; it only removes sequences that are fully contained within other sequences (allowing the specified number of mismatches or edits).

Dedupe can remove duplicate reads from multiple files simultaneously, if they are comma-delimited (e.g. in=file1.fastq,file2.fastq,file3.fastq). And if you set the flag "uniqueonly=t" then ALL copies of duplicate reads will be removed, as opposed to the default behavior of leaving one copy of duplicate reads.

However, it does not care which file a read came from; in other words, it can't remove only reads that are duplicates across multiple files but leave the ones that are duplicates within a file. That can still be accomplished, though, like this:

1) Run dedupe on each sample individually, so now there are at most 1 copy of a read per sample.
2) Run dedupe again on all of the samples together, with "uniqueonly=t". The only remaining duplicate reads will be the ones duplicated between samples, so that's all that will be removed.

--------------------------------------------------------------

Generate ROC curves from any aligner

[*]index the reference

Code:

$ bbmap.sh ref=reference.fasta

[*]Generate random reads

Code:

$ randomreads.sh reads=100000 length=100 out=synth.fastq maxq=35 midq=25 minq=15

[*]Map to produce a sam file

...substitute this command with the appropriate one from your aligner of choice

Code:

$ bbmap.sh in=synth.fq out=mapped.sam

[*]Generate ROC curve

Code:

$ samtoroc.sh in=mapped.sam reads=100000

--------------------------------------------------------------

Calculate heterozygous rate for sequence data

Code:

$ kmercountexact.sh in=reads.fq khist=histogram.txt peaks=peaks.txt

You can examine the histogram manually, or use the "peaks" file which tells you the number of unique kmers in each peak on the histogram. For a diploid, the first peak will be the het peak, the second will be the homozygous peak, and the rest will be repeat peaks. The peak caller is not perfect, though, so particularly with noisy data I would only rely on it for the first two peaks, and try to quantify the higher-order peaks manually if you need to (which you generally don't).

-----------------------------------------------------------------

Compare mapped reads between two files

To see how many mapped reads (can be mapped concordant or discordant, doesn't matter) are shared between the two alignment files and how many mapped reads are unique to one file or the other.

Code:

$ reformat.sh in=file1.sam out=mapped1.sam mappedonly
$ reformat.sh in=file2.sam out=mapped2.sam mappedonly

That gets you the mapped reads only. Then:

Code:

$ filterbyname.sh in=mapped1.sam names=mapped2.sam out=shared.sam include=t

...which gets you the set intersection;

Code:

$ filterbyname.sh in=mapped1.sam names=mapped2.sam out=only1.sam include=f
$ filterbyname.sh in=mapped2.sam names=mapped1.sam out=only2.sam include=f

...which get you the set subtractions.

--------------------------------------------------------------

BBrename.sh

Code:

$ bbrename.sh in=old.fasta out=new.fasta

That will rename the reads as 1, 2, 3, 4, ... 222.

You can also give a custom prefix if you want. The input has to be text format, not .doc.

---------------------------------------------------------------------

BBfakereads.sh

Generating “fake” paired end reads from a single end read file

Code:

$ bfakereads.sh in=reads.fastq out1=r1.fastq out2=r2.fastq length=100

That will generate fake pairs from the input file, with whatever length you want (maximum of input read length). We use it in some cases for generating a fake LMP library for scaffolding from a set of contigs. Read 1 will be from the left end, and read 2 will be reverse-complemented and from the right end; both will retain the correct original qualities. And " /1" " /2" will be suffixed after the read name.

------------------------------------------------------------------
Randomreads.sh

Generate random reads

Code:

$ randomreads.sh ref=genome.fasta out=reads.fq len=100 reads=10000

"seed=-1" will use a random seed; any other value will use that specific number as the seed

You can specify paired reads, an insert size distribution, read lengths (or length ranges), and so forth. But because I developed it to benchmark mapping algorithms, it is specifically designed to give excellent control over mutations. You can specify the number of snps, insertions, deletions, and Ns per read, either exactly or probabilistically; the lengths of these events is individually customizable, the quality values can alternately be set to allow errors to be generated on the basis of quality; there's a PacBio error model; and all of the reads are annotated with their genomic origin, so you will know the correct answer when mapping.

--------------------------------------------------------------------

Generate saturation curves to assess sequencing depth

Code:

$ bbcountunique.sh in=reads.fq out=histogram.txt

It works by pulling kmers from each input read, and testing whether it has been seen before, then storing it in a table.

The bottom line, "first", tracks whether the first kmer of the read has been seen before (independent of whether it is read 1 or read 2).

The top line, "pair", indicates whether a combined kmer from both read 1 and read 2 has been seen before. The other lines are generally safe to ignore but they track other things, like read1- or read2-specific data, and random kmers versus the first kmer.

It plots a point every X reads (configurable, default 25000).

In noncumulative mode (default), a point indicates "for the last X reads, this percentage had never been seen before". In this mode, once the line hits zero, sequencing more is not useful.

In cumulative mode, a point indicates "for all reads, this percentage had never been seen before", but still only one point is plotted per X reads.

-----------------------------------------------------------------
CalcTrueQuality.sh

http://seqanswers.com/forums/showthread.php?p=170904

In light of the quality-score issues with the NextSeq platform, and the possibility of future Illumina platforms (HiSeq 3000 and 4000) also using quantized quality scores, I developed it for recalibrating the scores to ensure accuracy and restore the full range of values.

-----------------------------------------------------------------

BBMapskimmer.sh

BBMap is designed to find the best mapping, and heuristics will cause it to ignore mappings that are valid but substantially worse. Therefore, I made a different version of it, BBMapSkimmer, which is designed to find all of the mappings above a certain threshold. The shellscript is bbmapskimmer.sh and the usage is similar to bbmap.sh or mapPacBio.sh. For primers, which I assume will be short, you may wish to use a lower than default K of, say, 10 or 11, and add the "slow" flag.

--------------------------------------------------------------

msa.sh and curprimers.sh

Quoted from Brian's response directly.

I also wrote another pair of programs specifically for working with primer pairs, msa.sh and cutprimers.sh. msa.sh will forcibly align a primer sequence (or a set of primer sequences) against a set of reference sequences to find the single best matching location per reference sequence - in other words, if you have 3 primers and 100 ref sequences, it will output a sam file with exactly 100 alignments - one per ref sequence, using the primer sequence that matched best. Of course you can also just run it with 1 primer sequence.

So you run msa twice - once for the left primer, and once for the right primer - and generate 2 sam files. Then you feed those into cutprimers.sh, which will create a new fasta file containing the sequence between the primers, for each reference sequence. We used these programs to synthetically cut V4 out of full-length 16S sequences.

I should say, though, that the primer sites identified are based on the normal BBMap scoring, which is not necessarily the same as where the primers would bind naturally, though with highly conserved regions there should be no difference.

------------------------------------------------------
testformat.sh

Identify type of Q-score encoding in sequence files

Code:

$ testformat.sh in=seq.fq.gz
sanger    fastq    gz    interleaved    150bp

--------------------------------------------------
kcompress.sh

Newest member of BBTools. Identify constituent k-mers.
http://seqanswers.com/forums/showthread.php?t=63258

----------------------------------------------------
commonkmers.sh

Find all k-mers for a given sequence.

Code:

$ commonkmers.sh in=reads.fq out=kmers.txt k=4 count=t display=999

Will produce output that looks like

Code:

MISEQ05:239:000000000-A74HF:1:2110:14788:23085	ATGA=8	ATGC=6	GTCA=6	AAAT=5	AAGC=5	AATG=5	AGCA=5	ATAA=5	ATTA=5	CAAA=5	CATA=5	CATC=5	CTGC=5	AACC=4	AACG=4	AAGA=4	ACAT=4	ACCA=4	AGAA=4	ATCA=4	ATGG=4	CAAG=4	CCAA=4	CCTC=4	CTCA=4	CTGA=4	CTTC=4	GAGC=4	GGTA=4	GTAA=4	GTTA=4	AAAA=3	AAAC=3	AAGT=3	ACCG=3	ACGG=3	ACTG=3	AGAT=3	AGCT=3	AGGA=3	AGTA=3	AGTC=3	CAGC=3	CATG=3	CGAG=3	CGGA=3	CGTC=3	CTAA=3	CTCC=3	CTTA=3	GAAA=3	GACA=3	GACC=3	GAGA=3	GCAA=3	GGAC=3	TCAA=3	TGCA=3	AAAG=2	AACA=2	AATA=2	AATC=2	ACAA=2	ACCC=2	ACCT=2	ACGA=2	ACGC=2	AGAC=2	AGCG=2	AGGC=2	CAAC=2	CAGG=2	CCGC=2	GCCA=2	GCTA=2	GGAA=2	GGCA=2	TAAA=2	TAGA=2	TCCA=2	TGAA=2	AAGG=1	AATT=1	ACGT=1	AGAG=1	AGCC=1	AGGG=1	ATAC=1	ATAG=1	ATTG=1	CACA=1	CACG=1	CAGA=1	CCAC=1	CCCA=1	CCGA=1	CCTA=1	CGAC=1	CGCA=1	CGCC=1	CGCG=1	CGTA=1	CTAC=1	GAAC=1	GCGA=1	GCGC=1	GTAC=1	GTGA=1	TTAA=1

-----------------------------------------------------
Mutate.sh

Simulate multiple mutants from a known reference (e.g. E. coli).

Code:

$ mutate.sh in=e_coli.fasta out=mutant.fasta id=99 
$ randomreads.sh ref=mutant.fasta out=reads.fq.gz reads=5m length=150 paired adderrors

That will create a mutant version of E.coli with 99% identity to the original, and then generate 5 million simulated read pairs from the new genome. You can repeat this multiple times; each mutant will be different.

------------------------------------

Partition.sh

One can partition a large dataset with partition.sh into smaller subsets (example below splits data into 8 chunks).

Code:

partition.sh in=r1.fq in2=r2.fq out=r1_part%.fq out2=r2_part%.fq ways=8

-----------------------------------
clumpify.sh

If you are concerned about file size and want the files to be as small as possible, give Clumpify a try. It can reduce filesize by around 30% losslessly by reordering the reads. I've found that this also typically accelerates subsequent analysis pipelines by a similar factor (up to 30%). Usage:

Code:

clumpify.sh in=reads.fastq.gz out=clumped.fastq.gz

Code:

clumpify.sh in1=reads_R1.fastq.gz in2=reads_R2.fastq.gz out1=clumped_R1.fastq.gz out2=clumped_R2.fastq.gz

Clumpify.sh can now mark/remove sequence duplicates (optical/PCR/otherwise) from NGS data

This does NOT require alignments so it should prove more useful compared to Picard MarkDuplicates. Relevant options for clumpify.sh command are listed below.

Code:

dedupe=f optical=f (default)
Nothing happens with regards to duplicates.

dedupe=t optical=f
All duplicates are detected, whether optical or not.  All copies except one are removed for each duplicate.

dedupe=f optical=t
Nothing happens.

dedupe=t optical=t

Only optical duplicates (those with an X or Y coordinate within dist) are detected.  All copies except one are removed for each duplicate.
The allduplicates flag makes all copies of duplicates removed, rather than leaving a single copy.  But like optical, it has no effect unless dedupe=t.

Note: If you set "dupedist" to anything greater than 0, "optical" gets enabled automatically.

-------------------------------------
fuse.sh

Fuse will automatically reverse-complement read 2. Pad (N) amount can be adjusted as necessary. This will for example create a full size amplicon that can be used for alignments.

Code:

fuse.sh in1=r1.fq in2=r2.fq pad=130 out=fused.fq fusepairs