Alternative content
CroCo is a program to detect cross contamination events in assembled transcriptomes using sequencing reads to determine the true origin of every transcripts.
Such cross contaminations can be expected if several RNA-Seq experiments were prepared during the same period at the same lab, or by the same people, or if they were processed or sequenced by the same sequencing service facility.
Our approach first determines a subset of transcripts that are suspiciously similar across samples using a pairwise BLAST procedure. CroCo then combine all transcriptomes into a metatranscriptome and quantifies the "expression level" of all transcripts successively using every sample read data (e.g. several species sequenced by the same lab for a particular study) while allowing read multi-mappings.
Several mapping tools implemented in CroCo can be used to estimate expression level (default is RapMap).
This information is then used to categorize each transcript in the following 5 categories :
clean: the transcript origin is from the focal sample.
cross contamination: the transcript origin is from an alien sample of the same experiment.
dubious: expression levels are too close between focal and alien samples to determine the true origin of the transcript.
low coverage: expression levels are too low in all samples, thus hampering our procedure (which relies on differential expression) to confidently assign it to any category.
over expressed: expression levels are very high in at least 3 samples and CroCo will not try to categorize it. Indeed, such a pattern does not correspond to expectations for cross contaminations, but often reflect highly conserved genes such as ribosomal gene, or external contamination shared by several samples (e.g. Escherichia coli contaminations).