Our Sponsors



Download BioinformaticsOnline(BOL) Apps in your chrome browser.




Question: Question: Whar are N50 and L50 of a contig assembly file? How to calculate it?

Gudiya Pal
2946 days ago

Question: Whar are N50 and L50 of a contig assembly file? How to calculate it?

I am currently dealing with new contig assembly dataset, and bit confused about N50 and L50 stuff ....

Answers
1

Hi Gudiya,
N50 can be defined as the largest entity E such that at least half of the total size of the entities is contained in entities larger than E. For example if we have a collection of contigs with sizes 7, 4, 3, 2, 2, 1, and 1 kb (total size = 20kbp), the N50 length is 4 because we can cover 10 kb with contigs bigger than 4kb.

L50 is the number of scaffolds that accounts for more than 50% of the genome assembly.

Cheers

0

from a Broad Institute site:

"N50 is a statistical measure of average length of a set of sequences. It is used widely in genomics, especially in reference to contig or supercontig lengths within a draft assembly.

Given a set of sequences of varying lengths, the N50 length is defined as the length N for which 50% of all bases in the sequences are in a sequence of length L < N. This can be found mathematically as follows: Take a list L of positive integers. Create another list L' , which is identical to L, except that every element n in L has been replaced with n copies of itself. Then the median of L' is the N50 of L. For example: If L = {2, 2, 2, 3, 3, 4, 8, 8}, then L' consists of six 2's, six 3's, four 4's, and sixteen 8's; the N50 of L is the median of L' , which is 6. "

The original site has a self-issued security certificate, but you can accept it and look here:
https://www.broad.harvard.edu/crd/wiki/index.php/N50