odgi
provides an efficient and succinct dynamic DNA sequence graph model, as well as a host of algorithms that allow the use of such graphs in bioinformatic analyses.
Careful encoding of graph entities allows odgi
to efficiently compute and transform pangenomes with minimal overheads. odgi
implements a dynamic data structure that leveraged multi-core CPUs and can be updated on the fly.
The edges and path steps are recorded as deltas between the current node id and the target node id, where the node id corresponds to the rank in the global array of nodes. Graphs built from biological data sets tend to have local partial order and, when sorted, the deltas be small. This allows them to be compressed with a variable length integer representation, resulting in a small in-memory footprint at the cost of packing and unpacking.
The RAM and computational savings are substantial. In partially ordered regions of the graph, most deltas will require only a single byte.