AEGeAn Toolkit: integrated genome analysis

AEGeAn logo
Author:Daniel S. Standage


Note: if you’re in a hurry, check out the installation demo for the impatient.

The AEGeAn Toolkit started as several distinct but related efforts to build tools for managing and analyzing whole-genome gene structure annotations. AEGeAn has brought these efforts together into a single library that includes executable programs as well as several data structures and modules callable via a C API. The AEGeAn Toolkit leverages a variety of parsers, data structures, and graphics capabilities available from the GenomeTools library (

  • ParsEval is a program for comparing distinct sets of gene structure annotations for the same sequence(s). This program calculates and reports a rich set of comparison statistics, both at the level of individual gene loci as well as at the level of entire sequences.
  • CanonGFF3 is a tool for preprocessing GFF3 data. It validates features related to canonical protein-coding genes, accepting data encoded in a wide variety of common conventions.
  • LocusPocus is a program for computing gene loci from one or more gene prediction sets. In the ParsEval paper cited below, a ‘gene locus’ is defined as the smallest genomic region that contains all genes that overlap with any other genes in that region. This definition can be useful when comparing two sets of gene predictions.
  • Additional tools are under development and will be released once they are a bit more stable.

If you have any questions regarding AEGeAn, feel free to contact the author by email or, even better, open up a thread on AEGeAn’s issue tracker so that my response will be visible to others who may have the same questions or issues in the future.