AEGeAn C API¶
The AEGeAn Toolkit relies heavily on data
types implemented by the GenomeTools library. For data types beginning with
Gt
, see the GenomeTools API documentation at
http://genometools.org/libgenometools.html.
Class AgnFilterStream¶
-
AgnFilterStream
¶ Implements the GenomeTools
GtNodeStream
interface. This is a node stream used to remove features with certain attributes from a node stream. See the AgnFilterStream class header.
-
GtNodeStream*
agn_attribute_filter_stream_new
(GtNodeStream *in_stream, GtHashmap *filters)¶ Class constructor. The keys of the filters hashmap should be the attribute keys/value pairs (such as partial=true or pseudo=true) to test each feature node against. The values associated with each key in the hashmap can be any non-NULL value. Any feature node having an attribute key/value pair matching an entry in the hashmap will be discarded.
-
bool
agn_attribute_filter_stream_unit_test
(AgnUnitTest *test)¶ Run unit tests for this class.
Class AgnCliquePair¶
-
AgnCliquePair
¶ The AgnCliquePair class facilitates comparison of two alternative sources of annotation for the same sequence. See the AgnCliquePair class header.
-
AgnCompClassification
agn_clique_pair_classify
(AgnCliquePair *pair)¶ Based on the already-computed comparison statistics, classify this clique pair as a perfect match, a CDS match, etc. See
AgnCompClassification
.
-
void
agn_clique_pair_comparison_aggregate
(AgnCliquePair *pair, AgnComparison *comp)¶ Add this clique pair’s internal comparison stats to a larger set of aggregate stats.
-
int
agn_clique_pair_compare
(void *p1, void *p2)¶ Same as c:func:agn_clique_pair_compare_direct, but with pointer dereferencing.
-
int
agn_clique_pair_compare_direct
(AgnCliquePair *p1, AgnCliquePair *p2)¶ Determine which pair has higher comparison scores. Returns 1 if the first pair has better scores, -1 if the second pair has better scores, 0 if they are equal.
-
int
agn_clique_pair_compare_reverse
(void *p1, void *p2)¶ Negation of c:func:agn_clique_pair_compare.
-
void
agn_clique_pair_delete
(AgnCliquePair *pair)¶ Class destructor.
-
AgnTranscriptClique *
agn_clique_pair_get_pred_clique
(AgnCliquePair *pair)¶ Return a pointer to the prediction annotation from this pair.
-
AgnTranscriptClique *
agn_clique_pair_get_refr_clique
(AgnCliquePair *pair)¶ Return a pointer to the reference annotation from this pair.
-
AgnComparison *
agn_clique_pair_get_stats
(AgnCliquePair *pair)¶ Return a pointer to this clique pairs comparison statistics.
-
AgnCliquePair*
agn_clique_pair_new
(AgnTranscriptClique *refr, AgnTranscriptClique *pred)¶ Class constructor.
-
bool
agn_clique_pair_unit_test
(AgnUnitTest *test)¶ Run unit tests for this class. Returns true if all tests passed.
Class AgnCompareReportHTML¶
-
AgnCompareReportHTML
¶ The
AgnCompareReportHTML
class is an extension of theAgnCompareReport
class. This node visitor relies on its parent class to process a stream ofAgnLocus
objects (containing two alternative sources of annotation to be compared) and then produces textual reports of the comparison statistics. See the AgnCompareReportHTML class header.
-
typedef void
(*AgnCompareReportHTMLOverviewFunc)
(FILE *outstream, void *data)¶ By default, the ParsEval summary report includes an overview with the start time, filenames, and command-line arguments. Users can override this behavior by specifying a callback function that follows this signature.
-
void
agn_compare_report_html_create_summary
(AgnCompareReportHTML *rpt)¶ After the node stream has been processed, call this function to write a summary of all locus comparisons to the output directory.
-
GtNodeVisitor *
agn_compare_report_html_new
(const char *outdir, bool gff3, AgnLocusPngMetadata *pngdata, GtLogger *logger)¶ Class constructor. Creates a node visitor used to process a stream of
AgnLocus
objects containing two sources of annotation to be compared. Reports will be written inoutdir
and status messages will be written to the logger.
-
void
agn_compare_report_html_reset_summary_title
(AgnCompareReportHTML *rpt, GtStr *title_string)¶ By default, the summary report’s title will be ‘ParsEval Summary’. Use this function to replace the title text.
-
void
agn_compare_report_html_set_overview_func
(AgnCompareReportHTML *rpt, AgnCompareReportHTMLOverviewFunc func, void *funcdata)¶ Specify a callback function to be used when printing an overview on the summary report.
Class AgnCompareReportText¶
-
AgnCompareReportText
¶ The
AgnCompareReportText
class is an extension of theAgnCompareReport
class. This node visitor relies on its parent class to process a stream ofAgnLocus
objects (containing two alternative sources of annotation to be compared) and then produces textual reports of the comparison statistics. See the AgnCompareReportText class header.
-
void
agn_compare_report_text_create_summary
(AgnCompareReportText *rpt, FILE *outstream)¶ After the node stream has been processed, call this function to write a summary of all locus comparisons to
outstream
.
-
GtNodeVisitor *
agn_compare_report_text_new
(FILE *outstream, bool gff3, GtLogger *logger)¶ Class constructor. Creates a node visitor used to process a stream of
AgnLocus
objects containing two sources of annotation to be compared. Reports will be written tooutstream
and status messages will be written to the logger.
Module AgnComparison¶
Data structures and functions related to comparative assessment of gene/transcript annotations. See the AgnComparison module header.
-
AgnCompStatsBinary
¶ This struct is used to aggregate counts and statistics regarding the structural-level comparison (i.e., at the level of whole CDS segments, whole exons, and whole UTRs) and analysis of gene structure. See header file for details.
-
AgnCompStatsScaled
¶ This struct is used to aggregate counts and statistics regarding the nucleotide-level comparison and analysis of gene structure. See header file for details.
-
AgnComparison
¶ This struct aggregates all the counts and stats that go into a comparison, including structural-level and nucleotide-level counts and stats. See header file for details.
-
AgnCompClassification
¶ This enumerated type refers to all the possible outcomes when annotations from two different sources are compared:
AGN_COMP_CLASS_UNCLASSIFIED
,AGN_COMP_CLASS_PERFECT_MATCH
,AGN_COMP_CLASS_MISLABELED
,AGN_COMP_CLASS_CDS_MATCH
,AGN_COMP_CLASS_EXON_MATCH
,AGN_COMP_CLASS_UTR_MATCH
, andAGN_COMP_CLASS_NON_MATCH
.
-
AgnCompInfo
¶ This struct contains various counts to be reported in the summary report.
-
AgnCompClassDesc
¶ When reporting the results of a comparative analysis, it may be useful to (as is done by ParsEval) show some basic information about clique pairs that fall under each classification category. The counts in this struct are necessary to calculate those summary characteristics.
-
AgnCompClassSummary
¶ This struct is used to aggregate descriptions for all of the classification categories.
-
AgnComparisonData
¶ Aggregate various data related to comparison of annotations.
-
void
agn_comparison_aggregate
(AgnComparison *agg_cmp, AgnComparison *cmp)¶ Function used to combine similarity stats from many different comparisons into a single aggregate summary.
-
void
agn_comparison_data_aggregate
(AgnComparisonData *agg_data, AgnComparisonData *data)¶ Add counts and stats from
data
toagg_data
.
-
void
agn_comparison_data_init
(AgnComparisonData *data)¶ Initialize counts and stats to default values.
-
void
agn_comparison_init
(AgnComparison *comparison)¶ Initialize comparison stats to default values.
-
void
agn_comparison_print
(AgnComparison *stats, FILE *outstream)¶ Print the comparison stats to the given file.
-
void
agn_comparison_resolve
(AgnComparison *comparison)¶ Calculate stats from the given counts.
-
bool
agn_comparison_test
(AgnComparison *c1, AgnComparison *c2)¶ Returns true if c1 and c2 contain identical values, false otherwise.
-
void
agn_comp_class_desc_aggregate
(AgnCompClassDesc *agg_desc, AgnCompClassDesc *desc)¶ Add values from
desc
toagg_desc
.
-
void
agn_comp_class_desc_init
(AgnCompClassDesc *desc)¶ Initialize to default values.
-
void
agn_comp_class_summary_aggregate
(AgnCompClassSummary *agg_summ, AgnCompClassSummary *summ)¶ Add values from
summ
toagg_summ
.
-
void
agn_comp_class_summary_init
(AgnCompClassSummary *summ)¶ Initialize to default values.
-
void
agn_comp_info_aggregate
(AgnCompInfo *agg_info, AgnCompInfo *info)¶ Add values from
info
toagg_info
.
-
void
agn_comp_info_init
(AgnCompInfo *info)¶ Initialize to default values.
-
void
agn_comp_stats_binary_aggregate
(AgnCompStatsBinary *agg_stats, AgnCompStatsBinary *stats)¶ Function used to combine similarity stats from many different comparisons into a single aggregate summary.
-
void
agn_comp_stats_binary_init
(AgnCompStatsBinary *stats)¶ Initialize comparison counts/stats to default values.
-
void
agn_comp_stats_binary_print
(AgnCompStatsBinary *stats, FILE *outstream)¶ Print the comparison stats to the given file.
-
void
agn_comp_stats_binary_resolve
(AgnCompStatsBinary *stats)¶ Calculate stats from the given counts.
-
bool
agn_comp_stats_binary_test
(AgnCompStatsBinary *s1, AgnCompStatsBinary *s2)¶ Returns true if s1 and s2 contain identical values, false otherwise.
-
void
agn_comp_stats_scaled_aggregate
(AgnCompStatsScaled *agg_stats, AgnCompStatsScaled *stats)¶ Function used to combine similarity stats from many different comparisons into a single aggregate summary.
-
void
agn_comp_stats_scaled_init
(AgnCompStatsScaled *stats)¶ Initialize comparison counts/stats to default values.
-
void
agn_comp_stats_scaled_print
(AgnCompStatsScaled *stats, FILE *outstream)¶ Print the comparison stats to the given file.
-
void
agn_comp_stats_scaled_resolve
(AgnCompStatsScaled *stats)¶ Calculate stats from the given counts.
-
bool
agn_comp_stats_scaled_test
(AgnCompStatsScaled *s1, AgnCompStatsScaled *s2)¶ Returns true if s1 and s2 contain identical values, false otherwise.
Class AgnFilterStream¶
-
AgnFilterStream
Implements the GenomeTools
GtNodeStream
interface. This is a node stream used to select features of a certain type from a node stream. See the AgnFilterStream class header.
-
GtNodeStream*
agn_filter_stream_new
(GtNodeStream *in_stream, GtHashmap *typestokeep)¶ Class constructor. The keys of the
typestokeep
hashmap should be the type(s) to be kept from the node stream. Any non-NULL value can be associated with those keys.
-
bool
agn_filter_stream_unit_test
(AgnUnitTest *test)¶ Run unit tests for this class. Returns true if all tests passed.
Class AgnGaevalVisitor¶
-
AgnGaevalVisitor
¶ Implements the GenomeTools
GtNodeVisitor
interface. This is a node visitor used for calculating transcript coverage and integrity scores for gene models using alignment data. See the AgnGaevalVisitor class header.
-
AgnGaevalParams
¶ Parameters used in calculating GAEVAL integrity. See http://www.plantgdb.org/GAEVAL/docs/integrity.html
-
GtNodeStream*
agn_gaeval_stream_new
(GtNodeStream *in, GtNodeStream *astream, AgnGaevalParams gparams)¶ Constructor for a node stream based on this node visitor.
-
GtNodeVisitor*
agn_gaeval_visitor_new
(GtNodeStream *astream, AgnGaevalParams gparams)¶ Class constructor for the node visitor.
-
void
agn_gaeval_visitor_tsv_out
(AgnGaevalVisitor *v, GtStr *tsvfilename)¶ Indicate a file to be used for printing TSV output.
-
bool
agn_gaeval_visitor_unit_test
(AgnUnitTest *test)¶ Run unit tests for this class.
Class AgnGeneStream¶
-
AgnGeneStream
¶ Implements the
GtNodeStream
interface. Searches the complete feature graph of each feature node in the input for canonical protein-coding gene features. Some basic sanity checks are performed on the mRNA(s) associated with each gene, and genes are only delivered to the output stream if they include one or more valid mRNA subfeatures. See the AgnGeneStream class header.
-
GtNodeStream*
agn_gene_stream_new
(GtNodeStream *in_stream, GtLogger *logger)¶ Class constructor.
-
void
agn_gene_stream_set_source
(AgnGeneStream *gs, GtStr *source)¶ Specify a source (GFF3 column 2) to be applied to newly inferred features (default is ‘.’).
-
bool
agn_gene_stream_unit_test
(AgnUnitTest *test)¶ Run unit tests for this class. Returns true if all tests passed.
Class AgnIdFilterStream¶
-
AgnIdFilterStream
¶ Implements the GenomeTools
GtNodeStream
interface. This is a node stream used to select features from a node stream using a pre-specified list of IDs. See the AgnIdFilterStream class header.
-
GtNodeStream*
agn_id_filter_stream_new
(GtNodeStream *in_stream, GtHashmap *ids2keep)¶ Class constructor. The keys of the
ids2keep
hashmap should be strings of the IDs of features to be kept from the node stream. Any non-NULL value can be associated with those keys.
-
bool
agn_id_filter_stream_unit_test
(AgnUnitTest *test)¶ Run unit tests for this class. Returns true if all tests passed.
Class AgnInferCDSVisitor¶
-
AgnInferCDSVisitor
¶ Implements the GenomeTools
GtNodeVisitor
interface. This is a node visitor used for inferring an mRNA’s CDS from explicitly defined exon and start/stop codon features. See the AgnInferCDSVisitor class header.
-
GtNodeStream*
agn_infer_cds_stream_new
(GtNodeStream *in, GtStr *source, GtLogger *logger)¶ Constructor for a node stream based on this node visitor.
-
GtNodeVisitor *
agn_infer_cds_visitor_new
(GtLogger *logger)¶ Constructor for the node visitor.
-
void
agn_infer_cds_visitor_set_source
(AgnInferCDSVisitor *v, GtStr *source)¶ Set the source value (GFF3 column 2) that will be assigned to any inferred features (default is ‘.’).
-
bool
agn_infer_cds_visitor_unit_test
(AgnUnitTest *test)¶ Run unit tests for this class. Returns true if all tests passed.
Class AgnInferExonsVisitor¶
-
AgnInferExonsVisitor
¶ Implements the GenomeTools
GtNodeVisitor
interface. This is a node visitor used for inferring exon features when only CDS and UTR features are provided explicitly. See the AgnInferExonsVisitor class header.
-
GtNodeStream*
agn_infer_exons_stream_new
(GtNodeStream *in, GtStr *source, GtLogger *logger)¶ Constructor for a node stream based on this node visitor.
-
GtNodeVisitor*
agn_infer_exons_visitor_new
(GtLogger *logger)¶ Class constructor for the node visitor.
-
void
agn_infer_exons_visitor_set_source
(AgnInferExonsVisitor *v, GtStr *source)¶ Set the source value (GFF3 column 2) that will be assigned to any inferred features (default is ‘.’).
-
bool
agn_infer_exons_visitor_unit_test
(AgnUnitTest *test)¶ Run unit tests for this class.
Class AgnInferParentStream¶
-
AgnInferParentStream
¶ Implements the GenomeTools
GtNodeStream
interface. This node stream creates new features as parents for the specified types. For example, iftype_parents
includes an entry withtRNA
as the key andgene
as the value, this node stream will create agene
feature for anytRNA
feature that lacks a gene parent. See the AgnInferParentStream class header.
-
GtNodeStream*
agn_infer_parent_stream_new
(GtNodeStream *in_stream, GtHashmap *type_parents)¶ Class constructor. The hashmap contains a list of key-value pairs, both strings. Any time the stream encounters a top-level (parentless) feature whose type is a key in the hashmap, a parent will be created for this feature of the type associated with the key.
-
void
agn_infer_parent_stream_set_source
(AgnInferParentStream *stream, GtStr *source)¶ Set the source (GFF3 2nd column) for all inferred features.
-
bool
agn_infer_parent_stream_unit_test
(AgnUnitTest *test)¶ Run unit tests for this class. Returns true if all tests passed.
Class AgnLocus¶
-
AgnLocus
¶ The AgnLocus class represents gene loci and interval loci in memory and can be used to facilitate comparison of two different sources of annotation. Under the hood, each
AgnLocus
object is a feature node with one or more gene features as direct children. See the AgnLocus class header.
-
AgnComparisonSource
¶ When tracking the source of an annotation for comparison purposes, use this enumerated type to refer to reference (
REFERENCESOURCE
) vs prediction (PREDICTIONSOURCE
) annotations.DEFAULTSOURCE
is for when the source is not a concern.
-
AgnLocusPngMetadata
¶ This data structure provides a convenient container for metadata needed to produce a PNG graphic for pairwise comparison loci.
-
AgnLocusFilterOp
¶ Comparison operators to use when filtering loci.
-
AgnLocusFilter
¶ Data by which to filter a locus. If the value returned by
function
satisfies the criterion specified bytestvalue
andoperator
, then the locus is to be kept.
-
void
agn_locus_add
(AgnLocus *locus, GtFeatureNode *feature, AgnComparisonSource source)¶ Associate the given annotation with this locus. Rather than calling this function directly, users are recommended to use one of the following macros:
agn_locus_add_pred_feature(locus, gene)
andagn_locus_add_refr_feature(locus, gene)
, to be used when keeping track of an annotation’s source is important (i.e. for pairwise comparison); andagn_locus_add_feature(locus, gene)
otherwise.
-
AgnLocus *
agn_locus_clone
(AgnLocus *locus)¶ Do a semi-shallow copy of this data structure–for members whose data types support reference counting, the same pointer is used and the reference is incremented. For the other members a new object is created and populated with the same content.
-
GtUword
agn_locus_cds_length
(AgnLocus *locus, AgnComparisonSource src)¶ The combined length of all coding sequences associated with this locus. Rather than calling this function directly, users are encouraged to use one of the following macros:
agn_locus_refr_cds_length(locus)
for the combined length of all reference CDSs,agn_locus_pred_cds_length(locus)
for the combined length of all prediction CDSs, andagn_locus_get_cds_length(locus)
for the combined length of all CDSs.
-
void
agn_locus_comparative_analysis
(AgnLocus *locus, GtLogger *logger)¶ Compare every reference transcript clique with every prediction transcript clique. For gene loci with multiple transcript cliques, each comparison is not necessarily reported. Instead, we report the set of clique pairs that provides the optimal pairing of reference and prediction transcripts. If there are more reference transcript cliques than prediction cliques (or vice versa), these unmatched cliques are reported separately.
-
int
agn_locus_array_compare
(const void *p1, const void *p2)¶ Analog of
strcmp
for sorting AgnLocus objects. Loci are first sorted lexicographically by sequence ID, and then spatially by genomic coordinates.
-
void
agn_locus_comparison_aggregate
(AgnLocus *locus, AgnComparison *comp)¶ Add this locus’ internal comparison stats to a larger set of aggregate stats.
-
void
agn_locus_data_aggregate
(AgnLocus *locus, AgnComparisonData *data)¶ Add this locus’ internal comparison stats to a larger set of aggregate stats.
-
GtUword
agn_locus_exon_num
(AgnLocus *locus, AgnComparisonSource src)¶ Get the number of exons for the locus. Rather than calling this function directly, users are encouraged to use one of the following macros:
agn_locus_num_pred_exons(locus)
for the number of prediction exons,agn_locus_num_refr_exons(locus)
for the number of reference exons, oragn_locus_num_exons(locus)
if the source of annotation is undesignated or irrelevant.
-
void
agn_locus_filter_parse
(FILE *filterfile, GtArray *filters)¶ Parse filters from
filterfile
and placeAgnLocusFilter
objects infilters
.
-
bool
agn_locus_filter_test
(AgnLocus *locus, AgnLocusFilter *filter)¶ Return true if
locus
satisfies the given filtering criterion.
-
GtArray *
agn_locus_get
(AgnLocus *locus)¶ Return an array of the locus’ top-level children, regardless of their type.
-
GtArray *
agn_locus_get_unique_pred_cliques
(AgnLocus *locus)¶ Get a list of all the prediction transcript cliques that have no corresponding reference transcript clique.
-
GtArray *
agn_locus_get_unique_refr_cliques
(AgnLocus *locus)¶ Get a list of all the reference transcript cliques that have no corresponding prediction transcript clique.
-
GtArray *
agn_locus_genes
(AgnLocus *locus, AgnComparisonSource src)¶ Get the genes associated with this locus. Rather than calling this function directly, users are encouraged to use one of the following macros:
agn_locus_pred_genes(locus)
to retrieve prediction genes,agn_locus_refr_genes(locus)
to retrieve reference genes, oragn_locus_get_genes(locus)
if the source of annotation is undesignated or irrelevant.
-
GtArray *
agn_locus_gene_ids
(AgnLocus *locus, AgnComparisonSource src)¶ Get the gene IDs associated with this locus. Rather than calling this function directly, users are encouraged to use one of the following macros:
agn_locus_pred_gene_ids(locus)
to retrieve prediction IDs,agn_locus_refr_gene_ids(locus)
to retrieve reference IDs, oragn_locus_get_gene_ids(locus)
if the source of annotation is undesignated or irrelevant.
-
GtUword
agn_locus_gene_num
(AgnLocus *locus, AgnComparisonSource src)¶ Get the number of genes for the locus. Rather than calling this function directly, users are encouraged to use one of the following macros:
agn_locus_num_pred_genes(locus)
for the number of prediction genes,agn_locus_num_refr_genes(locus)
for the number of reference genes, oragn_locus_num_genes(locus)
if the source of annotation is undesignated or irrelevant.
-
int
agn_locus_inner_orientation
(AgnLocus *left, AgnLocus *right)¶ Given two adjacent gene-containing iLoci, determine their orientation: 0 for both forward (‘>>’), 1 for inner (‘><’), 2 for outer (‘<>’), and 3 for reverse (‘<<’).
-
GtArray *
agn_locus_mrnas
(AgnLocus *locus, AgnComparisonSource src)¶ Get the mRNAs associated with this locus. Rather than calling this function directly, users are encouraged to use one of the following macros:
agn_locus_pred_mrnas(locus)
to retrieve prediction mRNAs,agn_locus_refr_mrnas(locus)
to retrieve reference mRNAs, oragn_locus_get_mrnas(locus)
if the source of annotation is undesignated or irrelevant.
-
GtArray *
agn_locus_mrna_ids
(AgnLocus *locus, AgnComparisonSource src)¶ Get the mRNA IDs associated with this locus. Rather than calling this function directly, users are encouraged to use one of the following macros:
agn_locus_pred_mrna_ids(locus)
to retrieve prediction IDs,agn_locus_refr_mrna_ids(locus)
to retrieve reference IDs, oragn_locus_get_mrna_ids(locus)
if the source of annotation is undesignated or irrelevant.
-
GtUword
agn_locus_mrna_num
(AgnLocus *locus, AgnComparisonSource src)¶ Get the number of mRNAs for the locus. Rather than calling this function directly, users are encouraged to use one of the following macros:
agn_locus_num_pred_mrnas(locus)
for the number of prediction mRNAs,agn_locus_num_refr_mrnas(locus)
for the number of reference mRNAs, oragn_locus_num_mrnas(locus)
if the source of annotation is undesignated or irrelevant.
-
GtArray *
agn_locus_pairs_to_report
(AgnLocus *locus)¶ Return the clique pairs to be reported for this locus.
-
void
agn_locus_png_track_selector
(GtBlock *block, GtStr *track, void *data)¶ Track selector function for generating PNG graphics of pairwise comparison loci. The track name to will be written to
track
.
-
void
agn_locus_print_png
(AgnLocus *locus, AgnLocusPngMetadata *metadata)¶ Print a PNG graphic for this locus.
-
void
agn_locus_print_transcript_mapping
(AgnLocus *locus, FILE *outstream)¶ Print a mapping of the transcript(s) associated with this locus in a two-column tab-delimited format:
transcriptId<tab>locusId
.
-
void
agn_locus_set_range
(AgnLocus *locus, GtUword start, GtUword end)¶ Set the start and end coordinates for this locus.
-
double
agn_locus_splice_complexity
(AgnLocus *locus, AgnComparisonSource src)¶ Calculate the splice complexity of this gene locus. Rather than calling this method directly, users are recommended to use one of the following macros:
agn_locus_prep_splice_complexity(locus)
to calculate the splice complexity of just the prediction transcripts,agn_locus_refr_splice_complexity(locus)
to calculate the splice complexity of just the reference transcripts, andagn_locus_calc_splice_complexity(locus)
to calculate the splice complexity taking into account all transcripts.
-
bool
agn_locus_unit_test
(AgnUnitTest *test)¶ Run unit tests for this class. Returns true if all tests passed.
Class AgnLocusFilterStream¶
-
AgnLocusFilterStream
¶ Implements the GenomeTools
GtNodeStream
interface. This is a node stream used to select loci based on user-specified criteria. See the AgnLocusFilterStream class header.
-
GtNodeStream*
agn_locus_filter_stream_new
(GtNodeStream *in_stream, GtArray *filters)¶ Class constructor. The keys of the
typestokeep
hashmap should be the type(s) to be kept from the node stream. Any non-NULL value can be associated with those keys.
-
bool
agn_locus_filter_stream_unit_test
(AgnUnitTest *test)¶ Run unit tests for this class. Returns true if all tests passed.
Class AgnLocusMapVisitor¶
-
AgnLocusMapVisitor
¶ Implements the GenomeTools
GtNodeVisitor
interface. This is a node visitor used for printing out gene –> locus and mRNA –> locus relationships as part of a locus/iLocus processing stream. See the AgnLocusMapVisitor class header.
-
GtNodeStream*
agn_locus_map_stream_new
(GtNodeStream *in, FILE *genefh, FILE *mrnafh)¶ Constructor for a node stream based on this node visitor. See
agn_locus_map_visitor_new()
for a description of the function arguments.
-
GtNodeVisitor *
agn_locus_map_visitor_new
(FILE *genefh, FILE *mrnafh)¶ Constructor for the node visitor. Gene-to-locus relationships are printed to the
genefh
file handle, while mRNA-to-locus relationships are printed to themrnafh
file handle. Setting either file handle to NULL will disable printing the corresponding output.
Class AgnLocusRefineStream¶
-
AgnLocusRefineStream
¶ Implements the
GtNodeStream
interface. By default theAgnLocusStream
class will group any and all overlapping features of interest together in the same iLocus. TheAgnLocusRefineStream
class can be used to post-processAgnLocusStream
output to accommodate a more flexible handling of overlapping features, such as requiring CDS overlap for grouping genes together or creating distinct iLoci for genes within the introns of other genes. See the AgnLocusRefineStream class header.
-
GtNodeStream *
agn_locus_refine_stream_new
(GtNodeStream *in_stream, GtUword delta, GtUword minoverlap, bool by_cds)¶ Class constructor.
-
void
agn_locus_refine_stream_set_name_format
(AgnLocusRefineStream *stream, const char *format)¶ Assign a Name attribute with a serial number to each iLocus using the specified printf-style format.
-
void
agn_locus_refine_stream_set_source
(AgnLocusRefineStream *stream, const char *source)¶ Set the source value to be used for all iLoci created by this stream. Default value is ‘AEGeAn::AgnLocusStream’.
-
void
agn_locus_refine_stream_track_ilens
(AgnLocusRefineStream *stream, FILE *ilenfile)¶ Record the length of each intergenic iLocus as loci are being parsed.
-
bool
agn_locus_refine_stream_unit_test
(AgnUnitTest *test)¶ Run unit tests for this class. Returns true if all tests passed.
Class AgnLocusStream¶
-
AgnLocusStream
¶ Implements the
GtNodeStream
interface. The only feature nodes delivered by this stream have typelocus
, and the only direct children of these features are gene features present in the input stream. Any overlapping genes are children of the same locus feature. See the AgnLocusStream class header.
-
void
agn_locus_stream_label_pairwise
(AgnLocusStream *stream, const char *refrfile, const char *predfile)¶ Use the given filenames to label the direct children of each iLocus as a ‘reference’ feature or a ‘prediction’ feature, to facilitate pairwise comparison. Note that these labels carry no connotation as to the relative quality of the respective annotation sources.
-
GtNodeStream *
agn_locus_stream_new
(GtNodeStream *in_stream, GtUword delta)¶ Calculate iLoci from a node stream which may or may not include data from multiple sources. Extend each iLocus boundary as far as possible without overlapping a gene from another iLocus, or by delta nucleotides, whichever is shorter.
-
void
agn_locus_stream_set_endmode
(AgnLocusStream *stream, int endmode)¶ Terminal iLoci or ‘end loci’ are empty iLoci at either end of a sequence. To exclude terminal iLoci from the output, set endmode < 0. To output only terminal iLoci, set endmode > 0. By default (endmode == 0), terminal iLoci are reported along with all other iLoci.
-
void
agn_locus_stream_set_name_format
(AgnLocusStream *stream, const char *fmt)¶ Assign a Name attribute with a serial number to each iLocus using the specified printf-style format.
-
void
agn_locus_stream_skip_iiLoci
(AgnLocusStream *stream)¶ By default, the locus stream will produce loci containing features and loci containing no features. This function disables reporting of the latter.
-
void
agn_locus_stream_set_source
(AgnLocusStream *stream, const char *source)¶ Set the source value to be used for all iLoci created by this stream. Default value is ‘AEGeAn::AgnLocusStream’.
-
void
agn_locus_stream_track_ilens
(AgnLocusStream *stream, FILE *ilenfile)¶ Record the length of each intergenic iLocus as loci are being parsed.
-
bool
agn_locus_stream_unit_test
(AgnUnitTest *test)¶ Run unit tests for this class. Returns true if all tests passed.
Class AgnMrnaRepVisitor¶
-
AgnMrnaRepVisitor
¶ Implements the GenomeTools
GtNodeVisitor
interface. This is a node visitor used for filtering out all but the longest mRNA (as measured by CDS length) from alternatively spliced genes. See the AgnMrnaRepVisitor class header.
-
GtNodeStream*
agn_mrna_rep_stream_new
(GtNodeStream *in, FILE *mapstream)¶ Constructor for a node stream based on this node visitor.
-
GtNodeVisitor *
agn_mrna_rep_visitor_new
(FILE *mapstream)¶ Constructor for the node visitor. If mapstream is not NULL, each gene/mRNA rep pair will be written to mapstream.
-
void
agn_mrna_rep_visitor_set_parent_type
(AgnMrnaRepVisitor *v, const char *type)¶ By default, the representative mRNA for each gene will be reported. Use this function to specify an alternative top-level feature to gene (such as locus).
-
bool
agn_mrna_rep_visitor_unit_test
(AgnUnitTest *test)¶ Run unit tests for this class. Returns true if all tests passed.
Class AgnPseudogeneFixVisitor¶
-
AgnPseudogeneFixVisitor
¶ Implements the GenomeTools
GtNodeVisitor
interface. This is a node visitor used for correcting thetype
value for pseudogene features erroneously using thegene
type instead of the more appropriatepseudogene
type. See the AgnPseudogeneFixVisitor class header.
-
GtNodeStream*
agn_pseudogene_fix_stream_new
(GtNodeStream *in)¶ Constructor for a node stream based on this node visitor.
-
GtNodeVisitor *
agn_pseudogene_fix_visitor_new
()¶ Constructor for the node visitor.
-
bool
agn_pseudogene_fix_visitor_unit_test
(AgnUnitTest *test)¶ Run unit tests for this class. Returns true if all tests passed.
Class AgnRemoveChildrenVisitor¶
-
AgnRemoveChildrenVisitor
¶ Implements the GenomeTools
GtNodeVisitor
interface. This is a node visitor used for correcting removing all children of each top-level feature. Psuedo-features are not modified. See the AgnRemoveChildrenVisitor class header.
-
GtNodeStream*
agn_remove_children_stream_new
(GtNodeStream *in)¶ Constructor for a node stream based on this node visitor.
-
GtNodeVisitor *
agn_remove_children_visitor_new
()¶ Constructor for the node visitor.
-
bool
agn_remove_children_visitor_unit_test
(AgnUnitTest *test)¶ Run unit tests for this class. Returns true if all tests passed.
Class AgnTranscriptClique¶
-
AgnTranscriptClique
¶ The purpose of the AgnTranscriptClique class is to store data pertaining to an individual maximal transcript clique. This clique may only contain a single transcript, or it may contain many. The only stipulation is that the transcripts do not overlap. Under the hood, each
AgnTranscriptClique
instance is a pseudo node (a GtFeatureNode object) with one or more transcript features as direct children. See the AgnTranscriptClique class header.
-
typedef void
(*AgnCliqueVisitFunc)
(GtFeatureNode*, void*)¶ The signature that functions must match to be applied to each transcript in the given clique. The function will be called once for each transcript in the clique. The transcript will be passed as the first argument, and a second argument is available for an optional pointer to supplementary data (if needed). See
agn_transcript_clique_traverse()
.
-
void
agn_transcript_clique_add
(AgnTranscriptClique *clique, GtFeatureNode *transcript)¶ Add a transcript to this clique.
-
GtUword
agn_transcript_clique_cds_length
(AgnTranscriptClique *clique)¶ Get the combined CDS length (in base pairs) for all transcripts in this clique.
-
AgnTranscriptClique*
agn_transcript_clique_copy
(AgnTranscriptClique *clique)¶ Make a shallow copy of this transcript clique.
-
void
agn_transcript_clique_delete
(AgnTranscriptClique *clique)¶ Class destructor.
-
const char *
agn_transcript_clique_get_model_vector
(AgnTranscriptClique *clique)¶ Get a pointer to the string representing this clique’s transcript structure.
-
bool
agn_transcript_clique_has_id_in_hash
(AgnTranscriptClique *clique, GtHashmap *map)¶ Determine whether any of the transcript IDs associated with this clique are keys in the given hash map.
-
char *
agn_transcript_clique_id
(AgnTranscriptClique *clique)¶ Retrieve the ID attribute of the transcript associated with this clique. User is responsible to free the string.
-
GtArray *
agn_transcript_clique_ids
(AgnTranscriptClique *clique)¶ Retrieve the ID attributes of all transcripts associated with this clique.
-
AgnTranscriptClique *
agn_transcript_clique_new
(AgnSequenceRegion *region)¶ Class constructor.
locusrange
should be a pointer to the genomic coordinates of the locus to which this transcript clique belongs.
-
GtUword
agn_transcript_clique_num_exons
(AgnTranscriptClique *clique)¶ Get the number of exons in this clique.
-
GtUword
agn_transcript_clique_num_utrs
(AgnTranscriptClique *clique)¶ Get the number of UTR segments in this clique.
-
void
agn_transcript_clique_put_ids_in_hash
(AgnTranscriptClique *clique, GtHashmap *map)¶ Add all of the IDs associated with this clique to the given hash map.
-
GtUword
agn_transcript_clique_size
(AgnTranscriptClique *clique)¶ Get the number of transcripts in this clique.
-
GtArray*
agn_transcript_clique_to_array
(AgnTranscriptClique *clique)¶ Get an array containing all the transcripts in this clique. User is responsible for deleting the array.
-
void
agn_transcript_clique_to_gff3
(AgnTranscriptClique *clique, FILE *outstream, const char *prefix)¶ Print the transcript clique to the given outstream in GFF3 format, optionally with a prefix.
-
void
agn_transcript_clique_traverse
(AgnTranscriptClique *clique, AgnCliqueVisitFunc func, void *funcdata)¶ Apply
func
to each transcript in the clique. SeeAgnCliqueVisitFunc
.
-
bool
agn_transcript_clique_unit_test
(AgnUnitTest *test)¶ Run unit tests for this class. Returns true if all tests passed.
Module AgnTypecheck¶
Functions for testing feature types. See the AgnTypecheck module header.
-
bool
agn_typecheck_cds
(GtFeatureNode *fn)¶ Returns true if the given feature is a CDS; false otherwise.
-
GtUword
agn_typecheck_count
(GtFeatureNode *fn, bool (*func)(GtFeatureNode *))¶ Count the number of
fn
’s children that have the given type.
-
bool
agn_typecheck_exon
(GtFeatureNode *fn)¶ Returns true if the given feature is an exon; false otherwise.
-
GtUword
agn_typecheck_feature_combined_length
(GtFeatureNode *root, bool (*func)(GtFeatureNode *))¶ Traverse the feature graph starting at root and add up the length of all features matching the given selection function func.
-
bool
agn_typecheck_gene
(GtFeatureNode *fn)¶ Returns true if the given feature is a gene; false otherwise.
-
bool
agn_typecheck_intron
(GtFeatureNode *fn)¶ Returns true if the given feature is an intron; false otherwise.
-
bool
agn_typecheck_mrna
(GtFeatureNode *fn)¶ Returns true if the given feature is an mRNA; false otherwise.
-
bool
agn_typecheck_pseudogene
(GtFeatureNode *fn)¶ Returns true if the given feature is declared as a pseudogene; false otherwise.
-
GtArray *
agn_typecheck_select
(GtFeatureNode *fn, bool (*func)(GtFeatureNode *))¶ Gather the children of a given feature that have a certain type. Type is tested by
func
, which accepts a singleGtFeatureNode
object.
-
GtArray *
agn_typecheck_select_str
(GtFeatureNode *fn, const char *)¶ Gather the children of a given feature that have a certain type. Type is tested by comparing type to the type of fn.
-
bool
agn_typecheck_start_codon
(GtFeatureNode *fn)¶ Returns true if the given feature is a start codon; false otherwise.
-
bool
agn_typecheck_stop_codon
(GtFeatureNode *fn)¶ Returns true if the given feature is a stop codon; false otherwise.
-
bool
agn_typecheck_transcript
(GtFeatureNode *fn)¶ Returns true if the given feature is an mRNA, tRNA, or rRNA; false otherwise.
-
bool
agn_typecheck_utr
(GtFeatureNode *fn)¶ Returns true if the given feature is a UTR; false otherwise.
-
bool
agn_typecheck_utr3p
(GtFeatureNode *fn)¶ Returns true if the given feature is a 3’ UTR; false otherwise.
-
bool
agn_typecheck_utr5p
(GtFeatureNode *fn)¶ Returns true if the given feature is a 5’ UTR; false otherwise.
Class AgnUnitTest¶
-
AgnUnitTest
¶ Class used for unit testing of classes and modules. See the AgnUnitTest class header.
-
void
agn_unit_test_delete
(AgnUnitTest *test)¶ Destructor.
-
AgnUnitTest *
agn_unit_test_new
(const char *label, bool (*testfunc)(AgnUnitTest *))¶ Class constructor, where
label
is a label for the test andtestfunc
is a pointer to the function that will execute the test.
-
void
agn_unit_test_print
(AgnUnitTest *test, FILE *outstream)¶ Prints results of the unit test to
outstream
.
-
void
agn_unit_test_result
(AgnUnitTest *test, const char *label, bool success)¶ Add a result to this unit test.
-
bool
agn_unit_test_success
(AgnUnitTest *test)¶ Returns true if all the results checked with this unit test passed, false otherwise.
-
void
agn_unit_test_run
(AgnUnitTest *test)¶ Run the unit test.
Module AgnUtils¶
Collection of assorted functions that are otherwise unrelated. See the AgnUtils module header.
-
AgnSequenceRegion
¶ This data structure combines sequence coordinates with a sequence ID to facilitate their usage together.
-
GtArray*
agn_array_copy
(GtArray *source, size_t size)¶ Similar to
gt_array_copy
, except that array elements are treated as pointers and dereferenced before being added to the new array.
-
double
agn_calc_splice_complexity
(GtArray *transcripts)¶ Determine the splice complexity of the given set of transcripts.
-
GtUword
agn_feature_index_copy_regions
(GtFeatureIndex *dest, GtFeatureIndex *src, bool use_orig, GtError *error)¶ Copy the sequence regions from
src
todest
. Ifuse_orig
is true, regions specified by input region nodes (such as those parsed from##sequence-region
pragmas in GFF3) are used. Otherwise, regions inferred directly from the feature nodes are used.
-
GtUword
agn_feature_index_copy_regions_pairwise
(GtFeatureIndex *dest, GtFeatureIndex *refrsrc, GtFeatureIndex *predsrc, bool use_orig, GtError *error)¶ Copy the sequence regions from
refrsrc
andpredsrc
todest
. Ifuse_orig
is true, regions specified by input region nodes (such as those parsed from##sequence-region
pragmas in GFF3) are used. Otherwise, regions inferred directly from the feature nodes are used.
-
GtRange
agn_feature_node_get_cds_range
(GtFeatureNode *fn)¶ Traverse the given feature and its subfeatures and find the range occupied by coding sequence, or {0,0} if there is no coding sequence.
-
GtArray *
agn_feature_node_get_children
(GtFeatureNode *fn)¶ Return an array of the given feature node’s direct children.
-
const char *
agn_feature_node_get_label
(GtFeatureNode *fn)¶ ID attribute is required to be unique within a single GFF3 file, but is not guaranteed to be unique among all GFF3 files (that is, it is not a stable identifier). This function searches other common attributes for labels or identifiers (accession and then Name) that are likely to be more stable. If these cannot be found, the ID attribute is retrieved. Finally, if ID is unavailable, the location of the feature will be returned in the format ${featuretype}:${seqid}_${start}-${end}.
-
void
agn_feature_node_remove_tree
(GtFeatureNode *root, GtFeatureNode *fn)¶ Remove feature
fn
and all its subfeatures fromroot
. Analogous togt_feature_node_remove_leaf
with the difference thatfn
need not be a leaf feature.
-
bool
agn_feature_overlap_check
(GtArray *feats)¶ Returns true if any of the features in
feats
overlaps, false otherwise.
-
int
agn_genome_node_compare
(GtGenomeNode **gn_a, GtGenomeNode **gn_b)¶ Compare function for data type
GtGenomeNode ``, needed for sorting ``GtGenomeNode `` stored in ``GtArray
objects.
-
GtUword
agn_mrna_3putr_length
(GtFeatureNode *mrna)¶ Determine the length of an mRNA’s 3’ UTR.
-
GtUword
agn_mrna_5putr_length
(GtFeatureNode *mrna)¶ Determine the length of an mRNA’s 5’UTR.
-
GtUword
agn_mrna_cds_length
(GtFeatureNode *mrna)¶ Determine the length of an mRNA’s coding sequence.
-
GtRange
agn_multi_child_range
(GtFeatureNode *top, GtFeatureNode *rep)¶ If a top-level feature
top
contains a multifeature child (with multi representativerep
), use this function to get the complete range of the multifeature.
-
bool
agn_overlap_ilocus
(GtGenomeNode *f1, GtGenomeNode *f2, GtUword minoverlap, bool by_cds)¶ Determine if two features overlap such that they should be assigned to the same iLocus. Specify the minimum overlap (in bp) required and whether the location of the feature’s coding sequence (CDS) should be used or not.
-
void
agn_print_version
(const char *progname, FILE *outstream)¶ CLI function: provide the name of the program, and this function prints out the AEGeAn version number to the specified outstream.
-
int
agn_sprintf_comma
(GtUword n, char *buffer)¶ Format the given non-negative number with commas as the thousands separator. The resulting string will be written to
buffer
.
-
int
agn_string_compare
(const void *p1, const void *p2)¶ Dereference the given pointers and compare the resulting strings (a la
strcmp
).
-
GtStrArray*
agn_str_array_union
(GtStrArray *a1, GtStrArray *a2)¶ Find the strings that are present in either (or both) of the string arrays.