Title : CARD-LinkML: Connecting the dots to develop frameworks for multicomponent antibiotic resistance
Abstract:
Antibiotic resistance is a complex problem with many interwoven concepts, spanning from individual drug-to-gene interactions to entire gene clusters working in tandem to evade antibiotics. The Comprehensive Antibiotic Resistance Database (CARD) represents individual antimicrobial resistance (AMR) genes using bioinformatic models, each storing the necessary sequence information and uses ontologies to characterize the overall resistance profile. These models are used alongside the Resistance Gene Identifier (RGI) software to report observed antibiotic resistance genes (ARGs) for given sequencing or assembly data. However, some AMR gene families comprise multiple components, whose combined impact leads to phenotypic resistance. Such gene families include glycopeptide resistance gene clusters, such as VanA in Enterococcus faecium, and eflux pump complexes, such as Mex in Pseudomonas aeruginosa. Currently, RGI can only detect individual components of AMR and cannot assess further functional relevance without properly connecting them back to their respective gene clusters or eflux pumps.
To resolve the gap, we began developing CARD-LinkML, leveraging the Linked Data Modelling Language (LinkML) to standardize the data structure for multicomponent systems for resistance. A schema written in LinkML uses classes representing overarching categories. Each class contains parameters that specify data and accepted data types, such as strings or identifiers. The developing CARD-LinkML module can perform two overarching functions. The first converts the existing CARD reference data into a schema where LinkML classes represent multicomponent gene clusters or eflux pumps. Parameters for each class include the necessary reference accession identifiers or values that characterize the resistance profile of each AMR gene. Further, we can connect the required components that make up a gene cluster or eflux pump by leveraging the underlying CARD ontology, each represented by separate LinkML classes. The second component takes RGI results into a LinkML-compatible YAML format where observed AMR genes are clustered into their respective multicomponent gene clusters or eflux pumps. Values from RGI are cross-referenced to classes in the reference schema, enabling functional assessment of the observed cluster or eflux pump based on observed AMR components.
By leveraging the LinkML schema as a tether connecting observed ARGs to their likely functional relevance, we hope to expand the RGI suite and then develop tools to aid in interpreting RGI results for more nuanced biology as observed in multicomponent systems.

