TY - DATA
T1 - SAFPredDB: Bacterial synteny database
PY - 2024/11/28
AU - Aysun Urhan
AU - Bianca-Maria Cosma
AU - Ashlee M. Earl
AU - Abigail L. Manson
AU - Thomas Abeel
UR - 
DO - 10.4121/ac84802e-853f-46f1-9786-b9d29c0f7557.v2
KW - bionformatics
KW - microbial genomics
KW - genomics
KW - protein language model
KW - bacterial genomics
KW - comparative genomics
KW - protein embeddings
KW - sequence analysis
KW - bacterial synteny
N2 - <p>SAFPredDB is a bacterial synteny database built for the gene function prediction tool SAFPred, Synteny Aware Function Predictor. The database is a collection of conserved synteny and operons found across the bacterial kingdom. First, we formulated a synteny model based on experimentally known operons and the genomic features common in bacteria. We designed a bottoms-up, purely computational approach to build our database based on the proposed synteny model using complete bacterial genome assemblies from the Genome Taxonomy Database (GTDB).</p><p><br></p><p>Although we initially built SAFPred for our prediction tool only, it can be used for other purposes where such a catalog is needed. As a standalone database, it can be queried to mine information about conserved genomic patterns in bacteria. In addition, it can be updated as newer assemblies are added to GTDB.</p>
ER -