%0 Generic %A Urhan, Aysun %A Cosma, Bianca-Maria %A Earl, Ashlee M. %A Manson, Abigail L. %A Abeel, Thomas %D 2024 %T SAFPredDB: Bacterial synteny database %U %R 10.4121/ac84802e-853f-46f1-9786-b9d29c0f7557.v1 %K bionformatics %K microbial genomics %K genomics %K protein language model %K bacterial genomics %K comparative genomics %K protein embeddings %K sequence analysis %K bacterial synteny %X
SAFPredDB is a bacterial synteny database built for the gene function prediction tool SAFPred, Synteny Aware Function Predictor. The database is a collection of conserved synteny and operons found across the bacterial kingdom. First, we formulated a synteny model based on experimentally known operons and the genomic features common in bacteria. We designed a bottoms-up, purely computational approach to build our database based on the proposed synteny model using complete bacterial genome assemblies from the Genome Taxonomy Database (GTDB).
Although we initially built SAFPred for our prediction tool only, it can be used for other purposes where such a catalog is needed. As a standalone database, it can be queried to mine information about conserved genomic patterns in bacteria. In addition, it can be updated as newer assemblies are added to GTDB.
%I 4TU.ResearchData