SAFPredDB: Bacterial synteny database

doi:10.4121/ac84802e-853f-46f1-9786-b9d29c0f7557.v1
The doi above is for this specific version of this dataset, which is currently the latest. Newer versions may be published in the future. For a link that will always point to the latest version, please use
doi: 10.4121/ac84802e-853f-46f1-9786-b9d29c0f7557
Datacite citation style:
Urhan, Aysun; Cosma, Bianca-Maria; Earl, Ashlee M.; Manson, Abigail L.; Thomas Abeel (2024): SAFPredDB: Bacterial synteny database. Version 1. 4TU.ResearchData. dataset. https://doi.org/10.4121/ac84802e-853f-46f1-9786-b9d29c0f7557.v1
Other citation styles (APA, Harvard, MLA, Vancouver, Chicago, IEEE) available at Datacite
Dataset

SAFPredDB is a bacterial synteny database built for the gene function prediction tool SAFPred, Synteny Aware Function Predictor. The database is a collection of conserved synteny and operons found across the bacterial kingdom. First, we formulated a synteny model based on experimentally known operons and the genomic features common in bacteria. We designed a bottoms-up, purely computational approach to build our database based on the proposed synteny model using complete bacterial genome assemblies from the Genome Taxonomy Database (GTDB).


Although we initially built SAFPred for our prediction tool only, it can be used for other purposes where such a catalog is needed. As a standalone database, it can be queried to mine information about conserved genomic patterns in bacteria. In addition, it can be updated as newer assemblies are added to GTDB.

history
  • 2024-04-05 first online, published, posted
publisher
4TU.ResearchData
format
Gzipped pickle file
organizations
TU Delft, Faculty of Electrical Engineering, Mathematics and Computer Science, Department of Intelligent Systems, Delft Bioinformatics Lab;
Broad Institute of MIT and Harvard, Infectious Disease and Microbiome Program

DATA

files (4)