SAFPredDB: Bacterial synteny database

Datacite citation style:
Urhan, Aysun; Cosma, Bianca-Maria; Earl, Ashlee M.; Manson, Abigail L.; Thomas Abeel (2024): SAFPredDB: Bacterial synteny database. Version 1. 4TU.ResearchData. dataset. https://doi.org/10.4121/ac84802e-853f-46f1-9786-b9d29c0f7557.v1
Other citation styles (APA, Harvard, MLA, Vancouver, Chicago, IEEE) available at Datacite
Dataset
choose version: version 2 - 2024-11-28 (latest)
version 1 - 2024-04-05

SAFPredDB is a bacterial synteny database built for the gene function prediction tool SAFPred, Synteny Aware Function Predictor. The database is a collection of conserved synteny and operons found across the bacterial kingdom. First, we formulated a synteny model based on experimentally known operons and the genomic features common in bacteria. We designed a bottoms-up, purely computational approach to build our database based on the proposed synteny model using complete bacterial genome assemblies from the Genome Taxonomy Database (GTDB).


Although we initially built SAFPred for our prediction tool only, it can be used for other purposes where such a catalog is needed. As a standalone database, it can be queried to mine information about conserved genomic patterns in bacteria. In addition, it can be updated as newer assemblies are added to GTDB.

history
  • 2024-04-05 first online, published, posted
publisher
4TU.ResearchData
format
Gzipped pickle file
organizations
TU Delft, Faculty of Electrical Engineering, Mathematics and Computer Science, Department of Intelligent Systems, Delft Bioinformatics Lab;
Broad Institute of MIT and Harvard, Infectious Disease and Microbiome Program

DATA

files (4)