SAFPredDB: Bacterial synteny database
doi: 10.4121/ac84802e-853f-46f1-9786-b9d29c0f7557
SAFPredDB is a bacterial synteny database built for the gene function prediction tool SAFPred, Synteny Aware Function Predictor. The database is a collection of conserved synteny and operons found across the bacterial kingdom. First, we formulated a synteny model based on experimentally known operons and the genomic features common in bacteria. We designed a bottoms-up, purely computational approach to build our database based on the proposed synteny model using complete bacterial genome assemblies from the Genome Taxonomy Database (GTDB).
Although we initially built SAFPred for our prediction tool only, it can be used for other purposes where such a catalog is needed. As a standalone database, it can be queried to mine information about conserved genomic patterns in bacteria. In addition, it can be updated as newer assemblies are added to GTDB.
- 2024-04-05 first online, published, posted
Broad Institute of MIT and Harvard, Infectious Disease and Microbiome Program
DATA
- 3,748 bytesMD5:
5408fd75b71ae74dd68043e38205addc
README.md - 325,656,953 bytesMD5:
be2a05217809ad2d2339957dd1bcf385
safpreddb_cluster_dict.pkl - 1,929,538,045 bytesMD5:
fd1ed5a32279aa628e273533f0fc0f1c
safpreddb_full_emb.pkl - 6,111,583 bytesMD5:
f022a0fc61d66d304c203f5343b45e0a
safpreddb_full_nr.pkl.tar.gz -
download all files (zip)
2,261,310,329 bytes unzipped