Title: Expected and observed prokaryotic genotypic complexity: correlation between 16S-rRNA and protein domain content Authors: Jasper J. Koehorst, Edoardo Saccenti, Vitor Martins dos Santos, Maria Suarez-Diez, Peter J. Schaap Laboratory of Systems and Synthetic Biology, Wageningen University & Research Corresponding author: Jasper J. Koehorst Contact information: jasper.koehorst@wur.nl Laboratory of Systems and Synthetic Biology Wageningen University & Research P.O. Box 8033 6700 EJ Wageningen The Netherlands Data files: --------------------------------------------------------------------------------------------------------------------------------------------------- Supplementary_S1_GenomeInformation.xlsx (680K) 3a2f58235eb2b0cf79c63f745c57da1c3d7b69ff8bdbf2ccd0fcfc9de508f90433b9cdc265673d72096834784e3b1f76 Information regarding the genomes used. sample: Sample identifier used to retrieve the genome from ENA genomesize: Size of the genome genes: Number of genes identified abundance: Total number of PFAM domains identified geneCountDomain: Number of genes with a domain cov: The ratio of genes that are covered by a domain distinct: Number of domain classes per genome taxid: The taxonomic identifier name: The strain name speciesname: The species name genusname: The genus name famname: The family name classname: The class name Reference strain: If this strain is labelled as a reference strain --------------------------------------------------------------------------------------------------------------------------------------------------- Supplementary_S2_16S.pdf (211K) db66bdd2931827ca5ceca2f6fcd13ca10dc197997bc72cb28b8e30a9fc905cb5bf813584d96f133f75258caf9753217f PDF image related to the distribution length distribution of the 16S-rRNA genes. --------------------------------------------------------------------------------------------------------------------------------------------------- Supplementary_S3_logp.xlsx (1.7M) 17c5c2df735b058f2c9527d52f076827f4827258f239bb77b1a14b54261f7f264e0e81b1ad7a06aab783a24e292f79a6 Each tab contains an overview of the log odds and global persistency for the domains found within this species. --------------------------------------------------------------------------------------------------------------------------------------------------- Supplementary_S4_DB.ttl.gz (40G / 281G) 40d034247a001ad93f3ce315a707e628a1acaaaae5d62aa3bc7b3d05f258f16e2895c010bde740586d1d726538ac1ebe This is the RDF database that is used for this study. All samples, sequences and annotations are stored within. The RDF format is according to the TURTLE syntax (https://en.wikipedia.org/wiki/Turtle_(syntax)). It can be loaded into any dedicated Triple Store such as Apache Jena / BlazeGraph / Graphdb / Stardog among others. A list of queries that is used can be found attached to the manuscript as a SPARQL supplementary file. --------------------------------------------------------------------------------------------------------------------------------------------------- Supplementary_S5_SPARQL.txt (11k) 01047ccda61ea7c9717003ca88f67cb100aa4f3323f9889b1ccaab92ac8576c7f2737b211fac11864c5b17a9c2beac82 The file contains the SPARQL queries used in this study. --------------------------------------------------------------------------------------------------------------------------------------------------- Supplementary_S6_quality_check.pdf 387bdc448e4c87be9ee97f9fa9258d9d48e4eefaed3d6cb126742868171d8e1edc8a87b31372a37977bb98ab3d701e13 Figure showing the linear relationships in the number of domain classes with n copies and the total number of domain classes in a genome. --------------------------------------------------------------------------------------------------------------------------------------------------- Supplementary_S7_Alignment.aln.gz (5.3M / 247 MB) 4a3e40093218bc54d869410b4bd8b9aa4dfa83af58db2a292fadab88ef5a579e2f08e31749e8f1c50ff47a3ed3e9261f The 16S rRNA sequences that were used for the distance calculations have been stored as a aligned FASTA file. ---------------------------------------------------------------------------------------------------------------------------------------------------