# README

## 1. Introductory Information
- **Title of the Dataset:**  
  Data and analyses for multivariate microbial ecology study (Overbeek, 2022).

- **Description of Files:**  
  - `metadata-KB-all-rsm-nalow-fmin1815.tsv` (34,017 bytes, MD5: `de472aca925ccdba9816124f558ed780`): Metadata file containing sample descriptions and experimental conditions.  
  - `multivariate_analyses_Overbeek_2022.R` (7,896 bytes, MD5: `f2ca5d8bfc430c464d5acba27c68897c`): R script for performing multivariate statistical analyses on the dataset.  
  - `rooted_tree.qza` (650,494 bytes, MD5: `d730b2b34498c41b4694cd70489942d2`): Phylogenetic tree in QIIME2 artifact format (`.qza`) representing evolutionary relationships among taxa.  
  - `table_2018-2019-rsm-fmin96-na-wp-noCM-low-fmin1815.qza` (932,553 bytes, MD5: `5689c6a1b7166a9c215290ae3eec1cd3`): Feature table in QIIME2 format summarizing the abundance of taxa across samples.  
  - `tax-132-v3v4-all-rep-seqs_2018-2019_rsm_fmin96.qza` (747,975 bytes, MD5: `033a136bd06644b1f8db7d4431947e59`): Representative sequences in QIIME2 format, classified against the SILVA 132 database using the V3V4 region of the 16S rRNA gene.

- **File Naming Convention:**  
  - File names include descriptors such as year (`2018-2019`), methods (`rsm`, `fmin96`), and data type (`table`, `tax`) to provide context.  
  - Files with the `.qza` extension are QIIME2 artifacts.

- **File Formats:**  
  - `.tsv`: Tab-separated value file, compatible with text editors and data analysis tools like Excel, R, and Python.  
  - `.R`: R script file, executable in RStudio or any R environment.  
  - `.qza`: QIIME2 artifact format, compatible with the QIIME2 platform.

- **Relationship Between Files:**  
  - The `metadata-KB-all-rsm-nalow-fmin1815.tsv` file provides sample-level information linked to data in the `table_2018-2019-rsm-fmin96-na-wp-noCM-low-fmin1815.qza` and `tax-132-v3v4-all-rep-seqs_2018-2019_rsm_fmin96.qza` files.  
  - The `rooted_tree.qza` is used in phylogenetic diversity analyses alongside the feature table.  
  - The R script (`multivariate_analyses_Overbeek_2022.R`) analyses data from the `.qza` files and metadata.

---

## 2. Dataset Description
- **Variables in Metadata (`metadata-KB-all-rsm-nalow-fmin1815.tsv`):**  
  - `SampleID`: Unique identifier for each sample.  
  - `SampleCode`: Alternative code used for the sample.  
  - `PoolSampleCode`: Indicates whether the sample was pooled.  
  - `Nummer`: Numerical identifier for the sample.  
  - `Treatment code`: Code indicating the treatment applied to the sample.  
  - `DNACode`: Code associated with DNA extraction.  
  - `DPWloc`: Sample location code.  
  - `DNAconc`: DNA concentration (in ng/µL).  
  - `SeqYear`: Year in which sequencing was performed.  
  - `SumFeaturesFmin1815`: Total number of features passing the filtering threshold.  
  - `SampleDescr1`, `SampleDescr2`, `ShortDescr`: Descriptions of the sample and experimental context.  
  - `Year`: Year the sample was collected.  
  - `Plant`: Plant-related information for the sample, if applicable.  
  - `Treatment`: Description of the experimental treatment.  
  - `SampleType1`, `SampleType2`, `SampleType3`: Hierarchical categorization of the sample type.  
  - `EcoliAB`: Indicates the presence of *E. coli* in the sample.  
  - `select`, `selection`: Further details on selection criteria.  
  - `pool/ind`: Specifies whether the sample is pooled or individual.  
  - `kit`: Indicates the DNA extraction kit used.  
  - `Description`: Full descriptive label for the sample.

- **Temporal and Spatial Coverage:**  
  - Temporal: 2018–2019  
  - Spatial: soil and manure samples from The Netherlands (Wageningen); lat (N): 5◦39’40.1” E; lon (E): 51◦59’18.0” N

---

## 3. Methodology
- **Data Collection:**  
  Samples were collected from manure treated with antibiotics (*e.g., Sulfadiazine*) or *E. coli* at various concentrations and time points during 2018–2019. Metadata includes detailed descriptions of experimental conditions.

- **Data Processing:**  
  - Metadata processed in R using the `qiime2R` package.  
  - QIIME2 artifacts (`.qza`) generated using the QIIME2 pipeline for diversity, taxonomy, and phylogeny analyses.

---

## 4. Access and Licensing
- **Access Information:**  
  This dataset is hosted in the 4TU.ResearchData repository at DOI: 10.4121/21632186.

- **License:**  
  This dataset is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0).

---

## 5. How to Cite
Overbeek, L.S. (2022). Data and analyses for multivariate microbial ecology study. 4TU.ResearchData. DOI: 10.4121/21632186.

---

## 6. Contact Information
- Name: Leo van Overbeek  
- Email: leo.vanoverbeek@wur.nl 
- Institution: Wageningen UR
