
README

1. Introductory Information

Title of the dataset:
Comprehensive Genomic and Clinical Data of Enterobacter Isolates

Description:
This dataset integrates MinION and whole genome sequencing (WGS) data, antimicrobial resistance gene analysis (ARMA), clinical chart review, and statistical summaries for Enterobacter isolates. The dataset also includes processed tables used to generate figures for resistome and other analyses.

Files included:
- # Enterobacter_2025 03 06-2.xlsx
  - MinION strains: List of isolates analyzed by MinION.
  - MinION (summary): Raw MinION sequencing data.
  - ARMA: Antimicrobial resistance gene analysis.
  - Chart Review: Clinical chart review data for patients.
  - Statistics: Summary statistics for the dataset.
  - WGS summary: Whole genome sequencing results.
  - Fig (resistome): Data used to generate resistome figures.
  - Fig: Data used for other figures.

File format:
Excel (.xlsx)

Contact information:
Ki Wook Yun
Seoul National University Children's Hospital
pedwilly@snu.ac.kr


2. Methodological Information

Data collection:
Isolates were collected from clinical samples and sequenced using Oxford Nanopore MinION and Illumina WGS platforms. Antimicrobial resistance genes were identified using ARMA pipelines. Clinical data were retrospectively reviewed from patient charts.

Data processing:
Raw sequencing data were cleaned, quality-controlled, and summarized. Resistance genes were annotated based on standard databases. Clinical variables were extracted by chart review and matched with isolate IDs.

Software used:
- Oxford Nanopore MinKNOW (for MinION)
- ARMA pipeline 
- Illumina BaseSpace (for WGS data)
- Excel for data processing and figure preparation

Quality assurance:
Low-quality reads and contaminants were filtered out during preprocessing. Duplicates were removed. Discrepancies in clinical chart review were double-checked by two independent reviewers.


3. Data Specific Information

Column headings:
- MinION sheets: Isolate ID, sequencing ID, coverage, depth, detected genes, quality metrics.
- ARMA: Isolate ID, resistance gene name, resistance mechanism, confidence score.
- Chart Review: Patient ID, isolate ID, age, sex, underlying conditions, clinical outcome.
- Statistics: Summary metrics for sample size, distribution, gene prevalence, etc.
- Figures sheets: Variables used for plotting resistome diversity and prevalence.

Units of measurement:
Coverage (X), gene counts (number), age (years)

Codes/symbols:
‘NA’ indicates missing data.
‘-’ indicates not applicable.


4. Sharing and Access Information

License:
[Specify your chosen license, e.g., CC-BY 4.0]

Terms of use:
This dataset is openly available for reuse in line with the specified license. Proper attribution is required.
