Dataset: Combined Annotation Dependent Depletion (CADD) scores for turkey and chicken

DOI:10.4121/f2ff2a38-0766-48f0-99f1-65d875ba81d4.v1
The DOI displayed above is for this specific version of this dataset, which is currently the latest. Newer versions may be published in the future. For a link that will always point to the latest version, please use
DOI: 10.4121/f2ff2a38-0766-48f0-99f1-65d875ba81d4

Datacite citation style

Lensing, Kim; van Schipstal, Job; de Ridder, Dick; Groenen, Martien; Derks, Martijn (2025): Dataset: Combined Annotation Dependent Depletion (CADD) scores for turkey and chicken. Version 1. 4TU.ResearchData. dataset. https://doi.org/10.4121/f2ff2a38-0766-48f0-99f1-65d875ba81d4.v1
Other citation styles (APA, Harvard, MLA, Vancouver, Chicago, IEEE) available at Datacite

Dataset

This dataset contains genome-wide CADD (Combined Annotation Dependent Depletion) scores for chicken and turkey, generated as part of research aimed at predicting the deleteriousness of genetic variants in non-model species. The objective of the study was to develop and apply a generic, species-agnostic pipeline that computes CADD scores using only a high-quality reference genome, corresponding gene annotation, and a multi-species alignment (MSA) to infer ancestral sequences. The research involved computational methods rather than experimental sample collection; genomic reference assemblies, available functional annotations, and an evolutionary MSA were used as input features to train a machine learning model that assigns PHRED-like CADD scores to all possible single nucleotide variants across the genome. The resulting data consist of chromosome-wise tab-delimited files containing CADD scores for chicken (chr{chr}.tsv.gz) and turkey (Turkey_chr{chr}.tsv.gz), which can be used for comparative genomics, evolutionary analyses, and prioritization of candidate variants in genomic and breeding studies. The work is described in the publication “A generic pipeline for CADD Score generation: chickenCADD and turkeyCADD”, accepted in G3.

History

  • 2025-11-06 first online, published, posted

Publisher

4TU.ResearchData

Format

g-zipped tab delimited files

Organizations

Animal Breeding and Genomics, Wageningen University and Research;
Bioinformatics Group, Wageningen University and Research

DATA

Files (156)