Dataset: Structural, functional and evolutionary characterisation of genes in Lactuca sp. reference genomes in the context of eudicots

doi:10.4121/af1b751d-23a2-4954-ac01-4eb6c68d895b.v1
The doi above is for this specific version of this dataset, which is currently the latest. Newer versions may be published in the future. For a link that will always point to the latest version, please use
doi: 10.4121/af1b751d-23a2-4954-ac01-4eb6c68d895b
Datacite citation style:
van Workum, Dirk-Jan M.; Dick de Ridder; M. Eric Schranz; Sandra Smit (2024): Dataset: Structural, functional and evolutionary characterisation of genes in Lactuca sp. reference genomes in the context of eudicots. Version 1. 4TU.ResearchData. dataset. https://doi.org/10.4121/af1b751d-23a2-4954-ac01-4eb6c68d895b.v1
Other citation styles (APA, Harvard, MLA, Vancouver, Chicago, IEEE) available at Datacite
Dataset

This data set contains easy-to-use overviews of the location, function and homologs of each transcript in the reference genomes of three Lactuca sp. For L. sativa, we included both v8 and v11 genomes of cultivar Salinas since both are used in lettuce research. For the L. sativa v11 genome specifically, we added the submitted structural annotation to the RefSeq structural annotation where there was no overlap with the latter (resulting GFF3 file is part of this data set). For L. saligna and L. virosa, we included their respective reference genomes according to NCBI (dd. 25 September 2024). For the structural information, we parsed the GFF3 file of each genome annotation; for the functional annotations, we obtained protein sequences and functionally annotated them using InterProScan; for the homologs, we constructed a panproteome using a diverse set of eudicots and grouped the proteins in homology groups using PanTools.

 

All data has been collected in TSV files, which can be used in Excel, R and command-line applications. For technical details, please refer to the included README.

history
  • 2024-12-10 first online, published, posted
publisher
4TU.ResearchData
format
TSV, GFF3
organizations
Bioinformatics Group, Wageningen University & Research
Biosystematics Group, Wageningen University & Research

DATA

files (6)