cff-version: 1.2.0 abstract: "
This data set contains easy-to-use overviews of the location, function and homologs of each transcript in the reference genomes of three Lactuca sp. For L. sativa, we included both v8 and v11 genomes of cultivar Salinas since both are used in lettuce research. For the L. sativa v11 genome specifically, we added the submitted structural annotation to the RefSeq structural annotation where there was no overlap with the latter (resulting GFF3 file is part of this data set). For L. saligna and L. virosa, we included their respective reference genomes according to NCBI (dd. 25 September 2024). For the structural information, we parsed the GFF3 file of each genome annotation; for the functional annotations, we obtained protein sequences and functionally annotated them using InterProScan; for the homologs, we constructed a panproteome using a diverse set of eudicots and grouped the proteins in homology groups using PanTools.
All data has been collected in TSV files, which can be used in Excel, R and command-line applications. For technical details, please refer to the included README.
" authors: - family-names: van Workum given-names: Dirk-Jan M. orcid: "https://orcid.org/0000-0001-6247-5499" - family-names: de Ridder given-names: Dick - family-names: Eric Schranz given-names: M. orcid: "https://orcid.org/0000-0001-6777-6565" - family-names: Smit given-names: Sandra orcid: "https://orcid.org/0000-0001-5239-5321" title: "Dataset: Structural, functional and evolutionary characterisation of genes in Lactuca sp. reference genomes in the context of eudicots" keywords: version: 1 identifiers: - type: doi value: 10.4121/af1b751d-23a2-4954-ac01-4eb6c68d895b.v1 license: CC BY 4.0 date-released: 2024-12-10