TY - DATA T1 - Dataset: Structural, functional and evolutionary characterisation of genes in Lactuca sp. reference genomes in the context of eudicots PY - 2024/12/10 AU - Dirk-Jan M. van Workum AU - Dick de Ridder AU - M. Eric Schranz AU - Sandra Smit UR - DO - 10.4121/af1b751d-23a2-4954-ac01-4eb6c68d895b.v1 KW - Lactuca KW - lettuce KW - bioinformatics KW - gene annotation KW - functional annotation KW - homology KW - eudicots N2 -

This data set contains easy-to-use overviews of the location, function and homologs of each transcript in the reference genomes of three Lactuca sp. For L. sativa, we included both v8 and v11 genomes of cultivar Salinas since both are used in lettuce research. For the L. sativa v11 genome specifically, we added the submitted structural annotation to the RefSeq structural annotation where there was no overlap with the latter (resulting GFF3 file is part of this data set). For L. saligna and L. virosa, we included their respective reference genomes according to NCBI (dd. 25 September 2024). For the structural information, we parsed the GFF3 file of each genome annotation; for the functional annotations, we obtained protein sequences and functionally annotated them using InterProScan; for the homologs, we constructed a panproteome using a diverse set of eudicots and grouped the proteins in homology groups using PanTools.

 

All data has been collected in TSV files, which can be used in Excel, R and command-line applications. For technical details, please refer to the included README.

ER -