Dataset: Structural, functional and evolutionary characterisation of genes in Lactuca sp. reference genomes in the context of eudicots
doi: 10.4121/af1b751d-23a2-4954-ac01-4eb6c68d895b
This data set contains easy-to-use overviews of the location, function and homologs of each transcript in the reference genomes of three Lactuca sp. For L. sativa, we included both v8 and v11 genomes of cultivar Salinas since both are used in lettuce research. For the L. sativa v11 genome specifically, we added the submitted structural annotation to the RefSeq structural annotation where there was no overlap with the latter (resulting GFF3 file is part of this data set). For L. saligna and L. virosa, we included their respective reference genomes according to NCBI (dd. 25 September 2024). For the structural information, we parsed the GFF3 file of each genome annotation; for the functional annotations, we obtained protein sequences and functionally annotated them using InterProScan; for the homologs, we constructed a panproteome using a diverse set of eudicots and grouped the proteins in homology groups using PanTools.
All data has been collected in TSV files, which can be used in Excel, R and command-line applications. For technical details, please refer to the included README.
- 2024-12-10 first online, published, posted
Biosystematics Group, Wageningen University & Research
DATA
- 9,028 bytesMD5:
78525be1f1b61ab8b7eb52d5f1e8504a
README.md - 15,349,515 bytesMD5:
26f426bbe497319c1f3f717244dc9a76
GCF_002870075.4_Lsat_Salinas_v11_Fused.gff.gz - 83,652,156 bytesMD5:
9fb67490d9372dc7a990838f7b025827
Lactuca_saligna.annotation_overview.tsv - 140,184,157 bytesMD5:
ef9b418522c4771a154b2e3c87f10cd7
Lactuca_sativa_Salinas_V11.annotation_overview.tsv - 88,963,877 bytesMD5:
4396052b6260975fe2e81808a17e665b
Lactuca_sativa_Salinas_V8.annotation_overview.tsv - 74,183,277 bytesMD5:
b5e3f2218d9b41ceacb615661dcb5237
Lactuca_virosa.annotation_overview.tsv -
download all files (zip)
402,342,010 bytes unzipped