cff-version: 1.2.0 abstract: "Vegetation distribution simulations could help to understand vegetation distribution patterns and trends, but it is difficult to accurately simulate the distribution of vegetation especially in regions that are heavily affected by human disturbance. Climate, topographic, and spectral data were used as input predictor variables of four machine learning models, including the random forest (RF), decision tree (DT), support vector machine (SVM) and maximum likelihood methods, in three vegetation classification units, including the vegetation group, vegetation type, and formation and subformation, in the Jing-Jin-Ji region, which is one of the most developed regions in China. A total of 2789 vegetation points were used for model training, and 974 vegetation points were used for model assessment. The result showed that the random forest method was the best of the four models and could simulate the distribution of the vegetation in all three classification units well. Kappa coefficients indicated that the random forest method had the highest prediction ability in regard to vegetation type, followed by vegetation group, formation and subformation. Five predictor variables, including 4 climate variables (annual mean temperature, max temperature of warmest month, min temperature of coldest month and annual precipitation) and 1 geospatial variable (elevation), were the most important for three vegetation classification levels. The winter surface albedo of band 4, the slope and the three summer spectral variables (the summer surface albedo of bands 2 and 6 and the summer brightness index) could also increase the accuracy of vegetation classification to some extent." authors: - family-names: Yi given-names: Sangui orcid: "https://orcid.org/0000-0002-1407-3775" title: "Sample data for simulation in Jing-Jin-Ji region" keywords: version: 1 identifiers: - type: doi value: 10.4121/uuid:1b27dc6b-b77e-4f18-b035-e8a249f595c0 license: CC0 date-released: 2020-05-11