中國農業科學 ?? 2020, Vol. 53 ?? Issue (3): 563-573.doi: 10.3864/j.issn.0578-1752.2020.03.009

? 土壤肥料·節水灌溉·農業生態環境 ? 上一篇    下一篇

集成土壤-環境關系與機器學習的干旱區土壤屬性數字制圖

張振華,丁建麗(),王敬哲,葛翔宇,王瑾杰,田美玲,趙啟東   

  1. 新疆大學資源與環境科學學院/新疆大學綠洲生態教育部重點實驗室/新疆大學智慧城市與環境建模自治區普通高校重點實驗室,烏魯木齊 830046
  • 收稿日期:2019-05-06 接受日期:2019-09-18 出版日期:2020-02-01 發布日期:2020-02-13
  • 通訊作者: 丁建麗 E-mail:[email protected]
  • 作者簡介:張振華,E-mail:[email protected]
  • 基金資助:
    國家重點研發計劃(2016YFC0402409-03);國家自然科學基金(41961059);國家自然科學基金(41771470);新疆維吾爾自治區自然科學基金青年基金(2018D01C067)

Digital Soil Properties Mapping by Ensembling Soil-Environment Relationship and Machine Learning in Arid Regions

ZHANG ZhenHua,DING JianLi(),WANG JingZhe,GE XiangYu,WANG JinJie,TIAN MeiLing,ZHAO QiDong   

  1. College of Research and Environmental Science, Xinjiang University/ Ministry of Education Key Laboratory of Qasis Ecology, Xinjiang University/ Key Laboratory of Smart City and Environment Modelling of Higher Education Institute, Xinjiang University, Urumqi 830046
  • Received:2019-05-06 Accepted:2019-09-18 Online:2020-02-01 Published:2020-02-13
  • Contact: JianLi DING E-mail:[email protected]

摘要:

【目的】土壤屬性的空間分布是影響農業生產力、土地管理和生態安全的重要因素。通過土壤環境耦合關系,在機器學習算法框架下,定量預測出干旱區土壤酸堿度(pH)、土壤鹽分含量(Soil Salt Content,SSC)與土壤有機質(Soil Organic Matter, SOM)3種土壤屬性的空間分布,為干旱區農業生產和生態安全提供科學依據。【方法】在渭干河—庫車河綠洲干旱區于2017年7月設計采集典型表層(0—20 cm)土壤樣品82個,依據土壤-環境之間的關系,集成DEM數據和Landsat 8數據提取出32種環境協變量,利用柵格重采樣將提取出的32種變量重采樣為90 m空間分辨率并轉換為Grid格式參與建模。借助梯度提升決策樹(Gradient Boosting Decision Tree,GBDT)模型依次對3類土壤屬性的32種環境協變量進行重要性排序,并通過均方根誤差(Root Mean Square Error,RMSE)界定出協變量重要性閾值點,從而篩選出參與3類土壤屬性制圖的環境協變量。進而運用隨機森林(Random Forest, RF)、Bagging和Cubist 3種非線性模型建模,并引入多元線性回歸模型(Multiple Linear Regression,MLR)進行對比分析,選出最優模型并繪制出90 m分辨率新疆渭干河-庫車河綠洲干旱區pH、SSC與SOM 3種土壤屬性圖。【結果】梯度提升決策樹能有效篩選出重要協變量,高程(Elevation)、剖面曲率(Profile Curvature)、差值植被指數(Difference Vegetation Index)、擴展增強型植被指數(Extended Normalized Difference Vegetation Index)、調整土壤亮度植被指數(Modified Soil Adjusted Vegetation Index)、鹽分指數S1(Salinity Index S1)以及鹽分指數S6 (Salinity Index S6) 7類環境變量均參與3類土壤屬性建模,其中SSC遴選出參與建模協變量15種,pH和SOM則均為17種,且遙感指標在預測土壤屬性圖中起到強大的作用。機器學習3種算法的結果均優于MLR。通過3種非線性模型對比發現,隨機森林在3種土壤屬性中均表現最佳。在隨機森林預測的3種土壤屬性中,土壤pH驗證集效果R 2=0.6779,RMSE =0.2182,ρc=0.6084;在SSC預測中,驗證集R 2=0.7945,RMSE =3.1803,ρc=0.8377;在SOM預測中,驗證集R 2=0.7472,RMSE =3.5456,ρc=0.7009。 【結論】GBDT所篩選出的重要性因子借助機器學習算法可以用于干旱區土壤屬性制圖,且隨機森林模型均對3類土壤屬性表現出最佳預測能力。依據所繪制的土壤屬性圖并結合土壤分類圖厘清了3種制圖屬性的空間分布。

關鍵詞: 土壤屬性, 環境協變量, 數字土壤制圖, 機器學習, 梯度提升決策樹模型, 隨機森林模型, Bagging模型, Cubist模型

Abstract:

【Objective】The spatial distribution of soil properties is an important factor affecting agricultural productivity, land management and ecological security. Utilizing the coupling relationship between soil and environment within framework of machine learning algorithm, the spatial distribution of soil pH, soil salt content (SSC) and soil organic matter (SOM) was quantitatively predicted to provide a scientific basis on ecological security and agricultural production in the arid region. 【Method】A total of 82 topsoil (0-20 cm) samples were collected from the Ugan-Kuqa River basin oasis in Xinjiang Uyghur Autonomous Region in July 2017. Furthermore, Digital elevation model (DEM) data and Landsat 8 data were used to extract 32 environmental covariates according to the soil-environment relationship. The 32 extracted variables were resampled to 90 m spatial resolution via raster resampling and were converted to grid format for participate in modeling. According to the importance of environmental covariates, they were ranked respectively using Gradient Boosting Decision Tree (GBDT) algorithm on the three soil attributes. We considered three strategies to estimate soil properties, including random forest, bagging and Cubist algorithm. Compared with non-linear models, we introduced classic linear model (MLR) to conduct optimization. On this foundation, we mapped the soil properties (pH, SSC and SOM) with a resolution of 90 m in the Ugan-Kuqa River basin oasis, respectively.【Result】The results showed that GBDT could screen out important covariates effectively. Elevation and Profile Curvature, Difference Vegetation Index, Extended Normalized Difference Vegetation Index, Modified Soil Adjusted Vegetation Index and Salinity Index S1 and Salinity Index S6 were important factors and involved in modeling of three kinds of soil properties, among which SSC selects 15 covariates to participate in modeling, pH and SOM were both 17. Remote sensing index played a significant role in predicting soil property maps. Non-linear models showed more accuracy than MLR as linear model. Random forest performed best in all three soil properties. Among the three soil properties predicted by random forest, the validation dataset of soil pH, SSC and SOM were R 2=0.6779, RMSE=0.2182, ρc=0.6084, R 2=0.7945, RMSE=3.1803, ρc=0.8377 and R 2=0.7472, RMSE=3.5456, ρc=0.7009, respectively. 【Conclusion】 The importance factors selected by GBDT and machine learning algorithm could be used to mapping soil properties in arid areas. The random forest strategy showed the best predictive ability for soil properties. The spatial distribution of mapping three properties could be determined by combining with soil classification map.

Key words: soil property, environment covariates, digital soil mapping, machine learning, Gradient Boosting Decision Tree, GBDT, Random Forest, RF, Bagging Model, Cubist Model

福利彩票3d开奖结果