Digital Soil Properties Mapping by Ensembling Soil-Environment Relationship and Machine Learning in Arid Regions
ZHANG ZhenHua, DING JianLi,, WANG JingZhe, GE XiangYu, WANG JinJie, TIAN MeiLing, ZHAO QiDongCollege of Research and Environmental Science, Xinjiang University/ Ministry of Education Key Laboratory of Qasis Ecology, Xinjiang University/ Key Laboratory of Smart City and Environment Modelling of Higher Education Institute, Xinjiang University, Urumqi 830046
Received:2019-05-6Accepted:2019-09-18Online:2020-02-01 作者简介 About authors 张振华,E-mail:15099577874@163.com。
摘要 【目的】土壤属性的空间分布是影响农业生产力、土地管理和生态安全的重要因素。通过土壤环境耦合关系,在机器学习算法框架下,定量预测出干旱区土壤酸碱度(pH)、土壤盐分含量(Soil Salt Content,SSC)与土壤有机质(Soil Organic Matter, SOM)3种土壤属性的空间分布,为干旱区农业生产和生态安全提供科学依据。【方法】在渭干河—库车河绿洲干旱区于2017年7月设计采集典型表层(0—20 cm)土壤样品82个,依据土壤-环境之间的关系,集成DEM数据和Landsat 8数据提取出32种环境协变量,利用栅格重采样将提取出的32种变量重采样为90 m空间分辨率并转换为Grid格式参与建模。借助梯度提升决策树(Gradient Boosting Decision Tree,GBDT)模型依次对3类土壤属性的32种环境协变量进行重要性排序,并通过均方根误差(Root Mean Square Error,RMSE)界定出协变量重要性阈值点,从而筛选出参与3类土壤属性制图的环境协变量。进而运用随机森林(Random Forest, RF)、Bagging和Cubist 3种非线性模型建模,并引入多元线性回归模型(Multiple Linear Regression,MLR)进行对比分析,选出最优模型并绘制出90 m分辨率新疆渭干河-库车河绿洲干旱区pH、SSC与SOM 3种土壤属性图。【结果】梯度提升决策树能有效筛选出重要协变量,高程(Elevation)、剖面曲率(Profile Curvature)、差值植被指数(Difference Vegetation Index)、扩展增强型植被指数(Extended Normalized Difference Vegetation Index)、调整土壤亮度植被指数(Modified Soil Adjusted Vegetation Index)、盐分指数S1(Salinity Index S1)以及盐分指数S6 (Salinity Index S6) 7类环境变量均参与3类土壤属性建模,其中SSC遴选出参与建模协变量15种,pH和SOM则均为17种,且遥感指标在预测土壤属性图中起到强大的作用。机器学习3种算法的结果均优于MLR。通过3种非线性模型对比发现,随机森林在3种土壤属性中均表现最佳。在随机森林预测的3种土壤属性中,土壤pH验证集效果R2=0.6779,RMSE =0.2182,ρc=0.6084;在SSC预测中,验证集R2=0.7945,RMSE =3.1803,ρc=0.8377;在SOM预测中,验证集R2=0.7472,RMSE =3.5456,ρc=0.7009。 【结论】GBDT所筛选出的重要性因子借助机器学习算法可以用于干旱区土壤属性制图,且随机森林模型均对3类土壤属性表现出最佳预测能力。依据所绘制的土壤属性图并结合土壤分类图厘清了3种制图属性的空间分布。 关键词:土壤属性;环境协变量;数字土壤制图;机器学习;梯度提升决策树模型;随机森林模型;Bagging模型;Cubist模型
Abstract 【Objective】The spatial distribution of soil properties is an important factor affecting agricultural productivity, land management and ecological security. Utilizing the coupling relationship between soil and environment within framework of machine learning algorithm, the spatial distribution of soil pH, soil salt content (SSC) and soil organic matter (SOM) was quantitatively predicted to provide a scientific basis on ecological security and agricultural production in the arid region. 【Method】A total of 82 topsoil (0-20 cm) samples were collected from the Ugan-Kuqa River basin oasis in Xinjiang Uyghur Autonomous Region in July 2017. Furthermore, Digital elevation model (DEM) data and Landsat 8 data were used to extract 32 environmental covariates according to the soil-environment relationship. The 32 extracted variables were resampled to 90 m spatial resolution via raster resampling and were converted to grid format for participate in modeling. According to the importance of environmental covariates, they were ranked respectively using Gradient Boosting Decision Tree (GBDT) algorithm on the three soil attributes. We considered three strategies to estimate soil properties, including random forest, bagging and Cubist algorithm. Compared with non-linear models, we introduced classic linear model (MLR) to conduct optimization. On this foundation, we mapped the soil properties (pH, SSC and SOM) with a resolution of 90 m in the Ugan-Kuqa River basin oasis, respectively.【Result】The results showed that GBDT could screen out important covariates effectively. Elevation and Profile Curvature, Difference Vegetation Index, Extended Normalized Difference Vegetation Index, Modified Soil Adjusted Vegetation Index and Salinity Index S1 and Salinity Index S6 were important factors and involved in modeling of three kinds of soil properties, among which SSC selects 15 covariates to participate in modeling, pH and SOM were both 17. Remote sensing index played a significant role in predicting soil property maps. Non-linear models showed more accuracy than MLR as linear model. Random forest performed best in all three soil properties. Among the three soil properties predicted by random forest, the validation dataset of soil pH, SSC and SOM were R2=0.6779, RMSE=0.2182, ρc=0.6084, R2=0.7945, RMSE=3.1803, ρc=0.8377 and R2=0.7472, RMSE=3.5456, ρc=0.7009, respectively. 【Conclusion】 The importance factors selected by GBDT and machine learning algorithm could be used to mapping soil properties in arid areas. The random forest strategy showed the best predictive ability for soil properties. The spatial distribution of mapping three properties could be determined by combining with soil classification map. Keywords:soil property;environment covariates;digital soil mapping;machine learning;Gradient Boosting Decision Tree;GBDT;Random Forest;RF;Bagging Model;Cubist Model
PDF (4248KB)元数据多维度评价相关文章导出EndNote|Ris|Bibtex收藏本文 本文引用格式 张振华, 丁建丽, 王敬哲, 葛翔宇, 王瑾杰, 田美玲, 赵启东. 集成土壤-环境关系与机器学习的干旱区土壤属性数字制图[J]. 中国农业科学, 2020, 53(3): 563-573 doi:10.3864/j.issn.0578-1752.2020.03.009 ZHANG ZhenHua, DING JianLi, WANG JingZhe, GE XiangYu, WANG JinJie, TIAN MeiLing, ZHAO QiDong. Digital Soil Properties Mapping by Ensembling Soil-Environment Relationship and Machine Learning in Arid Regions[J]. Scientia Acricultura Sinica, 2020, 53(3): 563-573 doi:10.3864/j.issn.0578-1752.2020.03.009
Table 1 表1 表1数字土壤制图环境协变量 Table 1Environmental covariates of digital soil mapping
编号 Number
数据来源 Data source
协变量定义 Covariable definition
简称 Abbreviation
计算公式 Formula
参考文献 Reference
1
DEM
高程 Elevation
Ele
[23]
2
坡度 Slope
Slo
[23]
3
坡向 Aspect
Asp
[23]
4
曲率 Curvature
Cur
[23]
5
剖面曲率 Profile curvature
PrCu
[23]
6
平面曲率 Plan curvature
PlCu
[23]
7
地形湿度指数 Topographic wetness index
TWI
ln(α/tanβ)
[23]
8
Landsat 8 OLI / TIRS
海岸波段 band 1 coastal
b1
9
蓝波段 Band 2 blue
b2
10
绿波段 Band 3 green
b3
11
红波段 Band 4 Red
b4
12
近红外波段 Band 5 near infrared
b5
13
短波红外1 Band 6 shortwave infrared 1
b6
14
短波红外2 Band 7 shortwave infrared 2
b7
15
增强型植被指数 Enhanced vegetation index
EVI
2.5[(NIR-R)/(NIR+6×R-7.5×R+1)]
[24]
16
差值植被指数 Difference vegetation index
DVI
NIR-R
[9]
17
归一化植被指数 Normalized difference vegetation index
NDVI
(NIR-R)/(NIR+R)
[9]
18
扩展增强型植被指数 Extended normalized difference vegetation index
ENDVI
(NIR+SWIR2 -R)/(NIR+SWIR2 +R)
[25]
19
调整土壤亮度植被指数 Modified soil adjusted vegetation index
MSAVI
[2NIR+1-((2NIR+1)2-8(NIR-R))0.5]/2
[9]
20
强度指数1 Intensity index 1
Int1
(G+R)/2
[26]
21
强度指数2 Intensity index 2
Int2
(G+R+NIR)/2
[26]
22
盐分指数S1 Salinity index S1
S1
B/R
[27]
23
盐分指数S3 Salinity index S3
S3
(G×R)/B
[27]
24
盐分指数S5 Salinity index S5
S5
(B×R)/G
[27]
25
盐分指数S6 Salinity index S6
S6
(R×NIR)/G
[27]
26
盐分指数 Salinity index
SI
(B×R)0.5
[27]
27
盐分指数1 Salinity index 1
SI1
(G×R)0.5
[27]
28
盐分指数2 Salinity index 2
SI2
(G2 + R2 + NIR²)0.5
[27]
29
盐分指数3 Salinity index 3
SI3
(R2 + G2)0.5
[27]
30
归一化盐分指数 Normalized difference salinity index
NDSI
(R-NIR)/(R+NIR)
[27]
31
综合光谱响应指数 Combined spectral response index
CoSRI
(B+G)/(R+NIR)×NDVI
[28]
32
地表温度 Land surface temperature
LST
[16]
β:坡度,采用最大下坡坡度代替;α:单位等高线上上游汇水面积;B:蓝波段;G:绿波段;R:红波段;NIR:近红外;SWIR1:短波红外(1570—1650 nm); SWIR2:短波红外(2100—2290 nm) β: Corresponding to the slope and is replaced by the maximum downhill gradient; α: Corresponding to the upstream catchment area on the unit contour; B: Corresponding to the Blue band; G corresponding to the Green band; R: Corresponding to the Near Infrared band; SWIR1: Corresponding to the Shortwave Infrared 1 (1570-1650 nm); SWIR2: Corresponding to the Shortwave Infrared 2 (2100-2290 nm)
WANG YQ, BAI YR, ZHAO YP . Assessment of soil fertility and its spatial variability based on small scale in the gravel mulched field of NingXia Scientia Agricultura Sinica, 2016,49(23):4566-4575. DOI: 10.3864/j.issn.0578-1752.2016.23.009. (in Chinese) [本文引用: 1]
LAGACHERIEP, MCBRATNEY AB . Spatial soil information systems and spatial soil inference systems: perspectives for digital soil mapping , 2006,31:3-22. [本文引用: 1]
ZHU AX, BANDL, VERTESSYR, DUTTONB . Derivation of soil properties using a soil land inference model (SoLIM) , 1997,61(2):523-533. [本文引用: 1]
WANGF, YANG ST, DING JL, WEIY, GE XY, LIANGJ . Environmental sensitive variable optimization and machine learning algorithm using in soil salt prediction at oasis Transactions of the Chinese Society of Agricultural Engineering, 2018,34(22):102-110. DOI: 10.11975/j.issn.1002-6819.2018.22.013. (in Chinese) [本文引用: 1]
ZHANGH, WU PB, YIN AJ, YANG XH, ZHANGM, GAOC . Prediction of soil organic carbon in an intensively managed reclamation zone of Eastern China: A comparison of Multiple Linear Regressions and the Random Forest model , 2017,592:704-713. [本文引用: 2]
BODAGHABADI BM, MARTíNEZ-CASASNOVASJ, SALEHIM H, MOHAMMADIJ, BORUJENIE I, TOOMANIANN, GANDOMKARA. Digital soil mapping using Artificial Neural Networks and terrain-related attributes , 2015,25(4):580-591. [本文引用: 1]
MAHMOUDABADIE, KARIMIA, HAGHNIA GH, SEPEHRA . Digital soil mapping using remote sensing indices, terrain attributes, and vegetation features in the rangelands of northeastern Iran , 2017,189(10):500. [本文引用: 3]
LU RK. Methods for Soil Agrochemistry Analysis. Beijing: China Agricultural Science and Technology Press, 2000. ( in Chinese) [本文引用: 1]
ZHOUY, HARTEMINK AE, SHIZ, LIANG ZZ, LU YL . Land use and climate change effects on soil organic carbon in North and Northeast China , 2019,647:1230-1238. [本文引用: 1]
ABDEL-KADER FH . Digital soil mapping at pilot sites in the northwest coast of Egypt: A Multinomial Logistic Regression approach , 2011,14(1):29-40. [本文引用: 1]
PENGJ, BISWASA, JIANG QS, ZHAO RY, HUJ, HU BF, SHIZ . Estimating soil salinity from remote sensing and terrain data in southern Xinjiang Province, China , 2019,337:1309-1319. [本文引用: 3]
ZHU AX. Model and Method of Fine Digital Soil Survey. Beijing: Science Press, 2008: 21-57. (in Chinese) [本文引用: 1]
MEHNATKESHA, AYOUBIS, JALALIANA, SAHRAWAT KL . Relationships between soil depth and terrain attributes in a semi arid hilly region in western Iran , 2013,10(1):163-172. [本文引用: 1]
QIN ZH, KARNIELIA, BERLINERP . A mono-window algorithm for retrieving land surface temperature from Landsat TM data and its application to the Israel-Egypt border region , 2001,22(18):3719-3746. [本文引用: 1]
LIU LF, JIM, BUCHROITHNERM . Combining partial least squares and the gradient-boosting method for soil property retrieval using visible Near-Infrared shortwave infrared spectra , 2017,9(12):1299. [本文引用: 1]
GE XY, WANG JZ, DING JL, CAO XY, ZHANG ZP, LIUJ, LI XH . Combining UAV-based hyperspectral imagery and machine learning algorithms for soil moisture content monitoring , 2019,7:e6926. [本文引用: 1]
DING JL, YANG AX, WANG JZ, SAGANV, YU DL . Machine-learning-based quantitative estimation of soil organic carbon content by VIS/NIR spectroscopy , 2018,6:e5714. [本文引用: 2]
CORETEAMR . R:A language and environment for statistical computing , 2015,14:12-21. [本文引用: 1]
LAWRENCEI, LINK . A concordance correlation coefficient to evaluate reproducibility , 1989,45(1):255-268. [本文引用: 1]
WANG JZ, DING JL, ABULIMITIA, CAI LH . Quantitative estimation of soil salinity by means of different modeling methods and visible-near infrared (VIS-NIR) spectroscopy, Ebinur Lake Wetland, Northwest China , 2018,6:e4703. [本文引用: 1]
ZERAATPISHEHM, AYOUBIS, JAFARIA, TAJIKS, FINKEP . Digital mapping of soil properties using multiple machine learning in a semi-arid region, central Iran , 2019,338:445-452.
LOBELLD, LESCHS, CORWIND, ULMERM, ANDERSONK, POTTSD, DOOLITTLEJ, MATOSM, BALTESM . Regional-scale assessment of soil salinity in the Red River Valley using multi-year MODIS EVI and NDVI , 2010,39(1):35-41.
CHEN HY, ZHAO GX, CHEN JC, WANG RY, GAO MX . Remote sensing inversion of saline soil salinity based on modified vegetation index in estuary area of Yellow River Transactions of the Chinese Society of Agricultural Engineering, 2015,31(5):107-114. DOI: 10.3969/j.issn.1002-6819.2015.05.016. (in Chinese)
TRIKI FOURATIH, BOUAZIZM, BENZINAM, BOUAZIZS . Modeling of soil salinity within a semi-arid region using spectral analysis , 2015,8(12):11175-11182.
ALLBEDA, KUMARL, ALDAKHEEL YY . Assessing soil salinity using soil salinity and vegetation indices derived from IKONOS high-spatial resolution imageries: Applications in a date palm dominated region , 2014,230:1-8.
MENGL, ZHOU SW, ZHANGH, BI XL . Estimating soil salinity in different landscapes of the Yellow River Delta through Landsat OLI/TIRS and ETM+ Data , 2016,20(4):271-279.
GULIBOSITAN-BATU . Analysis of soil physical and chemical properties under different land use/land cover in Weigan and Kuqa rivers delta oasis [D]. Urumqi: Xinjiang University, 2018. ( in Chinese) [本文引用: 2]
GU HB . Research on spatial variation of properties in irrigation area scale [D]. Urumqi: Xinjiang Agricultural University, 2011. ( in Chinese) [本文引用: 1]
WANGF, YANG ST, WEIY, YANG XD, DING JL . Influence of sub-region priority modeling constructed by random forest and stochastic gradient treeboost on the accuracy of soil salinity prediction in oasis scale Scientia Agricultura Sinica, 2018,51(24):4659-4676. DOI: 10.3864/j.issn.0578-1752.2018.24.007.(in Chinese) [本文引用: 1]
DING JL, YU DL . Monitoring and evaluating spatial variability of soil salinity in dry and wet seasons in the Werigan-Kuqa Oasis, China, using remote sensing and electromagnetic induction instruments , 2014,235:316-322. [本文引用: 2]
BRUBAKERS, JONESA, LEWISD, FRANKK . Soil properties associated with landscape position , 1993,57(1):235-239. [本文引用: 1]
FALAHATKARS, HOSSEINI SM, AYOUBIS, SALMANMAHINYA . Predicting soil organic carbon density using auxiliary environmental variables in Northern Iran , 2016,62(3):375-393. [本文引用: 1]
GE XY, DING JL, WANG JZ, WANGF, CAI LH, SUN HL . Estimation of soil moisture based on CARS algorithm coupled with machine learning Acta Optica Sinca, 2018,38(10):393-400. DOI: 10.3788/AOS201838.1030001. (in Chinese) [本文引用: 1]
CHEN SC, LIANG ZZ, WEBSTERR, ZHANG GL, ZHOUY, TENG HF, HU BF, ARROUAYSD, SHIZ . A high-resolution map of soil pH in China made by hybrid modelling of sparse soil data and environmental covariates and its implications for pollution , 2019,655:273-283. [本文引用: 1]
PENGJ, LIU HJ, SHIZ, XIANG HY, CHI CM . Regional heterogeneity of hyperspectral characteristics of salt-affected soil and salinity inversion Transactions of the Chinese Society of Agricultural Engineering, 2014,30(17):167-174. DOI: 10.3969/j.issn.1002-6819. 2014.17.022. (in Chinese) [本文引用: 1]
WANG HF, CHEN YW, ZHANG ZT, CHEN HR, LI XW, WANG MX, CHAI HY . Quantitatively estimating main soil water-soluble salt ions content based on visible-near infrared wavelength selected using GC, SR and VIP , 2019,7:e6310. [本文引用: 1]
MOSLEHZ, SALEHI MH, JAFARIA, BORUJENI IE, MEHNATKESHA . The effectiveness of digital soil mapping to predict soil properties over low-relief areas , 2016,188(3):195. [本文引用: 1]