Application of random forest algorithm in suitability evaluation of rural residential land
XUFeng, WANGZhanqi, ZHANGHongwei, CHAIJi School of Public Administration, China University of Geosciences, Wuhan 430074, China 通讯作者:通讯作者:王占岐,E-mail:zhqwang@cug.edu.cn 收稿日期:2018-04-27 修回日期:2018-08-28 网络出版日期:2018-10-25 版权声明:2018《资源科学》编辑部《资源科学》编辑部 基金资助:国家自然科学基金项目(71673258) 作者简介: -->作者简介:徐枫,男,湖北武汉人,博士生,主要从事土地经济与土地利用规划研究。E-mail:whcugxf@163.com
关键词:机器学习方法;随机森林算法;适宜性评价;农村居民点;分类和预测;房县 Abstract Suitability evaluation of rural residential land is a foundation for optimizing the layout of rural land use. To solve the problems existed in this research area and promote the rationality of evaluation results, it is essential to better serve the transition and optimization of rural land use and benefit to implementation of the strategy for the promotion of rural development. Machine learning technique was introduced in this study to achieve a higher precision classification of large-scale unknown data by learning small-scale known samples. The feasibility of the method, combination of data and model, optimization strategies, and prediction implementation were explored in an empirical study on suitability of rural residential land use in in Fang County, Hubei Province by using random forest algorithm. The results indicated that: (1) The suitability of rural residential land in Fang County was mainly influenced by the accessibility, altitude, area of land, and the local agricultural production. Also, Villagers' living and income levels were closely related to the rural residence. (2) The performance of models was less affected by changes in the size of the feature sets (evaluation factors), in which a model with a 16-dimensional-feature set reached test accuracy at as high as 83.54%. and (3) The prediction results of different models demonstrated that the potential of unsuitable residential land locally was extremely large, the algorithm was robust and the results were stable and reliable. This research illustrated that the machine learning technique could better support the suitability evaluation of rural resident land. The evaluation results could provide data foundation for land use optimization and grassroots governance in rural areas.
Keywords:machine learning technique;random forest algorithm;suitability evaluation;rural residential land;classification and prediction;Fang County -->0 PDF (16191KB)元数据多维度评价相关文章收藏文章 本文引用格式导出EndNoteRisBibtex收藏本文--> 徐枫, 王占岐, 张红伟, 柴季. 随机森林算法在农村居民点适宜性评价中的应用[J]. 资源科学, 2018, 40(10): 2085-2098 https://doi.org/10.18402/resci.2018.10.16 XUFeng, WANGZhanqi, ZHANGHongwei, CHAIJi. Application of random forest algorithm in suitability evaluation of rural residential land[J]. RESOURCES SCIENCE, 2018, 40(10): 2085-2098 https://doi.org/10.18402/resci.2018.10.16
农村居民点及其附着房屋的持续利用状态主要取决于农民的居住意愿,它受到多方面因素的影响与制约,涉及居民点土地的自身禀赋、生产和生活条件、山区特有的社会、地理特征等等,为尽可能地涵盖相关因素,并借鉴构建居民点适宜性评价指标体系的已有经验与案例[6,7,9,11,14],本研究设定土地区位条件、农业生产条件、发展与生活条件、地理条件共四个目标层,兼顾数据的可获取性,在指标层选取共计18项评价因子(变量)作为开展机器学习的备选因子集合,对因子的选取做简要说明: (1)居民点地块所处的相对位置决定其到达其他类型用地或特殊区域的便利程度,对农民的生活和生产均产生影响,而地块规模则反映居住模式。本研究考虑土地区位条件,选取居民点距最近道路、最近其他居民点、最近耕地和乡镇中心的距离以及地块面积大小共5项指标。 (2)农业生产条件影响着农民的耕作条件、土地的产出,直接关系到农民的生产投入、耕作收入,而耕作意愿的变化又影响着农户在当地的居住意愿。由于耕地产权数据获取难度大,本研究选取距居民点地块最近的耕地地块,假设其为该居民点住户的生产场所,选取涉及耕地质量的灌溉能力、亩产能力、土壤pH、有机质含量和表层质地共计5项指标。 (3)发展与生活条件从经济社会层面、农民个人层面共同影响着农村居住的可持续性,综合考虑进城意愿、生产活跃程度、非农收入与政策补贴、生活水平共4方面情况,选取城镇化率、播种面积、劳务收入、政策性补贴、用电量与牲畜存栏量共6项指标。 (4)鉴于研究区处于山区环境下,居民点之间所处地理条件差别显著,可能影响原地居住的可持续性,因此,选取地理因素相关的坡度与海拔2项指标。 各因子与对应变量、数据描述与单位、数据来源如表1所示。 Table 1 表1 表1居民点利用适宜性评价影响因素与特征 Table 1Influencing factors and their characteristics for evaluating the suitability of residential land use
为分析各特征变量的重要性,并衡量不同特征维度下训练模型的性能表现,根据2.4.1章节的特征变量选取策略对特征集合实施降维。当特征集合由18维降至7维时,农业生产条件准则层的指标因子仅剩一项,降维过程终止。各特征集合与剩余变量的重要性如图2(见第2091页)所示,各特征集合中重要性排名末位的变量即是下一个被移出的变量,按照移出顺序的先后,其重要性逐渐升高,对其形成的原因展开定性分析: 显示原图|下载原图ZIP|生成PPT 图2逐步降维过程中不同特征集合的因子重要性排序 -->Figure 2Ranking of factors importance in different feature sets during gradual dimensionality reduction -->
对3.4章节中得出的12组特征集合开展不同ntree和mtry参数组合下的模型训练与验证,考虑训练样本数量与特征规模情况,使ntree的取值在500~1500之间,使mtry的取值在以维度开方值为中心数的左右邻近整数范围内,即16~18维的特征集中mtry取值为3~6,9~15维的特征集mtry取值为2~5,7~8维的特征集mtry取值为1~4。测试精度如图3所示。 显示原图|下载原图ZIP|生成PPT 图3多种ntree与mtry参数组合下的7~18维度特征集合构成的模型测试精度 -->Figure 3Prediction accuracy of the model with feature sets from 7 to 18 dimensions under the combinations of multiple ‘ntree’ and ‘mtry’ parameters -->
基于前述参数调优结果,将测试样本数据集带入不同特征集合对应的最适宜模型,开展精度分析,结果如图4所示。 显示原图|下载原图ZIP|生成PPT 图4基于测试数据与不同维度特征集合的居民点利用适宜性预测结果 -->Figure 4Prediction results of suitability of residential land use with different feature sets but the same testing data -->
以测试精度最高的7维、13维和16维特征集合对应的模型开展房县境内全部农村居民点的适宜性评价,其结果如图5所示。 显示原图|下载原图ZIP|生成PPT 图53种不同维度特征集合下的房县居民点利用适宜性评价预测结果 -->Figure 5Prediction results of suitability of residential land use with three different feature sets in Fang County -->
[LiuY S, LongH L, ChenY F, et al.China Rural Development Research Report: Rural Hollowing and Its Remediation Strategy[M]. Beijing: Science Press, 2011. ] [本文引用: 1]
[ZhaoM Y, WangY L, HuZ C, et al. Comprehensive consolidation of hollowing village oriented rural land resource allocation [J]. Progress in Geography, 2016, 35(10): 1237-1248. ] [本文引用: 1]
[MengL, GuoJ, OuM H.Zoning regulation of rural settlement consolidation based on suitability and potential analysis in Xuzhou City [J]. Resource Science, 2014, 36(11): 2291-2298. ] [本文引用: 1]
[SuH Z, LiuX L, ZhaoY N, et al. A research on suitability evaluation of rural residential consolidation in Yumen City [J]. Chinese Agricultural Science Bulletin, 2015, 31(17): 272-278. ] [本文引用: 2]
[ZhuL, WuB F, ZhangL.Research on the landscape of rural residential areas and human settlement environment suitability evaluation in Three Gorges typical regions [J]. Resources and Environment in the Yangtze Basin, 2011, 20(3): 325-331. ] [本文引用: 2]
[WenB, LiuY Z, XiaM, et al. Suitability evaluation and regulation of rural residential land in Yixing City based on the Grey Target Model [J]. Areal Research and Development, 2016, 35(5): 153-157. ] [本文引用: 2]
[QinT T, QiW, LiY Q, et al. Suitability evaluation of rural residential land based on niche theory in mountainous area [J]. Acta Ecologica Sinica, 2012, 32(16): 5175-5183. ] [本文引用: 3]
[LiuJ J, XiaM, LiuY Z, et al. Layout suitability evaluation and classification of rural residential based on matter element [J]. Chinese Journal of Soil Science, 2016, 47(2): 308-313. ] [本文引用: 2]
[ChengW S, QiaoH Q, ChenY.Land use suitability evaluation and optimization of rural residential land in hilly and mountainous areas of South-west China [J]. Bulletin of Soil Water Conservation, 2014, 34(5): 322-327. ] [本文引用: 3]
[GuoJ, BaoQ, OuM H, et al. Suitability evaluation and partition regulation research of rural settlements consolidation based on households' willingness [J]. China Population, Resources and Environment, 2015, 25(4): 52-58. ] [本文引用: 1]
[ShuangW Y, HaoJ M, AiD, et al. Suitability evaluation, subarea control and regulation of rural residential land based on AVC theory [J]. Soil, 2014, 46(1): 126-133. ] [本文引用: 2]
[QuY B, ZhangF R, JiangG H, et al. Suitability evaluation and subarea control and regulation of rural residential land based on niche [J]. Transactions of the CSAE, 2010, 26(11): 290-296. ] [本文引用: 3]
[ShuangW Y, HaoJ M, YuS Q, et al. Suitability evaluation and spatial structure optimization model based on the pressure on rural residential land [J]. Journal of China Agricultural University, 2013, 18(5): 146-155. ] [本文引用: 2]
[WenB, LiuY Z, XiaM, et al. Suitability evaluation of rural residential land from perspective of ecological environment protection-a case study on Yixing City of Jiangsu Province [J]. Bulletin of Soil and Water Conservation, 2016, 36(4): 280-285. ] [本文引用: 1]
[WuC H, HuY M, HuangP Q, et al. Suitability evaluation of cities and rural settlementsin Fuxin based on the model of least resistance [J]. Resources Science, 2013, 35(12): 2405-2411. ] [本文引用: 1]
[18]
AlpaydinE.Introduction to Machine Learning [M]. Cambridge: MIT press, 2014. [本文引用: 1]
[19]
NovackT, EschT, KuxH, et al. Machine learning comparison between WorldView-2 and QuickBird-2-simulated imagery regarding object-based urban land cover classification [J]. Remote Sensing, 2011, 3(10): 2263-2282. [本文引用: 1]
[20]
HeungB, HoH C, ZhangJ, et al. An overview and comparison of machine-learning techniques for classification purposes in digital soil mapping [J]. Geoderma, 2016, 265: 62-77. [本文引用: 1]
[CaiL L, YanL J, XuH.A soil erosion model built on machine learning theory [J]. Chinese Journal of Eco-Agriculture, 2014, 22(9): 1122-1128. ] [本文引用: 2]
[LaiH S, WuC F.Productivity evaluation of standard cultivated land based on rough set and support vector machine [J]. Journal of Natural Resources, 2011, 26(12): 2141-2154. ] [本文引用: 1]
[23]
LöwF, FliemannE, AbdullaevI, et al. Mapping abandoned agricultural land in Kyzyl-Orda, Kazakhstan using satellite remote sensing [J]. Applied Geography, 2015, 62: 377-390. [本文引用: 1]
[24]
NemmourH, ChibaniY.Multiple support vector machines for land cover change detection: an application for mapping urban extensions [J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2006, 61(2): 125-133. [本文引用: 1]
[KangY, ChenY F, GuS H, et al. Assessment of sustainable utilization of regional water resources based on random forest [J]. Water Resources and Power, 2014, 32(3): 34-38. ] [本文引用: 2]
[ZhangZ Y, LiuX Y.Research on land resource eco-security assessment based on support vector machines [J]. Computer Engineering and Applications, 2009, 45(10): 245-248. ] [本文引用: 1]
[27]
VapnikV.The Nature of Statistical Learning Theory [M]. Berlin: Springer Science & Business Media, 2013. [本文引用: 1]
[28]
ArlotS, CelisseA.A survey of cross-validation procedures for model selection [J]. Statistics Surveys, 2010, 4: 40-79. [本文引用: 1]
LiawA, WienerM.Classification and regression by random forest [J]. R News, 2002, 2(3): 18-22. [本文引用: 1]
[31]
CawleyG C, TalbotN L C. On over-fitting in model selection and subsequent selection bias in performance evaluation [J]. Journal of Machine Learning Research, 2010, 11(1): 2079-2107. [本文引用: 1]
[32]
TeamR C.R Language Definition [M]. Vienna: R foundation for Statistical Computing, 2000. [本文引用: 1]
[33]
KuhnM.Building predictive models in R using the caret package [J]. Journal of Statistical Software, 2008, 28(5): 1-26. [本文引用: 2]
[34]
BlaschkeT.Object based image analysis for remote sensing [J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2010, 65(1): 2-16. [本文引用: 1]
[35]
LiuY, JiaoL, LiuY, et al. A self-adapting fuzzy inference system for the evaluation of agricultural land [J]. Environmental Modelling &Software, 2013, 40: 226-234. [本文引用: 1]
[36]
DescléeB, BogaertP, DefournyP.Forest change detection by statistical object-based method [J]. Remote Sensing of Environment, 2006, 102(1): 1-11. [本文引用: 1]
[37]
KohaviR.A study of cross-validation and bootstrap for accuracy estimation and model selection [J]. International Joint Conference on Artificial Intelligence, 1995, 14(2): 1137-1145. [本文引用: 1]
[ZhangB L, GaoJ B, GaoY, et al. Land use transition of mountainous rural areas in China [J]. Acta Geographica Sinica, 2018, 73(3): 503-517. ] [本文引用: 1]
[40]
United States Geological Survey (USGS). Satellite Data Sets of Landsat 8 OLI/TIRS C1 [EB/OL]. (2017-01-05)[2017-07-15]. URL [本文引用: 1]
[Fang County Bureau of Land. 2016 Annual Updated Data of Cultivated Land Quality in Shiyan City (Fang County Section) [R]. Fang County: Fang County Bureau of Land, 2017. ] [本文引用: 1]
[Resource and Environment Data Cloud Platform. Spatial Digital Elevation Model Data of China's Altitude (SRTM 30m) [EB/OL]. (2017-01-05)[2017-12-19]. ]URL [本文引用: 1]