摘要:传统机器学习算法已广泛应用于矿产预测,但面对地质大数据的高维稀疏、不平衡小样本等特性仍缺乏有效处理和分析的方法,设计适合地质大数据特点的机器学习算法是智能矿产预测亟需解决的新问题。本文以内蒙古浩布高地区的铅锌多金属矿产预测为例,提出了一种面向地质大数据的半监督协同训练矿产预测模型。首先对研究区地质找矿信息和地球化学异常信息进行定量分析,提取断裂构造、二叠系地层、燕山期侵入岩、地层与岩体接触带、围岩蚀变及Pb、Zn、Sn、Cu地球化学异常共9种找矿因子。然后利用递归特征消除法优选找矿因子组合,不包括Sn异常在内的8个找矿因子组合被选为最优组合。最后,利用支持向量机和随机森林算法作为基分类器进行半监督协同训练矿产预测,绘制成矿概率分布图。ROC曲线和预测度曲线分析结果表明,半监督协同训练模型的AUC值和预测效率都高于随机森林和支持向量机模型。研究结果也为大数据环境下的智能矿产预测提供了一种新的思路。?
关键词: 地质大数据/
机器学习/
半监督协同训练/
矿产预测/
浩布高地区
Abstract:Machine learning algorithms have been widely used for mineral prediction. However, these algorithms are hard to deal with the geological data with the characteristics of high-dimensional, sparse and unbalanced samples. It is important to study some new mineral prediction models suited for geological big data. In this paper, a semi-supervised co-training model was proposed for mineral prediction in Haobugao district, Inner Mongolia. Firstly, nine prospecting factors were extracted consisting the faults, the Permian formation, the Yanshanian intrusions, the contact zone between Yanshanian intrusions and Permian formation, the zones of skarn alteration and Pb, Zn, Sn, Cu geochemical anomalies. Secondly, the feature selection method based RF-RFE was used to optimize the factors combinations. The eight factors were selected as the final prospecting factors excepted Sn geochemical anomaly. Then SVM and RF model were used as basic classifier for co-training model to predict mineral probability. The analysis of the ROC curves and predictability curves showed that the semi-supervised co-training model was more accurate than single SVM or RF model. It is suggested that this method has a certain feasibility for mineral prediction in the big data environment.?
Key words:Big data/
Machine learning/
Semi-supervised co-training model/
Mineral prediction/
Haobugao district
PDF全文下载地址:
http://www.dzkx.org/data/article/export-pdf?id=geology_11496