摘要:为了客观地确定数据点投图后分布的主要区域,本文提出了一种基于数据密度确定数据主要分布区域的方法。利用该方法可以更加直观地了解数据分布,并可以作为数据清洗的预处理手段。本文基于GEOROC大数据,以全碱对硅(TAS)图解为例,进行了分析和验证。通过提取GEOROC 数据库中与TAS 图解相关的岩石样本中SiO2、Na2O、K2O 和烧失量含量数据,通过数据常规清洗和归算,最终获得24 个种类合计13.3 万条有效数据。通过数据投点、分区统计和提取80% 数据的分布区域,验证了24种岩石样品与TAS图解的吻合程度。通过综合研究分析发现,有6类岩石的数据分布与TAS图解定义区域基本一致,18类岩石的数据分布与TAS图解定义区域有系统性偏差。大数据研究证明了TAS图解的不足之处,利用全碱和SiO2作为指标,难以实现提升总体分类的准确性。?
关键词: 大数据数据/
分布区域提取/
密度分布/
TAS图解
Abstract:In order to determine the main distribution areas of data points after mapping, this paper proposes an automatic distribution area determination method based on data density. The method can be used to understand the data distribution more intuitively and can be used as a preprocessing means for data cleaning. Based on GEOROC database, the total alkali vs. silicon(TAS)diagram is analyzed and verified in this paper. By extracting SiO2, Na2O, K2O and LOI of rock samples related to TAS diagram in GEOROC database, about 133 thousand valid data of 24 rock types were obtained through routine data cleaning and reduction. The agreement between 24 rock samples and TAS diagram was verified by data points mapping, partition statistics and 80% data distribution area extraction. Through comprehensive research and analysis, it is found that the data distribution of 9 rock types is basically consistent with TAS diagram definition area, and the data distribution of 15 rock types has systematic deviation in TAS diagram definition area. Big data research has proved the deficiency of the TAS diagram. Using Total-Alkali and SiO2 as indicators, it is difficult to improve the accuracy of the overall classification.?
Key words:Big data/
Determination of distribution region/
Density distribution/
TAS diagram
PDF全文下载地址:
http://www.dzkx.org/data/article/export-pdf?id=geology_11489