支持时序数据聚合函数的索引 |
黄向东1, 郑亮帆1, 邱明明1, 张金瑞1, 王建民1,2 |
1. 清华大学软件学院, 北京 100084; 2. 清华信息科学与技术国家实验室(筹), 北京 100084 |
Time-series data aggregation index |
HUANG Xiangdong1, ZHENG Liangfan1, QIU Mingming1, ZHANG Jinrui1, WANG Jianmin1,2 |
1. School of Software, Tsinghua University, Beijing 100084, China; 2. Tsinghua National Laboratory for Information Science and Technology(TNList), Beijing 100084, China |
摘要:
| |||
摘要时序数据是工业新发展的关键, 其中针对时序数据的聚合操作成为主要的应用场景之一。传统关系型数据库不足以支撑海量的时序数据, 而现有的NoSQL数据库对时序数据的聚合操作显得低效耗时。该文提出了一种结合概要表和线段树思想的支持时序数据聚合操作的高效索引机制, 并实现了基于这种索引机制的查询算法。该查询算法将概要表的思想引入NoSQL中, 缩小了待查询数据集, 并通过在概要表上建立概要森林的形式, 将最坏情况下的待查询数据集进一步缩小为索引个数的lbn倍。此外, 该算法通过计算直接定位出待查询的一系列索引数据, 有效避免了一般树形结构的递归遍历操作, 减少了大量的磁盘开销。最后, 通过与一般索引机制的查询对比实验, 验证了该索引机制的可用性和高效性。 | |||
关键词 :索引,聚合操作,时序数据,概要表,线段树 | |||
Abstract:Time-series data is the key to industrial development, with the aggregation of the data an important step in practice. However, traditional relational databases fail to support vast amounts of time-series data. The NoSQL databases are inefficient and require time-consuming calculation to aggregate of time-series data. This paper presents an efficient index mechanism that supports time-series data aggregation by combining a synopsis table and a segment tree. A query algorithm based on this mechanism introduces the synopsis table into the NoSQL database and builds a segment tree from the synopsis table for archiving that is lbn the size of the original query set. This query algorithm can directly locate a series of index data to be queried without the recursive operations in traditional trees and effectively reduces I/O overhead. This study shows the efficiency of this index mechanism by comparisons with general index mechanisms. | |||
Key words:indexaggregate operationtime-series datasynopsis tablesegment tree | |||
收稿日期: 2015-09-28 出版日期: 2016-04-01 | |||
| |||
通讯作者:王建民,教授,E-mail:jimwang@tsinghua.edu.cnE-mail: jimwang@tsinghua.edu.cn |
引用本文: |
黄向东, 郑亮帆, 邱明明, 张金瑞, 王建民. 支持时序数据聚合函数的索引[J]. 清华大学学报(自然科学版), 2016, 56(3): 229-236,245. HUANG Xiangdong, ZHENG Liangfan, QIU Mingming, ZHANG Jinrui, WANG Jianmin. Time-series data aggregation index. Journal of Tsinghua University(Science and Technology), 2016, 56(3): 229-236,245. |
链接本文: |
http://jst.tsinghuajournals.com/CN/10.16511/j.cnki.qhdxxb.2016.21.032或 http://jst.tsinghuajournals.com/CN/Y2016/V56/I3/229 |
图表:
参考文献:
[1] Goldstein J, Larson P A. Optimizing queries using materialized views:a practical, scalable solution[J]. Special Interest Group on Management Of Data, 2001, 30(2):331-342. [2] Lehner W, Cochrane B, Pirahesh H, et al. Applying mass query optimization to speed up automatic summary table refresh[C]//In proceedings of the international conference on data engineering. Heidelberg, Germany:IEEE Computer Society, 2001:1-22. [3] Chen Y, Chen M, Liu X, et al. MapReduce based aggregate-join query algorithms[J]. Journal of Computer Research & Development, 2013, 50(z1):306-311. [4] Li Y, Kim G B, Wen L R, et al. MHB-Tree:A distributed spatial index method for document based NoSQL database system[J]. Lecture Notes in Electrical Engineering, 2013, 214:489-497. [5] Dean J, Ghemawat S. MapReduce:Simplified data processing on large clusters[J]. Operating Systems Design & Implementation, 2004, 51(1):147-152. [6] Ross K A, Srivastava D, Sudarshan S. Materialized view maintenance and integrity constraint checking:Trading space for time[J]. AcmSigmod Record, 199625(2):447-458. [7] Li C, Chen J, Jin C, et al. MR-tree:An efficient index for MapReduce[J]. International Journal of Communication Systems, 2014, 27(6):828-838. [8] Abramova V, Bernardino J. NoSQL databases:MongoDB vs Cassandra[C]//Proceedings of the International C* Conference on Computer Science and Software Engineering. Porto, Portual:ACM, 2013:14-22. [9] Ibrahim S, Jin H, Lu L, et al. Adaptive disk I/O scheduling for MapReduce in virtualized environment[C]//IEEE 42nd International Conference on Parallel Processing. Taipei, Taiwan, China:IEEE Computer Society, 2011:335-344. [10] Zilio D C, Zuzarte C, Lightstone S. Recommending materialized views and indexes with IBM DB2 design advisor[C]//In Proceedings of the International Conference on Autonomic Computing. New York, NY, USA:IEEE Computer Society, 2004:180-187. [11] Qu Z C, Guo T L. A maintenance strategy of materialized views in distributed environment[J].Applied Mechanics and Materials, 2012, 182-183:2123-2126. [12] Gal A. Obsolescent materialized views in query processing of enterprise information systems[C]//In Proc Eighth International Conference on Information and Knowledge Management. Kansas City, MO, USA:ACM, 1999:367-374. [13] Arman N. A materialized view for the same generation query in deductive databases[J].Computer & Information Science, 2012,5(6):1-5. |
相关文章:
|