|
摘要top-k查询要求查找出最符合需求的前k个结果,是对等网络中的重要数据处理技术。该文研究了结构化对等网络中数据在各节点上垂直划分的精确top-k查询处理,在3通信回合的三阶段阈值(TPUT)算法基础上提出了4回合阈值算法4R-TPUT。它由下界估计、剪枝和结果查找3个阶段组成,通过在TPUT的下界估计阶段增加一个通信回合来获取更多的数据信息以得到更准确的top-k下界估计和剪枝阈值,从而减少查询处理过程中的数据访问和传输量。实验表明: 4R-TPUT相比于TPUT较大幅度降低了数据传输量,减小了查询响应时间,是一种更高效的top-k查询算法。
|
关键词 :对等网络,top-k查询,TPUT,4R-TPUT |
Abstract:The Top-k query returns to users the k best match results and is an important data processing technique in peer-to-peer systems. This paper focuses on accurate top-k query processing in structured peer-to-peer systems in which the data is vertically partitioned among peers. The three communication round-trip algorithm TPUT is expanded into a 4R-TPUT threshold algorithm involving 4 round-trip communications. The algorithm has lower bound estimation, pruning and results lookup phases. TPUT introduces an additional round-trip communication in the first phase to get more information on the data to obtain a better top-k lower bound estimation and pruning threshold which can reduce data accesses and transmissions in the query processing. Tests show that 4R-TPUT greatly reduces data transmissions compared with TPUT and, thus, requires less query response time, so 4R-TPUT is a more efficient top-k query algorithm.
|
Key words:peer-to-peer systemstop-k queryTPUT4R-TPUT |
收稿日期: 2013-01-10 出版日期: 2015-04-17 |
基金资助:国家 “八六三” 高技术项目 (2010AA012400);国家自然科学基金面上项目 (61272539);浙江省自然科学基金项目 (LQ14F020013);浙江省重点科技创新团队项目(2009R50046) |
top-k算法 | 对等网络 | 数据划分 | 查询结果 | DTA[2] | 结构化 | 垂直 | 精确 | TPUT[3] | 结构化 | 垂直 | 精确 | DHTop[4] | 结构化 | 垂直 | 精确 | KLEE[5] | 结构化 | 垂直 | 近似 | TJA[6] | 非结构化 | 垂直 | 精确 | BRANCA[7] | 非结构化 | 水平 | 精确 | Hose's[8] | 非结构化 | 水平 | 近似 | FD[9] | 非结构化 | 水平 | 近似 | HPJT[10] | 非结构化 | 垂直 | 精确 | ASAP[11] | 非结构化 | 水平 | 精确 | SPEERTO[12] | 超级节点 | 水平 | 精确 | Balke's[13] | 超级节点 | 水平 | 精确 | HT-p2p plus[14] | 超级节点 | 垂直 | 精确 |
|
现有对等网络中的top-k查询算法及其分类 |
|
4R-TPUT算法 |
|
4R-TPUT对TPUT算法的改进 |
参数 | 取值范围 | 默认值 | 节点数N | 16~64 | 16 | 数据对象个数n | 1 000~1 000000 | 100 000 | 查询属性个数m | 2~10 | 4 | 数据分布 | Uniform、 Gauss | Uniform | k | 1~100 | 10 |
|
实验中的参数设置 |
|
4R-TPUT和TPUT在k变化时的性能对比(N=16, n=100 000, m=4) |
|
4R-TPUT和TPUT在n变化时的性能对比(数据集为Uniform, N=16, k=10, m=4) |
|
4R-TPUT和TPUT在m变化时的性能对比(数据集为Uniform, N=16, k=10, n=100 000) |
|
4R-TPUT在N变化时的性能(数据集为Uniform, n=100 000, m=4) |
[1] | SUNYongjiao, YUAN Ye, WANG Guoren. Top-k query processing over uncertain data in distributed environments [J]. J World Wide Web, 2012, 15(4): 429-446. |
[2] | ZHANG Jiangong, Suel T. Efficient query evaluation on large textual collections in a peer-to-peer environment [C]// Proc 5th IEEE Int Conf Peer-to-Peer Computing. Konstanz, Germany: IEEE Computer Society, 2005: 225-233. |
[3] | CAO Pei, WANG Zhe. Efficient top-k query calculation in distributed networks [C]// Proc 23rd Annual ACM Symp Principles of Distributed Computing. Newfoundland, Canada: ACM, 2004: 206-215. |
[4] | Akbarinia R, Pacitti E, Valduriez P. Processing top-k queries in distributed hash tables [C]// Proc 13th European Int Conf Parallel Processing. Rennes, France: Springer, 2007: 489-502. |
[5] | Michel S,Triantafillou P, Weikum G. KLEE: A framework for distributed top-k query algorithms [C]// Proc Int Conf Very Large Data Bases. Trondheim, Norway: ACM, 2005: 637-648. |
[6] | Zeinalipour-Yazti D, Vagena Z, Kalogeraki V, et al.Finding the k highest-ranked answers in a distributed network[J]. Computer Networks, 2009, 53(9): 1431-1449. |
[7] | ZHAO Keping, TAO Yufei, ZHOU Shuigeng. Efficient top-k processing in large-scaled distributed environments[J]. Data and Knowledge Engineering, 2007, 63(2): 315-335. |
[8] | Hose K,Karnstedt M, Sattler K U, et al.Processing top-n queries in P2P-based web integration systems with probabilistic guarantees [C]// Proc 8th Int Workshop Web and Databases. Baltimore, USA: ACM, 2005: 109-114. |
[9] | Akbarinia R, Pacitti E, Valduriez P. Reducing network traffic in unstructured P2P systems using top-k queries[J]. Distributed and Parallel Databases, 2006, 19(2-3): 67-86. |
[10] | GUAN Zhitao, YAN Guangwei, HUANG Heqing. A novel top-k query scheme in unstructured p2p networks [C]// Proc 9th IEEE Int Conf Computer and Information Technology. Xiamen, China: IEEE Computer Society, 2009: 16-21. |
[11] | Dedzoe W K, Lamarre P, Akbarinia R, et al.ASAP top-k query processing in unstructured p2p systems [C]// Proc 10th Int Conf Peer-to-Peer Computing. Delft, Netherlands: IEEE, 2010: 1-10. |
[12] | Vlachou A, Doulkeridis C, Nørvåg K, et al.On efficient top-k query processing in highly distributed environments [C]// Proc 2008 ACM SIGMOD Int Conf Management of Data. Vancouver, Canada: ACM, 2008: 753-764. |
[13] | Balke W T, Nejdl W, Siberski W, et al.Progressive distributed top-k retrieval in peer-to-peer networks [C]// Proc 21st Int Conf Data Engineering. Tokyo, Japan: IEEE Computer Society, 2005: 174-185. |
[14] | Chrysakis I, Chalkidis C, Plexousakis D. Evaluation of top-k queries in peer-to-peer networks using threshold algorithms [C]// Proc 19th ACM Int Conf Information and Knowledge Management. Toronto, Canada: ACM, 2010: 1305-1308. |
[15] | YU Hailing, LI Huagang, WU Ping, et al.Efficient processing of distributed top-k queries [C]// Proc 16th Int Conf Database and Expert Systems Applications. Copenhagen, Denmark: Springer, 2005: 65-74. |
[16] | Chen B, Liang W, Yu J X. Energy-efficient top-k query evaluation and maintenance in wireless sensor networks [Z/OL].[2013-08-07]. http://link.springer.com/article/10.1007/s11276-013-0625-6. |