分布式Top-k子图匹配技术 |
兰超, 张勇, 邢春晓 |
清华大学 计算机科学与技术系, 信息科学与技术国家实验室, 北京 100084 |
Distributed Top-k subgraph matching |
LAN Chao, ZHANG Yong, XING Chunxiao |
National Laboratory for Information Science and Technology, Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China |
摘要:
| |||
摘要Top-k子图匹配是一种应用广泛的图搜索技术。相比于单机环境,分布式环境下的Top-k子图匹配问题具有更大的挑战性。该文分析了已有方法在分布式环境下存在的问题,提出了包括查询拆分、查询执行、结果连接3个步骤的算法。算法通过查询拆分,彻底避免了生成中间结果过程中的数据传输,同时通过优化查询执行和结果连接步骤,避免不必要的中间结果生成,降低单个节点的计算量,提升整体效率。在此基础上,该文对分布式环境下Top-k连接策略进行了进一步优化。在真实图数据上进行的实验测试表明:该文提出的算法能够有效解决分布式环境下Top-k子图匹配问题,具有很好的扩展性,而且使用优化连接策略的算法性能较基础算法的效率有明显的提升。 | |||
关键词 :图搜索,子图匹配,Top-k子图匹配,分布式 | |||
Abstract:Top-k subgraph matching is a key operation in graph queries which is widely used in all kinds of applications such as social networks and knowledge graphs. Top-k subgraph matching is more challenging in distributed environments since it involves data and task transfers. Thus, existing local Top-k subgraph matching methods do not work well in distributed environments. An algorithm was developed to address this issue by dividing the problem into query decomposition, query execution, and ranked join stages. The query decomposition avoids unnecessary data transfers during the querying stage. The ranked join technique avoids generating unnecessary temporal results that reduces the overall latency of the algorithm. The algorithm effectiveness and efficiency were tested using real data and the results indicate that the algorithm, especially the optimized version, effectively solves the distributed Top-k subgraph matching problem. | |||
Key words:graph searchsubgraph matchingTop-k subgraph matchingdistributed systems | |||
收稿日期: 2016-01-29 出版日期: 2016-08-23 | |||
| |||
通讯作者:邢春晓,研究员,E-mail:xingcx@tsinghua.edu.cnE-mail: xingcx@tsinghua.edu.cn |
引用本文: |
兰超, 张勇, 邢春晓. 分布式Top-k子图匹配技术[J]. 清华大学学报(自然科学版), 2016, 56(8): 871-877. LAN Chao, ZHANG Yong, XING Chunxiao. Distributed Top-k subgraph matching. Journal of Tsinghua University(Science and Technology), 2016, 56(8): 871-877. |
链接本文: |
http://jst.tsinghuajournals.com/CN/10.16511/j.cnki.qhdxxb.2016.25.026或 http://jst.tsinghuajournals.com/CN/Y2016/V56/I8/871 |
图表:
图1 分布式Top-k 子图匹配查询示例 |
图2 查询划分示例 |
图3 最优匹配示例 |
图4 子查询连接示例 |
图5 采用优化评价函数的子图匹配连接 |
表1 DBPedia数据统计 |
图6 实验结果对比 |
参考文献:
[1] Gupta M,Gao J,Yan X,et al.Top-k interesting subgraph discovery in information networks[C]//ICDE2014.Chicago,IL,USA:IEEE,2014:820-831. [2] Huang J,Abadi D J,Ren K.Scalable SPARQL querying of large RDF graphs[J].Proc VLDB Endow,2011,4(11):1123-1134. [3] Yang S,Wu Y,Sun H,et al.Schemaless and structureless graph querying[J].Proc VLDB Endow,2014,7(7):565-576. [4] Khan A,Wu Y,Aggarwal C C,et al.NeMa:Fast graph search with label similarity[J].Proc VLDB Endow,2013,6(3):181-192. [5] Gonzalez J E,Low Y,Gu H,et al.PowerGraph:Distributed graph-parallel computation on natural graphs[C]//USENIX 2012.Hollywood,CA,USA:USENIX Association,2012:17-30. [6] Albert R,Jeong H,Barabási A.Error and attack tolerance of complex networks[J].Nature,2000,406(6794):378-382. [7] Abou-Rjeili A,Karypis G.Multilevel algorithms for partitioning power-law graphs[C]//Proceedings of the 20th International Conference on Parallel and Distributed Processing.Rhodes Island,Greece:IEEE Computer Society,2006:124-124. [8] Xin R S,Gonzalez J E,Franklin M J,et al.GraphX:A resilient distributed graph system on spark[C]//First International Workshop on Graph Data Management Experiences and Systems.New York,USA:ACM,2013:1-6. [9] Ullmann J R.An algorithm for subgraph isomorphism[J].J ACM,1976,23(1):31-42. [10] He H,Wang H,Yang J,et al.BLINKS:Ranked keyword searches on graphs[C]//SIGMOD2007.New York,USA:ACM,2007:305-316. [11] Sun Z,Wang H,Wang H,et al.Efficient subgraph matching on billion node graphs[J].Proc VLDB Endow,2012,5(9):788-799. [12] Gou G,Chirkova R.Efficient algorithms for exact ranked twig-pattern matching over graphs[C]//SIGMOD2008.New York,USA:ACM,2008:581-594. [13] Ilyas I F,Beskales G,Soliman M A.A survey of top-k query processing techniques in relational database systems[J].ACM Comput Surv,2008,40(4):1-58. [14] Dbpedia.Dbpedia[Z/OL].(2015-01-01).http://wiki.dbpedia.org/. |
相关文章:
|