删除或更新信息,请邮件至freekaoyan#163.com(#换成@)

分布式Top-k子图匹配技术

清华大学 辅仁网/2017-07-07

分布式Top-k子图匹配技术
兰超, 张勇, 邢春晓
清华大学 计算机科学与技术系, 信息科学与技术国家实验室, 北京 100084
Distributed Top-k subgraph matching
LAN Chao, ZHANG Yong, XING Chunxiao
National Laboratory for Information Science and Technology, Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China

摘要:

输出: BibTeX | EndNote (RIS)
摘要Top-k子图匹配是一种应用广泛的图搜索技术。相比于单机环境,分布式环境下的Top-k子图匹配问题具有更大的挑战性。该文分析了已有方法在分布式环境下存在的问题,提出了包括查询拆分、查询执行、结果连接3个步骤的算法。算法通过查询拆分,彻底避免了生成中间结果过程中的数据传输,同时通过优化查询执行和结果连接步骤,避免不必要的中间结果生成,降低单个节点的计算量,提升整体效率。在此基础上,该文对分布式环境下Top-k连接策略进行了进一步优化。在真实图数据上进行的实验测试表明:该文提出的算法能够有效解决分布式环境下Top-k子图匹配问题,具有很好的扩展性,而且使用优化连接策略的算法性能较基础算法的效率有明显的提升。
关键词 图搜索,子图匹配,Top-k子图匹配,分布式
Abstract:Top-k subgraph matching is a key operation in graph queries which is widely used in all kinds of applications such as social networks and knowledge graphs. Top-k subgraph matching is more challenging in distributed environments since it involves data and task transfers. Thus, existing local Top-k subgraph matching methods do not work well in distributed environments. An algorithm was developed to address this issue by dividing the problem into query decomposition, query execution, and ranked join stages. The query decomposition avoids unnecessary data transfers during the querying stage. The ranked join technique avoids generating unnecessary temporal results that reduces the overall latency of the algorithm. The algorithm effectiveness and efficiency were tested using real data and the results indicate that the algorithm, especially the optimized version, effectively solves the distributed Top-k subgraph matching problem.
Key wordsgraph searchsubgraph matchingTop-k subgraph matchingdistributed systems
收稿日期: 2016-01-29 出版日期: 2016-08-23
ZTFLH:TP319
通讯作者:邢春晓,研究员,E-mail:xingcx@tsinghua.edu.cnE-mail: xingcx@tsinghua.edu.cn
引用本文:
兰超, 张勇, 邢春晓. 分布式Top-k子图匹配技术[J]. 清华大学学报(自然科学版), 2016, 56(8): 871-877.
LAN Chao, ZHANG Yong, XING Chunxiao. Distributed Top-k subgraph matching. Journal of Tsinghua University(Science and Technology), 2016, 56(8): 871-877.
链接本文:
http://jst.tsinghuajournals.com/CN/10.16511/j.cnki.qhdxxb.2016.25.026 http://jst.tsinghuajournals.com/CN/Y2016/V56/I8/871


图表:
图1 分布式Top-k 子图匹配查询示例
图2 查询划分示例
图3 最优匹配示例
图4 子查询连接示例
图5 采用优化评价函数的子图匹配连接
表1 DBPedia数据统计
图6 实验结果对比


参考文献:
[1] Gupta M,Gao J,Yan X,et al.Top-k interesting subgraph discovery in information networks[C]//ICDE2014.Chicago,IL,USA:IEEE,2014:820-831.
[2] Huang J,Abadi D J,Ren K.Scalable SPARQL querying of large RDF graphs[J].Proc VLDB Endow,2011,4(11):1123-1134.
[3] Yang S,Wu Y,Sun H,et al.Schemaless and structureless graph querying[J].Proc VLDB Endow,2014,7(7):565-576.
[4] Khan A,Wu Y,Aggarwal C C,et al.NeMa:Fast graph search with label similarity[J].Proc VLDB Endow,2013,6(3):181-192.
[5] Gonzalez J E,Low Y,Gu H,et al.PowerGraph:Distributed graph-parallel computation on natural graphs[C]//USENIX 2012.Hollywood,CA,USA:USENIX Association,2012:17-30.
[6] Albert R,Jeong H,Barabási A.Error and attack tolerance of complex networks[J].Nature,2000,406(6794):378-382.
[7] Abou-Rjeili A,Karypis G.Multilevel algorithms for partitioning power-law graphs[C]//Proceedings of the 20th International Conference on Parallel and Distributed Processing.Rhodes Island,Greece:IEEE Computer Society,2006:124-124.
[8] Xin R S,Gonzalez J E,Franklin M J,et al.GraphX:A resilient distributed graph system on spark[C]//First International Workshop on Graph Data Management Experiences and Systems.New York,USA:ACM,2013:1-6.
[9] Ullmann J R.An algorithm for subgraph isomorphism[J].J ACM,1976,23(1):31-42.
[10] He H,Wang H,Yang J,et al.BLINKS:Ranked keyword searches on graphs[C]//SIGMOD2007.New York,USA:ACM,2007:305-316.
[11] Sun Z,Wang H,Wang H,et al.Efficient subgraph matching on billion node graphs[J].Proc VLDB Endow,2012,5(9):788-799.
[12] Gou G,Chirkova R.Efficient algorithms for exact ranked twig-pattern matching over graphs[C]//SIGMOD2008.New York,USA:ACM,2008:581-594.
[13] Ilyas I F,Beskales G,Soliman M A.A survey of top-k query processing techniques in relational database systems[J].ACM Comput Surv,2008,40(4):1-58.
[14] Dbpedia.Dbpedia[Z/OL].(2015-01-01).http://wiki.dbpedia.org/.


相关文章:
[1]赵俊韬, 冯伟, 赵明, 王京. 频谱共享系统中基于大尺度信道状态信息的资源优化[J]. 清华大学学报(自然科学版), 2016, 56(7): 692-695.
[2]王璟, 王燕敏, 冯伟, 肖立民, 周世东. 分布式天线系统中基于大尺度信道信息的功耗优化[J]. 清华大学学报(自然科学版), 2016, 56(7): 696-699,706.
[3]杨德亮, 谢旭东, 李春文, 牛小铁. 基于分布式视频网络的交叉口车辆精确定位方法[J]. 清华大学学报(自然科学版), 2016, 56(3): 281-286,293.
[4]马亚萍, 吴楠, 高远, 张辉, 李丽华. 基于分布式建筑控制策略的人员疏散系统[J]. 清华大学学报(自然科学版), 2015, 55(8): 927-932.
[5]沈婧楠, 王凌, 王圣尧. 求解分布式置换流水线调度问题的化学反应优化算法[J]. 清华大学学报(自然科学版), 2015, 55(11): 1184-1189,1196.
[6]刘海蛟, 华楠, 李艳和, 郑小平. 面向动态业务的光网络信息失真机理及其影响分析[J]. 清华大学学报(自然科学版), 2015, 55(11): 1235-1240.
[7]罗剑,罗禹贡,褚文博,张书玮,李克强. 分布式电动车制/驱动力协调行驶稳定性控制[J]. 清华大学学报(自然科学版), 2014, 54(6): 729-733.
[8]孙培林,汤俊,张宁. 分布式相参雷达相参性能的两种监控算法[J]. 清华大学学报(自然科学版), 2014, 54(4): 419-424.
[9]韩心慧, 肖祥全, 张建宇, 刘丙双, 张缘. 基于社交关系的DHT网络Sybil攻击防御[J]. 清华大学学报(自然科学版), 2014, 54(1): 1-7.

相关话题/优化 环境 技术 系统 信息