删除或更新信息,请邮件至freekaoyan#163.com(#换成@)

基于对称KL距离的用户行为时序聚类方法

本站小编 Free考研考试/2022-01-03

李文璟,
曾祥健,,
李梦,
喻鹏
北京邮电大学网络与交换技术国家重点实验室 ??北京 ??100876
基金项目:国家电网公司科技项目(52010116000W)

详细信息
作者简介:李文璟:女,1973年生,教授,研究方向为网络管理与通信软件、未来网络智能管理
曾祥健:男,1993年生,硕士生,研究方向为网络管理与智能信息处理
李梦:女,1993年生,硕士生,研究方向为网络管理与智能信息处理
喻鹏:男,1986年生,副教授,研究方向为基于人工智能的网络管理
通讯作者:曾祥健  zeng_fsh@163.com
中图分类号:TN915.07

计量

文章访问数:1443
HTML全文浏览量:372
PDF下载量:60
被引次数:0
出版历程

收稿日期:2018-01-04
修回日期:2018-06-27
网络出版日期:2018-07-30
刊出日期:2018-10-01

Time Series Method Clustering in User Behavior Based on Symmetric Kullback-Leibler Distance

Wenjing LI,
Xiangjian ZENG,,
Meng LI,
Peng YU
State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing 100876, China
Funds:The Project of Science and Technology of State Grid Corporation of China (52010116000W)


摘要
摘要:网络用户随时间变化的行为分析是近年来用户行为分析的热点,通常为了发现用户行为的特征需要对用户做聚类处理。针对用户时序数据的聚类问题,现有研究方法存在计算性能差,距离度量不准确的缺点,无法处理大规模数据。为了解决上述问题,该文提出基于对称KL距离的用户行为时序聚类方法。首先将时序数据转化为概率模型,从划分聚类的角度出发,在距离度量中引入KL距离,用以衡量不同用户间的时间分布差异。针对实网数据中数据规模大的特点,该方法在聚类的各个环节针对KL距离的特点做了优化,并证明了一种高效率的聚类质心求解办法。实验结果证明,该算法相比采用欧式距离和DTW距离度量的聚类算法能提高4%的准确度,与采用medoids聚类质心的聚类算法相比计算时间少了一个量级。采用该算法对实网环境中获取的用户流量数据处理证明了该算法拥有可行的应用价值。
关键词:时序聚类/
用户分析/
Kullback-Leibler距离
Abstract:Behavioral analysis of Internet users over time is a hot spot in user behavior analysis in recent years, usually clustering users is a way to find the feature of user behavior. Problems like poor computing performance or inaccurate distance metric exist in present research about clustering user time series data, which is unable to deal with large scale data. To solve this problem, a method for clustering time series in user behavior is proposed based on symmetric Kullback-Leibler (KL) distance. First time series data is transformed into probability models, and then a distance metric named KL distance is introduce, using partition clustering method, the different time distribution between different users. For the Large-scale feature of physical network data, each process of clustering is optimized based on the characteristics of KL distance. It also proves an efficient solution for finding the clustering centroids. The experimental results show that this method can improve the accuracy of 4% compared with clustering algorithm using the Euclidean distance metric or DTW metric, and the calculation time of this method is less a quantity degree than clustering algorithm using medoids centroids. This method is used to deal with user traffic data obtained in physical network which proves its application value.
Key words:Time series clustering/
User analysis/
Kullback-Leibler distance



PDF全文下载地址:

https://jeit.ac.cn/article/exportPdf?id=f804164a-8401-420b-ab36-fc191957ea5f
相关话题/数据 网络 智能 计算 信息