于炯,
新疆大学信息科学与工程学院 乌鲁木齐 830046
基金项目:国家自然科学基金(61862060, 61462079, 61562086, 61562078),新疆大学博士生科技创新项目(XJUBSCX-201901)
详细信息
作者简介:褚征:男,1991年生,博士生,研究方向为分布式计算、内存计算和机器学习
于炯:男,1966年生,教授,研究方向为分布式计算、内存计算和绿色计算
通讯作者:于炯 yujiong@xju.edu.cn
中图分类号:TN919; TP311计量
文章访问数:2815
HTML全文浏览量:882
PDF下载量:78
被引次数:0
出版历程
收稿日期:2019-07-23
修回日期:2020-02-17
网络出版日期:2020-03-10
刊出日期:2020-06-22
Performance Prediction Based on Random Forest for the Stream Processing Checkpoint
Zheng CHU,Jiong YU,
School of Information Science and Engineering, Xinjiang University, Urumqi 830046, China
Funds:The National Natural Science Foundation of China (61862060, 61462079, 61562086, 61562078), The Doctoral Science, Technology Innovation Project in Xinjiang University (XJUBSCX-201901)
摘要
摘要:物联网(IoT)的发展引起流数据在数据量和数据类型两方面不断增长。由于实时处理场景的不断增加和基于经验知识的配置策略存在缺陷,流处理检查点配置策略面临着巨大的挑战,如费事费力,易导致系统异常等。为解决这些挑战,该文提出基于回归算法的检查点性能预测方法。该方法首先分析了影响检查点性能的6种特征,然后将训练集的特征向量输入到随机森林回归算法中进行训练,最后,使用训练好的算法对测试数据集进行预测。实验结果表明,与其它机器学习算法相比,随机森林回归算法在CPU密集型基准测试,内存密集型基准测试和网络密集型基准测试上针对检查点性能的预测具有误差低,准确率高和运行高效的优点。
关键词:流处理/
预测方法/
检查点性能/
随机森林/
回归算法
Abstract:Since real-time processing scenarios for ever-increasing amount and type of streaming data caused by the development of the Internet of Things (IoT) keep increasing, and strategies based on empirical knowledge for checkpoint configuration are deficiencies, the strategy faces huge challenges, such as time-consuming, labor-intensive, causing system anomalies, etc. To address these challenges, regression algorithm-based prediction is proposed for checkpoint performance. Firstly, six kinds of features, which have a huge influence on the performance, are analyzed, and then feature vectors of the training set are input into the regression algorithms for training, finally, test sets are used for the checkpoint performance prediction. Compared with other machine learning algorithms, the experimental results illustrat that the Random Forest (RF) has lower errors, higher accuracy and faster execution on CPU intensive benchmark, memory intensive benchmark and network intensive benchmark.
Key words:Stream processing/
Prediction method/
Checkpoint Performance/
Random Forest (RF)/
Regression algorithm
PDF全文下载地址:
https://jeit.ac.cn/article/exportPdf?id=1a749da7-1eb1-4b9c-851a-e1e5be7fb447