删除或更新信息,请邮件至freekaoyan#163.com(#换成@)

基于迁移演员-评论家学习的服务功能链部署算法

本站小编 Free考研考试/2022-01-03

唐伦,
贺小雨,,
王晓,
陈前斌
1.重庆邮电大学通信与信息工程学院 重庆 400065
2.重庆邮电大学移动通信技术重点实验室 重庆 400065
基金项目:国家自然科学基金(61571073),重庆市教委科学技术研究项目(KJZD-M20180601)

详细信息
作者简介:唐伦:男,1973年生,教授,博士生导师,主要研究方向为新一代无线通信网络、异构蜂窝网络等
贺小雨:女,1995年生,硕士生,研究方向为网络切片资源分配和强化学习
王晓:男,1995年生,硕士生,研究方向为网络切片资源优化和机器学习
陈前斌:男,1967年生,教授,博士生导师,主要研究方向为个人通信、多媒体信息处理与传输、下一代移动通信网络、异构蜂窝网络等
通讯作者:贺小雨 Hexy1995@163.com
中图分类号:TN915

计量

文章访问数:1455
HTML全文浏览量:551
PDF下载量:58
被引次数:0
出版历程

收稿日期:2019-07-18
修回日期:2020-03-07
网络出版日期:2020-04-08
刊出日期:2020-11-16

Deployment Algorithm of Service Function Chain Based on Transfer Actor-Critic Learning

Lun TANG,
Xiaoyu HE,,
Xiao WANG,
Qianbin CHEN
1. School of Communication and Information Engineering, Chongqing University of Posts and Telecommunications, Chongqing 400065, China
2. Key Laboratory of Mobile Communication, Chongqing University of Posts and Telecommunications, Chongqing 400065, China
Funds:The National Natural Science Foundation of China (61571073), The Science and Technology Research Program of Chongqing Municipal Education Commission (KJZD-M20180601)


摘要
摘要:针对5G网络切片环境下由于业务请求的随机性和未知性导致的资源分配不合理从而引起的系统高时延问题,该文提出了一种基于迁移演员-评论家(A-C)学习的服务功能链(SFC)部署算法(TACA)。首先,该算法建立基于虚拟网络功能放置、计算资源、链路带宽资源和前传网络资源联合分配的端到端时延最小化模型,并将其转化为离散时间马尔可夫决策过程(MDP)。而后,在该MDP中采用A-C学习算法与环境进行不断交互动态调整SFC部署策略,优化端到端时延。进一步,为了实现并加速该A-C算法在其他相似目标任务中(如业务请求到达率普遍更高)的收敛过程,采用迁移A-C学习算法实现利用源任务学习的SFC部署知识快速寻找目标任务中的部署策略。仿真结果表明,该文所提算法能够减小且稳定SFC业务数据包的队列积压,优化系统端到端时延,并提高资源利用率。
关键词:网络切片/
服务功能链部署/
马尔可夫决策过程/
演员-评论家学习/
迁移学习
Abstract:To solve the problem of high system delay caused by unreasonable resource allocation because of randomness and unpredictability of service requests in 5G network slicing, this paper proposes a deployment scheme of Service Function Chain (SFC) based on Transfer Actor-Critic (A-C) Algorithm (TACA). Firstly, an end-to-end delay minimization model is built based on Virtual Network Function (VNF) placement, and joint allocation of computing resources, link resources and fronthaul bandwidth resources, then the model is transformed into a discrete-time Markov Decision Process (MDP). Next, A-C learning algorithm is adopted in the MDP to adjust dynamically SFC deployment scheme by interacting with environment, so as to optimize the end-to-end delay. Furthermore, in order to realize and accelerate the convergence of the A-C algorithm in similar target tasks (such as the arrival rate of service requests is generally higher), the transfer A-C algorithm is adopted to utilize the SFC deployment knowledge learned from source tasks to find quickly the deployment strategy in target tasks. Simulation results show that the proposed algorithm can reduce and stabilize the queuing length of SFC packets, optimize the system end-to-end delay, and improve resource utilization.
Key words:Network slice/
Service Function Chain (SFC) deployment/
Markov Decision Process (MDP)/
Actor-Critic (A-C) learning/
Transfer learning



PDF全文下载地址:

https://jeit.ac.cn/article/exportPdf?id=4e4331f8-de79-4c48-a91d-184d96c7f30f
相关话题/网络 资源 过程 优化 重庆邮电大学