删除或更新信息,请邮件至freekaoyan#163.com(#换成@)

基于WGAIL-DDPG(λ)的车辆自动驾驶决策模型

本站小编 Free考研考试/2024-01-16

-->
张明恒,吕新飞,万星,吴增文.基于WGAIL-DDPG(λ)的车辆自动驾驶决策模型[J].,2022,62(1):77-84
基于WGAIL-DDPG(λ)的车辆自动驾驶决策模型
Decision model for automatic vehicle driving based on WGAIL-DDPG(λ)
DOI:10.7511/dllgxb202201010
中文关键词:自动驾驶决策深度强化学习模仿学习深度确定性策略梯度算法
英文关键词:automatic driving decisiondeep reinforcement learningimitation learningdeep deterministic policy gradient algorithm
基金项目:国家自然科学基金资助项目(51675077);中国博士后科学基金资助项目(2015M5813292017T100178).
作者单位
张明恒,吕新飞,万星,吴增文
摘要点击次数:343
全文下载次数:236
中文摘要:
优良的可靠性、学习效率和模型泛化能力是车辆自动驾驶系统研究的基本要求.基于深度强化学习理论框架提出了一种用于车辆自动驾驶决策的WGAIL-DDPG(λ)(Wasserstein generative adversarial nets deep deterministic policy gradient(λ))模型.其中,基于驾驶安全性、稳定性的车辆行驶性能要求,对强化学习模型中的奖励函数进行了针对性设计;通过引入模仿学习有效提升了强化学习过程中的学习效率;通过合理的增益调度器设计,保证了从模仿学习到强化学习的平稳过渡.实验结果表明,在稳定性上,智能体偏离道路中线的程度一直在30%内波动;在安全性上,智能体与周边其他车辆的安全距离基本保持在10 m以上;在模型泛化性方面,智能体在许多未训练过的复杂弯道也能很好地完成安全、平稳的驾驶任务;与原始DDPG(deep deterministic policy gradient)算法相比,该模型在学习速度上提升了约3.4倍,说明所提出的模型在保证自动驾驶系统可靠决策的同时有效提升了强化学习的效率,进一步实验证明其适用于不同的驾驶条件.
英文摘要:
Better reliability, learning efficiency and model generalization are essential for automatic vehicle driving system research. Therefore, a WGAIL-DDPG(λ)(Wasserstein generative adversarial nets deep deterministic policy gradient(λ)) model for automatic vehicle driving decision is proposed based on deep reinforcement learning theoretical framework. In which, the reward function of the reinforcement learning model is directionally designed based on the performance requirements of vehicle driving safety and stability. The learning efficiency is improved through a proposed imitation learning strategy, and a rational gain regulator is designed to smooth the transition from imitation to reinforcement phases. Test results show that in terms of stability, the degree of agent deviation from the road center line fluctuates within 30% all the time; in terms of safety, the distance from the agent to the target vehicles is maintained at more than 10 m; in the aspect of model generalization, the agent can complete the safe and stable driving task in many untrained complicated corners. Compared with the original DDPG (deep deterministic policy gradient) algorithm, the model improves the learning speed by about 3.4 times. The proposed model can ensure the automatic vehicle driving system make accurate decisions, and improve the training efficiency at the same time. Additionally, extended test also proves its good adaptability for different driving conditions.
查看全文查看/发表评论下载PDF阅读器
关闭
相关话题/

  • 领限时大额优惠券,享本站正版考研考试资料!
    大额优惠券
    优惠券领取后72小时内有效,10万种最新考研考试考证类电子打印资料任你选。涵盖全国500余所院校考研专业课、200多种职业资格考试、1100多种经典教材,产品类型包含电子书、题库、全套资料以及视频,无论您是考研复习、考证刷题,还是考前冲刺等,不同类型的产品可满足您学习上的不同需求。 ...
    本站小编 Free壹佰分学习网 2022-09-19