张明恒,吕新飞,万星,吴增文.基于WGAIL-DDPG(λ)的车辆自动驾驶决策模型[J].,2022,62(1):77-84 |
基于WGAIL-DDPG(λ)的车辆自动驾驶决策模型 |
Decision model for automatic vehicle driving based on WGAIL-DDPG(λ) |
|
DOI:10.7511/dllgxb202201010 |
中文关键词:自动驾驶决策深度强化学习模仿学习深度确定性策略梯度算法 |
英文关键词:automatic driving decisiondeep reinforcement learningimitation learningdeep deterministic policy gradient algorithm |
基金项目:国家自然科学基金资助项目(51675077);中国博士后科学基金资助项目(2015M5813292017T100178). |
|
摘要点击次数:343 |
全文下载次数:236 |
中文摘要: |
优良的可靠性、学习效率和模型泛化能力是车辆自动驾驶系统研究的基本要求.基于深度强化学习理论框架提出了一种用于车辆自动驾驶决策的WGAIL-DDPG(λ)(Wasserstein generative adversarial nets deep deterministic policy gradient(λ))模型.其中,基于驾驶安全性、稳定性的车辆行驶性能要求,对强化学习模型中的奖励函数进行了针对性设计;通过引入模仿学习有效提升了强化学习过程中的学习效率;通过合理的增益调度器设计,保证了从模仿学习到强化学习的平稳过渡.实验结果表明,在稳定性上,智能体偏离道路中线的程度一直在30%内波动;在安全性上,智能体与周边其他车辆的安全距离基本保持在10 m以上;在模型泛化性方面,智能体在许多未训练过的复杂弯道也能很好地完成安全、平稳的驾驶任务;与原始DDPG(deep deterministic policy gradient)算法相比,该模型在学习速度上提升了约3.4倍,说明所提出的模型在保证自动驾驶系统可靠决策的同时有效提升了强化学习的效率,进一步实验证明其适用于不同的驾驶条件. |
英文摘要: |
Better reliability, learning efficiency and model generalization are essential for automatic vehicle driving system research. Therefore, a WGAIL-DDPG(λ)(Wasserstein generative adversarial nets deep deterministic policy gradient(λ)) model for automatic vehicle driving decision is proposed based on deep reinforcement learning theoretical framework. In which, the reward function of the reinforcement learning model is directionally designed based on the performance requirements of vehicle driving safety and stability. The learning efficiency is improved through a proposed imitation learning strategy, and a rational gain regulator is designed to smooth the transition from imitation to reinforcement phases. Test results show that in terms of stability, the degree of agent deviation from the road center line fluctuates within 30% all the time; in terms of safety, the distance from the agent to the target vehicles is maintained at more than 10 m; in the aspect of model generalization, the agent can complete the safe and stable driving task in many untrained complicated corners. Compared with the original DDPG (deep deterministic policy gradient) algorithm, the model improves the learning speed by about 3.4 times. The proposed model can ensure the automatic vehicle driving system make accurate decisions, and improve the training efficiency at the same time. Additionally, extended test also proves its good adaptability for different driving conditions. |
查看全文查看/发表评论下载PDF阅读器 |
| --> 关闭 |