删除或更新信息,请邮件至freekaoyan#163.com(#换成@)

基于时-空特征的全卷积网络用于视频人眼关注预测的研究\r\n\t\t

本站小编 Free考研考试/2022-01-16

\r史久琛1,孙美君2,王 征2,张 冬\r3\r
\r
AuthorsHTML:\r史久琛1,孙美君2,王 征2,张 冬\r3\r
\r
AuthorsListE:\rShi Jiuchen 1,Sun Meijun 2,Wang Zheng2,Zhang Dong\r3\r
\r
AuthorsHTMLE:\rShi Jiuchen 1,Sun Meijun 2,Wang Zheng2,Zhang Dong\r3\r
\r
Unit:\r\r1. 上海交通大学电子信息与电气工程学院,上海 200240;\r
\r\r2. 天津大学智能与计算学部,天津 300072;
\r
\r3. 天津中医药大学中医药研究院,天津 300193\r
\r
Unit_EngLish:\r\r1. School of Electronic Information and Electrical Engineering,Shanghai Jiao Tong University,Shanghai 200240,China;\r
\r\r2. College of Intelligence and Computing,Tianjin University,Tianjin 300072,China;
\r
\r3. Institute of Chinese Medicine,Tianjin University of Traditional Chinese Medicine,Tianjin 300193,China\r
\r
Abstract_Chinese:\r视频人眼关注预测是在视频中标注能够吸引人眼关注的感兴趣显著区域,对于自动提取大量视频的语义信息有着重要的应用.该研究从目前显著性处理主流算法全卷积网络的局限性出发,提出了一种基于时间-空间特征的深度学习模型用于预测视频中的人眼关注区域.首先,采用全卷积网络提取视频帧图像的空间特征,光流方法用于提取相邻帧之间的时间运动特征,通过长短期记忆网络处理当前帧与其前6 帧的空间特征与时间特征,得到最终的人眼关注区域预测图.使用INB 和IVB 两个人眼关注视频数据库进行计算.实验结果表明,在地球移动距离、受试者工作特征曲线下面积、标准化扫描路径显著性、线性相关性等4 个性能评估标准分别取得了0.375 1、0.818 6、2.024 1、0.745 7 和0.413 7、0.785 6、1.964 5、0.734 9 的结果,预测性能优于5 种对比算法,表明本文方法在视频人眼关注预测上能够取得较准确的结果.\r
\r
Abstract_English:\rVideo eye fixation prediction is to mark the area of interest in the video which can attract the eyes attention. It is an important application for automatic extraction of semantic information of a large number of videos. Based on the limitation of the full convolutional network used now,this study proposes a deep learning model based on spatial-temporal features to predict the eye fixation in video. Firstly,the full convolutional network is used to extract the spatial features of video frame images. The optical flow method is used to extract temporal motion characteristics between adjacent frames,through the long short term memory network to deal with the current frame and the first six frames of the spatial and temporal features,the final eye fixation prediction map can be captured. INB and IVB video databases are used to evaluate the model. The experimental results show that the four performance evaluation criteria such as the earth mover’s distance,area under receiver operating characteristic,normalized scanpath saliency and linear correlation coefficient are respectively obtained,it is 0.375 1,0.818 6,2.024 1,0.745 7 and 0.413 7,0.785 6,1.964 5,0.734 9. And the prediction performance is better than the five contrastive algorithms,indicating that the proposed method can get more accurate results in predicting the video eye fixation.\r
\r
Keyword_Chinese:视频;人眼关注;时空特征;全卷积网络;光流;长短期记忆网络\r

Keywords_English:video;eye fixation;spatial-temporal feature;full convolutional network;flow optical;long short term memory\r


PDF全文下载地址:http://xbzrb.tju.edu.cn/#/digest?ArticleID=6326
相关话题/网络 卷积