删除或更新信息,请邮件至freekaoyan#163.com(#换成@)

MSAM:针对视频问答的多阶段注意力模型

本站小编 Free考研考试/2024-10-07

作者:梁丽丽,刘昕雨,孙广路,朱素霞
Authors:LIANG Li-li,LIU Xin-yu,SUN Guang-lu,ZHU Su-xia摘要:摘要:视频问答(VideoQA)任务需要理解视频和问题中的语义信息生成答案。目前,基于注意力模型的VideoQA方法很难完全理解和准确定位与问题相关的视频信息。为解决上述问题,提出一种基于注意力机制的多阶段注意力模型网络(MSAMN)。该网络将视频、音频以及文本等多模态特征输入到多阶段注意力模型(MSAM)中,通过逐阶段的定位方式精准找到与回答问题相关的视频信息,用于答案生成。为了提高特征融合的有效性,提出一种三模态压缩级联双线性(TCCB)算法计算不同模态特征之间的相关性。MASMN在ZJL数据集上进行实验,平均准确率均为54.3%,比传统方法提高了近15%,比现有方法提高了近7%。
Abstract:Abstract:The video question answering (VideoQA) task requires understanding of semantic information of both the video and question to generate the answer.At present, it is difficult for VideoQA methods that are based on attention model to fully understand and accurately locate video information related to the question.To solve this problem, a multi-stage attention model network (MSAMN) is proposed. This network extracts multi-modal features such as video, audio and text and feeds these features into the multi-stage attention model (MSAM), which is able to accurately locate the video information through a stage-by-stage localization method.In order to improve the effectiveness of feature fusion, a triplemodal compact concat bilinear (TCCB) algorithm is proposed to calculate the correlation between different modal features.This network is tested on the ZJL dataset.The average accuracy rate is 54.3%, which is nearly 15% higher than the traditional method and nearly 7% higher than the exist method.

PDF全文下载地址:

可免费Download/下载PDF全文
相关话题/

  • 领限时大额优惠券,享本站正版考研考试资料!
    大额优惠券
    优惠券领取后72小时内有效,10万种最新考研考试考证类电子打印资料任你选。涵盖全国500余所院校考研专业课、200多种职业资格考试、1100多种经典教材,产品类型包含电子书、题库、全套资料以及视频,无论您是考研复习、考证刷题,还是考前冲刺等,不同类型的产品可满足您学习上的不同需求。 ...
    本站小编 Free壹佰分学习网 2022-09-19