王井东博士曾任IEEE TPAMI,IJCV, IEEE TMM和IEEE TCSVT等多个国际期刊的副编辑,以及CVPR, ICCV, ECCV, ACM MM, IJCAI 和 AAAI等多个计算机视觉、多媒体和人工智能领域国际顶级会议的区域主席。由于在计算机视觉领域所做的杰出贡献,他被选为国际计算机学会杰出会员(ACM Distinguished Member)、美国电气与电子工程师协会(IEEE Fellow),以及国际模式识别协会会士(IAPR Fellow)。代表性成果包括高分辨率深度神经网络(HRNet),判别区域特征融合(DRFI),近邻图搜索方法(NGS, SPTAG)等,在学术和产业界均有很高的贡献和影响力。其中,HRNet一经发布就在COCO数据集的关键点检测、姿态估计、多人姿态估计这三项任务里取得了令人瞩目的成绩,至今依然是此类任务中广泛应用的backbone网络。
王井东博士此次报告的主要内容是自监督表征学习中的基于图像掩码的新型模型,报告也将进一步围绕自监督表征学习中其他有价值的问题展开介绍。
报告题目:Context Autoencoder for Scalable Self-Supervised Representation Pretraining
主讲人:王井东百度AI 计算机视觉首席科学家
邀请人:胡迪 中国人民大学高瓴人工智能学院准聘助理教授
主讲人简介:Jingdong Wang is a Chief Scientist for computer vision with the Artificial Intelligence Group at Baidu. His team is focusing on conducting product-driven and cutting-edge computer vision/deep learning/AI research and developing practical computer vision applications. Before joining Baidu, he was a Senior Principal Researcher at Microsoft Research Asia. His areas of interest are computer vision, deep learning, and multimedia search. His representative works include deep high-resolution network (HRNet), discriminative regional feature integration (DRFI) for supervised saliency detection, neighborhood graph search (NGS, SPTAG) for large scale similarity search. He has been serving/served as an Associate Editor of IEEE TPAMI, IJCV, IEEE TMM, and IEEE TCSVT, and an area chair of leading conferences in vision, multimedia, and AI, such as CVPR, ICCV, ECCV, ACM MM, IJCAI, and AAAI. He was elected as an ACM Distinguished Member, a Fellow of IAPR, and a Fellow of IEEE, for his contributions to visual content understanding and retrieval.
报告摘要:Self-supervised representation pretraining aims to learn an encoder from unlabeled images, such that the encoded representations take on semantics and benefit downstream tasks. In this talk, I present a novel masked image modeling approach, context autoencoder (CAE), for scalable self-supervised representation training. The core ideas include that predictions are made in the latent representation space from visible patches to masked patches and that the encoder is only for representation learning and representation learning is only by the encoder. I also discuss why masked image modeling potentially outperforms contrastive pretraining (e.g., SimCLR, MoCo) and why contrastive learning performs on par with supervised pretraining on ImageNet. In addition, I show that linear probing and the extended version, attentive probing, are more suitable than fine-tuning on ImageNet for pretraining evaluation.
讲座时间:2022年3月18日(周五)15:40-17:00
腾讯会议:610-614-378