FVC

Schedule

Invited Talks (YouTube Recordings)

Speaker	Topic
Zhengyou Zhang, Tencent	Transcending Space Through Immersive Telecommunications [Youtube]
Ira Kemelmacher-Shlizerman, U Washington	Future of Communication [Youtube] [Bilibili]
Ming-Yu Liu, NVIDIA	Face-VID2VID: Neural Talking Head Synthesis For Video Conf [Youtube]
Chuo-Ling Chang & Tingbo Hou, Google	Cross-Platform ML for Video Conf with MediaPipe [Youtube] [Bilibili]
Catherine Qi Zhao, U Minnesota	Attention in AI Tasks [Youtube] [Bilibili]
Sergi Caelles, Google Research	Video Object Segmentation for Video Conferencing [Youtube] [Bilibili]
Lexing Xie, ANU	Image Captioning with Knowledge and Style [Youtube] [Bilibili]

Session 1 Challenge Results (Live Zoom Link)

US Western	US Eastern	UK	Beijing	Speaker	Topic
6:20 - 6:30	9:20 - 9:30	14:20 - 14:30	21:20 - 21:30	Chairs	Opening remarks
6:30 - 6:45	9:30 - 9:45	14:30 - 14:45	21:30 - 21:45	Alibaba DAMO Academy	Track1: Challenge Winner
6:45 - 7:00	9:45 - 10:00	14:45 - 15:00	21:45 - 22:00	Bytedance	Track2: Challenge Winner

Session 2 Invited Speakers Q&A and Panel (Live Zoom Link)

US Western	US Eastern	UK	Beijing	Participants	Topic
7:00 - 9:00	10:00 - 12:00	15:00 - 17:00	22:00 - 24:00	Invited Speakers & Chairs	Q&A and Panel Discussion

Introduction

CV/AI techniques are quickly taking the central role in driving this growth by creating video conferencing applications that deliver more natural, contextual, and relevant meeting experiences. For example, high-quality video matting and synthesis is crucial to the now-essential functionality of virtual background; gaze correction and gesture tracking can add to interactive user engagement; automatic color and light correction can improve the user’s visual appearance and self-image; and all those have to be backed up by high-efficacy video compression/transmission and efficient edge processing which can also benefit from AI advances nowadays. Those challenges have drawn increasing R&D attraction, e.g. NVIDIA recently released their fully accelerated platform for building video conferencing services with many advanced AI features: https://developer.nvidia.com/maxine .
While we seem to already start embracing a mainstream adoption of AI-based video collaboration, we recognize that building the next-generation video conferencing system involves multi-fold interdisciplinary challenges, and face many technical gaps to close. Centered at this theme, this proposed workshop aims to provide the first comprehensive forum for CVPR researchers, to systematically discuss relevant techniques that we can contribute to as a community. Examples include but are not limited to:

Image display and quality enhancement for teleconferencing
Video compression and transmission for teleconferencing
Video object segmentation, matting and synthesis (for virtual background, etc.)
HCI (gesture recognition, head tracking, gaze tracking, etc.), AR and VR applications in video conferencing
Efficient video processing on the edge and IoT camera devices
Multi-modal information processing and fusion in video conferencing (audio transcription, image to text, video captioning, etc.)
Societal and Ethical Aspects: privacy intrusion & protection, attention engagement, fatigue avoidance, etc
Emerging Applications where video conferencing would be the cornerstone: remote education, telemedicine, etc.

... and many more interesting features.
We aim to collectively address this core question: what CV techniques are/will be ready for the next-generation video conference, and how will they fundamentally change the experience of remote work, education and more? We aim to bring together experts in interdisciplinary fields to discuss the recent advances along these topics and to explore new directions. As one of the expected workshop outcomes, we expect to generate a joint report defining the key CV problems, characterizing the technical demands and barriers, and discussing potential solutions or discussions.