Nashville (GMT-5) Greenwich (GMT+1) Event
8:30 AM 14:30 PM Chairs’ opening remarks
8:45 AM 14:45 PM Al Bovik, UT Austin
9:30 AM 15:30 PM Zhengyou Zhang, Tencent
10:15 AM 16:15 PM Break 1
10:30 AM 16:30 PM Ira Kemelmacher-Shlizerman, U Washington
11:15 AM 17:15 PM Ming-Yu Liu, NVIDIA
12:00 PM 18:00 PM Break 2 & Poster Sessions
13:00 PM 19:00 PM Chuo-Ling Chang, Google
13:45 PM 19:45 PM Lexing Xie, ANU
14:30 PM 20:30 PM Break 3
14:45 PM 20:45 PM Catherine Qi Zhao, University of Minnesota
15:30 PM 21:30 PM Sergi Caelles, Google Research
16:15 PM 22:15 PM Break 4 & Poster Sessions
16:30 PM 22:30 PM Overview of the Challenge Competitions
16:45 PM 22:45 PM Track1: Challenge Winners’ Oral Presentation
17:00 PM 23:00 PM Track2: Challenge Winners’ Oral Presentation
17:15 PM 23:15 PM Panel Discussion
18:00 PM 00:00 AM Award ceremony and concluding remarks


CV/AI techniques are quickly taking the central role in driving this growth by creating video conferencing applications that deliver more natural, contextual, and relevant meeting experiences. For example, high-quality video matting and synthesis is crucial to the now-essential functionality of virtual background; gaze correction and gesture tracking can add to interactive user engagement; automatic color and light correction can improve the user’s visual appearance and self-image; and all those have to be backed up by high-efficacy video compression/transmission and efficient edge processing which can also benefit from AI advances nowadays. Those challenges have drawn increasing R&D attraction, e.g. NVIDIA recently released their fully accelerated platform for building video conferencing services with many advanced AI features: .
While we seem to already start embracing a mainstream adoption of AI-based video collaboration, we recognize that building the next-generation video conferencing system involves multi-fold interdisciplinary challenges, and face many technical gaps to close. Centered at this theme, this proposed workshop aims to provide the first comprehensive forum for CVPR researchers, to systematically discuss relevant techniques that we can contribute to as a community. Examples include but are not limited to:

  • Image display and quality enhancement for teleconferencing
  • Video compression and transmission for teleconferencing
  • Video object segmentation, matting and synthesis (for virtual background, etc.)
  • HCI (gesture recognition, head tracking, gaze tracking, etc.), AR and VR applications in video conferencing
  • Efficient video processing on the edge and IoT camera devices
  • Multi-modal information processing and fusion in video conferencing (audio transcription, image to text, video captioning, etc.)
  • Societal and Ethical Aspects: privacy intrusion & protection, attention engagement, fatigue avoidance, etc
  • Emerging Applications where video conferencing would be the cornerstone: remote education, telemedicine, etc.
... and many more interesting features.
We aim to collectively address this core question: what CV techniques are/will be ready for the next-generation video conference, and how will they fundamentally change the experience of remote work, education and more? We aim to bring together experts in interdisciplinary fields to discuss the recent advances along these topics and to explore new directions. As one of the expected workshop outcomes, we expect to generate a joint report defining the key CV problems, characterizing the technical demands and barriers, and discussing potential solutions or discussions.

Important Dates

Description Date
Paper Submission Deadline April 2, 2021 (11: 59 pm PDT)
Notification of Paper Acceptance April 16, 2021 (11: 59 pm PDT)
Paper Camera Ready April 18, 2021 (11: 59 pm PDT)

NOTE: All the papers should be prepared using CVPR submission format, the content should between 4 and 8 pages.