All datasets are well prepared. We will employ CodaLab to host these two tracks.

Track1

Human-centric video matting

This challenge aims to perform efficient and accurate portrait matting in videos, which can be applied to real video conferencing scenarios such as setting virtual background. To this end, we collect 320 videos from real-world video conferencing scenarios using VooV Meeting and annotate the alpha matte at 5~6 fps. This portrait matting dataset covers a wide range of scenarios from both indoor scenes (e.g., office, bedroom) and outdoor scenes (e.g., park, street). Similar to image matting, several popular metrics such as MSE and SAD are used to evaluate the performance, and MSE is used as the key metric for ranking. The entire dataset is divided into training (220), validation (50) and testing (50) sets. The dataset can be downloaded from both GoogleDrive and Baidu Cloud(extract code: f1o7). More details can be found at https://competitions.codalab.org/competitions/30523

Split	Clips	Annotated Frames
Train	220	19,449
Validation	50	4,377
Test	50	4,183

Result

	Team	Participant	Video
First Place	Alibaba DAMO Academy	Jian Cao, Ke Sun, Li Hu, Peng Zhang, Bang Zhang, Pan Pan	[Youtube]
Second Place	Available soon	Available soon	[Youtube]
Third Place	ZTE	Chengjian Zheng, Xiaofeng Zhang, Shaoli Liu, Bofei Wang, Diankai Zhang, Kaidi Lu, Tianyu Xu, Biao Wu, Ning Wang	[Youtube]

Track2

Human-centric video coding for action analysis

This challenge aims to develop new image/video pre-editing methods for human-centric frame reconstruction jointly with the related analytics, which provides the technical basis for highly efficient video compression and transmission for conferencing scenarios. To this end, PKU-MMD is adopted as an evaluation dataset. The dataset features human-centric videos, captured in indoor scenes. In this challenge, we select the video sequences of 10 action categories for training and testing. In the evaluation, we will calculate two kinds of metrics on the decoded results of the pre-editing videos: human region frame reconstruction error and human analytics accuracy. For the former, for a frame, we will calculate PSNR and SSIM results of the regions in the bounding boxes that tightly enclose persons. For the latter, we will evaluate human analytics performances using state-of-the-art methods on the reconstructed videos.

We have released the training and validation splits of the dataset, both of which can be downloaded from Google Drive and Microsoft OneDrive. All videos and images are 512x512 in resolutions. Properties of all splits are summarized below:

Split	Clips	Frames	Codec
Train	1,128	109,624	HEVC QP=1
Validation	125	13,411	HEVC QP=1
Test	176	16,861	JPEG QF=2

The training and validation set are encoded with x265. We set QP=1 when we encode the frames to video. Such settings guarantee all videos are visually losslessly encoded. The testing split is provided in frames. We set QF=2 to guarantee visual losslessness. Participants are required to submit exactly the same number of frames, either in JPEG or PNG, as the testing set provides. The competitors are allowed to pre-edit video frames. Then, we will compress these frames with the same bit-rate constraint with existing codecs, e.g. HEVC, and calculate the evaluation metrics on the decoded frames. More details can be found at https://competitions.codalab.org/competitions/30111

Split	Clips	Annotated Frames
Train	220	19,449
Validation	50	4,377
Test	50	4,183

Result

	Team	Participant	Video
First Place	Bytedance Seg	Yixing Zhu, Ziyang Wu, Cairong Zhang, Jiyang Liu, Jiabin Huang, Yitong Wang	[Youtube]
Second Place	DWH-PKU	Zhimeng Huang, Yuefeng Zhang, Chuanmin Jia	[Youtube]
Third Place	Available soon	Available soon	[Youtube]