Creating high-fidelity human avatars is a significant topic in metaverse, AR and VR applications. To achieve this, various solid algorithms are necessary, including 3D human body and pose reconstruction, generation, and realistic rendering. During in Microsoft Research Asia, PI contributed to multiple projects focused on these topics (Gao et al., 2022), (Kim et al., 2023), (Kim et al., 2015).
Human pose tracking of sign language videos. Accurate human body and hand pose tracking with weakly synchronized multi-view videos.
Dance motion generation conditioned on various music sequences.
MPS-NeRF. Model & pose agnostic NeRF-based human rendering.
MPS-NeRF demo videos
References
2023
-
MNET++: Music-Driven Pluralistic Dancing Toward Multiple Dance Genre Synthesis
Jinwoo Kim, Beom Kwon, Jongyoo Kim, and Sanghoon Lee
IEEE Transactions on Pattern Analysis and Machine Intelligence, Dec 2023
Numerous task-specific variants of autoregressive networks have been developed for dance generation. Nonetheless, a severe limitation remains in that all existing algorithms can return repeated patterns for a given initial pose, which may be inferior. We examine and analyze several key challenges of previous works, and propose variations in both model architecture (namely MNET++) and training methods to address these. In particular, we devise the beat synchronizer and dance synthesizer. First, generated dance should be locally and globally consistent with given music beats, circumvent repetitive patterns, and look realistic. To achieve this, the beat synchronizer implicitly catches the rhythm enabling it to stay in sync with the music as it dances. Then, the dance synthesizer infers the dance motions in a seamless patch-by-patch manner conditioned by music. Second, to generate diverse dance lines, adversarial learning is performed by leveraging the transformer architecture. Furthermore, MNET++ learns a dance genre-aware latent representation that is scalable for multiple domains to provide fine-grained user control according to the dance genre. Compared with the state-of-the-art methods, our method synthesizes plausible and diverse outputs according to multiple dance genres as well as generates remarkable dance sequences qualitatively and quantitatively.
2022
-
MPS-NeRF: Generalizable 3D Human Rendering From Multiview Images
Xiangjun Gao, Jiaolong Yang, Jongyoo Kim, Sida Peng, Zicheng Liu, and Xin Tong
IEEE Transactions on Pattern Analysis and Machine Intelligence, Dec 2022
There has been rapid progress recently on 3D human rendering, including novel view synthesis and pose animation, based on the advances of neural radiance fields (NeRF). However, most existing methods focus on person-specific training and their training typically requires multi-view videos. This paper deals with a new challenging task – rendering novel views and novel poses for a person unseen in training, using only multiview still images as input without videos. For this task, we propose a simple yet surprisingly effective method to train a generalizable NeRF with multiview images as conditional input. The key ingredient is a dedicated representation combining a canonical NeRF and a volume deformation scheme. Using a canonical space enables our method to learn shared properties of human and easily generalize to different people. Volume deformation is used to connect the canonical space with input and target images and query image features for radiance and density prediction. We leverage the parametric 3D human model fitted on the input images to derive the deformation, which works quite well in practice when combined with our canonical NeRF. The experiments on both real and synthetic data with the novel view synthesis and pose animation tasks collectively demonstrate the efficacy of our method.
2015
-
Implementation of an Omnidirectional Human Motion Capture System Using Multiple Kinect Sensors
Junghwan Kim, Inwoong Lee, Jongyoo Kim, and Sanghoon Lee
IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, Dec 2015