Kun Li 李坤
中文版Chinese Version


Associate Professor

PHD Advisor

College of Intelligence and Computing,
Tianjin University (Peiyang University) , Tianjin 300350, China

Email: lik@tju.edu.cn


Recruit Excellent Postdoctoral Fellows in the Field of Human 3D Reconstruction

Selected Publications

High-Fidelity Human Avatars from a Single RGB Camera

CVPR, 2022 Code

We propose a coarse-to-fine framework to reconstruct a personalized high-fidelity human avatar from a monocular video. Our framework also enables photo-realistic novel view/pose synthesis and shape editing applications.

STATE: Learning Structure and Texture Representations for Novel View Synthesis

CVM, 2022 Code

We propose STATE, an end-to-end deep neural network, for sparse view synthesis by learning STructure And TExture representations. Our method also enables texture and structure editing applications benefitting from implicit disentanglement of structures and textures.

High Quality Rendered Dataset and Non-local Graph Convolutional Network for Intrinsic Image Decomposition

Journal of Image and Graphics (Chinese), 2022 Dataset

We propose an intrinsic decomposition framework and a new photorealistic rendered dataset for intrinsic image decomposition, which is rendered by leveraging large-scale 3D indoor scene models, along with high-quality textures and lighting to simulate the real-world environment. The chromatic shading components are first implemented.

Implicit Transformer Network for Screen Content Image Continuous Super-Resolution

NIPS, 2021 Code

We propose a novel Implicit Transformer Super-Resolution Network (ITSRN) for screen content image super-resolution at arbitrary scales. We also construct a benchmark dataset with various screen contents.

Geometry-guided Dense Perspective Network for Speech-Driven Facial Animation

IEEE TVCG, 2021 Code

Realistic speech-driven 3D facial animation is a challenging problem due to the complex relationship between speech and face. In this paper, we propose a deep architecture, called Geometry-guided Dense Perspective Network (GDPnet), to achieve speaker-independent realistic 3D facial animation.

Image-Guided Human Reconstruction via Multi-Scale Graph Transformation Networks

IEEE TIP, 2021 Code Dataset

To reconstruct topology-consistent deformed human models, this paper proposes a novel deep learning framework with cascaded multi-scale graph transformation networks. D2Human (Dynamic Detailed Human) dataset is also presented and provided.

Deep Social Grouping Network for Large Scenes with Multiple Subjects

SCIENTIA SINICA Informationis (Chinese), 2021 Code

This paper proposes a fine-grained social grouping framework for gigapixel large scene images based on deep learning.

PISE: Person Image Synthesis and Editing with Decoupled GAN

CVPR, 2021 Code

This paper proposes a novel two-stage generative model for Person Image Synthesis and Editing, which is able to generate realistic person images with desired poses, textures, or semantic layouts.

Cross-MPI: Cross-scale Stereo for Image Super-Resolution using Multiplane Images

CVPR, 2021 Code

This paper proposes an end-to-end reference-based super-resolution network composed of a novel planeaware attention-based MPI mechanism, a multiscale guided upsampling module as well as a super-resolution synthesisand fusion module.

GPS-Net: Graph-based Photometric Stereo Network

NIPS, 2020 Code

This paper proposes a Graph-based Photometric Stereo Network, which unifies per-pixel and all-pixel processings to explore both inter-image and intra-image information.

PoNA: Pose-guided Non-local Attention for Human Pose Transfer

IEEE TIP, 2020 Code

This paper proposes a new human pose transfer method using a generative adversarial network (GAN) with simplified cascaded blocks. Furthermore, our generated images can help to alleviate data insufficiency for person re-identification.

Human Pose Transfer by Adaptive Hierarchical Deformation

Computer Graphics Forum, 2020 (PG2020) Code

This paper proposes an adaptive human pose transfer network with two hierarchical deformation levels. Our model has very few parameters and is fast to converge. Furthermore, our method can be applied to clothing texture transfer.

Learning to Reconstruct and Understand Indoor Scenes from Sparse Views

IEEE TIP, 2020 Code Dataset

This paper proposes a new method for simultaneous 3D reconstruction and semantic segmentation for indoor scenes. Our method only need a small number of (eg, 3-5) color images from uncalibrated sparse views, which significantly simplifies data acquisition and broadens applicable scenarios. We also make available a new indoor synthetic dataset, containing photorealistic high-resolution RGB images, accurate depth maps and pixel-level semantic labels for thousands of complex layouts.

4D Association Graph for Realtime Multi-person Motion Capture Using Multiple Video Cameras Code Dataset

CVPR, 2020

This paper contributes a novel realtime multi-person motion capture algorithm using multiview video inputs.

Full-Body Motion Capture for Multiple Closely Interacting Persons

Graphical Models, 2020

In this paper, we present a fully automatic and fast method to capture the total human performance including body poses, facial expression, hand gestures, and feet orientations for closely interacting multiple persons.
Discern Depth Under Foul Weather: Estimate PM2.5 for Depth Inference
IEEE Transactions on Industrial Informatics, 2020 Code Dataset
We propose an image-based method for PM2.5 estimation and a depth estimation method by capturing a single color image.
Generating 3D Faces using Multi-column Graph Convolutional Networks
Computer Graphics Forum, 2019 (PG2019) Code
In this work, we introduce multi-column graph convolutional networks (MGCNs), a deep generative model for 3D mesh surfaces that effectively learns a non-linear facial representation. Moreover, with the help of variational inference, our model has excellent generating ability
CDnet: CNN-Based Cloud Detection for Remote Sensing Imagery
IEEE Transactions on Geoscience and Remote Sensing, 2019
Cloud detection is one of the important tasks for remote sensing image (RSI) preprocessing. In this paper, we utilize the thumbnail (i.e., preview image) of RSI, which contains the information of original multispectral or panchromatic imagery, to extract cloud mask efficiently. We also propose a cloud detection neural network (CDnet) with an encoder-decoder structure, a feature pyramid module (FPM), and a boundary refinement (BR) block. 
3D Face Reprentation and Reconstruction with Multi-scale Graph Convolutional Autoencoders
We propose a multi-scale graph convolutional autoencoder for face representation and reconstruction. Our autoencoder uses graph convolution, which is easily trained for the data with graph structures and can be used for other deformable models. Our model can also be used for variational training to generate high quality face shapes.
Global As-Conformal-As-Possible Non-Rigid Registration of Multi-View Scans
IEEE ICME, 2019 Code
We present a novel framework for global non-rigid registration of multi-view scans captured using consumer-level depth cameras. All scans from different viewpoints are allowed to undergo large non-rigid deformations and finally fused into a complete high quality model.
Global 3D Non-Rigid Registration of Deformable Objects Using a Single RGB-D Camera
IEEE TIP, 2019
We present a novel global non-rigid registration method for dynamic 3D objects. Our method allows objects to undergo large non-rigid deformations, and achieves high quality results even with substantial pose change or camera motion between views. In addition, our method does not require a template prior and uses less raw data than tracking based methods since only a sparse set of scans is needed.
Robust Non-Rigid Registration with Reweighted Position and Transformation Sparsity
IEEE TVCG, 2019 Won in the SHREC 2019 Contest
We propose a robust non-rigid registration method using reweighted sparsities on position and transformation to estimate the deformations between 3-D shapes.
Spatio-Temporal Reconstruction for 3D Motion Recovery
We address the challenge of 3D motion recovery by exploiting the spatio--temporal correlations of corrupted 3D skeleton sequences.
Tensor Completion From Structurally-Missing Entries by Low-TT-rankness and Fiber-wise Sparsity
JSTSP 2018
Most tensor completion methods assume that missing entries are randomly distributed in incomplete tensors, but this could be violated in practical applications where missing entries are not only randomly but also structurally distributed. To remedy this, we propose a novel tensor completion method equipped with double priors on the latent tensor, named tensor completion from structurally-missing entries by low tensor train (TT) rankness and fiber-wise sparsity.
Shape and Pose Estimation for Closely Interacting Persons Using Multi-view Images
Computer Graphics Forum, 2018 (PG2018)
We propose a fully-automatic markerless motion capture method to simultaneously estimate 3D poses and shapes of closely interacting people from multi-view sequences.
Intrinsic Image Decomposition With Sparse and Non-local Priors
ICME, 2017 Code World’s FIRST 10K Best Paper Award – Platinum
We propose a new intrinsic image decomposition method that decomposing a single RGB-D image into reflectance and shading components.
SPA: Sparse Photorealistic Animation Using a Single RGB-D Camera
We propose a marker-less performance capture method using sparse deformation to obtain the geometry and pose of the actor for each time instance in the database.
Video Super-resolution Using an Adaptived Superpixel-guided Auto-Regeressive Model
Pattern Recognition, 2016 Code
We propose a video super-resolution method based on an adaptive superpixel-guided auto-regressive (AR) model.
Foreground-Background Separation From Video Clips via Motion-assisted Matrix Restoration
We propose a motion-assisted matrix restoration (MAMR) model for foreground-background separation from video clips.
Non-Rigid Structure from Motion via Sparse Representation
IEEE Transactions on Cybernetics, 2015
We propose a new approach for non-rigid structure from motion with occlusion, based on sparse representation.
Graph-based Segmentation for RGB-D Data Using 3-D Geometry Enhanced Superpixels
IEEE Transactions on Cybernetics, 2015
We propose a two-stage segmentation method for RGB-D data: 1) oversegmentation by 3-D geometry enhanced superpixels; and 2) graph-based merging with label cost from superpixels.
Color-Guided Depth Recovery From RGB-D Data Using an Adaptive Autoregressive Model
ECCV, 2012/IEEE TIP, 2014 Code
We propose an adaptive color-guided autoregressive (AR) model for high quality depth recovery from low quality measurements captured by depth cameras.
Temporal-Dense Dynamic 3D Reconstruction with Low Frame Rate Cameras
We propose a new method for temporal-densely capturing and reconstructing dynamic scenes with low frame rate cameras, which consists of spatio-temporal sampling, spatio-temporal interpolation, and spatio-temporal fusion.
Three-Dimensional Motion Estimation via Matrix Completion
We propose a new 3D motion estimation method based on matrix completion.
Markerless Shape and Motion Capture from Multi-view Video Sequences
We propose a new method for temporal-densely capturing and reconstructing dynamic scenes with low frame rate cameras, which consists of spatio-temporal sampling, spatio-temporal interpolation, and spatio-temporal fusion.
Multi-Camera and Multi-Lighting Dome
We construct a dome to record the geometry, texture and motion of human actors in a dedicated multiple-camera studio with controlled lighting and a chromakey background. The diameter of the dome is 6 meters which provides enough space for character perform. 40 PointGrey flea2 cameras are ring-shape arranged on the dome and 320 LEDs are evenly spaced on the hemisphere of the dome.

College of Intelligence and Computing
Tianjin University (Peiyang University)