Publications

Publications

Model-Based Deep Portrait Relighting

F.-D. Schreiber, A. Hilsmann, P. Eisert

European Conference Visual Media Production (CVMP)

Publication year: 2022

Abstract

Like most computer vision problems the relighting of portrait face images is more and more being entirely formulated as a deep learning problem. However, data-driven approaches need a detailed and exhaustive database to work on and the creation of ground truth data is tedious and oftentimes technically complex. At the same time, networks get bigger and deeper. Knowledge about the problem statement, scene structure, and physical laws are often neglected. In this paper, we propose to encompass prior knowledge for relighting directly in the network learning process, adding model-based building blocks to the training. Thereby, we improve the learning speed and effectiveness of the network, thus performing better even with a restricted dataset. We demonstrate through an ablation study that the proposed model-based building blocks improve the network’s training and enhance the generated images compared with the naive approach.

Share

Recovering Fine Details for Neural Implicit Surface Reconstruction

D. Chen, P. Zhang, I. Feldmann, O. Schreer & P. Eisert

arXiv

Publication year: 2022

Abstract

Recent works on implicit neural representations have made significant strides. Learning implicit neural surfaces using volume rendering has gained popularity in multiview reconstruction without 3D supervision. However, accurately recovering fine details is still challenging, due to the underlying ambiguity of geometry and appearance representation. In this paper, we present D-NeuS, a volume rendering-base neural implicit surface reconstruction method capable to recover fine geometry details, which extends NeuS by two additional loss functions targeting enhanced reconstruction quality. First, we encourage the rendered surface points from alpha compositing to have zero signed distance values, alleviating the geometry bias arising from transforming SDF to density for volume rendering. Second, we impose multi-view feature consistency on the surface points, derived by interpolating SDF zerocrossings from sampled points along rays. Extensive quantitative and qualitative results demonstrate that our method reconstructs high-accuracy surfaces with details, and outperforms the state of the art.

Share

Imposing temporal consistency on deep monocular body shape and pose estimation

A. Zimmer, A. Hilsmann, W. Morgenstern, P. Eisert

Computational Visual Media

Publication year: 2023

Abstract

Accurate and temporally consistent modeling of human bodies is essential for a wide range of applications, including character animation, understanding human social behavior, and AR/VR interfaces. Capturing human motion accurately from a monocular image sequence remains challenging; modeling quality is strongly influenced by temporal consistency of the captured body motion. Our work presents an elegant solution to integrating temporal constraints during fitting. This increases both temporal consistency and robustness during optimization. In detail, we derive parameters of a sequence of body models, representing shape and motion of a person. We optimize these parameters over the complete image sequence, fitting a single consistent body shape while imposing temporal consistency on the body motion, assuming body joint trajectories to be linear over short time. Our approach enables the derivation of realistic 3D body models from image sequences, including jaw pose, facial expression, and articulated hands. Our experiments show that our approach accurately estimates body shape and motion, even for challenging movements and poses. Further, we apply it to the particular application of sign language analysis, where accurate and temporally consistent motion modelling is essential, and show that the approach is well-suited to this kind of application.

Share

Temporal Shape Transfer Network for 3D Human Motion

J. Regateiro and E. Boyer

International Conference on 3D Vision 2022 (3DV 2022)

Publication year: 2022

Abstract

Accurate and temporally consistent modeling of human bodies is essential for a wide range of applications, including character animation, understanding human social behavior, and AR/VR interfaces. Capturing human motion accurately from a monocular image sequence remains challenging; modeling quality is strongly influenced by temporal consistency of the captured body motion. Our work presents an elegant solution to integrating temporal constraints during fitting. This increases both temporal consistency and robustness during optimization. In detail, we derive parameters of a sequence of body models, representing shape and motion of a person. We optimize these parameters over the complete image sequence, fitting a single consistent body shape while imposing temporal consistency on the body motion, assuming body joint trajectories to be linear over short time. Our approach enables the derivation of realistic 3D body models from image sequences, including jaw pose, facial expression, and articulated hands. Our experiments show that our approach accurately estimates body shape and motion, even for challenging movements and poses. Further, we apply it to the particular application of sign language analysis, where accurate and temporally consistent motion modelling is essential, and show that the approach is well-suited to this kind of application.

 

 

This paper presents a learning-based approach to perform human shape transfer between an arbitrary 3D identity mesh and a temporal motion sequence of 3D meshes. Recent approaches tackle the human shape and pose transfer on a per-frame basis and do not yet consider the valuable information about the motion dynamics, e.g., body or clothing dynamics, inherently present in motion sequences. Recent datasets provide such sequences of 3D meshes, and this work investigates how to leverage the associated intrinsic temporal features in order to improve learning-based approaches on human shape transfer. These features are expected to help preserve temporal motion and identity consistency over motion sequences. To this aim, we introduce a new network architecture that takes as input successive 3D mesh frames in a motion sequence and which decoder is conditioned on the target shape identity. Training losses are designed to enforce temporal consistency between poses as well as shape preservation over the input frames. Experiments demonstrate substantially qualitative and quantitative improvements in using temporal features compared to optimization-based and recent learning-based methods.

Share

Detailed Eye Region Capture and Animation

G. Kerbiriou, Q. Avril , F.  Danieau and M. Marchal

ACM SIGGRAPH / Eurographics Symposium on Computer Animation

Publication year: 2022

Abstract

Even if the appearance and geometry of the human eye have been extensively studied during the last decade, the geometrical correlation between gaze direction, eyelids aperture and eyelids shape has not been empirically modeled. In this paper, we propose a data-driven approach for capturing and modeling the subtle features of the human eye region, such as the inner eye corner and the skin bulging effect due to globe orientation. Our approach consists of an original experimental setup to capture the eye region geometry variations combined with a 3D reconstruction method. Regarding the eye region capture, we scanned 55 participants doing 36 eyes poses. To animate a participant’s eye region, we register the different poses to a vertex wise correspondence before blending them in a trilinear fashion. We show that our 3D animation results are visually pleasant and realistic while bringing novel eye features compared to state of the art models.

Share

Multi-View Mesh Reconstruction with Neural Deferred Shading

M. Worchel, R. Diaz, W. Hu, O. Schreer, I. Feldmann, P. Eisert

CVPR 2022

Publication year: 2022

Abstract

We propose an analysis-by-synthesis method for fast multi-view 3D reconstruction of opaque objects with arbitrary materials and illumination. State-of-the-art methods use both neural surface representations and neural rendering. While flexible, neural surface representations are a significant bottleneck in optimization runtime. Instead, we represent surfaces as triangle meshes and build a differentiable rendering pipeline around triangle rasterization and neural shading. The renderer is used in a gradient descent optimization where both a triangle mesh and a neural shader are jointly optimized to reproduce the multi-view images. We evaluate our method on a public 3D reconstruction dataset and show that it can match the reconstruction accuracy of traditional baselines and neural approaches while surpassing them in optimization runtime. Additionally, we investigate the shader and find that it learns an interpretable representation of appearance, enabling applications such as 3D material editing.

Share

Study on Automatic 3D Facial Caricaturization: From Rules to Deep Learning

N. Olivier, G. Kerbiriou, F. Arguelaguet, Q. Avril, F. Danieau, P. Guillotel, L. Hoyet, F. Multon

Frontiers in VR Journal

Publication year: 2022

Abstract

Facial caricature is the art of drawing faces in an exaggerated way to convey emotions such as humor or sarcasm. Automatic caricaturization has been explored both in the 2D and 3D domain. In this paper, we propose two novel approaches to automatically caricaturize input facial scans, filling gaps in the literature in terms of user-control, caricature style transfer, and exploring the use of deep learning for 3D mesh caricaturization. The first approach is a gradient-based differential deformation approach with data driven stylization. It is a combination of two deformation processes: facial curvature and proportions exaggeration. The second approach is a GAN for unpaired face-scan-to-3D-caricature translation. We leverage existing facial and caricature datasets, along with recent domain-to-domain translation methods and 3D convolutional operators, to learn to caricaturize 3D facial scans in an unsupervised way. To evaluate and compare these two novel approaches with the state of the art, we conducted the first user study of facial mesh caricaturization techniques, with 49 participants. It highlights the subjectivity of the caricature perception and the complementarity of the methods. Finally, we provide insights for automatically generating caricaturized 3D facial mesh.

Share

Volograms & V-SENSE Volumetric Video Dataset

R. Pagés, K. Amplianitis, J. Ondrej, E. Zerman and A.  Smolic

Preprint

Publication year: 2022

Abstract

Volumetric video is a new form of visual media that enables novel ways of immersive visualisation and interaction. Currently volumetric video technologies receive a lot of attention in research and standardization, leading to an increasing need for related test data. This paper describes the Volograms & V-SENSE Volumetric Video Dataset which is made publicly available to help said research and standardisation efforts.

Share

Neural Human Deformation Transfer

J. Basset,  A. Boukhayma,  S. Wuhrer,  F.  Multon,  E. Boyer       

2021 International Conference on 3D Vision (3DV)

Publication year: 2021

Abstract

We consider the problem of human deformation transfer, where the goal is to retarget poses between different characters. Traditional methods that tackle this problem assume a human pose model to be available and transfer poses between characters using this model. In this work, we take a different approach and transform the identity of a character into a new identity without modifying the character’s pose. This offers the advantage of not having to define equivalences between 3D human poses, which is not straightforward as poses tend to change depending on the identity of the character performing them, and as their meaning is highly contextual. To achieve the deformation transfer, we propose a neural encoder-decoder architecture where only identity information is encoded and where the decoder is conditioned on the pose. We use pose independent representations, such as isometry invariant shape characteristics, to represent identity features. Our model uses these features to supervise the prediction of offsets from the deformed pose to the result of the transfer. We show experimentally that our method outperforms state-of-the-art methods both quantitatively and qualitatively, and generalises better to poses not seen during training. We also introduce a fine-tuning step that allows to obtain competitive results for extreme identities, and allows to transfer simple clothing.  

 

Share

Preserving Memories of Contemporary Witnesses Using Volumetric Video

O. Schreer, M. Worchel, R. Diaz, S. Renault, W. Morgenstern, I. Feldmann, M. Zepp, A. Hilsmann, P. Eisert

ACM Conference Culture and Computer Science, Physical und Virtual Spaces

Publication year: 2021

Abstract

Volumetric Video is a novel technology that enables the creation of dynamic 3D models of persons, which can then be integrated in any 3D environment. In contrast to classical character animation, volumetric video is authentic and much more realistic and therefore ideal for the transfer of emotions, facial expressions and gestures, which is highly relevant in the context of preservation of contemporary witnesses and survivors of the Holocaust. Fraunhofer Heinrich-Hertz-Institute (HHI) is working on two projects in this cultural heritage context. In a recent project between UFA and Fraunhofer HHI, a VR documentary about the last German survivor of the Holocaust Ernst Grube has been produced. A second project started in collaboration with the University Munich, faculty of languages and literature and Geschwister-Scholl-institute for political science, creating a concept for a VR experience together with Dr. Eva Umlauf, the youngest Jewish survivor in the concentration camp in Auschwitz. This paper presents key aspects of volumetric video and details about both projects including a discussion about the user perspective in such a VR experience.

Share

Example-Based Facial Animation of Virtual Reality Avatars using Auto-Regressive Neural Networks

Wolfgang Paier, Anna Hilsmann, Peter Eisert

IEEE Computer Graphics and Applications

Publication year: 2021

Abstract

This paper presents a hybrid animation approach that combines example-based and neural animation methods to create a simple, yet powerful animation regime for human faces. Example-based methods usually employ a database of pre-recorded sequences that are concatenated or looped in order to synthesize novel animations. In contrast to this traditional example-based approach, we introduce a light-weight auto-regressive network to transform our animation-database into a parametric model. During training, our network learns the dynamics of facial expressions, which enables the replay of annotated sequences from our animation database as well as their seamless concatenation in new order. This representation is especially useful for the synthesis of visual speech, where co-articulation creates inter-dependencies between adjacent visemes, which affects their appearance. Instead of creating an exhaustive database that contains all viseme variants, we use our animation-network to predict the correct appearance. This allows realistic synthesis of novel facial animation sequences like visual-speech but also general facial expressions in an example-based manner.

Share

Neural Face Models for Example-Based Visual Speech Synthesis

Wolfgang Paier, Anna Hilsmann, Peter Eisert

CVMP ’20: European Conference on Visual Media Production

Publication year: 2020

Abstract

Creating realistic animations of human faces with computer graphic models is still a challenging task. It is often solved either with tedious manual work or motion capture based techniques that require specialised and costly hardware. 

Example based animation approaches circumvent these problems by re-using captured data of real people. This data is split into short motion samples that can be looped or concatenated in order to create novel motion sequences. The obvious advantages of this approach are the simplicity of use and the high realism, since the data exhibits only real deformations. Rather than tuning weights of a complex face rig, the animation task is performed on a higher level by arranging typical motion samples in a way such that the desired facial performance is achieved. Two difficulties with example based approaches, however, are high memory requirements as well as the creation of artefact-free and realistic transitions between motion samples. We solve these problems by combining the realism and simplicity of example-based animations with the advantages of neural face models. 

Our neural face model is capable of synthesising high quality 3D face geometry and texture according to a compact latent parameter vector. This latent representation reduces memory requirements by a factor of 100 and helps creating seamless transitions between concatenated motion samples. In this paper, we present a marker-less approach for facial motion capture based on multi-view video. Based on the captured data, we learn a neural representation of facial expressions, which is used to seamlessly concatenate facial performances during the animation procedure. We demonstrate the effectiveness of our approach by synthesising mouthings for Swiss-German sign language based on viseme query sequences.

Share

Split Rendering for Mixed Reality: Interactive Volumetric Video in Action

J. Son, S. Gül, G. Singh Bhullar, G. Hege, W. Morgenstern, A. Hilsmann, T. Ebner, S. Bliedung,

P. Eisert, T. Schierl, T. Buchholz, C. Hellge

SIGGRAPH Asia, Demos

Publication year: 2020 

Abstract

This demo presents a mixed reality (MR) application that enables free-viewpoint rendering of interactive high-quality volumetric video (VV) content on Nreal Light MR glasses, web browsers via WebXR and Android devices via ARCore. The application uses a novel technique for animation of VV content of humans and a split rendering framework for real-time streaming of volumetric content over 5G edge-cloud servers. The presented interactive XR experience showcases photorealistic volumetric representations of two humans. As the user moves in the scene, one of the virtual humans follows the user with his head, conveying the impression of a true conversation.

Share

Ernst Grube: A Contemporary Witness and His Memories Preserved with Volumetric Video

M. Worchel, M. Zepp, W. Hu, O. Schreer, I. Feldmann, P. Eisert

Eurographics Workshop on Graphics and Cultural Heritage 8GCH2020

Publication year: 2020 

Abstract

''Ernst Grube - The Legacy'' is an immersive Virtual Reality documentary about the life of Ernst Grube, one of the last German Holocaust survivors. From interviews conducted inside a volumetric capture studio, dynamic full-body reconstructions of both, the contemporary witness and its interviewer, are recovered. The documentary places them in virtual recreations of historical sites and viewers experience the interviews with unconstrained motion. As a step towards the documentary's production, prior work presents reconstruction results for one interview. However, the quality is unsatisfying and does not meet the requirements of the historical context. In this paper, we take the next step and revise the used volumetric reconstruction pipeline. We show that our improvements to depth estimation and a new depth map fusion method lead to a more robust reconstruction process and that our revised pipeline produces high-quality volumetric assets. By integrating one of our assets into a virtual scene, we provide a first impression of the documentary's look and the convincing appearance of protagonists in the virtual environment.

Share

The impact of stylization on face recognition

N. Olivier, L. Hoyet, F. Argelaguet, F. Danieau, Q. Avril, A. Lecuyer, P. Guillotel, F. Multon,

SAP 2020 – ACM Symposium on Applied Perception

Publication year: 2020 

Abstract

While digital humans are key aspects of the rapidly evolving areas of virtual reality, gaming, and online communications, many applications would benefit from using digital personalized (stylized) representations of users, as they were shown to highly increase immersion, presence and emotional response. In particular, depending on the target application, one may want to look like a dwarf or an elf in a heroic fantasy world, or like an alien on another planet, in accordance with the style of the narrative. While creating such virtual replicas requires stylization of the user’s features onto the virtual character, no formal study has however been conducted to assess the ability to recognize stylized characters. In this paper, we present a perceptual study investigating the effect of the degree of stylization on the ability to recognize an actor, and the subjective acceptability of stylizations. Results show that recognition rates decrease when the degree of stylization increases, while acceptability of the stylization increases. These results provide recommendations to achieve good compromises between stylization and recognition, and pave the way to new stylization methods providing a tradeoff between stylization and recognition of the actor.

Share