28C3 - Version 2.3.5

28th Chaos Communication Congress
Behind Enemy Lines

Speakers
David Kim
Schedule
Day Day 3 - 2011-12-29
Room Saal 1
Start time 15:15
Duration 01:00
Info
ID 4928
Event type Lecture
Track Science
Language used for presentation English
Feedback

KinectFusion

Real-time 3D Reconstruction and Interaction Using a Moving Depth Camera

This project investigates techniques to track the 6DOF position of handheld depth sensing cameras, such as Kinect, as they move through space and perform high quality 3D surface reconstructions for interaction.

While depth cameras are not conceptually new, Kinect has made such sensors accessible to all. The quality of the depth sensing, given the low-cost and real-time nature of the device, is compelling, and has made the sensor instantly popular with researchers and enthusiasts alike. The Kinect camera uses a structured light technique to generate real-time depth maps containing discrete range measurements of the physical scene. This data can be reprojected as a set of discrete 3D points (or point cloud). Even though the Kinect depth data is compelling, particularly compared to other commercially available depth cameras, it is still inherently noisy. Depth mea- surements often fluctuate and depth maps contain numerous ‘holes’ where no readings were obtained. To generate 3D models for use in applications such as gaming, physics, or CAD, higher-level surface geometry needs to be inferred from this noisy point-based data. One simple approach makes strong assumptions about the connectivity of neighboring points within the Kinect depth map to generate a mesh representation. This, however, leads to noisy and low-quality meshes. As importantly, this approach creates an incomplete mesh, from only a single, fixed viewpoint. To create a complete (or even watertight) 3D model, different viewpoints of the physical scene must be captured and fused into a single representation. This talk presents a novel interactive reconstruction system called KinectFusion). The system takes live depth data from a moving Kinect camera and, in real- time, creates a single high-quality, geometrically accurate, 3D model. A user holding a standard Kinect camera can move within any indoor space, and reconstruct a 3D model of the physical scene within seconds. The system continuously tracks the 6 degrees-of-freedom (DOF) pose of the camera and fuses new viewpoints of the scene into a global surface-based representation. A novel GPU pipeline allows for accurate camera tracking and surface reconstruction at interactive real-time rates. We demonstrate core uses of KinectFusion as a low-cost handheld scanner, and present novel interactive methods for segmenting physical objects of interest from the reconstructed scene. We show how a real-time 3D model can be leveraged for geometry-aware augmented reality (AR) and physics- based interactions, where virtual worlds more realistically merge and interact with the real. Placing such systems into an interaction context, where users need to dynamically interact in front of the sensor, reveals a fundamental challenge – no longer can we assume a static scene for camera tracking or reconstruction. We illustrate failure cases caused by a user moving in front of the sensor. We describe new meth ods to overcome these limitations, allowing camera tracking and reconstruction of a static background scene, while simultaneously segmenting, reconstructing and tracking foreground objects, including the user. We use this approach to demonstrate real-time multi-touch inter actions anywhere, allowing a user to appropriate any physical surface, be it planar or non-planar, for touch.