2020, 196 S., graph. Darst. 210 mm, Softcover
KIT Scientific Publishing
Computing three-dimensional reconstructions of dynamic scenes is one of the fundamental problems in computer vision. For many applications this task can be reduced to the determination of three-dimensional object motion trajectoriesw.r.t. mainly static environment structures. This approach simplifies the reconstruction problem by constraining projective ambiguities of different scene components. Image-based reconstruction approaches such as Multibody Structure from Motion (MSfM) represent an appealing choice to reconstruct dynamic scenes given suitable conditions like sufficiently textured surfaces and non-degenerated camera trajectories. The underlying assumption of MSfM is that the scene maybe represented by a multibody system, i.e., that the scene consists of multiplen on-deformable components, which may undergo independent translational and rotational displacements. Existing MSfM approaches use epipolar constraints or motion segmentation to determine component specific feature correspondences to reconstruct independently moving components. Such methods are agnostic to semantics and fail in certain scenarios like stationary or parallel moving objects. It is difficult to identify capabilities and limitations of existing approaches, because of the lack of image-based dynamic object reconstruction baseline algorithms and benchmark datasets. We propose a novel MSfM algorithm for moving object reconstruction that incorporates (instance-aware) semantic segmentation and multiple view geometry methods. The proposed MSfM pipeline includes a Multiple Object Tracking (MOT) algorithm that tracks two-dimensional object shapes on pixel level to determine object specific feature correspondences. We consider nonobject structures for the environment reconstruction. The proposed MSfM method allows the reconstruction of three-dimensional object shapes and object motion trajectories. We leverage camera poses w.r.t. object reconstructions and corresponding instance-aware semantic segmentations to determine object points consistent with image observations. The generated point clouds are suitable for object mesh computations. In order to compute a three-dimensional object trajectory we combine corresponding camera poses in the object and in the background reconstruction. We present different algorithms to reconstruct object motion trajectories in monocular and stereo image sequences. In the monocular case, three-dimensional object trajectories are defined up to scale. In order to resolve this ambiguity, we propose two different constraints to estimate the scale ratio between object and environment reconstructions. To facilitate the benchmarking of new and existing approaches, we additionally created two publicly available datasets for moving object reconstruction. The first dataset comprises real-world image sequences of a moving vehicle and a corresponding vehicle laser scan suitable for evaluation of object shapere constructions. The second dataset contains synthetic sequences of different vehicles in an urban environment. The ground truth includes vehicle shapes as well as vehicle and camera poses per frame. This dataset allows to quantitatively evaluate shape and trajectory reconstructions of moving objects. Using the created datasets, we evaluate our algorithms on outdoor scenarios of driving vehicles with challenging properties such as small object sizes, reflecting surfaces as well as illumination and view dependent appearance changes. We show that the proposed semantic constraint for object shape reconstruction produces meshes that are robust w.r.t. reflections and appearance changes. The quantitative evaluation of the trajectory reconstruction algorithms shows that the scale ambiguity of (monocular) image-based reconstructions poses a challenging problem. The usage of stereo image sequences resolves this ambiguity and results in more accurate and robust reconstructions. By quantitatively evaluating the proposed algorithms on our datasets we provide a reference for future research in the area of moving object reconstruction.