View-invariant 3D Human Pose Tracking in Monocular Surveillance Videos


Abstract. Exemplar based approaches have been very successful for human motion analysis but their accuracy strongly depends on the similarity of the viewpoint in testing and training images. In practice, roof-top cameras are widely used for video surveillance applications and are usually placed at a significant angle from the floor, which is different from typical training viewpoints. We present a methodology for viewpoint invariant monocular 3D human pose tracking in man-made environments in which we exploit some properties of projective geometry and assume that observed people move on a known ground plane. First, we model 3D body poses and camera viewpoints with a low dimensional manifold and learn a generative model of the silhouette from this manifold to a reduced set of training views. During the online stage, 3D body poses   are tracked using recursive Bayesian sampling conducted jointly over the scene’s ground plane and the pose-viewpoint manifold. For each sample, the homography that relates the corresponding training plane to the image points is calculated using the dominant 3D directions of the scene, the  sampled location on the ground plane and the sampled camera view. Each regressed silhouette shape is projected using this homographic transformation and matched in the image to estimate its likelihood. Our framework is able to track 3D human walking poses in a 3D environment exploring only a 4 dimensional state space. In our experimental evaluation, we demonstrate the significant improvements of the homographic alignment over a commonly used similarity transformation and provide quantitative pose tracking results for the monocular sequences with high perspective effect from the CAVIAR dataset.

Keywords. Monocular Human Pose Tracking, View-invariance, Projective Geometry, Video-Surveillance.

Papers.

Dataset.