Actions recognition from videos: some recent results
The amount of digital video content available is growing daily, on sites such as YouTube. Recent statistics on the YouTube website show that around 48 hours of video are uploaded every minute. This massive data production calls for automatic analysis. In this talk we present some recent results for action recognition in videos.
Bag-of-features have shown very good performance for action recognition in videos. We briefly review the underlying principles and introduce trajectory-based video features, which have shown to outperform the state of the art. These trajectory features are obtained by dense point sampling and tracking based on displacement information from a dense optical flow field. Trajectory descriptors are obtained with motion boundary histograms, which are robust to camera motion.
We, then, show how to move towards more structured representations by explicitly modelling human-object interactions. We learn how to represent a human actions as an interactions between persons and objects. We localize in space and track over time both the object and the person, and represent an action as the trajectory of the object with respect to the person position, i.e., our human-object interaction features capture the relative trajectory of the object with respect to the human.
Cordelia Schmid holds a M.S. degree in Computer Science from the University of Karlsruhe and a Doctorate, also in Computer Science, from the Institut National Polytechnique de Grenoble (INPG). Her doctoral thesis on “Local Greyvalue Invariants for Image Matching and Retrieval” received the best thesis award from INPG in 1996. She received the Habilitation degree in 2001 for her thesis entitled “From Image Matching to Learning Visual Models”. Dr. Schmid was a post-doctoral research assistant in the Robotics Research Group of Oxford University in 1996–1997. Since 1997 she has held a permanent research position at INRIA Rhone-Alpes, where she is a research director and directs the INRIA team called LEAR for LEArning and Recognition in Vision. Dr. Schmid is the author of over a hundred technical publications. She has been an Associate Editor for IEEE PAMI (2001–2005) and for JJCV (2004—), a program chair of IEEE CVPR 2005 and ECCV 2012 as well as a general chair of IEEE CVPR 2015. In 2006, she was awarded the Longuet-Higgins prize for fundamental contributions in computer vision that have withstood the test of time. She is a fellow of IEEE. In 2012, she was awarded an ERC advanced grant.