CVPR 2012 Workshop on Gesture Recognition June 17


Our Industry sponsors:

Kipman Massy
Branislav Kisacanin

Alex Kipman and Laura Massey (Microsoft) Branislav Kisacanin (Texas Instrument)

Our challenge winners:

ChaLearn one-shot learning challenge

David Weiss
Eric Jackson

Alfonso Nieto-Catanon (1st place) David Weiss (2nd place) Eric Jackson (3rd place)

Demonstration competition:

Cem Keskin
Ilaria Gori
Gabriele Fanelli

Cem Keskin (1st place) Ilaria Gori (2nd place) Gabriele Fanelli (3rd place)

Our best paper award winners:

Erik Jackson
Kenneth Funes

Yui Man Lui (Best paper award) Kenneth Funes (Best student paper award)

Our speakers:

Fernando de la Torre
Takeo Kanade
Deva Ramanan
Adam Kendon

Fernando de la Torre Takeo Kanade Deva Ramanan Adam Kendon

Thad Starner
Sebastian Novosin
Lu Wang

Thad Starner Sebastian Novozin Jeffrey Cohn Lu Wang

The organizers:

Isabelle Guyon
Vassilis Athitsos

Isabelle Guyon Vassilis Athitsos

Invited speakers

Jeffrey Cohn
Takeo Kanade

Jeffrey Cohn, University of Pittsburgh, Pennsylvania, USA. Non-verbal communication and facial expression. Jeffrey Cohn is Professor of Psychology at the University of Pittsburgh and Adjunct Faculty at the Robotics Institute at Carnegie Mellon University. He has led interdisciplinary and inter-institutional efforts to develop adanced methods of automatic analysis of facial expression and prosody; and applied those tools to research in human emotion, social development, non-verbal communication, psychopathology, and biomedicine. He co-chaired the 2008 IEEE International Conference on Automatic Face and Gesture Recognition (FG2008) and the2009 International Conference on Affective Computing and Intelligent Interaction (ACII2009). He has co-edited two recent special issues of the Journal of Image and Vision Computing. His research has been supported by grants from the National Institutes of Health, National Science Foundation, Autism Foundation, Office of Naval Research, Defense Advanced Research Projects Agency, and the Technical Support Working Group.

Takeo Kanade, Carnegie Mellon University, Pennsylvania, USA. Body motion detection and understanding using both 2D and 3D setups. Body motion detection and understanding using both 2D and 3D setups. Takeo Kanade is the U. A. and Helen Whitaker University Professor of Computer Science and Robotics and the director of Quality of Life Technology Engineering Research Center at Carnegie Mellon University. He works in multiple areas of robotics: computer vision, multi-media, manipulators, autonomous mobile robots, medical robotics and sensors. He has written more than 300 technical papers and reports in these areas, and holds more than 20 patents. He has been the principal investigator of more than a dozen major vision and robotics projects at Carnegie Mellon.

Adam Kendon, University of Pennsylvania, USA. Semiotic analysis of gestures. Adam Kendon is a leading authority on the study of gesture but has also published pioneering studies on the organization of behaviour in face-to-face interaction. In a scholarly career that extends over more than forty years, Kendon has published over one hundred articles and several books all of which deal, in various ways, with the the role of the body in face-to-face interaction. He is, at present, a visiting scholar at the Institute for Research in Cognitive Science at the University of Pennsylvania. He is Editor of the international journal Gesture that has been published by John Benjamins of Amsterdam since 2001 and editor of the companion book series Gesture Studies. He was made an Honorary President of the International Society of Gesture Studies in 2005.

Deva Ramanan, UC Irvine, California, USA, Statistical models for activity recognition. Deva Ramanan graduated with a PhD from UC Berkeley, where he was advised by David Forsyth. Before arriving at UCI, he spent two years as a research professor at TTI-Chicago. He was a visting researcher at the Visual Geometry Group at Oxford, the Robotics Institute at CMU, and the Interactive Visual Media Group at Microsoft Research. His work is primarily in computer vision, but he is also interested in machine learning and computer graphics. His awards include a PASCAL VOC Lifetime Achievement Prize for Object Recognition (invited talk at ICML 2010)and the 2009 Marr Prize for Discriminative Models in Vision.

Sebastian Nowozin, Microsoft Research, Cambridge, UK. Getting quality labeled data for gesture recognition. Sebastian Nowozin's research interest is in developing optimization and machine learning techniques suitable for solving high-level computer vision tasks, such as image classification, object and gesture recognition. High-level computer vision tasks are a unique source of hard machine learning problems for three reasons. First, in contrast to physics-based processes we do not know the correct model (model uncertainty). Second, humans excel at all high-level vision tasks and therefore can provide data and assess model performance (ground truth oracle). Third, image and video data is available for free at an enormous scale (data availability). Sebastian Novozin uses mathematical optimization as a tool to solve computer vision machine learning tasks. He has recently conducted a comparison of HMMs and decision forests for gesture recognition. In this work, he will present an in-depth investigation of the intricacies of getting high quality labeled data.

Thad Starner, Georgia Tech, Georgia, USA. Gesture recognition and human computer interaction. Thad Starner is the director of the Contextual Computing Group. His group creates computational interfaces and agents for use in everyday mobile environments. They combine wearable and ubiquitous computing technologies with techniques from the fields of artificial intelligence (AI), pattern recognition, and human computer interaction (HCI). Recently, they have been designing assistive technology with the deaf community. One of our main projects is CopyCat, a game which uses American Sign Language recognition to help young deaf children acquire language skills. They continually develop new interfaces for mobile computing (and mobile phones) with an emphasis on gesture. Currently, they are exploring mobile interfaces that are fast to access, like wristwatches.

Fernando de la Torre, Carnegie Mellon Univ., Pennsylvania, USA. Unsupervised and weakly supervised discovery of events for human sensing. Fernando De la Torre is an Associate Research Professor in the Robotics Institute at Carnegie Mellon University. He received his B.Sc. degree in Telecommunications, as well as his M.Sc. and Ph. D degrees in Electronic Engineering from La Salle School of Engineering at Ramon Llull University, Barcelona, Spain in 1994, 1996, and 2002, respectively. His research interests are in the fields of Computer Vision and Machine Learning. Specifically, he is interested in modeling and recognizing human behavior with a focus on understanding human behavior from multimodal sensors (e.g. video, body sensors). He has done extensive work on facial image analysis (e.g., facial expression recognition, facial feature tracking). In machine learning his interest centers on developing efficient and robust supervised and unsupervised methods to model high-dimensional data. Currently, he is directing the Component Analysis Laboratory ( and the Human Sensing Laboratory ( at Carnegie Mellon University. He has over 100 publications in referred journals and conferences. He has organized and co-organized several workshops and has given tutorials at international conferences on the use and extensions of Component Analysis.

Deva Ramanan
Sebastian Nowozin
Fernando de la Torre

Call for participation

We are organizing a workshop of gesture and sign language recognition from 2D and 3D video data and still images. Last year's workshop on gesture recognition at CVPR 2011 was a big success, with over 300 participants. This new workshop is coupled with a gesture recognition challenge, offering the opportunity to work on a large database of videos of hand gestures recorded with KinectTM. The best entrants will present their work at the workshop.

The scope of the workshop is broader than that of the challenge since gestures originate from any body motion and there is a wide variety of application settings in gaming, marketing, computer interfaces, interpretation of sign language for the deaf, and video surveillance. We invited keynote speakers in diverse areas of gesture research, including sign language recognition, body posture analysis, action and activity recognition, image or video indexing and retrieval, and facial expression or emotion recognition. The workshop aims at gathering researchers from different application domains working on gesture recognition to share algorithms and techniques.

It is possible to register to the workshop only, not to the full conference, but the participants have to register via the CVPR 2012 website. The calls for paper and for demonstration competitions are closed.


Demonstration competition (June 16, 2012)

June 17, 2012, workshop:

-----To download the preprints use login CVPR2012 and passwd papers -----

Morning: Gesture recognition -- from theory to practice

7:30 am Breakfast

8:00 am Alex Kipman, Microsoft. Welcome and introduction.

8:20 am Isabelle Guyon, ChaLearn. Results of the ChaLearn gesture challenge [slides].

8:40 am Vassilis Athitsos, UTA. Results of the demonstration competition.

9:00 am INVITED TALK: Jeffrey Cohn, U. Pittsburgh. Non-verbal communication and facial expression.

9:30 am Challenge paper 1 (2nd place) Pennect team, U. of Pennsylvannia [Slides].

9:50 am Challenge paper 2 (3rd place) Eric Jackson [One Million Monkeys], Menlo Park, California. An HMM-Based Approach For Gesture Recognition Using Edge Features [Slides].

10:10 am Coffee break

10:20 am INVITED TALK: Adam Kendon, U. of Pennsylvania. Semiotic analysis of gestures.

10:50 am Challenge paper 3. Di Wu, Fan Zhu, Ling Shao, U. of Sheffield. One Shot Learning Gesture Recognition from RGBD Images. [Slides]

11:10 am Challenge paper 4 [BEST PAPER AWARD]. Yui Man Lui, Colorado State U. A Least Squares Regression Framework on Manifolds and its Application to Gesture Recognition [Slides].

11:30 am INVITED TALK: Fernando de la Torre, CMU. Unsupervised and weakly supervised discovery of events for human sensing.

12:00 - 1:30 pm: Lunch and poster session; demonstrations can also be shown

(lunch will be served from 12:30 to 1:30)

Afternoon: Gesture recognition -- from practice to applications

1:30 pm INVITED TALK: Takeo Kanade, CMU. Body motion detection and understanding using both 2D and 3D setups.

2:00 pm Demonstration competition, 1st place. Cem Keskin1, Eray Berger2 and Lale Akarun1, 2 Sigma Resarch and Devel., Istanbul, Turkey, 1 Bogazici University Computer Engineering Department, Istanbul, Turkey. A Unified Framework for Concurrent Usage of Hand Gesture, Shape and Pose [Slides].

2:20 pm Demonstration competition, second place. Ilaria Gori, Sean Ryan Fanello, Giorgio Metta, Francesca Odone. Istituto Italiano di Tecnologia, Italy. All Gestures you Can: a Memory Game [Slides].

2:40 pm INVITED TALK: Deva Ramanan, UCI. Estimating human poses in images and videos.

3:10 pm Coffee break

3:20 pm INVITED TALK: Thad Starner, Georgia Tech. Gesture recognition and human computer interaction [Slides].

3:50 pm Lu Wang, Ryan Villamil, Supun Samarasekera, and Rakesh Kumar, SRI, Magic Mirror: A Virtual Handbag Shopping System [Slides].

4:10 pm Kenneth Funes and Jean-Marc Odobez, IDIAP [BEST STUDENT PAPER AWARD], Gaze Estimation from Multimodal Kinect Data [Slides].

4:30 pm INVITED TALK: Sebastian Nowozin, Microsoft, Machine learning for low-latency gesture recognition: Issues in data acquisition and labeling [Slides]

5:00 pm Cem Keskin, Bogaziçi University, Istanbul, Randomized Decision Forests for Static and Dynamic Hand Shape Classification.

5:20 Wrap up. Advertisement for next challenge

5:30 Discussion

6:00 pm Adjourn


Bernhard Kohn, Austrian Institute of Technology. Real-time Gesture Recognition using bio inspired 3D Vision Sensor.

Manavender Malgireddy, Ifeoma Nwogu, and Venu Govindaraju, Univ. Buffalo. A Temporal Bayesian Model for Classifying, Detecting and Localizing Activities in Video Sequences [Poster]

Rizwan Ahmed Khan *, Alexandre Meyer*, Hubert Konik+, Saida Bouakaz*, *U. Lyon 1, LIRIS and +U. Jean Monnet. Exploring human visual system: study to aid the development of automatic facial expression recognition framework [Poster]

Yannick L. Gweth, RWTH Aachen University. Enhanced Continuous Sign Language Recognition using PCA and Neural Network Features [Poster]

Ilaria Gori, Sean Ryan Fanello, Giorgio Metta, Francesca Odone. Istituto Italiano di Tecnologia, Italy. All Gestures you Can: a Memory Game [Video][Poster]

Poster printing near the convention center:


Michelle Fournier

20 Westminster Street



FedEx Kinko

FedEx Office Print & Ship Center

1020 Bald Hill Rd

Warwick, RI 02886

(401) 826-0808

Program committee

Vassilis Athitsos, University of Texas at Arlington, Texas, USA

Richard Bowden, University of Surrey, UK

Chris Bregler, New-York University, New-York, USA

Jeffrey Cohn, University of Pittsburgh, Pennsylvania, USA

Philippe Dreuw, Robert Bosch GmbH, Hildesheim, Germany

Hugo Jair Escalante Balderas, IANOE, Mexico

Isabelle Guyon, Clopinet, Berkeley, California, USA

Ben Hamner, Kaggle, USA

Ivan Laptev, INRIA, France

Quoc Le, Stanford University, California, USA

Dimitris Metaxas, Rutgers, New-Jersey, USA

Carol Neidle, Boston University, Massachusetts, USA

Andrew Ng, Stanford University, Palo Alto, California, USA

Andrew Saxe, Stanford University, California, USA

Jamie Shotton, Microsoft Research, Cambridge, UK

Thad Starner, Georgia Tech, Georgia, USA

Fernando de la Torre, CMU, Pennsylvania, USA

Matthew Turk, UCSB, California, USA

Workshop chairs

Isabelle Guyon, Clopinet, Berkeley, California

Vassilis Athitsos, University of Texas at Arlington, Texas, USA

Alex Kipman, Microsoft, Redmond, Washington, USA