Aims and scope:  Research advances in computer vision and pattern recognition represent tremendous progress in different problems and applications. As a result, several problems on visual analysis can be considered as solved (e.g., face recognition), at least in certain scenarios and under specific circumstances.  Despite these important advances, there are still many open problems that are receiving much attention from the community because of the potential applications. We are organizing a workshop and contest at ICPR2012 that aims at compiling research progress around four of these problems that require, in addition to performing an effective visual analysis, to deal with multimodal information (e.g., audio,  RGB-D video, etc.) in order to be solved. In addition we focus on open problems in which the aim is to recognize non-visually evident patterns (e.g., apparent personality traits). The workshop and contest is supported by three organizations with vast experience in the organization of academic events, namely: Chalearn, MediaEval and ImageCLEF. This event is also supported by the IAPR TC 12 on visual and multimedia information systems.

Topics and guidelines: Workshop papers making fundamental or practical contributions on all aspects of computer vision problems that require of multimodal and non-visual information in order to be solved appropriately, with emphasis on topics relevant to the four tracks of the associated challenge. The scope of the workshop includes, but is not limited to:

·       Gesture and action recognition with multimodal information

o   Multimodal action/gesture recognition.

o   Multimodal action/gesture spotting.

·       Context-based video indexing and retrieval

o   Video recommendation incorporating contextual information.

o   User-context aware video indexing and retrieval.

·       Apparent personality analysis

o   Personality  trait  analysis,

o   Personality profiling,

·       Applications: Health, Security, Forensics, criminology, Job interviews, Dating, Marketing.


Papers should not exceed 6 pages in IEEE ICPR format and should be submitted thought the CMT system: link TBA. Accepted papers will published in the ICPR2016 companion proceedings. Additionally, papers on personality traits analysis will be invited to submit revised and extended versions of their papers to be considered for publication in a Special Issue on Personality Analysis in the IEEE Transactions on Affecfive Computing


Important dates (Workshop)

  • August 10th, 2016: Workshop paper submission deadline (for no participants)
  • August 31th, 2016: Workshop paper submission deadline (for contest participants) Extended deadline!
  • September 10th, 2016: Notification of paper acceptance.
  • September 14th, 2016:  Camera ready of workshop papers. 
  • December, 2016:          ICPR 2016 Joint contest and workshop on Multimedia Challenges Beyond Analysis, challenge results, award ceremony.

08:45h Opening: Presentation of the workshop, Hugo Jair Escalante (INAOE, Mexico) & Jun Wan (CAS, China)

09:00h Invited Speaker I: Towards machine interpretation of the real world. Alberto del Bimbo. (University of Florence, Italy)

09:45h Session I: Challenge results presentation and award ceremony, Hugo Jair Escalante (INAOE, Mexico) & Jun Wan (CAS, China)

  • ChaLearn Joint Contest on Multimedia Challenges Beyond Visual Analysis: An overview,  Hugo Jair Escalante, Víctor Ponce, Jun Wan., Michael Riegler, Albert Clapes, Sergio Escalera, Isabelle. Guyon, Xavier Baro, Paal Halvorsen, Henning Muller, and Martha Larson,

10:00h Coffee Break

10:30h Session II: Winners First Impressions Challenge (Challenge Track 1)

  • First impressions.

    • 1st: Multimodal Fusion of Audio, Scene, and Face Features for First Impression Estimation. Furkan Gürpınar, Heysem Kaya, Ali Salah

    • 4th Automatic Personality Prediction from Audiovisual Data using Random Forest Regression. Berkay Aydın, Ahmet Alp kindiroglu, Lale Akarun

11:10h Session III: Winners Isolated Gesture Recognition Challenge (Challenge Track 2)

  • 1st: Large-scale Gesture Recognition with a Fusion of RGB-D Data Based on C3D Model . Yunan Li, Qiguang Miao, Kuan Tian, Yingying Fan, Xin Xu, Rui Li, Jianfeng Song

  • 2nd: Large-scale Isolated Gesture Recognition Using Convolutional Neural Networks. Pichao Wang, Wanqing Li, Song Liu, Zhimin Gao, Chang Tang, Philip Ogunbona,

  • 3rd: Large-scale Isolated Gesture Recognition using Pyramidal 3D Convolutional Networks. Guangming Zhu, Liang Zhang, Lin Mei, Jie Shao, Juan Song, Peiyi Shen,

12:10h Lunch break (on your own)

13:30h Invited Speaker II: Multimodal Deep Learning. Fabio A. González O. (National University of Colombia, Colombia)

14:15h Session IV: Affective computing session

  • Predicting and visualizing psychological attributions with a deep neural network. Mariam Zabihi, Edward Grant, Stephan Sahm, Marcel Van Gerven

  • Fusion of Classifier Predictions for Audio-Visual Emotion Recognition. Fatemeh Noroozi, Marina Marjanovic, Angelina Njegus, Sergio Escalera, Gholamreza Anbarjafari

15:00h Coffee Break

15:30h Invited Speaker III: Bayesian Classification: Applications in Computer Vision. L. Enrique Sucar (National Institute of Astrophysics, Optics and Electronics, Mexico)

16:15h Session V: Winners Continuous Gesture Recognition Challenge (Challenge Track 3)

    • 1st: Two Streams Recurrent Neural Networks for Large-Scale Continuous Gesture Recognition.  Xiujuan Chai, Zhipeng Liu, Fang Yin, Zhuang Liu, Xilin Chen

    • 2nd: Using Convolutional 3D Neural Networks for User-Independent Continuous Gesture Recognition.Necati Camgoz, Simon Hadfield, Oscar Koller, Richard Bowden

    • 3rd: Large-scale Continuous Gesture Recognition Using Convolutional Neural Networks. Pichao Wang, Wanqing Li, Song Liu, Yuyao Zhang, Zhimin Gao, Philip Ogunbona

17:15h Closing: Closing ceremony & announcements,Hugo Jair Escalante (INAOE, Mexico) & Jun Wan (CAS, China)

Confirmed Invited Speakers:


Alberto del Bimbo, University of Florence

Alberto del Bimbo research interests address analysis and interpretation of images and video and their applications, with particular interest at content-based retrieval in visual and multimedia digital archives, advanced videosurveillance and target tracking, and natural man-machine interaction assisted by computer vision. He is the author of over 200 scientific publications that have appeared in international journals and conference proceedings. He has been the coordinator of many European and national projects in the areas of multimedia and image and video analysis. He was also the coordinator of the cluster Audio-Visual and Non-traditional Object Digital Libraries Network of Excellence on Digital Libraries DELOS of the Information Society Technologies Program of the European Commission, from 2004 to 2007.

Luis Enrique Sucar, INAOE

L. Enrique Sucar is Senior Research Scientist at INAOE, Puebla, Mexico. He has a Ph.D. in Computing from Imperial College and a M.Sc. in Electrical Engineering from Stanford University; and has been an invited professor at the University of British Columbia, INRIA and CREATE-NET. Dr. Sucar is Member of the National Research System, the Mexican Science Academy, a Senior Member of the IEEE, and has more than 250 publications in journals and conference proceedings. He has served as president of the Mexican AI Society, has been member of the Advisory Board of IJCAI, and has directed several international projects. His main research interests are in probabilistic graphical models and their applications in computer vision, robotics and biomedicine.

Fabio A. Gonzalez
, Universidad Nacional de Colombia 

Fabio A. Gonzalez is a Full Professor at the Department of Computing Systems and Industrial Engineering at the National University of Colombia, where he leads the Machine Learning, Perception and Discovery Lab (MindLab). He earned a Computing Systems Engineer degree and a MSc in Mathematics degree from the National University of Colombia in 1993 and 1998 respectively, and a MSc and PhD degrees in Computer Science from the University of Memphis, USA, in 2003. His research work revolves around machine learning, information retrieval and computer vision, with a particular focus on the representation, indexing and automatic analysis of multimodal data (data encompassing different types of information: textual, visual, signals, etc.).