CGD 2011 Data

ChaLearn Gesture Dataset (CGD 2011)

DATASET OF THE 2011/2012 ONE-SHOT-LEARNING GESTURE CHALLENGE

[download the technical report]

THESE DATA AND SOFTWARE ARE PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS

SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

WARNING: Even though this dataset includes people from different ethnical backgrounds, gender, and ethnicity, it was not designed to be balanced and/or representative of any given population.

DOWNLOAD THE DATASET CGD2011 of 50,000 gestures from one of our data mirrors.

To cite the data please use:

"ChaLearn Gesture Dataset (CGD2011), ChaLearn, California, 2011"

Copyright (c) ChaLearn - 2011

Setting

We are portraying a single user in front of a fixed camera, interacting with a computer by performing gestures to

- play a game,

- remotely control appliances or robots, or

- learn to perform gestures from an educational software.

KinectTM data

KinectTM has revolutionized the field of gesture recognition because it is an affordable device (a high end webcam) providing both RGB and depth images. Depth images facilitate image segmentation considerably. We have collected a large dataset of 50,000 gestures with KinectTM. We provide MatlabTM code to browse though the data and process it to create a sample submission. The data can also be viewed with most video viewers, see the README file for details.

We provide both the RGB image and the depth image as in the example below. View more examples.

RGB images

Gray scale rendering of depth images

The data are organized in batches

devel01

...

devel480

[Initially only 20 development batches were released. All the data are now available]

valid01

...

valid20

final01 (final evaluation data for round 1)

...

final20

final21 (final evaluation data for round 2, not published yet)

...

final40

Each batch includes 100 recorded gestures grouped in sequences of 1 to 5 gestures performed by the same user. The gestures are drawn from a small vocabulary of 8 to 15 unique gestures, which we call a "lexicon" (see a few examples of the lexicons we used).

We selected lexicons from nine categories corresponding to various settings or application domains; they include (1) body language gestures (like scratching your head, crossing your arms), (2) gesticulations performed to accompany speech, (3) illustrators (like Italian gestures), (4) emblems (like Indian Mudras), (5) signs (from sign languages for the deaf), (6) signals (like referee signals, diving signals, or mashalling signals to guide machinery or vehicle), (7) actions (like drinking or writing), (8) pantomimes (gestures made to mimic actions), and (9) dance postures.

Many Faces

During the challenge, we do not disclose the identity of the lexicons and of the users. They will be revealed (after user anonymization) at the end of the challenge. Although the gesture classes are different from batch to batch, we represent the class label within each batch by a number between 1 and 15.

Goal of the challenge: one-shot-learning

For the develXX batches, we provide all the labels. For the validXX and finalXX batches, we provide labels only for one examle of each class. The goal is to predict the gesture class labels for the remaining gesture sequences.

During the development period, performance feed-back will be provided on-line on the validXX batches. The final evaluation will be carried out on the finalXX batches and the final results will be revealed only when the challenge is over.

What is easy about the data:

- Fixed camera

- Availability of depth data

- Single user within a batch

- Homogeneous recording conditions within a batch

- Small vocabulary within a batch

- Gestures separated by returning to a resting position

- Gestures performed mostly by arms and hands

- Camera framing mostly the upper body (some exceptions)

What is hard about the data:

- Only one labeled example of each unique gestures

- Variations in recording conditions (various backgrounds, clothing, skin colors, lighting, temperature, resolution)

- Some parts of the body may be occluded

- Some users are less skilled than others

- Some users made errors or omissions in performing the gestures

Data annotations:

We provide some data annotations, including temporal segmentation into isolated gestures and body part annotations (head, shoulders, elbows, and hands).