cff-version: 1.2.0
abstract: "<p>This file contains the annotations for the ConfLab dataset, including actions (speaking status), pose, and F-formations. </p>
<p>------------------</p>
<p>./actions/speaking_status:</p>
<p>./processed: the processed speaking status files, aggregated into a single data frame per segment. Skipped rows in the raw data (see https://josedvq.github.io/covfee/docs/output for details) have been imputed using the code at:  https://github.com/TUDelft-SPC-Lab/conflab/tree/master/preprocessing/speaking_status</p>
<p>    The processed annotations consist of:</p>
<p>        ./speaking: The first row contains person IDs matching the sensor IDs,</p>
<p>        The rest of the row contains binary speaking status annotations at 60fps for the corresponding 2 min video segment (7200 frames).</p>
<p>        ./confidence: Same as above. These annotations reflect the continuous-valued rating of confidence of the annotators in their speaking annotation.</p>
<p>To load these files with pandas: pd.read_csv(p, index_col=False)</p>
<p><br></p>
<p>./raw-covfee.zip: the raw outputs from speaking status annotation for each of the eight annotated 2-min video segments. These were were output by the covfee annotation tool (https://github.com/josedvq/covfee)</p>
<p>Annotations were done at 60 fps.</p>
<p>--------------------</p>
<p>./pose:</p>
<p>./coco: the processed pose files in coco JSON format, aggregated into a single data frame per video segment. These files have been generated from the raw files using the code at: https://github.com/TUDelft-SPC-Lab/conflab-keypoints</p>
<p>    To load in Python: f = json.load(open('/path/to/cam2_vid3_seg1_coco.json'))</p>
<p>    The skeleton structure (limbs) is contained within each file in:</p>
<p>        f['categories'][0]['skeleton']</p>
<p>    and keypoint names at:</p>
<p>        f['categories'][0]['keypoints']</p>
<p>./raw-covfee.zip: the raw outputs from continuous pose annotation. These were were output by the covfee annotation tool (https://github.com/josedvq/covfee)</p>
<p>    Annotations were done at 60 fps.</p>
<p>---------------------</p>
<p>./f_formations:</p>
<p>seg 2: 14:00 onwards, for videos of the form x2xxx.MP4 in /video/raw/ for the relevant cameras (2,4,6,8,10). </p>
<p>seg 3: for videos of the form x3xxx.MP4 in /video/raw/ for the relevant cameras (2,4,6,8,10). </p>
<p>Note that camera 10 doesn't include meaningful subject information/body parts that are not already covered in camera 8. </p>
<p>First column: time stamp</p>
<p>Second column: "()" delineates groups, "<>" delineates subjects, cam X indicates the best camera view for which a particular group exists.</p>
<p><br></p>
<p>phone.csv: time stamp (pertaining to seg3), corresponding group, ID of person using the phone</p>"
authors:
  - family-names: Raman
    given-names: Chirag
    orcid: "https://orcid.org/0000-0003-4894-4206"
  - family-names: Vargas Quiros
    given-names: Jose
  - family-names: Tan
    given-names: Stephanie
  - family-names: Islam
    given-names: Ashraful
  - family-names: Gedik
    given-names: Ekin
  - family-names: Hung
    given-names: Hayley
title: "Annotations for ConfLab: A Data Collection Concept, Dataset, and Benchmark for Machine Analysis of Free-Standing Social Interactions in the Wild"
keywords:
version: 3
identifiers:
  - type: doi
    value: 10.4121/20017664.v3
license: Restrictive Licence
date-released: 2022-10-10