BoxLab Inter-Rater Reliabilities

Background

BoxLab is a platform for collecting sensor data on everyday activities in the homes of volunteer participants. We are making the data collected by BoxLab “kiosks” freely available for download by researchers. In addition to sensor values for different types of sensing, we provide audio/visual records of home activity and annotations of what a 3rd party observer believes is happening in the video recordings.

More information about this project is available at http://boxlab.wikispaces.com. Refer to the links under “Annotation” on the left-hand column to learn more about our annotation procedure.

Exercise

In order to quantify the accuracy and validity of labels provided by human observers, two independent annotators are double-coding portions of the activities in the datasets. Your job is to provide a statistical analysis of the degree to which the two coders correspond in their labels.

Step 1:

Grab the two files of annotations for a 24-hour period in XML format (BoxLabRater1.annotation.xml, BoxLabRater2.annotation.xml). For each of the annotation nodes in these files, you can ignore all of the information except the LABEL, START_DT, and STOP_DT values as indicated in the following sample:

    <LABEL>washing dishes</LABEL>
    <START_DT>2009-06-21 20:10:33.167</START_DT>
    <STOP_DT>2009-06-21 20:22:18.561</STOP_DT>

Your application should treat this sample to mean that the activity “washing dishes” occurred from 8:10pm to 8:22pm on June 21, 2009.

For this exercise, you should write an application (or function) that is called as follows:
BoxLabInterRater(file1, file2)

Have the code compute the following information and output it in files with the same format as the examples:

Provide a histogram of the labels occurring in each file, ordered by frequency from highest to lowest; include a column with the mean duration of each label’s occurrence in the source files (BoxLabInterRaterHistogram.csv)
List only those labels that appear in both files; provide the number or occurrences in each document, and then compute the number of intersections where these labels occur simultaneously in each file. Organize this information in columns. (BoxLabInterRaterIntersection.csv)
Provide statistical worksheet describing the data according to each source file. At minimum, you should included the following: (1) Total number of labels per file, (2) amount of time for which at least one label is applied, (3) amount of time for no labels are applied, (4) longest activity name + duration, (5) shortest activity name + duration, (6) median duration of labeled activities (7) any additional summary statistics you think might be of interest to users of this dataset (BoxLabInterRaterStatistics.csv)
Compute Cohen’s Kappa (you can learn about it on the Internet) for the two data files, using intervals of 1s, 10s, and 60s. If you can think of a better measure of inter-rater reliability, you may substitute that for Cohen’s Kappa. (BoxLabInterRaterKappa.csv)

Step 2:

Email Jason Nawyn (nawyn@mit.edu) your output files. If you wish to pursue further work on this data, we will describe a second exercise in which the Rating times and values are taken into consideration.

Questions?

Contact Jason Nawyn (nawyn@mit.edu).