BoxLab is a platform for collecting sensor data on everyday activities in the homes of volunteer participants. We are making the data collected by BoxLab “kiosks” freely available for download by researchers. In addition to sensor values for different types of sensing, we provide audio/visual records of home activity and annotations of what a 3rd party observer believes is happening in the video recordings.
More information about this project is available at http://boxlab.wikispaces.com. Refer to the links under “Annotation” on the left-hand column to learn more about our annotation procedure.
In order to quantify the accuracy and validity of labels provided by human observers, two independent annotators are double-coding portions of the activities in the datasets. Your job is to provide a statistical analysis of the degree to which the two coders correspond in their labels.
Grab the two files of annotations for a 24-hour period in XML format (BoxLabRater1.annotation.xml, BoxLabRater2.annotation.xml). For each of the annotation nodes in these files, you can ignore all of the information except the LABEL, START_DT, and STOP_DT values as indicated in the following sample:
<LABEL>washing dishes</LABEL>
<START_DT>2009-06-21 20:10:33.167</START_DT>
<STOP_DT>2009-06-21 20:22:18.561</STOP_DT>
Your application should treat this sample to mean that the activity “washing dishes” occurred from 8:10pm to 8:22pm on June 21, 2009.
For this exercise, you should write an application (or function) that is called as follows:
BoxLabInterRater(file1, file2)
Have the code compute the following information and output it in files with the same format as the examples:
Email Jason Nawyn (nawyn@mit.edu) your output files. If you wish to pursue further work on this data, we will describe a second exercise in which the Rating times and values are taken into consideration.
Contact Jason Nawyn (nawyn@mit.edu).