CS6140 Machine Learning
HW4B Ensembles: Gradien Boosting, Bagging, VC dimension
Make sure you check the syllabus for the due date. Please
use the notations adopted in class, even if the problem is stated
in the book using a different notation.
PROBLEM 5 [optional, no credit]
What is the VC dimension for the class of hypothesis of (you
can choose that plus side must be inside, or that either side is
inside)
a) unions of two rectangles with edges vertical/horizontal (not
angled)
b) circles
c) triangles
d) multidimensional "sphere" given by f(x) = sign [(x-c)(x-c)
-b] in the Euclidean space with m dimensions. Justify your answers !
For this problem we care more about the existential side of VC
dimension: if K is the answer, show that there is a set of K points
that can be shattered. It is not required to formally prove
that no set of K+1 points can be shattered.
PROBLEM 6 [50 points] Bagging
Bagging setup:
Training: Run your Decision Tree classifier separately (from
scratch) T=50 times. Each Decision Tree is trained on a
sample-with-replacement set from the training dataset (every
Decision Tree has its own sampled-training-set). You can limit the
depth of the tree in order to simplify computation.
Sampling with replacement: Say there are N datapoints in the
training set. Sampling with replacement, uniformly, for a sample
of size N, works in the following way: in a sequence,
independently of each other, select randomly and uniformly N times
from the training datapoints. Once a datapoint is selected, it is
still available for further sampling (hence "with replacement"
methodology). Each sampled-training-set will have N datapoints;
some points will be sampled overall more than once (duplicates)
while other datapoints will not be sampled at all.
Testing: for a test datapoint, will run all T Decision Trees and
average the predictions to obtain the final prediction.
Run bagging on Spambase dataset. Compare results with
boosting.
PROBLEM 7 [50 points] Gradient Boosted Trees for Regression
Run gradient boosting with regression trees on housing dataset.
Essentially repeat the following procedure i=1:10 rounds on the
training set. Start in round i=1with labels Y_x as the original
labels.
- Train a regression tree T_i of depth 2 on data X, labels
Y_x (use your HW1 regression tree code as a procedure).
- Update labels for each datapoint x : Y_x = Y_x - T_i(x).
No need to update a distribution over datapoints like in
Adaboost.
- Repeat
The overall prediction function is the sum of the trees. Report
training and testing error.
PROBLEM 8 [Extra Credit]
Run Boosting with ECOC
functions on the Letter
Recognition Data Set (also a multiclass dataset).
PROBLEM 9 [Extra Credit] Boosted Decision Trees
Do PB1 with weak learners being full decision trees
instead of stumps. The final classifier is referred as "boosted
trees". Compare the results. Hints: there are two possible ways of
doing this.
- Option 1. The weights are taken into account when we decide
the best split, like in Adaboost. This requires you to change
the decision tree training : when looking for best split at each
node, the split criteria has to account for current datapoints
weights as assigned by the boosting.
- Option 2. We can simulate the weights by sampling. In each
round, when we want to train the decision tree, we construct a
new data set based on the old one. We sample from the old data
set k times with replacement. In each sampling, each data point
x_i in the old data set has probability w_i of being chosen into
the new data set. k can be as large as the size old data set, or
smaller. We only need to make sure there are enough data points
sampled for a decision tree to be trained reliably. Then we
train a decision tree on the new data set without considering
weights. Weights are already considered in sampling. In this
way, you don't need to modify the decision tree training
algorithm. More generally, any weak learner, whether it can
handle weights naturally or not, can be trained like this. Once
the decision tree is trained, the new data set is discarded. The
only use of the newly constructed data set is in building the
tree. Any other computation is based on the original data set.
PROBLEM 10 [Extra Credit] Rankboost
- Implement rankboost algorithm following the rankboost
paper and run it on TREC queries.
PROBLEM 11 [Extra Credit] Gradient Boosted Trees
Run gradient boosting with regression stumps/trees on 20Newsgroup dataset dataset.