CS6140 Machine Learning

HW4B Ensembles: Gradien Boosting, Bagging, VC dimension

Make sure you check the syllabus for the due date. Please use the notations adopted in class, even if the problem is stated in the book using a different notation.

PROBLEM 5 [optional, no credit]

What is the VC dimension for the class of hypothesis of (you can choose that plus side must be inside, or that either side is inside)

a) unions of two rectangles with edges vertical/horizontal (not angled)

b) circles

c) triangles

d) multidimensional "sphere" given by f(x) = sign [(x-c)(x-c) -b] in the Euclidean space with m dimensions $ℝ^{m}$ . Justify your answers !

For this problem we care more about the existential side of VC dimension: if K is the answer, show that there is a set of K points that can be shattered. It is not required to formally prove that no set of K+1 points can be shattered.

PROBLEM 6 [50 points] Bagging

Bagging setup:

Training: Run your Decision Tree classifier separately (from scratch) T=50 times. Each Decision Tree is trained on a sample-with-replacement set from the training dataset (every Decision Tree has its own sampled-training-set). You can limit the depth of the tree in order to simplify computation.

Sampling with replacement: Say there are N datapoints in the training set. Sampling with replacement, uniformly, for a sample of size N, works in the following way: in a sequence, independently of each other, select randomly and uniformly N times from the training datapoints. Once a datapoint is selected, it is still available for further sampling (hence "with replacement" methodology). Each sampled-training-set will have N datapoints; some points will be sampled overall more than once (duplicates) while other datapoints will not be sampled at all.

Testing: for a test datapoint, will run all T Decision Trees and average the predictions to obtain the final prediction.

Run bagging on Spambase dataset. Compare results with boosting.

PROBLEM 7 [50 points] Gradient Boosted Trees for Regression

Run gradient boosting with regression trees on housing dataset. Essentially repeat the following procedure i=1:10 rounds on the training set. Start in round i=1with labels Y_x as the original labels.

Train a regression tree T_i of depth 2 on data X, labels Y_x (use your HW1 regression tree code as a procedure).
Update labels for each datapoint x : Y_x = Y_x - T_i(x). No need to update a distribution over datapoints like in Adaboost.
Repeat

The overall prediction function is the sum of the trees. Report training and testing error.

PROBLEM 8 [Extra Credit]

Run Boosting with ECOC functions on the Letter Recognition Data Set (also a multiclass dataset).

PROBLEM 9 [Extra Credit] Boosted Decision Trees

Do PB1 with weak learners being full decision trees instead of stumps. The final classifier is referred as "boosted trees". Compare the results. Hints: there are two possible ways of doing this.

Option 1. The weights are taken into account when we decide the best split, like in Adaboost. This requires you to change the decision tree training : when looking for best split at each node, the split criteria has to account for current datapoints weights as assigned by the boosting.

Option 2. We can simulate the weights by sampling. In each round, when we want to train the decision tree, we construct a new data set based on the old one. We sample from the old data set k times with replacement. In each sampling, each data point x_i in the old data set has probability w_i of being chosen into the new data set. k can be as large as the size old data set, or smaller. We only need to make sure there are enough data points sampled for a decision tree to be trained reliably. Then we train a decision tree on the new data set without considering weights. Weights are already considered in sampling. In this way, you don't need to modify the decision tree training algorithm. More generally, any weak learner, whether it can handle weights naturally or not, can be trained like this. Once the decision tree is trained, the new data set is discarded. The only use of the newly constructed data set is in building the tree. Any other computation is based on the original data set.

PROBLEM 10 [Extra Credit] Rankboost

- Implement rankboost algorithm following the rankboost paper and run it on TREC queries.

PROBLEM 11 [Extra Credit] Gradient Boosted Trees

Run gradient boosting with regression stumps/trees on 20Newsgroup dataset dataset.