Explore to the underlying principles of the distributed processing of large data sets. Gain an understanding of the performance and usability tradeoffs of various data analytics infrastructures. Work with large data sets and conduct practical experiments with machine learning techniques. Gain a working knowledge of technologies such as Hadoop and Spark and an insight into their implementation. The class builds on known principles such as the design recipe, testing and code reviews.
Nat Tuck | WVH 314 | Thursday, 2-4pm | ntuck ⚓ ccs.neu.edu |
Mirek Riedewald | WVH 332 | TBA | |
Joe Sackett | WVH 462 | Tuesday, 3:30 - 4:30pm | jsackett ⚓ ccs.neu.edu |
Ankur Shanbhag | WVH First Floor | Wednesday, 4-5pm | ankurs ⚓ ccs.neu.edu |
Swapnil Mahajan | WVH First Floor | Friday, 4-5pm | swapm31 ⚓ ccs.neu.edu |
Every week or two you will be given a homework assignment to complete. Homework assignments will be posted to Bottlenose, and work should be submitted there as well.
The last assignment will be a small project, worth the same points as two homeworks.
Assignments are due at 11pm on the specified day. Late submissions will recieve an automatic 50% point deduction. Submissions more than a day late will recive a 100% point deduction.
NU Online Blackboard: http://nuonline.neu.edu
Each week you are expected to review the lesson material and complete the online quiz for a module on NU Online Blackboard. This should be completed before class so you are prepared for the lecture and any in-class questions.
There will in-class coding assignments approximately weekly. Make sure to bring your laptop to class. These are due at the end of class, and should be submitted online through Bottlenose.
You are expected to use the online discussion forum, and to answer questions asked by your classmates. This will be graded by looking at total number of posts and number of good answers.
Questions will occasionally be asked in class of a random student. Being present and answering will contribute slightly to your in-class coding grade. Not being present will hurt your slightly.
Homework | 40% |
Particpation & In-Class Coding | 10% |
Blackboard Modules | 10% |
Exam | 40% |
Grades will be assigned on the following scale:
93+ | 90+ | 87+ | 83+ | 80+ | 75+ | 70+ | 60+ |
A | A- | B+ | B | B- | C+ | C | D |
Here's how the semester is likely to play out. Details subject to change.
Dates | Topics | BB Module | Work Due |
---|---|---|---|
Sep 9 |
|
Intro |
|
Sep 13 Sep 16 |
|
Parallel Programs, HDFS |
|
Sep 20 Sep 23 |
|
Map-Reduce |
|
Sep 27 Sep 30 |
|
M-R Fundamental Techniques |
|
Oct 4 Oct 7 |
|
M-R Basic Algorithms |
|
Oct 11 Oct 14 |
|
M-R Graph Algorithms |
|
Oct 18 Oct 21 |
|
Advanced Algorithms |
|
Oct 25 Oct 28 |
|
Partitioning |
|
Nov 1 Nov 4 |
|
Data Mining I |
|
Nov 8 No class Friday |
|
Data Mining II Matrix Multiplication |
|
Nov 15 Nov 18 |
|
Pig Latin |
|
Nov 22 No class Friday |
|
Exam |
|
Nov 29 Dec 2 |
|
SQL Databases |
|
Dec 6 Dec 9 |
|
HBase, CAP |
|
Dec 12 - 16 |
Data Mining Presentations |
If you want to contest a grade once you've recieved it, this class uses a variant of the "coach's challenge" system to resolve such challenges. Here are the rules:
Deadline Extensions, Makeup Assignments, and Extra Credit Assignments will not be given on request. Exceptions may be made for major emergencies. Examples of non-emergencies include heavy load in other courses, interviews, and job fairs.