©2010 Felleisen, Proulx, et. al.
Practice problems help you get started, if some of the lab and lecture
material is not clear. You are not required to do these problems, but
make sure you understand how you would solve them. Solving them on
paper is a great preparation for the exams.
Read through the documentation for Java Collections Framework library. Find how you can run use the various implementations of the Map interface defined there. Write a simple program that will test these algorithms and measure their timing in a manner similar to the previous lab.
Finish all the work for Lab 11. (See below)
Finish all parts of Lab 11 and hand in the completed work with your partner.
The Application
Have you ever wondered about the size of Shakespeare’s vocabulary? For
this assignment you will write a program that reads its input from a
text file and lists the words that occur most frequently, together
with a count of how many different words occur in the file. If this
program were to run on a file that contains all of Shakespeare’s
works, it would tell you the approximate size of his vocabulary, and
how often he uses the most common words.
Hamlet, for example, contains about 4542 distinct words, and
the word "king" occurs 202 times.
The Problem
Start by downloading the file Assignment10.zip and making an Eclipse project that contains these files. Run the project, to make sure you have all pieces in place. The Examples class uses the tester package as we have done before.
You are given the file Hamlet.txt that contains the entire text
of Hamlet and a file InFileReader.java that contains the
code that generates the words from the file Hamlet.txt one at a
time, via an iterator. Save the file Hamlet.txt in the
Eclipse project directory (where you find the subdirectories
src and bin).
Note: Here you will use the imperative
Iterator interface that is a part of Java Standard Library. Make
sure to look up the documentation for this interface and
understand how it works.
Your tasks are the following:
Design the class Word to represent one word of Shakespeare’s vocabulary, together with its frequency counter. The constructor takes only one String (for example the word "king") and starts the counter at one. We consider one Word instance to be equal to another, if they represent the same word, regardless of the value of the frequency counter. That means that you have to override the method equals() as well as the method hashCode().
Design the class that implements the Comparator interface, so that the words can be sorted by frequencies. (Be careful!) When you are done, place this class definition as the last part of the class definition of the class Word. This is called an inner class.
Note: In this program there will be two ways of comparing the instances of the Word class - by the String that it represents and by the counter for the word that this instance represents.
Include in the class Word the method that allows you to increment the counter (using mutation), and a method toString that prints one line with the word and its frequency.
Design the class WordCounter that keeps track of all the words we have seen so far. It should include the following methods:
// records the Word objects generated by the given Iterator // for each word record the number of ocurrences void countWords (Iterator it) { ... } // How many different Words has this WordCounter recorded? int words() { ... } // Prints the n most common words and their frequencies. void printWords (int n) { ... }
Here are additional details:
countWords consumes an iterator that generates the words and builds the collection of the appropriate Word instances, with the correct frequencies. This collection is then used by the next two method to show the results of our text analysis.
words produces the total count of different words that have been consumed.
printWords consumes an integer n and prints the top n words with the highest frequencies (using the toString method defined in the class Word).
Note: The given code expects that you implement the classes as given, with the same names and methods. It will then check whether your program works correctly. That does not mean you do not need to design tests.
Of course, you need to test all methods as you are designing them. Design the tests in two stages:
For the class Word and the the class WordCounter use a technique similar to what was done in the past assignments, i.e. design a class Examples with the necessary sample data and all tests.
Convert all tests into JUnit tests. Hand in both versions.
The projects should contain complete Javadoc documentation that should produce the documentation pages without warnings. You do not need to submit the documentation pages.