CSU213 Lecture Notes Spring 2007

Lecture 1: January 8, 2007

This class is not about Java. It's about how to design programs. We will use Java as the programming language to develop programs.
All you need to know prior to this class you have already learned in kindergarden (i.e. in Fundamentals of Computing 1).
In this lecture we focus on recalling how to design programs with accumulators in the Beginner HtDP Language - and in the process recall the design recipe for data definitions and for functions.

Find the max number in a list

Lecture 2: January 10, 2007

One can only understand computing if one understands the connection between the information provided by the user and the way the information is represented as data that the program manipulates.
We will look at what we already know about data definitions and learn that the same information can be represented as data in a different language.
However, the most important lesson is that the data no matter how represented corresponds to the same information, and that the structure of the data representation is identical in two different programming languages.

Data Definition in Scheme and Java

Lecture 3: January 11, 2007

Unions of data combine several related classes of data under the same name.
While in HtDP this common name did not have any meaning within the language, class-based programming languages not only provide for a language-based definition of commonality among several classes of data, but lster leverages this commonality in program design.
The most important lesson here is that a piece of data can be an instance of some class of data, and at the same time be known to represent data of a given type (super-type). So, for example, an instance of a Book is also an object of the type LibraryItem.

Data Definition for unions of data

Lecture 4: January 17, 2007

Unions of data combined with a self-referential containment represent quite complex structures of data, such as ancestor trees and river systems.
We look at how the data definitions we have seen in HtDP translate in a systematic way to class-based data definitions.
The most important lesson here is that following the DESIGN RECIPE for data definitions we learned in HtDP helps us in identifying the classes of data we need to define and the relationships between the classes and interfaces.

Data Definition for self-referential unions of data

Lecture 5: January 18, 2007

Designing methods for a class-based language is very similar to designing function in functional languages.
The main difference is that in a class-based language a method is invoked by an object of the class in which the method is defined. This object than becomes an implicit argument to the method.
The most important thing to remember is that the program designer need to think very carefully which class should be responsible for defining a method that may involve instances of more than one class.

Designing methods for simple classes and classes with containment

Lecture 6: January 22, 2007

Designing methods for a unions in a class-based language leverages the fact that the interface that represents the union has a meaning in the language.
In a class-based language when an instance of a class invokes a method, the method definition lookup is dispatched first to the class to which the instance belongs.
The most important thing to remember is that the method dispatch eliminates the need for the conditional we used in HtDP to distinguish between the variants of a union. However, we are now responsible for defining the method with the same signature in all classes that implement the common interface.

Designing methods for unions

Lecture 7: January 24, 2007

Designing methods for self-referential unions typically leads to a self-referential method invocation by a field in the class that is itself an instance of the class.
Including the method declaration (method signature) in the common interface defines a contract between the interface and all of the implementing classes; requiring that each of them implements the specified method.
The most important thing to remember is that as in HtDP, learning to read the purpose statement in the context of this self-referential field, typically provides a deeper understanding of the program behavior and helps greatly in the method design.

Designing methods for self-referential unions

Lecture 8: January 25, 2007

The rule of one task, one function we learned in HtDP is just as relevant in the context of class-based program design.
In complex class hierarchies, following the arrows in the class diagram often serves as a guide where the method that solves the next task should be defined.
The most important thing to remember is the rule delegate each task to the class that should be responsible for providing the answer.

More methods for self-referential unions

Lecture 9: January 29, 2007

Typical data for a large project contains a number of mutually referential dependencies. The need for delegating tasks to the classes that are responsible for providing the answer becomes even more obvious.
Before we start designing methods, we have to make sure that we thoroughly understand the nature of the problem, the relevant information, and how the information is represented as data.
The most important thing to remember is to make sure it is clear what information each piece of data represents (interpreting data as information) and how is the given information encoded as data (representing information as data).

Understanding mutually-referential unions

Lecture 10: January 31, 2007

We continue designing methods for more complex class hierarches.
Following the DESIGN RECIPE faithfully, designing helper method when needed, guides us seamlesly through this complex process.
The most important thing to remember is to let the dynamic dispatch of the method invocations be the mechanism for delegating tasks.

Managing mutually-referential unions

Lecture 11: February 1, 2007

The methods that process lists of data that we have designed (that follow the structure of the data) defer the computation until the series of recursive calls reduces the problem to the trivial case. The result is then accumulated as the nested recursive method invocations return the computed value.
It is possible to rewrite these methods is such a way that the known values and partial results are accumulated during the initial method invocation and the recursive method uses the accumulated value to produce the final result - eliminating the need for retracing the recursion steps.
The most important thing to remember is that the mechanism for performing the transformation of structural recursion to recursion with accumulators follows a pattern that can be described precisely, regardless of the specific details of the computation.

Designing methods with accumulators

Lecture 12: February 5, 2007

In the previous course we used the DESIGN RECIPE FOR ABSTRACTIONS to eliminate repetition in the code. This is not just an aesthetic issue - it also help in following the single point of control principle.
We start by identifying the situations where the current data definitions (with interfaces representing unions of data) do not give us enough power to design the programs we need to design. We then introduce several solutions to our problem.
The most important thing to remember is that while an abstract class provides a common base for several classes that extend it (its subclasses or derived classes) we cannot construct an instance of an abstract class.

Comparing Union Objects for Equality; Abstracting over common fields and methods

Lecture 13: February 7, 2007

We can now eliminate repetition from the method definitions as well --- continuing with the theme of the single point of control principle.
We look for methods that have a common implementation in several classes. At times, we can lift the entire implementation into the abstract class --- defining concrete methods in an abstract class. In other cases the implementations in different classes may be completely distinct, and there is no abstraction possible. The third situation occurs when several classes share the implementation of a method, but one class may differ. In that case we may implement a concrete method in the abstract class and let the one class that needs a different implementation override this method.
The most important thing to remember is that abstract class is a perfectly good place for defining methods. However, every class that extends this super class must provide tests for each such concrete method.

Lifting methods to the abstract class or a super class

Lecture 14: February 8, 2007

We now look down. At times, the class we are working with seems to represent several different situations. In that case, it may be helpful to derive a subclass that represents one variant of the behavior, while the original class retains the other.
An additional reason for deriving subclasses is when we want to add new behavior to an existing library class. We cannot modify the library class, but our class that extends it can contain new methods relevant for our application.
The most important thing to remember is that each class should be designed to represent a common collection of objects, and if the behavior can be clearly divided into two cases, we should consider deriving subclasses.

Deriving subclasses; overriding methods.

Lecture 15: February 12, 2007

So far every time we invoked the constructor to create a new instance of a class we provided the full collection of values needed to initialize the fields. However, the class designer may choose to provide several constructors for one class. A class can define more than one constructor as long as no two constructors have the same sequence of argument types. This is called overloading.
Methods can also be overloaded, but they all must have the same return type.
We see several reasons for overloading constructors. The first is a convenience for the user when one of more of the fields is typically initialized to a known value, and the user provides it only if the value differs from the default value. The second reason is to provide an alternative way of supplying the initial values (for example temperature in degrees Celsius instead of degrees Fahrenheit).
The most important thing to remember is that a method or a constructor that oveloads existing methods or constructors must have a different sequence of argument types than any existing method or constructor with the same name.

Designing constructors; overloading constructors and methods.

Lectures 16-17: February 14-15, 2007

We get a preview of how constructors can be used to assure integrity of data - but defer more discussion of this topic to later.
We explore the possibility of data definitions forming a circularity that cannot be avoided. We run into problems trying to define examples of data. This is the first time we see a need for changing the values that an object represents after the object has been constructed. This is called mutation.
The most important thing to remember is that when we introduce mutation, the values that an object represents change as the program runs. This requires that tests verify the state of the program both before and after the changes have been made.

Circularly referential data; Methods for circularly referential data.

Lecture 18: February 21, 2007

We learn how constructors can be used to assure integrity of data. We also learn about the need for information hiding through the use of private visibility modifier.
We learn three new concepts in this context. First we see how Java reports runtime errors by raising an exception. We learn about class methods in contrast to the instance methods we have seen so far. The static modifier indicated that a method belongs to the class and is invoked by specifying the name of the class where it is defined. Finally, we learn about of private visibility modifier that prevents the client from seeing and using some of the class code directly.
The most important thing to remember is that the class code can provide many services to the client while protecting the integrity of the data represented by an instance.

Assuring integrity of data; Information hiding.

Lecture 19: February 22, 2007

Touching the Void

We explore further the need for mutation i.e. making changes in the data values represented by an instance of a class after it has been constructed.
We first look at the need for modifying fields in a simple class. We then attempt to modify the structure of a complex piece of data ... leaving the solution till the next time.
The most important thing to remember is that mutation introduces time dependency into a program and requires that the tests check the state of the program before and after the occurrence of the mutation.

The need for mutation.

Lecture 20: February 26, 2007

Understanding mutation

We explore further the need for mutation. We now want to change the contents of a list - either remove items from a list or add new elements to the list. The problem is that some other objects (for example fields in a database) may refer to our list. If our method were to produce a new list, all fields that refer to this list would have to be notified. And, of course, we really do not know what all these fields are.
We therefore need to find a way to change the contents of the object that represents a list of items. To be able to do so we introduce a wrapper class that refers to the current list. That way the wrapper class is the only reference to the list and all other fields that need to know about the list refer to the instance of the wrapper class.
The most important thing to remember is that when we need to change the contents of a structure such as a list or a tree, we have to think carefully not to produce a new object, so that other places in the program that refer to it will not be compromised.

Mutating the contents of a list.

Lecture 21: February 28, 2007

Abstracting over the datatype

We now look how we can avoid defining new and new classes to represent a list of objects. The Design Recipe for Abstractions should help us in replacing the repetitive code with parameters that represent the common behavior.
We define an interface ISame to represent the fact that the class implements the same method to compare two instances of this class. That makes it possible to not only define a list of objects of the type ISame, but also to design comparison of two lists that can then be used to design tests.
The most important thing to remember is that each time we get more power, we also end up having more responsibilities. When we define a list of ISame objects, we can mix Bookss and Songs, something we did not have to worry about earlier. We will see soon that this can be avoided using Java generics.

Abstracting over the datatype.

Lecture 22: March 1, 2007

Abstracting over the behavior

We continue with designing abstractions. Suppose we want to know whether there are oldbooks in our list of books. We also want to know whether our list of songs contains a long song. And, finally, we also want to know whether our list has any books written by GBS (George Bernard Shaw).
We define an interface ISelect to represent a predicate, a method that determines for a given object whether it satisfies our criterion. This method does not belong to any of the classes we have designed --- the object in question is given to it as an argument. For each predicate we need for our program we need to define a new class that implements the ISelect interface, and then construc an instance of it to use as a function argument. Read on --- this is quite complex, but used extensively for defining actions for GUI objects.

A note to the wise: A similar problem was handled easily in Scheme, where functions can be passed as arguments to other functions. Just recall the Scheme loops and their contracts.
The most important thing to remember is that for each new predicate we need to first define a class that implements the desired method, then use an instance of this class as an argument to our general method. Do not forget to run the tests for each use of the general method.

Abstracting over the behavior.

Lecture 23: March 12, 2007

Abstracting over Traversals - Part 1

We first review the use of function objects from the previous lecture. We see, that as we design new and new algorithms that consume list data, we keep changing the classes that represent lists. If we wanted to build a library of data, we would no longer want to change these classes. Instead, we want to observe the contents of the list in some orderly fashion, and process the data using methods defined outside of the list classes.
We introduce a new interface that allows us to traverse over the elements of a list. We then show how this allows us to design algorithms outside the classes that represent lists. Later, we will see that other classes the represent a dataset of items can be traversed (and processed) the same way, as long as they implement the same Traversal interface.
The most important thing to remember is that when we design libraries, we should encapsulate the internal structure of the dataset and provide the outside users with appropriate interfaces that allow them to observe (and possibly, later, mutate) the contents of our dataset.

Abstracting over traversals - Part 1.

Lecture 24: March 14, 2007

Abstracting over datatype: Generics

We get back to looking at lists of data and the algorithms we designed. We see that we now can build a mixed list of songs and books and persons, and have to work very hard to make sure we do not make a mistake. The Design Recipe for Abstraction tells us to use a parameter to ditinguish between the different variants of the same code. Here we want to use a paramter to specify the type of data that can be used to construct a list, or the type of data a selector function object consumes.
Java generics (type parameters) allows us to do this. When we specify that a list contains only objects of the type Book, Java will throw an exception if someone tries to include a book in the list. When designing the function object that consumes an element of some type given by the type parameter, the concrete implementing class specifies the desired type in the declaration and know that the argument will have the properties of an object of that type - its fields and methods.
The most important thing to remember is that using generice we produce better and safer programs, as well as more readable.

Abstracting over datatype: Generics.

Lecture 25: March 15, 2007

Abstracting over traversals: Part 2

We are now ready to look at how some of the Java libraries are built. We understand how to design classes, how to design abstract classes, use functon objects, implement the behavior specified by an interface, hide the implementation with visibility modifiers, etc. We also understand well the different ways how two pieces of data can be considred the same, which is extremely important for understanding the effects of mutation.
We explore further the use of the Traversal interface and see that it can work not only for lists, but also for binary search trees and for ArrayList data type defined in the Java Collections Framework library. Some of the deails are left for the next time...
The most important thing to remember is that corerctly designed interfaces allow us to implement the behavior in several ways, depending on what seems to be the most appropriate for the given problem.

Abstracting over traversals: Part 2.

Lecture 26: March 19, 2007

ArrayList: Direct access data structure

We introduce a new class from the Java libraries: ArrayList. It allows us to store a collection of data of the same type in a manner similar to our lists. However, the class also contains methods that allow us to both see and mutate the data item at a specific location within the list.
While it is possible to design a method for our Cons lists that would produce the fifth item in a list (or the ninth item) --- such operations are difficult and involve traversing over all data items in the list that preceede it. The direct access property of the ArrayList allows us to design new ways of processing data (new algorithms).
The most important thing to remember is that each item in the ArrayList is just a reference to the instance of that data, just like in our Cons lists. If we mutate it, the contents of the list changes, if we replace it with another item, the original item is still around, and may possibly be still used.
Start using Java documentation to learn about the libraries available to you.

ArrayList: Direct access data structure.

Code for this lecture.

Lecture 27: March 21, 2007

From Accumulators to Java Loops

We have seen loops where accumulators were used to keep track of the knowledge we have gained as we traversed over the early parts of the list of data. Java language contains several statements that represent similar loops. Each of them specifies how to travers over the data in a dataset, and allows the programmer to define action to take as each new item in the dataset is seen.
We see that the Java loops consist of the same components as the loops with accumulators we have used for our Cons lists. By following the template, we can design loops as we did before, and modify our original solution to use Java loops.
The most important thing to remember is that Java is not well suited for working with recursively defined data. Therefore, the use of ArrayList with Java loops produces a more efficient solution.
Make sure you always design a method that wraps the loop, so you can reason about it, design it, and test it.

From Accumulators to Java Loops.

Code for this lecture.

Lecture 28: March 22, 2007

Java Loops; Mutation: Selection Sort

We recall how to translate recursive loops with accumulator into mutating while loops and for loops. We translate these loops into index-based loops and eliminate the need for an iterator when dealing with direct access data structures.
We then practice the use of mutating loops by implementing the Selection Sort.
We see that index-based access simplifies the code needed to implement the loop. However, the programmer now has the responsibility to program the traversal as well as the body of the loop.
The most important thing to remember is to design each loop carefully as a method. If the task inside of the loop body involves further traversal it should be implemented as a seperate method.
The old rule one task one method still applies.

Java Loops; Mutation: Selection Sort.

Code for this lecture.

Lecture 29: March 28, 2007

Exploring Java Collections Framework; Iterator

Java Collections Framework is a library of classes that represent different ways of organizing data, traversing over the data, as well as a standard set of the most common algorithms needed for processing data.
We explore the structure of this class hierarchy, the design choices, and especially the design of the Iterator interface. This also leads us to the new form of for loop, commonly referred to as for-each loop.
The most important thing to remember is that a programmer should become familiar with the libraries that are available and use them whenever appropriate. This makes code easier for others to understand and maintain.

Exploring Java Collections Framework; Iterator.

Lecture 30: March 29, 2007

Exploring Data Structures: Stacks and Queues

We recall that both the recursively built lists and the ArrayLists handled addition of new items and the removal of an item from the dataset. The Cons lists always placed the new item at the front, and it was the only item we could remove. The ArrayList added the items at the end (if we did not ask for a specific place), but could remove from anywhere. However, the cheapest removal was from the end --- removing the item that was added last.
This data organization Last In First Out or LIFO is called a stack.
We somtimes need to organize data differently. If the data represents requests that have to be processed in the order in which they were received, we need to represent the data as a queue. We compare the definitions and implementations of stacks versus queues.
The most important thing to remember is that there is a distinction between the interface that represents how the data can be accessed and manipulated and the implementation that is typically hidden from the user. However, the implementation should explain which operations are costly and give the estimate of their expected running times.

Exploring Data Structures: Stacks and Queues.

Code for this lecture.

Lecture 31: March 30, 2007

Data Organization and Algorithm Complexity

Over the past several weeks we have seen that the way we organize the representation of information as data affects the design of the methods that access and manipulate this data. One of the concerns a program designer must address is the efficiency of the algorithms used to solve the problem.
We compare the different data structures we have seen and explore the cost of simple operations, such as finding an item to the dataset represented by the data structure. We define the algorithm (time) complexity as an approximate function of the size of the dataset.
The most important thing to remember is that we need to consider not only the average time it takes to perform the task, but also the worst case scenario.

Data Organization and Algorithm Complexity.

Lecture 32: April 2, 2007

Key-Based Data Organization: Hash Map

We have seen that finding an item at the given index in a direct-access data structure is very simple and fast. That means that an ArrayList can represent a look-up table of data, as long as we look up the values based on a numeric index, and as long as all locations in the table are filled.
We explore how we can leverage the direct table lookup by mapping the value of the unique key for each item into numerical value that can then be used as an index to a hash table of data items. We then learn how we can design our own hashCode method.
The most important thing to remember is that the hashCode method and the equals method go hand-in-hand and if we override one of them we must override the other one in a way that maintains their relationship.

Key-Based Data Organization: Hash Map.

Lecture 32: April 4, 2007

BigOh and Sorting Algorithms

We explore further the time complexity of different algorithms. We use sorting algorithms to illustrate the difference between the running times that are logarithmic, linear, n log n and quadratic.
We see that at times one algorithm may be a better choice, because the data organization corresponds to the best time, not the average time; at other times, we may rejact a faster algorithm if the data organization causes us to expect the worst running time, not the average time.
The most important thing to remember is that the selection of a suitable algorithm and data organization involves tradeoff and judgement. The designer needs to consider the whole problem when selecting the data organization and the algorithm.

BigOh and Sorting Algorithms.