Object oriented programming (OOP) gets a lot of hype. This lecture explores what OOP is, why it generates so much excitement, where it works well, and where it does not.
Practitioners associate the term Object Oriented Programming with a wide variety of concepts. Part of the reason that OOP gets so much hype is that some developers think that you need object-oriented programming to reap certain benefits. Here are some concepts associated with OOP:
this
in Java or C#).
Then each procedure definition can choose whether it is appropriate to
use methods from the bundle to accomplish subtasks.
I will try address all of the ideas above during this lecture.
You can find a similar list (with some differences in terminology and perspective) at the start of Jonathan Rees's "Object-Oriented" article.
Stephen Kell has his own list of what composes OOP. (I may add more commentary as I reflect further.)
Lets set the stage by reviewing our usual development methodology: "Applicative Programming"; then I will discuss how that contrasts with development in an object-oriented language.
An applicative programming style has two main stages: we first determine what the data definition is, and then we write functions that manipulate that data.
This strategy makes sense for a class on programming languages, where most functions process abstract-syntax trees for a particular language: interpreters, type-checkers, and various program transformers. Often programming language developers will be working with a fixed grammar for the language in question; the language is fixed, but the set of operations we want to perform on terms in the language is growing over time.
Note that this strategy makes it difficult to change the data definitions later; every change to the data definition may require review of all the functions that process instances of that data.
A data type consists of two parts:
For an abstract data type, the interface consists of a set of operations that clients are allowed to use when manipulating values of the abstract data type. With abstract data types, clients do not need to know how the data is represented. That means the representation can be changed without breaking any client code. In other words, client code is representation-independent.
A queue represents a sequence of values that are delivered in "first-in first-out" (FIFO) order; the first element enqueued will be the first element removed, the second element enqueued will be the second element removed, and so on.
We will write a sequence of values mathematically
using the notation
[
a, b, c, …, x, y, z]
;
thus we will denote an implementation's
representation of such a sequence
using the notation
⌈[
a, b, c, …, x, y, z]
⌉.
(Note that the ellipses notation is somewhat informal;
one important detail is that
we use [
a, …, k]
to denote a
potentially empty sequence,
but [
a, b, …, k]
and [
a, …, k, l]
both denote non-empty sequences.)
empty : → Queue
snoc : Queue × Val → Queue
isEmpty : Queue → Boolean
head : Queue → Val
tail : Queue → Queue
(empty)
= ⌈[]
⌉
(snoc
⌈[
a, …, w]
⌉ x)
= ⌈[
a, …, w, x]
⌉
(isEmpty
⌈[]
⌉)
=#t
(isEmpty
⌈[
a, b, …, w]
⌉)
=#f
(head
⌈[
a, b, …, w]
⌉)
= a
(tail
⌈[
a, b, …, w]
⌉)
= ⌈[
b, …, w]
⌉
Here is a one queue implementation: queue1 ("simple"). What are its features? What are its drawbacks?
Features:
Drawbacks:
snoc
takes O(n) time!Here is another queue implementation: queue2 ("fast"). What are its features? What are its drawbacks?
Features:
snoc
and head
.Drawbacks:
tail
.tail
, then this queue representation does not have
amortized O(1) time for all operations.
There do exist (even more complex) queue implementations that can achieve:
But, should the client care how queues are implemented?
Can we write our client in a manner so that it does not care how queues are implemented? (See for example these black-box queue tests.)
(We can edit the first import
of the script
to change which queue implementation we want to use in the
test program.)
Here is the output from that test suite on the first and second queue implementations (after installing them as libraries).
% plt-r6rs queue-tests.sps success success success success success success
I have structured the code above in a manner such that I can run the tests importing either queue implementation.
However, I took special care when writing my test suite code to make sure that it never attempted to look at the internal representation of queues.
Here is another ("white-box") test suite that was not written to the abstract specification. In particular, it assumes that queues have been implemented using the first representation. Compare: a run using the first implementation:
% plt-r6rs queue-tests-rep-exposed.sps success success success success success success
but if we edit the testing script to use the second implementation,
by changing the import to start with:
(import (obj-lecture queue1-encapsulated) ...)
:
% plt-r6rs queue-tests-rep-exposed.sps FAILURE (empty) should be: () but got: (() ()) FAILURE (snoc (empty) 'a) should be: (a) but got: ((a) ()) success FAILURE (snoc (snoc (empty) 'a) 'b) should be: (a b) but got: ((a) (b)) FAILURE (tail (snoc (snoc (empty) 'a) 'b)) should be: (b) but got: ((b) ()) FAILURE (tail (tail (tail (snoc (snoc (snoc (snoc (snoc (empty) 'a) 'b) 'c) 'd) 'e)))) should be: (d e) but got: ((d e) ())
Note that these failures are not particularly illuminating; it is not obvious from the failure messages why the actual and expected values do not match.
The problem is that the client (in this case, the test suite) has violated the Queue abstraction; it has relied on the particular representation used for queues, but a proper client should only interact with the data by using the appropriate procedures defined in the abstraction.
One can enforce an abstraction by properly modularizing the code. Most modern languages provide facilities for defining modules as collections of related procedures, and rules for what procedures have access to the internal representation of an abstraction.
Here is a revision, queue1 (encapsulated), of the first (slow but simple) queue implementation that illustrates one way to control access to a representation, and thus enforces modularity.
This is only meant to illustrate one way to achieve this effect; as stated above, most modern languages provide more convenient facilities for defining access rules.
If we now change our script to import the encapsulated library:
% plt-r6rs queue-tests.sps success success success success success success % plt-r6rs queue-tests-rep-exposed.sps FAILURE (empty) should be: () but got: #<procedure:an-abstract-queue> FAILURE (snoc (empty) 'a) should be: (a) but got: #<procedure:an-abstract-queue> success FAILURE (snoc (snoc (empty) 'a) 'b) should be: (a b) but got: #<procedure:an-abstract-queue> FAILURE (tail (snoc (snoc (empty) 'a) 'b)) should be: (b) but got: #<procedure:an-abstract-queue> FAILURE (tail (tail (tail (snoc (snoc (snoc (snoc (snoc (empty) 'a) 'b) 'c) 'd) 'e)))) should be: (d e) but got: #<procedure:an-abstract-queue>
Now the failure messages are a bit clearer: the tests are failing because
the client (the test writer) wrote down that the code expected values such as
the list (a)
but the actual values we receive are abstract queues.
Even if the client wanted to break the abstraction by trying to apply the returned procedure, they would be foiled (illustrated below).
% larceny -err5rs -path .. Larceny v0.963 "Fluoridation" (Jul 29 2008 20:26:38, precise:Posix Unix:unified) larceny.heap, built on Tue Jul 29 20:28:40 EDT 2008 ERR5RS mode (no libraries have been imported) > (import (rnrs)) Autoloading (rnrs) Autoloading (rnrs enums) Autoloading (rnrs lists) Autoloading (rnrs syntax-case) Autoloading (rnrs conditions) Autoloading (err5rs records procedural) Autoloading (rnrs exceptions) Autoloading (rnrs hashtables) Autoloading (rnrs arithmetic bitwise) Autoloading (rnrs programs) Autoloading (rnrs files) Autoloading (rnrs io ports) Autoloading (larceny deprecated) Autoloading (rnrs records syntactic) Autoloading (rnrs records procedural) Autoloading (rnrs control) Autoloading (rnrs sorting) Autoloading (rnrs bytevectors) Autoloading (rnrs unicode) > (import (obj-lecture queue1-encapsulated)) Autoloading (obj-lecture queue1-encapsulated) > (define my-queue (snoc (snoc (empty) 'a) 'b)) > my-queue #<PROCEDURE> > (my-queue 'attempting-to-get-inside!) Error: queue1-encapsulated: unauthorized-access-attempt Entering debugger; type "?" for help. debug>
This encapsulation of the internal representation, exposing only operations that know how to properly manipulate the data and preserve invariants of the representation, is often considered a key benefit of object-oriented programming.
Furthermore, with a sufficiently expressive language, the most common attempts to violate encapsulation can be detected statically; the program can be rejected before you run it. (The technique illustrated above is detecting the encapsulation violation dynamically, so the system does not signal an error until we run the code for the white-box test suite that violates the encapsulation.)
However, encapsulation is not a benefit that is provided by only object-oriented languages. Many non-object-oriented languages do support modular program definitions where representations are hidden; after all, I just demonstrated one way to accomplish the task in Scheme!
On top of that, you can write
code in Java or C# where the internal representation is
exposed as a set of public
fields; it is up
to the programmer to decide how to use the features of the
language to achieve their desired system design.
Still, many practitioners will list encapsulation as a reason that they use object-oriented languages for their systems development.
In the above demonstrations, I chose between the queue1 ("simple"), queue2 ("fast"), and queue1 ("encapsulated") implementations; it is not legal with the code above to mix-and-match queue implementations.
That is, the client code can be ignorant of which queue abstraction it is using, but it still needs to be linked against a single queue implementation, or else there are likely to be serious consequences.
We can see some of the bad effects of trying to link against several of the above queue implementations with this script, queue-mixing.sps.
% plt-r6rs queue-mixing.sps B? b mcar: expects argument of type <mutable-pair>; given a A?
Maybe that's a PLT bug; let's try Larceny.
% larceny -path .. -r6rs -program queue-mixing.sps B? b A? Error: no handler for exception #<record &compound-condition> Compound condition has these components: #<record &assertion> #<record &who> who : "car" #<record &message> message : " a is not a pair.\n" Terminating program execution.
No, Larceny agrees with PLT Scheme on this point.
Something is definitely wrong with how queue-mixing.sps attempts to use the queues.
You can see other "fun" results if you avoid the runtime
error by commenting out the three lines grouped with the
(display "A? ")
expression and try
running the script again.
If you want to be able to use different implementations of the same interface at the same time, you need a more sophisticated way of mapping the operations you want to perform with the actual code that performs those operations.
By representing our data in a different way, we can interact with it by passing a message asking it to perform a particular operation.
Here are translations of the first, queue1 ("dispatch"), and second, queue2 ("dispatch"), implementations into a message-passing style.
As a quick aside: the R6RS library system actually helped my presentation here, since I was able to layer the dispatching implementations on top of the core implementations of
queue1
andqueue2
.When I gave this lecture previously, I based the code on R5RS Scheme; but the R5RS does not provide a library system. So I had to choose between using
load
+define
tricks to get the above effect, or copying the implementations ofqueue1
andqueue2
into the dispatching implementations.In that presentation, I chose to copy the implementations, but that meant that the details of the individual queue implementations were distracting the reader of the code from the core ideas of dispatch. In this version, the
library
system lets me focus on the relevant details to dispatch alone.
Now we can load one version, use it, load another version, use that as well, and the values we constructing with the first version continue to work, as in this script:
% plt-r6rs queue-mixing-dispatch.sps B? b A? a Y? y X? x
The code for queue1-dispatch and queue2-dispatch illustrates dynamic dispatch (or sometimes single dispatch, or sometimes just dispatch).
Dispatch can provide a strong separation between interface and implementation, because one typically defines the interface when one is developing the set of messages that will be passed around. The actual code that implements the desired behavior associated with the messages can be developed long after the interface has been conceived.
But there is a piece missing, and understanding what the missing piece is requires that we take another step back.
A queue abstraction can be useful for certain algorithms that require a FIFO order on element delivery and do not require any other sort of order of fast access to enqueued elements.
But some clients may not require a strict FIFO order; some clients will be happy to consume an element from anywhere in the queue.
And other clients may require fast access to both the most-recently inserted and least-recently inserted elements.
It would be useful to categorize our implementations according to what capabilities they have; that is, what operations they can support. If we were to classify our various interfaces into a hierarchy, then our clients could clearly state their requirements by choosing the interface appropriate to their needs.
For my examples in this lecture, I will use a simple hierarchy. Here is its interface (not implementation):
Collection supports methods isEmpty : Collection -> Bool addElem : Collection * Value -> Collection anyElem : Collection -> (list Value Collection) ;; only non-empty works! addAll : Collection * Collection -> Collection toList : Collection -> Listof[Value] Queue extends Collection and adds methods snoc : Queue * Value -> Queue head : Queue -> Value ;; only non-empty works! tail : Queue -> Queue ;; only non-empty works! Tree extends Collection and adds methods isLeaf : Tree -> Bool nodeValue : Tree -> Value ;; only non-leaf works! left : Tree -> Tree ;; only non-leaf works! right : Tree -> Tree ;; only non-leaf works! The base Queue constructor is empty : -> Queue The base Tree constructors are leaf : -> Tree node : Tree * Tree * Value -> Tree
There may be a lot of potential code sharing amongst the different
implementations. A Collection
class might implement
a method, addAll
, where one collection c1
consumes another collection c2
of elements and produces the union of the two by iteratively
invoking the c1.add
method on each element
it can get out of c2.
This is a common code pattern that we would like to put into one place,
into the Collection
class. Extensions of the
Collection
will be responsible for properly implementing
its add
method, but once that is in place, then
all of the extensions immediately support the addAll
method. (At least in principle; there are caveats here.)
The crucial idea to support implementation inheritance: pass yourself around as another parameter! Then, when you need to invoke a method, pass a message to yourself!
(This may sound absurd; why would this accomplish anything?)
This is the heart of delegation; pass a special self parameter around; for any method that you want to allow your subclasses to handle in their implementation, you perform the invocation by passing a message to self.
(Felix finds it fascinating that this notion of self-reference, the key to delegation, is also the key for implementing recursive functions in languages like PROC. But that is just an aside.)
Here is the relevant code that illustrates delegation, using Scheme as the implementation language so that the self parameter is explicit in the code.
And here is a script that demonstrates it:% plt-r6rs demo-delegation.sps {}: () {a b c}: (a b c) {b c}: (b c) {}: () {p q r s t}: (p t q s r) t2 a leaf? #f t2.rgt leaf? #f t2.lft leaf? #f t2.lft.value: p t2.rgt as list: (q s r) q2 and t2 as list: (a b c p t q s r) t2 and q2 as list: (c b a p t q s r)
And here is the big punch-line: the above bits of Scheme code, but encoded
as Java classes! (plus a Main
program that tests them).
% javac Collection.java Queue.java Tree.java Main.java % java Main > q1.toString(): ( ) > q2.toString(): ( a b c ) > q2.tail().toString(): ( b c ) > t1.toString(): ( ) > t2.toString(): ( p t q s r ) > t2.isLeaf(): false > t2.right().isLeaf(): false > t2.left().isLeaf(): false > t2.left().nodeValue(): p > t2.right().toString(): ( q s r ) > q2.addAll(t2).toString(): ( a b c p t q s r )
(Does the output of Main
look familiar?)
class A { /* specification of four: must always return 4. */ public int four() { return 4; } /* specification of five: must always return 5. */ public int five() { return 5; } } class B { public int four() { return five() + 2; } }
Above is subclassing
(since we are inheriting the implementation
of the five()
method and
using it within the definition of the B
class)
but it is not subtyping.
Consider the Object.equals
method
in Java.
In particular, consider the constraint that specfication
puts on its subclasses with respect to whether they
need to implement the
Object.hashCode
method.
Is this class, C
, a subtype of Object
?
class C { int hashCode() { return 42; } }
How about D
?
class D { boolean equals(Object d) { return (d instanceof D); } }
.sls
files)
.sps
files)
import
at the top)
.java
files)
Last updated 22 October 2008.