Lab 7h Fun With Higher-Order Functions
Purpose The purpose of this lab is to induce headaches with λ, apply some of our newly-found skills to a practical problem, and cut down on unneeded computation.
When working with λ and higher order operations, signatures are very important. They are there to keep you sane and give you a hint as to how to use them. Heed their wisdom, as to go forward you must go back, and to touch the light you must pass beneath the shadow.
Functions on Functions
Exercise 1 Design the function two that takes a function f : [X -> X] as input and returns another function that applies f twice in a row. That is, two returns a function which first applies f to its input, then applies f again to the output of the first application (all within one function call).
Exercise 2 Design the function three, similarly, that applies f three times in a row.
Exercise 3 Design the functions one and zero. Writing one is easy, but what about zero? What does it mean to apply a function to its argument zero times?
; A Repeater is a [X -> X] -> [X -> X] ; that, given a unary function f, outputs a function that ; will repeatedly apply f
Exercise 4 Design a function rep->nat which consumes a Repeater as input and produces the number of times it repeats its argument. So:
(check-expect (rep->nat zero) 0) (check-expect (rep->nat one) 1) (check-expect (rep->nat two) 2) (check-expect (rep->nat three) 3) (check-expect (rep->nat (λ (f) (λ (x) ((three f) ((two f) x))))) 5) Hint: If you have a Repeater, all you can do is give it some inputs and see what it gives you back. So your task here is simply to devise some inputs that will force it to tell you which number it represents:
; rep->nat : Repeater -> Nat ; Discover how many times the given Repeater repeats its input (define (rep->nat rep) ((rep ?) ?))
Exercise 5 Design the function rep-add1 that increments a Repeater by 1.
Exercise 6 Design the function nat->rep that converts a natural number n to the Repeater that repeats its input n times. Use the following data definition to process natural numbers recursively.
; A Nat (natural number) is one of: ; - 0 ; - (add1 Nat)
Repeaters give us an alternative representation of natural numbers, and the cool part is that we can build them with λ. Can we also do arithmetic with them using nothing more than λ?
Exercise 7 Before moving on to the next exercises, update your definition of nat->rep to not use zero or rep-add1 to get some good practice of using just λ.
Exercise 8 Design the function rep+ that takes two Repeaters and returns the Repeater that represents their sum. Don’t use rep->nat, nat->rep, or built-in arithmetic.
Exercise 9 Design the function rep* that takes two Repeaters and returns the Repeater that represents their product. (Again, no rep->nat, nat->rep, or built-in arithmetic.)
Hint It’s shorter than rep+.
Exercise 10 Design the function rep-expt that takes two Repeaters and returns the Repeater that represents their exponentiation: (rep-expt n m) = nm. (Don’t use rep->nat, nat->rep, or built-in arithmetic!)
Hint It’s shorter than rep*.
Exercise 11 Can we use repeaters to compute functions we care about?
The Fibonacci sequence is defined as:
F(0) = 0
F(1) = 1
F(n) = F(n-1) + F(n-1)
Making Google
This is a highly simplified account of what search engines do, and is one of the simplest metrics by which search engines rank their results. If this lab has whetted your appetite for more sophisticated techniques, consider taking an upper-level class in Information Retrieval!
Exercise 12 Design the function count that will take a function p of type [X -> Boolean] and a [Listof X] lox and will output how many elements x in the list return true when p is applied to them. This function should not use length or filter, as that would require building up a list, when all we really want is the length of that list.
Exercise 13 Design the function unique that will take a [Listof X] and return the same list, except if an element in the list appears later in the list, remove its first apperance. Use foldr.
; A Corpus is a [Listof Document] ; A Document is a [Listof String] where every element is an English word, all lower-case
Let’s think what makes a document relevant when someone searches the word "lackadaisical". How often it appears in the document is a good indicator of how relevant it is. In a document about the etymology of the word, for example, the word would appear quite often. In a novel, it may appear once or twice, but someone interested in the word "lackadaisical" would almost definitely not be interested in finding that novel. Thus, term frequency is a good indicator of how relevant a document is to that term.
Exercise 14 Design the function term-frequency, that takes a Document and a String and returns what percentage of the words in the document are equal to the given string. Use count.
; A Query is a [Listof String] where every element is an English word, all lower-case
Now let’s think about what makes a document irrelevant when someone searches for the query (list "a" "storm" "of" "swords"). "storm" and "swords" are certainly unique words and won’t appear in just any document, but "a" and "of" aren’t and would. Therefore, how many documents a word appears in is inversely proportional to how relevant it is to a query.
Exercise 15 Design the function idf, which is short for inverse-document-frequency, that takes a String and a Corpus and outputs the total number of documents divided by the number of documents the given word appears in... plus 1, to avoid division by zero in case the word never appears at all.
Exercise 16 Design the function tf-idf, that takes a Query, a Document and the Corpus it belongs to and outputs the sum of the product of each search term’s term-frequency in that document and its inverse-document-frequency.
Exercise 17 Design the function search-engine, that given a Corpus, will output a function that takes a Query and outputs the documents sorted in descending order by the tf-idf applied that query, document, and corpus. Use sort.
Yay! You are Google!
Exercise 18 Challenge: sort uses a function that compares two elements to each other. It has been proven that all comparison-based sorting functions will compare at least one element x to other elements more than once.
In your search-engine function, documents were compared to each other based on their tf-idf value. Therefore, for at least some documents, this value was computed more than once. That’s bad!
The problem we’re going to solve is this: sometimes, we want to sort a list of values based on some (possibly slow) computation applied to each one. We definitely don’t want to do this computation on the same element more than once.
Design sort/compute-once, which takes a [List X] lox, an [X -> Y] f, and a [Y Y -> Boolean] y-comp, and sorts lox by y-comp after f is applied to its elements. f should only be computed once on each element, the function should still return a list identical to lox (but with the order changed), and it should use sort.