Lab 11h Type Inference

home work!

Purpose In this lab, we will automate the process of code-detective with the Hindley-Milner algorithm.

Exercise 1 Copy and paste the following familiar pieces of code and data definitions into your program.

; A [Pair X Y] is a (make-pair X Y)
(define-struct pair (fst snd))
 
; remove-duplicates : [List X] -> [List X]
; removes all but the last copy of any duplicates in a list
(define (remove-duplicates lst)
  (cond
    [(empty? lst) lst]
    [(member (first lst) (rest lst)) (remove-duplicates (rest lst))]
    [else (cons (first lst) (remove-duplicates (rest lst)))]))
 
 
; A Husky expression (HExp) is one of:
;  Boolean                            booleans
;  Number                             numbers
;  Ident                              identifiers
;  (list 'if HExp HExp HExp)          if expressions
;  (list 'fun Ident HExp)             lambdas
;  (list HExp HExp)                   function applications
; An Ident is a Symbol other than 'if, or 'fun

Take note that this definition of a HExp is slightly different than what you’re used to. Here, functions can only take one argument, and as such, there is only one Ident in a function, and function application uses a list of fixed size. Also, we discard arbitrary constants. This language only operates over booleans, numbers, and functions.

Exercise 2 Copy the following data definitions and code into your program. If you feel it will help, write their templates.
; A Type is one of
;  (make-const Symbol) named types, like Number or Boolean
;  (make-arrow Type Type) a one-argument arrow type, like [X -> Y]
;  (make-tyvar Symbol) a type variable, like X
(define-struct const (name))
(define-struct arrow (arg ret))
(define-struct tyvar (name))
 
; type->string : Type -> String
; Outputs the type as an easily-readable string
(define (type->string type)
  (cond
    [(const? type) (format "~a" (const-name type))]
    [(tyvar? type) (format "~a" (tyvar-name type))]
    [(arrow? type) (format "[~a -> ~a]" (type->string (arrow-arg type)) (type->string (arrow-ret type)))]))
 
; A TypeScheme is a (make-scheme [Listof Symbol] Type]
; and is a polymorphic type of the form "Forall X, Y, Z, <some type>"
; Technically, the "type" [X -> X] is not well-formed, because X isn't defined
; but the type scheme "Forall X, [X -> X]" is well-formed.
(define-struct scheme (vars type))

We will use type->string much later, but you should understand how it works.

; A TypeEnv is a [List (list Symbol Scheme)]
; A type environment records the (polymorphic) types for predefined functions, or
; the types of function parameters available in the bodies of functions
 
 
; A Subst is a [List (list Symbol Type)]
; This is where we keep track of all the type equations we build up while running
; type inference (aka "code detective")
A TypeEnv is just like the environment that our evaluator recorded the definitions of builtin functions, or the parameters of functions while applying them.

We use the associations in a Subst to we use these substitutions to rewrite the types in progress and update them as we build up more information (like in code detective!).

We will now write short helpers that will allow us to write our type-inferencing algorithm.

Exercise 3 Design the function apply-subst, which has the following header:
; apply-subst/type : Subst Type -> Type
; Rewrites all types in the given type based on the definitions we have in the substitution
(define (apply-subst/type subst type) ...)

Use the type-template to guide your code. No substitutions can happen on constant types, and arrow types should build a new arrow type and recur on the left and right. In type variables is where we do the heavy lifting, i.e. actual substitution. Use the Subst given to the function to substitute with the proper type, or keep it the same if no substitution exists. assq will be of use.

Exercise 4 Design the function apply-subst/scheme, which has the following header:
; apply-subst/scheme : Subst TypeScheme -> TypeScheme
; Rewrites all the types inside the given type scheme based on the substitution
; but does *not* rewrite the type variables that are bound by the scheme itself
(define (apply-subst/scheme subst scm) ...)

What function have you written that will help you with this? Note that the Subst you really want to work with here is one with all of the type variables in the Scheme removed.

Exercise 5 Design the following two functions using the functions you have just written. Use the signatures on those functions and the data definitions of Subst and TypeEnv to guide you.
; apply-subst/env : Subst TypeEnv -> TypeEnv
; Rewrites all the types in the type environment
(define (apply-subst/env subst tenv) ...)
; apply-subst/subst : Subst Subst -> Subst
; Rewrites all the types in the second substitution, using definitions from the first substitution
(define (apply-subst/subst subst1 subst2) ...)

Exercise 6 Design the function compose-subst, which has the following header:
; compose-subst : Subst Subst -> Subst
; Combines two substitutions into one:
; first by applying the first substitution to the second,
; and then just appending that to the first
(define (compose-subst sub1 sub2) ...)

Exercise 7 Design the following four related functions, and use remove-duplicates when appropriate.

  1. ftv/type, which will take in a Type and output all of the free type variables in it.

  2. occurs?, which will take in a Symbol and a Type and determine if it occurs in the free type variables of the type.

  3. ftv/scheme, which will take in a TypeScheme and output all of the free type variables in its type that are not bound by the scheme’s variables.

  4. ftv/env, which will compute all of the free type variables in a TypeEnv. Note that the symbols in the TypeEnv just give types to named terms, and those names have nothing to do with the free type variables in an environment.

Exercise 8 Design the function bind, which will take in a Symbol and Type, and will make a Subst binding the type that symbol. One of the three outcomes may occur:
  • The type is a variable, and it is the same as the symbol given, so an empty substitution environment can be returned, as no substitution is needed.

  • The symbol appears in the free type variables of the type, and it does not fall into the above case. This would create an impossible (infinite) type. For example, imagine if the type were an [X -> Y] function and we were binding it to X. That would be bad! If this happens, we will throw an error, of the form:

    (error (format "Infinite types: ~a and ~a" binding (type->string type)))

  • Otherwise, we make a Subst with just the one binding of the type to the symbol.

Exercise 9 Design the function unify, which will take in two Types and will determine whether or not the types are compatible. In other words, it will output a Subst, if possible, that will bind the free type variables inside the types so that they become identical. It will work as follows:
  • If the first type is a type variable, create a Subst that will bind the second type to that variable.

  • If the second type is a type variable, create a Subst that will bind the first type to that variable.

  • If both are constant types and are the same, no substitution is needed, so we will output an empty Subst. Otherwise, if they are not the same, we will throw an error of the following form:

    (error (format "Incompatible type constants: ~a and ~a" (type->string t1) (type->string t2)))

  • If both are arrow types, we will unify their argument types. Using this Subst, we will use apply-subst/type to apply the substition to both of their return types. We will then unify these. Finally, we will compost-subst these unifications together, and we will use the unification of their substituted return types as the first parameter to compose-subst, and the unification of their argument type as the second.

  • Otherwise, the two types are incompatible, and we will throw an error as above.

The following check-expects should pass. Use these to help you understand how this function works.
(check-error (unify (make-const 'Number) (make-const 'Boolean))
             "Incompatible type constants: Number and Boolean")
(check-error (unify (make-arrow (make-tyvar 'A) (make-const 'Number))
                    (make-arrow (make-const 'Boolean) (make-tyvar 'A)))
             "Incompatible type constants: Number and Boolean")
(check-error (unify (make-const 'Oops) (make-arrow (make-tyvar 'X) (make-tyvar 'Y)))
             "Incompatible types: Oops and [X -> Y]")
(check-expect (unify (make-arrow (make-arrow (make-tyvar 'A) (make-tyvar 'B)) (make-tyvar 'C))
                     (make-arrow (make-tyvar 'D) (make-tyvar 'E)))
              (list (list 'D (make-arrow (make-tyvar 'A) (make-tyvar 'B)))
                    (list 'C (make-tyvar 'E))))

Exercise 10 At the top of your file, add the following line:

(require lang/htdp-advanced)   ; gives us gensym

gensym is a very exciting function that returns a symbol that is only equal to itself. Experiment with it and its output in the interactions window.
> (define x (genysm 'X))
> x
> (symbol=? x x)
> (symbol=? x <the copy/pasted symbol from when you enter just x>)

Exercise 11 Design the function instantiate, which has the following header:
; instantiate : TypeScheme -> Type
; Given a polymorphic type scheme, create some instantiation of it
; by making up brand new (gensym'ed) type variables for it, and substituting them where needed
(define (instantiate scm) ...)
This function will make a local Subst, by mapping over the variables in scm, and will then call to apply-subst/type

.

Exercise 12 Design the function generalize, which has the following header:
; generalize : TypeEnv Type -> TypeScheme
; produce a TypeScheme that binds the free variables in the type
; (that do not appear in the TypeEnv)
(define (generalize env type) ...)
Here, the type given to the output TypeScheme will be just the input type, and only free variables in the type that do not appear in the type environment will be given to the output TypeScheme.

Exercise 13 Design the function lookup-env, which will take a TypeEnv and a Symbol and return a [Pair Subst Type]. The Subst in the pair it returns will always be the empty list, but including it will be convinient later. This lookup function works similarly to the ones we have seen before, and if the symbol is not found, it will throw the following error:

(error (format "Unbound identifier: ~a" ident))

If it is found, the type in the returned [Pair Subst Type] will be instantiated by the associated TypeEnv.

Exercise 14 And now for our main function! First, add the following constants to your program:
(define NUM (make-const 'Number))
(define BOOL (make-const 'Bool))

infer will take a TypeEnv and a HExp and will output a [Pair Subst Type]. It works as follows:
  • If the expression is a symbol, lookup its type in the environment.

  • If the expression is a boolean or number, return the empty Subst and the appropriate base type.

  • If it is an 'if expression, recur on its three components.

    Check to make sure the type of its first expression is of type BOOL with unify. Then, make sure the other two expressions have a compatible type with unify. Keep these three results around, as their substitution environments will be of use.

    To finish, the substitution environment returned should combine all of the substitutions of the unifications and the ones returned by the recursive calls. It should do so using foldr, and compose-subst. The base case should be the substitution returned by recurring on the conditional expression, and the list should begin with the results of the unification and end with the other recursive calls.

  • If it is a 'fun expression, generate a new type variable for its argument with

    (make-tyvar (gensym 'X))

    Extend the environment with the function’s argument and a TypeScheme, which parametrizes no variables and whose type is the newly generated type for the argument.

    Using this new environment, recur on the body of the function. The result of recursive call will contain the Subst we are ultimately interested in returning.

    For the type, however, we will return an arrow type. The return type will come straight from the recursive call. The argument type will need to use apply-subst/type, the Subst returned from the recursive call, and the newly generated type.

  • Otherwise, we have function application. As we are unsure of the return type, we will make a new type variable for it, just as we did for functions.

    We will then recursively call on the function being applied. We will also recur on the argument being passed to the function, but we will need to modify the environment with apply-subst/env by using the result on the first recursive call.

    We will now clean up the type of function. We will unify the following two types:
    • apply-subst/type called with the Subst from the recursive call on the argument with the type from the recursive call on the function.

    • A new arrow type, which goes from the type of the argument (that we inferred) to the result type (the type variable we just invented).

    The result of this unifcation is very important. In the pair we return, we will use foldr an compose-subst, like we did in the 'if case, and here, the base Subst will be from the recursive call on the function, the one at the end of the list will be from the recursive call, and the one at the beginning will be the result of this unification. The type returned will be computed by apply-subst/type, passed the unification and the result type that we generated.

Exercise 15 Take a deep breath, you’re done! All that’s left is to test. Use the following code to test your infer.

; infer-type : TypeEnv HExp -> TypeScheme
; Computes the most general type scheme for the given Husky expression in the given environment
(define (infer-type env exp)
  (local ((define result (infer env exp)))
    (generalize empty (apply-subst/type (pair-fst result) (pair-snd result)))))
 
; zip : [List X] [List Y] -> [List (list X Y)]
; zip combines two lists into a list of 2-item lists
(define (zip l1 l2)
  (cond
    [(or (empty? l1) (empty? l2)) empty]
    [else (cons (list (first l1) (first l2))
                (zip (rest l1) (rest l2)))]))
 
; cleanup-vars/scheme : TypeScheme -> TypeScheme
; takes the relatively ugly type names we get from gensym, and cleans them all up
(define (cleanup-vars/scheme scheme)
  (local ((define ftvs (ftv/type (scheme-type scheme)))
          (define new-vars (build-list (length ftvs) (λ(n) (make-tyvar (string->symbol (format "X~a" n))))))
          (define subst (zip ftvs new-vars)))
    (make-scheme (map tyvar-name new-vars)
                 (apply-subst/type subst (scheme-type scheme)))))
 
; scheme->string : TypeScheme -> String
; Outputs a readable version of a TypeScheme
(define (scheme->string scm)
  (format "Forall ~a, ~a"
          (foldr (λ (s1 s2) (if (string=? s2 "")
                                s1
                                (string-append s1 ", " s2)))
                 ""
                 (map symbol->string (scheme-vars scm)))
          (type->string (scheme-type scm))))
 
; test : TypeScheme -> String
; Outputs a nice string to test our code inference
(define test (compose scheme->string cleanup-vars/scheme))
 
(define TYENV0 (list (list 'and (make-scheme '() (make-arrow BOOL (make-arrow BOOL BOOL))))
                     (list '+ (make-scheme '() (make-arrow NUM (make-arrow NUM NUM))))
                     (list '- (make-scheme '() (make-arrow NUM (make-arrow NUM NUM))))
                     (list '< (make-scheme '() (make-arrow NUM (make-arrow NUM BOOL))))
                     (list 'add1 (make-scheme '() (make-arrow NUM NUM)))
                     (list 'zero? (make-scheme '() (make-arrow NUM BOOL)))
                     (list 'compose (make-scheme '(X Y Z)
                                                 (make-arrow (make-arrow (make-tyvar 'Y) (make-tyvar 'Z))
                                                             (make-arrow (make-arrow (make-tyvar 'X) (make-tyvar 'Y))
                                                                         (make-arrow (make-tyvar 'X) (make-tyvar 'Z))))))))
 
(check-expect (test (infer-type '() IDENTITY))
              "Forall X0, [X0 -> X0]")
(check-expect (test (infer-type '() APP-TO-5))
              "Forall X0, [[Number -> X0] -> X0]")
(check-expect (test (infer-type '() COMPOSE))
              "Forall X0, X1, X2, [[X0 -> X2] -> [[X1 -> X0] -> [X1 -> X2]]]")
(check-expect (test (infer-type TYENV0 '(fun f (fun g ((< f) ((+ g) 5))))))
              "Forall , [Number -> [Number -> Bool]]")
(check-expect (test (infer-type TYENV0 ABS))
              "Forall , [Number -> Number]")
(check-expect (test (infer-type TYENV0 NOT))
              "Forall , [Bool -> Bool]")
(check-expect (test (infer-type TYENV0 '(compose zero?)))
              "Forall X0, [[X0 -> Number] -> [X0 -> Bool]]")
(check-error (test (infer-type empty 'x)) "Unbound identifier: x")
 
 
(define MYSTERY '(fun foo (fun bar (fun baz (((foo bar) (baz bar)) bar)))))
(check-expect (scheme->string (cleanup-vars/scheme (infer-type '() MYSTERY)))
              "Forall X0, X1, X2, [[X0 -> [X1 -> [X0 -> X2]]] -> [X0 -> [[X0 -> X1] -> X2]]]")

Exercise 16 What can’t this language type check? What does that say about this language?

Hint:

(infer-type '() '(fun x (x x)))

Exercise 17 Celebrate!* You’re almost done with fundies (assuming you’ve handed in the homework), only one lecture to go!

*Make sure you celebrate responsibly.