Learn to program in Haskell, lesson 3

Ari Rahikkala · Post by **Ari Rahikkala** » Fri Aug 17, 2007 9:03 am

It seems that, at least for the next week or so, I'll have way too much free time, so you should be able to catch me at irc.psigenix.net:#jm3u at basically any time except when I'm sleeping, which is approximately 0:00 - 7:00 UTC. Also, Haskell surprises me with its depth again, so to keep things sane, the Thorough Explanation of Polymorphism will... have to wait until the next lesson again :(. The subject of this lesson is simply too important to be left for later.

Parts of this lesson involve things you can't do directly in ghci. I recommend starting up a text editor (Wordpad will do - anything will do as long as it can simply save things as text) and loading it in ghci by typing ":l foo.hs" (replacing foo.hs with the name of the file, of course). After making changes to the file, you can reload it in ghci by simply typing ":a" or even just ":". Specifically, datatype definitions can't be introduced in ghci at all, and pattern match candidates all have to be written on a single line, separated by semicolons.

--

Values in Haskell are built out of data constructors. Conceptually you could say every value is built that way - this isn't actually quite true, but you can pretend that it is. Types are equivalently built using type constructors. Both of them are introduced using the data syntax, also called datatype definition. Here's how it works, using a couple of examples from the Prelude:

Code: Select all

data Maybe a = Just a | Nothing

The Maybe is a type constructor. It takes one type argument, a. The Just is a data constructor that takes one argument. Type and data constructors are simply connected: The a in Just a means that when you construct a value with Just, the type of the a is taken and applied to the Maybe type constructor. For instance, the type of Just "hello" is Maybe String (or Maybe [Char]). The Nothing is another data constructor, one which takes no arguments. The type of an expression constructed with Nothing is Maybe a.

Code: Select all

data [a] = [] | a : [a] -- not actually legal Haskell, though you can pretend it is
data List a = Nil | Cons a (List a) -- the same thing, only substituting different names for the constructors. This one doesn't exist in Haskell by default.

Here we have another type with two constructors. An expression that is a list can be constructed either out of an empty list, or an expression consed to an expression that is a list (a list whose members are of the same type as the first expression). For instance:

Code: Select all

1 : (2 : (3 : [])) -- the parentheses are unnecessary but they clarify what's going on - first a list is constructed out of an empty list, then 3 is consed to it, then 2, etc..
Cons 1 (Cons 2 (Cons 3 Nil)) -- the same thing, using the differently named constructors
1 : 2 : 3 : [] -- equivalent to the first


1 : 2 : 3 -- not well-typed: 3 is not a list, so you can't cons a value to it. Try it and see the error the language gives.

Constructors don't have to take any arguments:

Code: Select all

data Bool = True | False

Code: Select all

data Char = 'a' | 'b' | 'c' | 'd' ... -- not the way it's actually defined, but you can pretend it is
data Int = 1 | 2 | 3 | 4 ... -- again, it's not actually defined this way

Now that we know how to make values, let's see how to take them apart.

The trick is that functions don't just take arguments: They can pattern match to assert that their arguments have a certain structure. You can match on either data constructors or formal parameters. A data constructor matches itself, a formal parameter matches anything. A function definition may have any number of pattern match choices - when a function is called, they're tried one by one, top-down, until one matches.

Code: Select all

map f [] = []
map f (x:xs) = f x : map f xs

map is pretty much the most basic example of pattern matching. It's got two choices, and the choices correspond exactly to the constructors of the list type:

If the second argument of map is the empty list, then the first pattern is matched, and the call evaluates to [].
If the second argument is a value consed to a list, the second pattern is matched. x is bound to the value that was consed to the list, xs is bound to the rest of the list (corresponding to the two arguments given to the (:) constructor earlier).

Code: Select all

map (+1) [1,2] ==
map (+1) (1 : 2 : []) == -- second argument is a cons, so the second match is taken
(+1) 1 : map (+1) (2 : []) == -- second argument is a cons again, so second match taken again
(+1) 1 : (+1) 2 : map (+1) [] == -- second argument is the empty list, so first match is taken
(+1) 1 : (+1) 2 : []

Additionally, matching on anything starting with an underscore (_) means the same thing as matching on a variable, but without actually binding the argument in that place to a value. It's very useful for showing that an argument isn't actually needed in a given case - for instance, in the generally used definition of map, the first line is actually not what I wrote, but "map _ [] = []" as the first argument isn't needed.

Matching on anything but formal parameters and data constructors is impossible. If you try to match on the name of a function, what actually happens is that that name is bound to the value given as an argument. This tends to be frowned upon as it makes code confusing.

Here's a few examples of functions whose pattern match choices correspond almost directly to the appropriate data constructors:

Code: Select all

maybe n _ Nothing  = n
maybe _ f (Just x) = f x

either f _ (Left x)     =  f x  -- corresponds to data Either a b = Left a | Right b
either _ g (Right y)    =  g y

(++) []     ys = ys
(++) (x:xs) ys = x : xs ++ ys

foldr k z xs = go xs
     where go []     = z
           go (y:ys) = y `k` go ys

head (x:_) = x
head []    = undefined
[/quote]

The last two functions here use a few concepts we haven't handled yet. foldr uses where-binding, which is just a syntactic feature - it's exactly the same thing as let-binding, except that in let-binding you write the names to define first and the expression to evaluate after them (let [i]names[/i] in [i]expression[/i]) whereas with where you write the expression first and the names later ([i]expression[/i] where [i]names[/i]). head uses [b]undefined[/b], or [b]bottom[/b], which is a very strange value that will be explained in greater depth later. 



A few final notes on syntax that you might have already noticed: Both type and data constructors are always written with a capital first letter (unless they're nonalphabetic such as (:), in which case they always start with a colon). Other names are written with a lowercase first letter.


Exercises: 

1. Take a deep breath and meditate on the connection between data constructors and pattern matching. Play around with the functions given above if you like. Realise that in the code itself, there really is very little to the language but the lambda, data constructors to create values, pattern matching to take them apart, and names.

2. Write a datatype definition for days of the week, with one constructor corresponding to one weekday.

3. A binary tree is either a leaf containing a value, or a branch containing a value and two binary trees. Write the corresponding datatype.

4. Write a function that tells if a list is exactly 1 element long. It should have type [a] -> Bool.

5. Write a function that takes a binary tree and returns its values in a list in some consistent order (if you know what a preorder traversal is, do that). It should have type BinTree a -> [a]. 

As a hint for exercises 3 and 5, here's a datatype definition for Rose trees and a preorder traversal for them. A Rose tree, aka a multi-way tree, is either a leaf containing a value, or a branch containing a list of Rose trees.

data RoseTree a = Leaf a | Branch [RoseTree a]

preorder :: RoseTree a -> [a]
preorder (Leaf a) = [a]
preorder (Branch xs) = concatMap preorder xs

(concatMap is simply map, except it concatenates all of its results - which must be lists - together into a single list. You won't need it to write preorder for binary trees, but you do need (++))

As a hint for exercise 4, note that pattern matching on data constructors can be recursive. That is, it's possible to match on (x:[]), for instance. Also remember that pattern matches are tried one by one until one matches the structure of the given arguments, and that choice is then taken and evaluated.

Shireroth

Learn to program in Haskell, lesson 3

Learn to program in Haskell, lesson 3

Who is online