Learning Haskell: Algebraic Datatypes

posted 2 years ago

Data and Type Constructors

There are two kinds of constructors in Haskell: type constructors and data constructors.

Type constructors are used only at the type level, in type signatures and typeclass declarations and instances. Types are static and resolve at compile time.

Data constructors construct the values at term level, values you can interact with at runtime.

Type and data constructors that take no arguments are called constants. They can only store a fixed type and amount of data.

For example, in the Bool datatype, Bool is a type constant because it isn't waiting for any additional information in the form of an argument to be fully realized as a type.

Bool enumerates two values that are also constants, True and False because they take no arguments.

When a constructor takes an argument, it must be applied to become a concrete type of value.

Type Constructors and Kinds

Take the list datatype:

data [] a = [] | a : [a]

This must be applied to a concrete type before you have a list.

Kinds are the types of types, or types one level up.

Kinds are represented with *.

We know something is fully applied

Compare:

Prelude> let g = not True
Prelude> :g
g :: Bool

Prelude> let g x = x > 5
Prelude> :t g
g :: (Ord a, Num a) => a -> Bool

The first g takes no arguments and is not awaiting application to anything to produce a value, so it's type signature is a concrete type. Note that there is no function arrow.

The second g is awaiting application to an x so its type signature has a function arrow. Once we apply x a value, it also has a concrete type.

We can query the kind signature of a type constructor in GHCi with :kind or :k.

Prelude> :k Bool
Bool :: *
Prelude> :k [Int]
[Int] :: *
Prelude> :k []
[] :: * -> *

Both Bool and [Int] are fully applied, so their kind signatures have no function arrows. They are not awaiting application to anything to be fully realized.

The kind of [] is * -> * because it still needs to be applied to a concrete type before it is itself a concrete type.

This is what the constructor of "type constructor" is referring to.

Data Constructor Arities

Arity refers to the number of arguments a function or constructor takes.

A function that takes no arguments is called nullary.

Data constructors that take no arguments are also called nullary.

Data constructors that take one argument are called unary.

Data constructors that take more than one argument are called products.

-- nullary
data Car =
  Car
  deriving (Eq, Show)

-- unary
data Car =
  Car Integer
  deriving (Eq, Show)
  
-- product
data Car =
  Car Integer Bool
  deriving Show

What Makes Datatypes Algebraic?

Algebraic datatypes are algebraic because we can describe the patterns of argument structures using two basic operations: sum and product.

The cardinality of a datatype is the number of possible values it defines. That number can be as large as infinity.

The cardinality of Bool is 2 because it has two inhabitants that are both nullary data constructors.

Int8 has a cardinality of 256 because its lower bound is -128 and its upper bound is 127.

data Example = Example' deriving Show

Example is the type constructor, and Example' is the only data constructor. Since Example' takes no type arguments, it is a nullary constructor.

Given that Example' is a single nullary value, the cardinality of the type Example is 1.

newtype

We can use the newtype keyword to mark types that can only ever have a single unary data constructor.

A newtype cannot be a product type, sum type, or contain nullary constructors, but it has a few advantages over a vanilla data declaration.

The primary benefit is that it has no runtime overhead, as it reuses the representation of the type it contains.

newtype Car
  = Car Int 
  deriving (Eq, Show) 

Sum Types

Bool is a sum type:

data Bool = False | True

The | represents the logical disjunction - that is, "or". This is the sum in algebraic datatypes.

To know the cardinality of sum types, we add the cardinalities of their data constructors. True and False take no type arguments and thus are nullary constructors, each with a value of 1.

Thus, the cardinality of Bool is 2.

You can check that in the REPL:

Prelude> length (enumFrom False)
2

Product Types

A product type's cardinality is the product of the cardinalities of its inhabitants.

Products are the result of multiplication.

Where a sum type expresses "or", a product type expresses "and".

A product is a way to carry multiple values around in a single data constructor.

Record Syntax

Records in Haskell are product types with additional syntax to provide convenient accessors to fields within the record.

Let's define a simple product type:

data Car =
  MkCar String Int
  deriving (Eq, Show)
  
data Car = -- record syntax
  Car { make :: String
      , year :: Int }
      deriving (Eq, Show)

The record syntax is similar to the original Car type, but defining it as a record means there are now named record field accessors. They're just functions that go from the product type to a member of product.

*Main> :t make
make :: Car -> String
*Main> :t year
year :: Car -> Int
*Main> Car "toyota" 1998
Car {make = "toyota", year = 1998}
*Main> let p = Car "toyota" 1998
*Main> make p
"toyota"
*Main> year p
1998

Constructing and Deconstructing Values

There are essentially two things we can do with a value: we can generate or construct it or we can match on it and consume it.

Construction and deconstruction of values form a duality.

Data is immutable in Haskell, so values carry with them the information about how they were created.

data OperatingSystem
  = GnuPlusLinux
  | OpenBSDPlusNevermindJustBSDStill
  | Mac
  | Windows
  deriving (Eq, Show)

data ProgLang
  = Haskell
  | Agda
  | Idris
  | Purescript
  deriving (Eq, Show)

data Programmer = Programmer
  { os :: OperatingSystem,
    lang :: ProgLang
  }
  deriving (Eq, Show)

allOperatingSystems :: [OperatingSystem]
allOperatingSystems = [GnuPlusLinux, OpenBSDPlusNevermindJustBSDStill, Mac, Windows]

allLanguages :: [ProgLang]
allLanguages = [Haskell, Agda, Idris, Purescript]

allProgrammers :: [Programmer]
allProgrammers = [Programmer {os = x, lang = y} | x <- allOperatingSystems, y <- allLanguages]
Prelude> allProgrammers
[Programmer {os = GnuPlusLinux, lang = Haskell},Programmer {os = GnuPlusLinux, lang = Agda},Programmer {os = GnuPlusLinux, lang = Idris},Programmer {os = GnuPlusLinux, lang = Purescript},Programmer {os = OpenBSDPlusNevermindJustBSDStill, lang = Haskell},Programmer {os = OpenBSDPlusNevermindJustBSDStill, lang = Agda},Programmer {os = OpenBSDPlusNevermindJustBSDStill, lang = Idris},Programmer {os = OpenBSDPlusNevermindJustBSDStill, lang = Purescript},Programmer {os = Mac, lang = Haskell},Programmer {os = Mac, lang = Agda},Programmer {os = Mac, lang = Idris},Programmer {os = Mac, lang = Purescript},Programmer {os = Windows, lang = Haskell},Programmer {os = Windows, lang = Agda},Programmer {os = Windows, lang = Idris},Programmer {os = Windows, lang = Purescript}]

Binary Tree

data BinaryTree a =
    Leaf
  | Node (BinaryTree a) a (BinaryTree a)
  deriving (Eq, Ord, Show)

Binary trees have values at each node. Each node could be a terminal node, called a leaf, or it could branch and have two subtrees.

The subtrees are also of type BinaryTree a, so this type is recursive.

Each binary tree can store another binary tree, which allows for trees of arbitrary depth.

In some cases, binary trees can be more efficient for structuring and accessing data than a list.

Inserting into Binary Trees

If a value is lower, we want to insert it somewhere on the left-hand part of the tree. If the value is greater than the current node, then it should go on the right.

It's important to remember that data is immutable in Haskell. Every time we insert a value into the tree, we build an entirely new tree.

insert' :: Ord a => a -> BinaryTree a -> BinaryTree a
insert' b Leaf = Node Leaf b Leaf -- handles inserting into an empty tree, beginning the construction of a new tree, and the case of having reached the bottom fo a much larger tree
insert' b (Node left a right)
  | b == a = Node left a right
  | b < a = Node (insert' b left) a right
  | b > a = Node left a (insert' b right)

Mapping a Binary Tree

mapTree :: (a -> b) -> BinaryTree a -> BinaryTree b
mapTree _ Leaf = Leaf
mapTree f (Node left a right) = Node (mapTree f left) (f a) (mapTree f right)

Convert Binary Trees to Lists

preorder :: BinaryTree a -> [a]
preorder Leaf = []
preorder (Node left a right) = a : preorder left ++ preorder right

inorder :: BinaryTree a -> [a]
inorder Leaf = []
inorder (Node left a right) = inorder left ++ a : inorder right

postorder :: BinaryTree a -> [a]
postorder Leaf = []
postorder (Node left a right) = postorder left ++ postorder right ++ [a]

Folding a Binary Tree

foldTree :: (a -> b -> b) -> b -> BinaryTree a -> b
foldTree f a Leaf = a
foldTree f a (Node left x right) = foldTree f (foldTree f (f x a) left) right 

Sources

Binary tree - https://en.wikipedia.org/wiki/Binary_tree#:~:text=In%20computer%20science%2C%20a%20binary,child%20and%20the%20right%20child

Algebraic data type - https://wiki.haskell.org/Algebraic_data_type

Haskell Book - https://haskellbook.com/