Type-Level Magic: A Type-Safe Stack DSL in Haskell

㕥前我們實現過一箇簡單的 printf 函數，它的參數箇數是可變的。

今天要實現的來自一箇 Codewars 上的 Challenge，下面是原始題目。

Kata#

The goal of this kata is to embed a stack based language into Haskell. Languages such as forth have a number of simple commands such as push but are written in a postfix style. Haskell only has support for prefix and infix function application, can we directly embed such langauges into Haskell?

We will only use 4 instructions.

begin - Marks the start of a program.
end - Marks the end of a program and returns the top element of the stack.
push n - Pushes an integer onto he stack.
add - Adds together the top two elements on the stack.

Here are some examples of programs written using these commands.

1
f = begin push 2 push 3 add end
2
  = 5
3

4
g = begin push 1 push 1 add push 2 add end
5
  = 4

1
f = begin_ push 2 push 3 add end_
2
  = 5
3

4
g = begin_ push 1 push 1 add push 2 add end_
5
  = 4

1
f = begin_ push 2 push 3 add end_
2
  = 5
3

4
g = begin_ push 1 push 1 add push 2 add end_
5
  = 4

There are several possible implementations. The two suggestions below are not required in order to complete this kata but further challenges if you so desire.

An invalid program should raise a type error rather than a runtime error.

Your implementation should allow programs such as..

1
bad = begin push 1 push 1 push 1 push 1 push 1 push 1 push 1 push 1 push 1
2
        add add add add add add add add end

The inferred type of a basic implementation is exponential in size to the expression. This causes type inference to take too long. Hint: Use a typeclass Hint: http://chrisdone.com/posts/haskell-constraint-trick

Template#

1
module Postfix where
2

3
begin = undefined
4
push  = undefined
5
add   = undefined
6
end   = undefined

對比 `printf`#

前回 printf 的核心難點是：參數箇數和參數類型都由格式串決定，printf 本身必須是「真正的可變參數函數」，所以需要用 typeclass 的遞歸實例來描述 a -> b -> ... -> r 這種類型。

這箇 kata 有點像、但本質上不同。它雖然也寫成鏈式調用，但命令集合是固定的：

push 衹喫一箇 Int
add 不喫外部參數、衹操作棧
begin/end 衹是程序邊界

所以一開始我們可以不引入 printf 那套 variadic typeclass 技術，先用一箇「傳遞棧狀態」的普通模型（例如 CPS/Cont）把基本功能跑通。

不過這衹是先「通過」基礎題意，還沒有解決題目附帶的兩箇進階挑戰：

非法程序（例如空棧 add）應該在類型層面報錯，而不是在運行時崩潰。
程序變長後，類型推導會快速膨脹，導致推導性能變差。

因此後面引入 typeclass，主要是爲了解決「可擴展的類型推導」和「更強的靜態約束」。

題目解讀#

因爲函數的調用結構是大致固定的，不妨攷慮棧是一箇列表，那自然而然可以用類似 Cont 的方式來實現：

begin 等待一箇回調，竝把 [] 作爲初始棧傳入。
push n 等待一箇回調，把 n 壓入棧頂得到新的回調。
add 等待一箇回調，把棧頂的兩個元素彈出相加後壓入棧頂得到新的回調。
end 本身就是一箇回調，直接返回棧頂元素。

1
type Cont s r = (s -> r) -> r
2

3
end :: [Int] -> Int
4
end (x : _) = x
5

6
begin :: Cont [Int] r
7
-- begin f = f []
8
begin = ($ [])
9

10
push :: [Int] -> Int -> Cont [Int] r
11
push s x = ($ (x : s))
12

13
add :: [Int] -> Cont [Int] r
14
-- add (x:y:s) f = f ((y+x):s)
15
add (x : y : s) = ($ ((y + x) : s))
16

17
f = begin push 1 push 2 push 3 add add end

這箇直接的實現是可行的，由於函數應用是左結合的：

begin push 爲 push 提供了 [] 作爲棧。
begin push 1 把 1 壓入棧頂得到 [1]，並爲下一個 push 提供了這個棧。
類推下去，直到 add，它把前面傳來的 [3, 2, 1] 的前兩個元素彈出相加後壓入棧頂得到 [5, 1]，並爲下一個 add 提供了這個棧。
下一箇 add 同理，最後 end 被傳入了 [6]，返回 6。

這也是一些人在抱怨講說這箇題雖然是 2kyu，但實際衹算得上 4kyu 的原因。但他们沒有解決題目給出的挑戰。

我們先來看看 hls 推導的 f 中 begin 的類型：

1
begin :: forall r. Cont [Int] r
2
_ :: Cont
3
  [Int]
4
  (Int
5
   -> ([Int]
6
       -> Int
7
       -> ([Int]
8
           -> Int
9
           -> ([Int]
10
               -> ([Int] -> ([Int] -> Int) -> Int) -> ([Int] -> Int) -> Int)
11
           -> ([Int] -> ([Int] -> Int) -> Int)
12
           -> ([Int] -> Int)
13
           -> Int)
14
       -> Int
15
       -> ([Int]
16
           -> ([Int] -> ([Int] -> Int) -> Int) -> ([Int] -> Int) -> Int)
17
       -> ([Int] -> ([Int] -> Int) -> Int)
18
       -> ([Int] -> Int)
19
       -> Int)
20
   -> Int
21
   -> ([Int]
22
       -> Int
23
       -> ([Int]
24
           -> ([Int] -> ([Int] -> Int) -> Int) -> ([Int] -> Int) -> Int)
25
       -> ([Int] -> ([Int] -> Int) -> Int)
26
       -> ([Int] -> Int)
27
       -> Int)
28
   -> Int
29
   -> ([Int]
30
       -> ([Int] -> ([Int] -> Int) -> Int) -> ([Int] -> Int) -> Int)
31
   -> ([Int] -> ([Int] -> Int) -> Int)
32
   -> ([Int] -> Int)
33
   -> Int)

Question
爲什麼類型會變成這樣複雜？

其實這是由 Cont 中的 r 的展開引起的。

一开始，begin 期待一箇 [Int] -> r 的回調，它的類型被定爲 ([Int] -> r) -> r
push 的類型是 [Int] -> Int -> ([Int] -> r) -> r，所以當 push 被傳給 begin 時，r 被替換爲 Int -> ([Int] -> r) -> r，類型變爲 ([Int] -> (Int -> ([Int] -> r) -> r)) -> (Int -> ([Int] -> r) -> r)
上面的 r 需要由後續的傳入確定，類推下去，類型就越來越複雜了

所以我們的調用看似是線性的，但類型的展開是指數級（至少是超線性）的。

使用類型類#

我們先來解決第二箇挑戰：讓類型推導回歸線性、而不是指數級，這允許我們寫出更長的程序而不至於讓 hls 崩潰。

第一步可以先把 push、add 和 end 攺爲數據而非函數，再用 typeclass 描述它們對棧的操作，由 begin 來驅動整個程序的執行。

1
{-# LANGUAGE FlexibleInstances #-}
2
{-# LANGUAGE UndecidableInstances #-}
3

4
data Push = Push
5
data Add = Add
6
data End = End
7

8
class Program t where
9
  run :: [Int] -> t
10

11
instance (Program r) => Program (Push -> Int -> r) where
12
  run stack Push x = run (x : stack)
13

14
instance (Program r) => Program (Add -> r) where
15
  run (x : y : stack) Add = run ((y + x) : stack)
16

17
instance Program (End -> Int) where
18
  run (x : _) End = x
19

20
begin :: (Program t) => t
21
begin = run []
22

23
end :: End
24
end = End
25

26
push :: Push
27
push = Push
28

29
add :: Add
30
add = Add
31

32
f = begin push 1 push 2 push 3 add add end

這樣 f 中 begin 的類型看起來會很簡單，理論上 只是

1
begin :: forall t. Program t => t
2
_ :: Push -> Int -> Push -> Int -> Push -> Int -> Add -> Add -> End -> Int

但實際上，上面的代碼仍然無法通過編譯，會得到 Ambiguous type variables。

問題在於：begin :: Program t => t 太泛了。Program 的遞歸實例雖然描述了「怎麼跑」，但沒有把中間參數和最終結果限制得足夠具體， GHC 在求解這條長約束鏈時，會留下歧義變量（尤其是數字字面量和最終返回類型）。

簡單來說，就是 GHC 知道有一箇 Program (Push -> Int -> r) 的實例，但它眼中 push 1 的 1 是一箇 Num a => a，GHC 是不能從實例反推 a 的類型的，最後就沒法應用實例了。

還記得我們在 printf 中也遇到過類似的問題嗎？當時我們的解決方案是爲 IO a 寫一箇實例，但是限定 a ~ ()，這樣就把返回類型釘死爲 IO ()，從而消除了歧義。這也是題目 hint 提到的 constraint trick：在 instance 上主動加入等式約束，讓編譯器在遞歸求解時「同步收窄」類型。

1
{-# LANGUAGE FlexibleInstances #-}
2
{-# LANGUAGE UndecidableInstances #-}
3

4
data Push = Push
5
data Add = Add
6
data End = End
7

8
class Program t where
9
  run :: [Int] -> t
10

11
instance (x ~ Int, Program r) => Program (Push -> x -> r) where
12
  run stack Push x = run (x : stack)
13

14
instance Program r => Program (Add -> r) where
15
  run (x : y : stack) Add = run ((y + x) : stack)
16

17
instance (a ~ Int) => Program (End -> a) where
18
  run (x : _) End = x
19

20
begin :: Program t => t
21
begin = run []
22

23
end :: End
24
end = End
25

26
push :: Push
27
push = Push
28

29
add :: Add
30
add = Add
31

32
f = begin push 1 push 2 push 3 add add end

這兩箇約束分別對應兩箇關鍵點：

x ~ Int：每次 push 後面的字面量不管是什么、都一定命中 Program (Push -> x -> r)，且 x 被限制推導爲 Int。
a ~ Int：程序以 end 結束時、一定會命中 Program (End -> a)，且 a 被限制推導爲 Int。

和 printf 裏的 instance (a ~ ()) => Printable (IO a) 是同一思路：不改接口形狀，只在 instance 上追加等式約束，幫 GHC 消除歧義。

到這一步，我們主要解決的是「推導可用、可以寫長程序」。

至於第一箇挑戰（把非法程序變成編譯期報錯），還需要在棧結構上再引入更強的類型信息。

強化類型類——使棧深可計算#

想解决第一箇挑戰，核心在於讓類型系統能算出棧的深度，這樣我們就能利用 TypeError 來拒絕不合法的程序。

最簡單的思路是給 Program 新增一箇表達棧深的類型參數，然後在實例中對它進行相應的增減。這里我們需要把自然數放進類型签名，爲了不引入太多依賴類型的話題，我們用標準庫提供的 Nat 來表示自然數，可以完全地像字面量一樣使用它。

依賴類型是一箇很複雜的話題，這裏衹需知道 Nat 讓我們能在類型層面計數就行。

1
{-# LANGUAGE DataKinds #-}
2
{-# LANGUAGE FlexibleContexts #-}
3
{-# LANGUAGE FlexibleInstances #-}
4
{-# LANGUAGE KindSignatures #-}
5
{-# LANGUAGE MultiParamTypeClasses #-}
6
{-# LANGUAGE TypeOperators #-}
7
{-# LANGUAGE UndecidableInstances #-}
8

9
import GHC.TypeLits (Nat, type (+), type (-))
10

11
data Push = Push
12
data Add = Add
13
data End = End
14

15
class Program (n :: Nat) t where
16
  run :: [Int] -> t
17

18
instance (Program (n + 1) r) => Program n (Push -> Int -> r) where
19
  run stack Push x = run (x : stack)
20

21
instance (Program (n - 1) r) => Program n (Add -> r) where
22
  run (x : y : stack) Add = run ((y + x) : stack)
23

24
instance Program n (End -> Int) where
25
  run (x : _) End = x

這樣，Program n t 描述了在棧深爲 n 時運行程序 t。現在上面的代碼無法編譯，一箇原因是我們的幽靈參數沒法讓 GHC 來推導，所以需要我們手動指定 n。

1
{-# LANGUAGE AllowAmbiguousTypes #-}
2
{-# LANGUAGE DataKinds #-}
3
{-# LANGUAGE FlexibleContexts #-}
4
{-# LANGUAGE FlexibleInstances #-}
5
{-# LANGUAGE KindSignatures #-}
6
{-# LANGUAGE MultiParamTypeClasses #-}
7
{-# LANGUAGE ScopedTypeVariables #-}
8
{-# LANGUAGE TypeApplications #-}
9
{-# LANGUAGE TypeOperators #-}
10
{-# LANGUAGE UndecidableInstances #-}
11

12
import GHC.TypeLits (Nat, type (+), type (-))
13

14
data Push = Push
15
data Add = Add
16
data End = End
17

18
class Program (n :: Nat) t where
19
  run :: [Int] -> t
20

21
instance (Program (n + 1) r, x ~ Int) => Program n (Push -> x -> r) where
22
  run stack Push x = run @(n + 1) (x : stack)
23

24
instance (Program (n - 1) r) => Program n (Add -> r) where
25
  run (x : y : stack) Add = run @(n - 1) ((y + x) : stack)
26

27
instance (a ~ Int) => Program n (End -> a) where
28
  run (x : _) End = x
29

30
begin :: (Program 0 t) => t
31
begin = run @0 []
32

33
end :: End
34
end = End
35

36
push :: Push
37
push = Push
38

39
add :: Add
40
add = Add
41

42
f = begin push 1 push 2 push 3 add add end

這裏的 TypeApplication 讓我們能爲 run 手動地指定 n，注意 begin 我们用了 run @0 []，這是 GHC 自動地把字面量 0 推導爲 Nat 類型，所以不用太擔心沒有接触過依賴類型。另外，ScopedTypeVariables 讓我們能在 instance 中使用 n（本來 n 是 Program n t 的參數、竝不存在於函數作用域！如果不這樣做，我們就不能拿到 n，也就無法写 @(n + 1)）， AllowAmbiguousTypes 是爲了讓 GHC 不要因爲 class Program n t 中的 run 不能確定 n 的值而報錯。

拒絕非法程序#

現在我們的類型信息中已經有了棧深，那麼我們就可以利用它來拒絕非法程序了。這裏我們需要先寫幾箇 TypeFamily。

1
{-# LANGUAGE PolyKinds #-}
2
{-# LANGUAGE TypeFamilies #-}
3

4
type family If (b :: Bool) (t :: k) (f :: k) :: k where
5
  If 'True t _ = t
6
  If 'False _ f = f

在寫更多之前，先講講這箇 TypeFamily 的語法。其實 TypeFamily 就是類型層面的函數，是用來計算類型的。它和函數的區別主要在於：

TypeFamily 語法更複雜
TypeFamily 不是Curry化的

這裏的 k 用來代表任意類型，'True 和 'False 表示類型層面的布爾值。

接下來我們來寫用來檢查棧深的 TypeFamily。

1
import GHC.TypeLits (Nat, type (<=?))
2

3
type family AtLeastTwo (n :: Nat) :: Constraint where
4
  AtLeastTwo n = (n <=? 1) ~ 'False
5

6
type family AtLeastOne (n :: Nat) :: Constraint where
7
  AtLeastOne n = (n <=? 0) ~ 'False

Constraint 是一箇特殊的 kind，表示類型約束。AtLeastTwo n 的意思是「n 至少要 2」，AtLeastOne n 的意思是「n 至少要 1」。它們可以用來約束 Add 和 End 的實例，讓它們在棧深不夠的時候報錯。這裏的 (n <=? 1) ~ 'False 的意思是對 n <=? 1 這箇比較下斷言、必須是 'False，否則會報 TypeError。

1
instance (Program (n - 1) r, AtLeastTwo n) => Program n (Add -> r) where
2
  run (x : y : stack) Add = run @(n - 1) ((y + x) : stack)
3

4
instance (a ~ Int, AtLeastOne n) => Program n (End -> a) where
5
  run (x : _) End = x

這樣我們就限定了 Add 和 End 的使用必須滿足棧深的要求，否則會在編譯期報錯，爲了不留下偏函數，可以手動補全不可達分支。另外，利用 If TypeFamily，我們還可以把錯誤信息變得更友好一些，下面是完整的代碼。

1
{-# LANGUAGE AllowAmbiguousTypes #-}
2
{-# LANGUAGE DataKinds #-}
3
{-# LANGUAGE FlexibleContexts #-}
4
{-# LANGUAGE FlexibleInstances #-}
5
{-# LANGUAGE MultiParamTypeClasses #-}
6
{-# LANGUAGE PolyKinds #-}
7
{-# LANGUAGE ScopedTypeVariables #-}
8
{-# LANGUAGE TypeApplications #-}
9
{-# LANGUAGE TypeFamilies #-}
10
{-# LANGUAGE TypeOperators #-}
11
{-# LANGUAGE UndecidableInstances #-}
12

13
import Data.Kind (Constraint)
14
import GHC.TypeLits (ErrorMessage (..), Nat, TypeError, type (+), type (-), type (<=?))
15

16
data Push = Push
17
data Add = Add
18
data End = End
19

20
type family If (b :: Bool) (t :: k) (f :: k) :: k where
21
  If 'True t _ = t
22
  If 'False _ f = f
23

24
type family AtLeastTwo (n :: Nat) :: Constraint where
25
  AtLeastTwo n =
26
    If
27
      (n <=? 1)
28
      ( TypeError
29
          ( 'Text "Illegal program: `add` needs at least 2 value on the stack, but stack depth is "
30
              ':<>: 'ShowType n
31
          )
32
      )
33
      (() :: Constraint)
34

35
type family AtLeastOne (n :: Nat) :: Constraint where
36
  AtLeastOne n =
37
    If
38
      (n <=? 0)
39
      ( TypeError
40
          ( 'Text "Illegal program: `end` needs at least 1 value on the stack, but stack depth is "
41
              ':<>: 'ShowType n
42
          )
43
      )
44
      (() :: Constraint)
45

46
class Program (n :: Nat) t where
47
  run :: [Int] -> t
48

49
instance (Program (n + 1) r, x ~ Int) => Program n (Push -> x -> r) where
50
  run stack Push x = run @(n + 1) (x : stack)
51

52
instance (Program (n - 1) r, AtLeastTwo n) => Program n (Add -> r) where
53
  run (x : y : stack) Add = run @(n - 1) ((y + x) : stack)
54
  run _ Add = error "Unreachable: AtLeastTwo n should prevent this case"
55

56
instance (a ~ Int, AtLeastOne n) => Program n (End -> a) where
57
  run (x : _) End = x
58
  run _ End = error "Unreachable: AtLeastOne n should prevent this case"
59

60
begin :: (Program 0 t) => t
61
begin = run @0 []
62

63
end :: End
64
end = End
65

66
push :: Push
67
push = Push
68

69
add :: Add
70
add = Add
71

72
f = begin push 1 push 2 push 3 add add end
73

74
g = begin push 1 push 2 push 3 add add add end

注意 g 是一箇非法程序，因爲第三次 add 時棧深不夠，我們看看 GHC 給出的信息：

1
E • Illegal program: `add` needs at least 2 value on the stack, but stack depth is 1
2
    • In the expression: begin push 1 push 2 push 3 add add add end
3
    In an equation for ‘g’:
4
        g = begin push 1 push 2 push 3 add add add end typecheck (GHC-64725) [74, 5]
5
E • Illegal program: `end` needs at least 1 value on the stack, but stack depth is 0
6
    • In the expression: begin push 1 push 2 push 3 add add add end
7
    In an equation for ‘g’:
8
        g = begin push 1 push 2 push 3 add add add end typecheck (GHC-64725) [74, 5]

完美！

Kata#

Template#

對比 printf#

題目解讀#

使用類型類#

強化類型類——使棧深可計算#

拒絕非法程序#

對比 `printf`#