sd-visualiser
: interactive hypergraph visualisation for programs as string diagramsNick Hu
Calin Tataru
Alex Rice
Dan Ghica
A program is a sequence of instructions which, when executed, performs a computation.
For example, consider the program which computes \(7 + 7 + 7 + 7 = 28\).
Computations are ordered: \[ \underline {7 + 7} + 7 + 7 \to \underline {14 + 7} + 7 \to \underline {21 + 7} \to 28. \]
This program is compiled to the following ARM assembly machine code:
MOV R0, 7 // load 7 into register 0
ADD R1, R0, R0 // add register 0 to register 0 and store in register 1
ADD R1, R1, R0 // add register 1 to register 0 and store in register 1
ADD R2, R1, R0 // add register 1 to register 0 and store in register 2
Theory of arithmetic tells us that \(x + x + x + x = x \times 4\).
An optimising compiler recognises \(7 + 7 + 7 + 7\) fits this pattern: \[ \underline {7 * 4} \to 28. \]
We get a shorter, more efficient program:
MOV R0, 7 // load 7 into register 0
MOV R1, 4 // load 4 into register 1
MUL R2, R1, R0 // multiply register 1 to register 0 and store in register 2
In general, compiler optimisations are about finding semantic-preserving transformations on programs which make them more efficient.
Programs: terms of a type theory (e.g. arithmetic expressions, λ-calculus).
Denotations: mathematical objects associated to a term (e.g. numbers, functions).
Categorical semantics: \[ \begin {aligned} \text {type theory} &\cong \text {category}, \\ \text {types} &\cong \text {objects}, \\ \text {terms} &\cong \text {morphisms}. \end {aligned} \]
Curry-Howard-Lambek correspondence: \[ \begin {aligned} \text {simply typed $\lambda $-calculus} &\cong \text {cartesian closed category}, \\ \text {types} &\cong \text {objects}, \\ \text {terms} &\cong \text {morphisms}. \end {aligned} \]
A computer is nothing more than a string processing machine.
Programs are always represented as strings.
We use encodings to represent fancier data structures, and it matters how we encode them.
Traditional abstract syntax trees are a way to represent programs as trees:
This talk: string diagrams give us a better way.
Fix a countably infinite set of variables \(\mathcal {V}\).
λ-terms \(\Lambda \) are generated inductively by:
\(\lambda \) is a binder; variables which are not bound are free: \[ \forall t \in \Lambda . \text {FV} (t) \subset \mathcal {V}. \]
α-equivalence: terms are equivalent up to renaming of bound variables. \[ \lambda x. t \equiv _\alpha \lambda y. t[y/x]. \]
String diagrams are a graphical notation for terms in different types of monoidal categories.
The term \((f \otimes \text {id}) \circ (\text {id} \otimes g)\) is represented by the string diagram:
Equations of terms arising from the monoidal structure are captured by isotopy of string diagrams.
Cartesian monoidal categories (i.e. \(\otimes = \times \) and \(I = 1\)) admit a natural copy-delete comonoid:
sd-lang
Introducing sd-lang
, a toy language for programs with let
bindings.
essentially λ-calculus with arithmetic operations and recursive let
:
e.g. bind x = v1 y = v2 … in v
.
values are variables, thunks, or operations op(v1, v2, …)
:
plus(x, y)
, eq(x, y)
, if(cond, tb, fb)
, etc.
a model of string diagrams for traced cartesian closed categories.
bind fact = lambda(x .
if(eq(x, 0),
1,
times(x,
app(fact,
minus(x, 1)
)
)
)
)
in app(fact, 5)
Factorial
as an ASTbind("fact",
lambda("x",
if(eq("x", 0),
1,
times("x",
app("fact",
minus("x", 1)
)
)
)
),
app("fact", 5)
)
Compiler optimisations are described by semantic-preserving transformations on these ASTs given by rewrite rules.
Consider the expression \((x + 1) + (x + 1)\) (where \(x\) is free).
This is represented by the sd-lang
expression:
plus(plus(x, 1), plus(x, 1))
Its AST is:
Problem: The term obtained by the α-invariant substitution \([x \mapsto y]\) is represented by a different AST.
Consequence: The optimisation \(\texttt {plus}(x_1, x_2) \to \texttt {times}(x_1, 2)\) needs to do a non-trivial computation to be valid, namely checking that \(x_1 \equiv _\alpha x_2\).
Can leverage de Bruijn indices, nominal techniques…
Our string diagrams are equipped with a natural copy-delete comonoid.
This allows for a more meaningful representation of this program as the string diagram:
Nodes represent operations, and edges represent dataflow (e.g. of values)!
Another way to write this program:
bind y = plus(x, 1) in plus(y, y)
AST:
Program | AST | String diagram |
---|---|---|
plus(plus(x, 1), plus(x, 1)) |
|
|
bind y = plus(x, 1) in plus(y, y) |
|
|
The optimisation we care about is:
For \((x + 1) + (x + 1)\), derive:
Graphs are also used in production compilers to sidestep these issues
String diagrams can be thought of as an intermediate representation between ASTs and graphs
Hypergraphs quotient monoidal categories with copy-delete.
For each hypergraph, we need to pick a representative monoidal term:
Given a monoidal term, we can construct a big linear program to determine the coordinates of each node and positioning of edges [2].
sd-visualiser
fibo.c
Consider the following (buggy) Fibonacci program:
// fibo.c #include <stdio.h> int fib(int n) { int f, f0 = 1, f1 = 1; while (n > 1) { n = n - 1; f = f0 + f1; f0 = f1; f1 = f; } return f; } int main() { int n = 9; while (n > 0) { printf("fib(%d)=%dN", n, fib(n)); n = n - 1; } return 0; }
; clang -emit-llvm fibo.c -O1 -S define i32 @fib(i32 %0) { %2 = icmp sgt i32 %0, 1 br i1 %2, label %3, label %10 3: ; preds = %1, %3 %4 = phi i32 [ %8, %3 ], [ 1, %1 ] %5 = phi i32 [ %4, %3 ], [ 1, %1 ] %6 = phi i32 [ %7, %3 ], [ %0, %1 ] %7 = add nsw i32 %6, -1 %8 = add nsw i32 %4, %5 %9 = icmp sgt i32 %6, 2 br i1 %9, label %3, label %10 10: ; preds = %3, %1 %11 = phi i32 [ undef, %1 ], [ %8, %3 ] ret i32 %11 }
Also available at https://sd-visualiser.github.io/sd-visualiser.
sd-lang
to make it a 'generic hypergraph description language'