sd-visualiser: interactive hypergraph visualisation for programs as string diagrams

Nick Hu

Calin Tataru

Alex Rice

Dan Ghica

12th March 2025

Programs and semantics

What is a program?

A program is a sequence of instructions which, when executed, performs a computation.

For example, consider the program which computes \(7 + 7 + 7 + 7 = 28\).

Computations are ordered: \[ \underline {7 + 7} + 7 + 7 \to \underline {14 + 7} + 7 \to \underline {21 + 7} \to 28. \]

This program is compiled to the following ARM assembly machine code:

MOV R0, 7     // load 7 into register 0
ADD R1, R0, R0 // add register 0 to register 0 and store in register 1
ADD R1, R1, R0 // add register 1 to register 0 and store in register 1
ADD R2, R1, R0 // add register 1 to register 0 and store in register 2

What is a compiler optimisation?

Theory of arithmetic tells us that \(x + x + x + x = x \times 4\).

An optimising compiler recognises \(7 + 7 + 7 + 7\) fits this pattern: \[ \underline {7 * 4} \to 28. \]

We get a shorter, more efficient program:

MOV R0, 7     // load 7 into register 0
MOV R1, 4     // load 4 into register 1
MUL R2, R1, R0 // multiply register 1 to register 0 and store in register 2

In general, compiler optimisations are about finding semantic-preserving transformations on programs which make them more efficient.

What does it mean?

Programs: terms of a type theory (e.g. arithmetic expressions, λ-calculus).

Denotations: mathematical objects associated to a term (e.g. numbers, functions).

Categorical semantics: \[ \begin {aligned} \text {type theory} &\cong \text {category}, \\ \text {types} &\cong \text {objects}, \\ \text {terms} &\cong \text {morphisms}. \end {aligned} \]

Curry-Howard-Lambek correspondence: \[ \begin {aligned} \text {simply typed $\lambda $-calculus} &\cong \text {cartesian closed category}, \\ \text {types} &\cong \text {objects}, \\ \text {terms} &\cong \text {morphisms}. \end {aligned} \]

What is a computer?

A computer is nothing more than a string processing machine.

Programs are always represented as strings.

We use encodings to represent fancier data structures, and it matters how we encode them.

Traditional abstract syntax trees are a way to represent programs as trees:

\[ ((7 + 7) + 7) + 7 \] \[ \leadsto \]

This talk: string diagrams give us a better way.

Crash course on λ-calculus

Fix a countably infinite set of variables \(\mathcal {V}\).

λ-terms \(\Lambda \) are generated inductively by:

\(\lambda \) is a binder; variables which are not bound are free: \[ \forall t \in \Lambda . \text {FV} (t) \subset \mathcal {V}. \]

α-equivalence: terms are equivalent up to renaming of bound variables. \[ \lambda x. t \equiv _\alpha \lambda y. t[y/x]. \]

Captures the idea that the name of a variable doesn't matter.

String diagrams

String diagrams are a graphical notation for terms in different types of monoidal categories.

The term \((f \otimes \text {id}) \circ (\text {id} \otimes g)\) is represented by the string diagram:

Equations of terms arising from the monoidal structure are captured by isotopy of string diagrams.

Cartesian monoidal categories (i.e. \(\otimes = \times \) and \(I = 1\)) admit a natural copy-delete comonoid:

\[ \forall f. \] \[ = \]

Programs represented as string diagrams

sd-lang

Introducing sd-lang, a toy language for programs with let bindings.

syntax

essentially λ-calculus with arithmetic operations and recursive let:

e.g. bind x = v1 y = v2 … in v.

values are variables, thunks, or operations op(v1, v2, …):

plus(x, y), eq(x, y), if(cond, tb, fb), etc.

semantics

a model of string diagrams for traced cartesian closed categories.

Example: factorial

bind fact = lambda(x .
    if(eq(x, 0),
      1,
      times(x,
        app(fact,
          minus(x, 1)
        )
      )
    )
  )
  in app(fact, 5)
factorial as a string diagram

Factorial as an AST

bind("fact",
    lambda("x",
      if(eq("x", 0),
        1,
        times("x",
          app("fact",
            minus("x", 1)
          )
        )
      )
    ),
    app("fact", 5)
  )

Compiler optimisations are described by semantic-preserving transformations on these ASTs given by rewrite rules.

ASTs do not support sharing, or α-equivalence I

Consider the expression \((x + 1) + (x + 1)\) (where \(x\) is free).

This is represented by the sd-lang expression: plus(plus(x, 1), plus(x, 1))

Its AST is:

ASTs do not support sharing, or α-equivalence II

Problem: The term obtained by the α-invariant substitution \([x \mapsto y]\) is represented by a different AST.

Consequence: The optimisation \(\texttt {plus}(x_1, x_2) \to \texttt {times}(x_1, 2)\) needs to do a non-trivial computation to be valid, namely checking that \(x_1 \equiv _\alpha x_2\).

Can leverage de Bruijn indices, nominal techniques…

String diagrams do support sharing, and α-equivalence

Our string diagrams are equipped with a natural copy-delete comonoid.

This allows for a more meaningful representation of this program as the string diagram:

\((x + 1) + (x + 1)\) — observe that \(x\) does not appear in the diagram!

Nodes represent operations, and edges represent dataflow (e.g. of values)!

ASTs do not support binding and shadowing

Another way to write this program: bind y = plus(x, 1) in plus(y, y)

AST:

\[ \neq \]

ASTs vs string diagrams

Program AST String diagram
plus(plus(x, 1), plus(x, 1))
bind y = plus(x, 1) in
plus(y, y)

Compiler optimisations as string diagram rewriting

The optimisation we care about is:

\[ = \]

For \((x + 1) + (x + 1)\), derive:

\[ = \]
\[ = \]
\[ = \]
\[ = \]

An aside on graphs

Graphs are also used in production compilers to sidestep these issues

  • They also convey information efficiently, and naturally support sharing
  • They are not algebraic: no inductive structure, hard to reason about and distil algorithms which preserve invariants

String diagrams can be thought of as an intermediate representation between ASTs and graphs

  • Have enough graphical structure to support sharing and α-equivalence
  • They are algebraic, as they represent terms of some kind of monoidal category
  • Support a natural theory of rewriting via double-pushout graph rewriting (corresponding to equipping the monoidal category with equations)
  • ( ?) Not very well studied, lack of tooling(!)

How to draw a string diagram

Hypergraphs quotient monoidal categories with copy-delete.

For each hypergraph, we need to pick a representative monoidal term:

  • involves (non-canonically) foliating the hypergraph into layers, and determining the order of operations (which determines how many 'swaps' are needed);
  • Aesthetically-pleasing diagram heuristic: minimise the number of layers, and the number of swaps (NP-hard).

Given a monoidal term, we can construct a big linear program to determine the coordinates of each node and positioning of edges [2].

LLVM and sd-visualiser

Anatomy of a modern compiler

Bug hunting fibo.c

Consider the following (buggy) Fibonacci program:

// fibo.c
#include <stdio.h>
int fib(int n) {
  int f, f0 = 1, f1 = 1;
  while (n > 1) {
    n = n - 1;
    f = f0 + f1;
    f0 = f1;
    f1 = f;
  }
  return f;
}
int main() {
  int n = 9;
  while (n > 0) {
    printf("fib(%d)=%dN", n, fib(n));
    n = n - 1;
  }
  return 0;
}
; clang -emit-llvm fibo.c -O1 -S
define i32 @fib(i32 %0) {
  %2 = icmp sgt i32 %0, 1
  br i1 %2, label %3, label %10

3:                    ; preds = %1, %3
  %4 = phi i32 [ %8, %3 ], [ 1, %1 ]
  %5 = phi i32 [ %4, %3 ], [ 1, %1 ]
  %6 = phi i32 [ %7, %3 ], [ %0, %1 ]
  %7 = add nsw i32 %6, -1
  %8 = add nsw i32 %4, %5
  %9 = icmp sgt i32 %6, 2
  br i1 %9, label %3, label %10

10:                   ; preds = %3, %1
  %11 = phi i32 [ undef, %1 ], [ %8, %3 ]
  ret i32 %11
}
buggy Fibonacci string diagram

Demo

Also available at https://sd-visualiser.github.io/sd-visualiser.

Future work

  • Extensions to sd-lang to make it a 'generic hypergraph description language'
  • String diagram rewriting
  • (Hyper)graph analysis, similar to compiler analyses performed by LLVM
  • C → ClangIR → MLIR pipeline

References

  1. Dan Ghica , Fabio Zanasi, String Diagrams for \lambda -calculi and Functional Computation, October 2023
  2. Calin Tataru , Jamie Vicary , A layout algorithm for higher-dimensional string diagrams, May 2023
  3. Mario Alvarez-Picallo , Dan Ghica , David Sprunger , Fabio Zanasi , Functorial String Diagrams for Reverse-Mode Automatic Differentiation, 2023
  4. Dan Ghica , Koko Muroya, Todd Waugh Ambridge, A robust graph-based approach to observational equivalence, September 2021
  5. Chris Lattner, Vikram Adve, LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation, March 2004