Proof Assistants for Typesetting Maths

I am currently trying make our exercise sheets HTML based to allow for things like drop down solutions, which start in a collapsed state. I ended up asking myself whether it might be possible yet to verify proofs submitted by students automatically. So I looked for Proof Assistants and ended up with lean. In particular I tried out the Natural Numbers Game and started reading Theorem Proving in Lean for a more comprehensive introduction.

Landmark based Maths vs Steps in Proof Assistants

The first contrast I noticed was the following. When you write down Maths for another human, you tend to write something like:

We have

Equation 1

which implies

Equation 2

using induction we get

Equation 3

In other words, you write down “Landmark” Equations and assume that the person you are talking to will be able to do the steps in-between. Sometimes you add some directional hints between those landmarks. E.g. “using induction”, but you do not actually write out precise instructions like you do in Lean.

Giving directions in lean feels like saying:

1. Go 10 steps South-East
2. Turn 30 degrees left and go 40 steps
3. Move your foot 30cm forward and then 10cm down
4. Repeat step 3 with alternating feet 5 times 
5. Turn 50 degrees right and go 100 steps

While you would usually just say:

turn South-East, you should see a Watertower there go there. From there continue
on to...

Depending on how familiar some person is to the city you would make your description more or less precise.

Why the Difference?

Step based instructions make perfect sense if you are trying to give directions to a robot. You circumvent the entire vision/identification problem, and you do not have to worry whether or not the culture shifts and what was once a prominent landmark is now hard to recognize, making your directions harder to read in the future.

But, they are absolutely garbage for a human. Because landmark based instructions provide you with information about the state you are in. And it is easy to check whether you are really at a water tower or whether you really got to the next equation. With step based instructions it is often difficult to know whether or not you followed the steps correctly so far.

In other words: you never know if you are in the correct state if you are not perfectly sure that you made the right steps beforehand. And I noticed this writing down proofs myself: I greatly depend on lean telling me the current state interactively when hovering over some step.

In this sense lean feels like the assembly of theorem proving, possibly C. It is shaped by the requirements of a computer and you need your IDE to help you navigate the code as a human.

Solving the vision problem

Is it possible to write landmark based directions and compile them into steps? Maybe. But also maybe not. There is probably a reason Lean was not designed that way. This might be difficult to do with explicit instructions.

But just like you would not attempt to solve a vision/classification problem with classic code, maybe this is not a task for a compiler but machine learning.

Now if you are aware of OpenAi Codex (GitHub Copilot), you might be guessing where this is going. GitHub Copilot attempts to autocomplete your code based on comments, function stubs or list stubs.

Now image GitHub copilot could fill in the instructions between the following comments

-- Move around the corner to your south-east
Go 10 steps South-East
Turn 30 degrees left and go 40 steps
-- walk down the set of stairs
Move your foot 30cm forward and then 10cm down
Repeat step 3 with alternating feet 5 times 
-- walk towards the water tower on your right
Turn 50 degrees right and go 100 steps

But for this to work, you need to be able to train your Copilot on existing code with comments. How could you obtain such a dataset? Well…

Typesetting Interactive Maths

Let us forget the issue we had for a second and consider a different problem. If you want to create an interactive proof, you can not really use (La)TeX. Not only because it is a legacy programming language few people dare to touch, but also because it is on the other side of the machine-interpretable vs human-interpretable side.

For example, MathML uses a unicode for invisible multiplication (U+2062) and one for function applications (U+2061) to discern between the two interpretations of

h(x+y)

One being that h is a function, the second being that h is a constant multiplied with the sum of x and y. TeX simply does not have that. Now MathML has its own problems. Browser support is abysmal. It is basically only fully supported by Firefox. This means that conversion tools like latexml suggest including a fallback script, which includes MathJax for any browser which is not Gecko based. Of course MathJax is MathML based,

MathJax uses MathML as the basis for its internal format for mathematical expressions, so MathML support is built into MathJax at a fundamental level. - https://docs.mathjax.org/en/latest/output/mathml.html

so there isn’t really a way around MathML either, if you want to typeset mathematics for the web. But writing MathML directly is no fun either since it is incredibly verbose. So if you are not converting existing .tex files using a tool like latexml. You are probably better of, with a combination of html or markdown and MathJax or KaTeX. Even though that entails marrying the different escape mechanisms of html or markdown with the TeX escapes (multiple backslashes ensue), and working around the fact that MathJax and KaTeX are far from feature complete TeX compilers.

On the whole: not a fun experience, 1/10, would recommend for the lack of better options.

Typesetting with Lean?

Imagine you could render your Lean proof. One thing would be certain: The computer understands what is going on. Therefore if you do not understand an equation, you could easily use the rewrite rules in your proof to do stuff like animate how the next equation comes to be. But you could certainly add a bunch more steps in-between. In other words: a proof typeset with Lean could be fully interactive and personalized for the reader. How much you show a person might depend on what they read before and how they collapsed or expanded steps before. Maths would become arbitrarily accessible!