Apologies for the delays in this newsletter. With the baby and the coup, the newsletter took a momentary backseat. I also wrote a few drafts (not about LaTeX) that I was unhappy with and scrapped completely.
TeX/LaTeX (and its cousins XeTeX, LuaTeX, etc.) is the de facto typesetting system of choice for all mathematicians. Basically all papers and math books are written in it, and its “math mode” syntax has been widely adopted by the most widely-used web-based math display systems such as MathJAX.
TeX was written by Donald Knuth in 1978, and LaTeX (a system of macros written on top of TeX) was written in the early 1980’s. Development has continued on LaTeX, and the underlying TeX system has been effectively been marked as “complete.”
Tex/LaTeX extends far beyond math equation typesetting. It is a complete typesetting engine and can be used to typeset entire books, ready to print. I’ve personally done this twice. And I should stress this: LaTeX was designed for print. It shines for print-specific problems like page layout, and automatically tracking complex cross-section or cross-chapter references to theorems, diagrams, etc.
So what’s not to like? TeX/LaTeX is a software masterpiece. It’s a 40-year old piece of software that still has millions of active users.
That said, even masterpieces deserve criticism. Receiving thoughtful, respectful criticism from invested stakeholders is an honor. It means they care in a world that makes it too easy to burn villages down. I endeavor to provide honorable criticism about things that matter to me.
As mentioned, TeX is the core typesetting system, and LaTeX is a system of preprocessor macros built on top of TeX. LaTeX is a state-heavy state machine. In particular, on each run, various parts of LaTeX generate auxiliary files that are used by later runs of LaTeX to compile the final document. This is why when compiling a document, most people need to run something like
pdflatex main.tex bibtex main pdflatex main.tex pdflatex main.tex
If you use one of the monolithic LaTeX IDEs,
or a tool like
this repetition step is handled for you.
Still, it can cause problems.
Some steps of the typesetting chain are skipped
when old auxiliary files are present on the file system.
When something goes wrong—i.e.,
when my changes aren’t showing up in the compiled document—and
I can’t figure out what happened,
deleting random auxiliary files and recompiling often fixes it.
This is the “turn it off and turn it back on again” method of fixing problems, and leads people with principles like mine angry at the tool. Based on Lamport’s writing and the time it was designed, this decision is almost certainly due to the computational constraints of the time. Those constraints don’t exist today.
The tradeoff for this choice is instability and confusion. The environment in which LaTeX runs is not hermetically sealed. As the modern software discipline emphasizes, there are more problems that can occur due to this than are dreamt of in anyone’s philosophy.
At its core, TeX is a macro preprocessor. This macro language is Turing complete, which means you can use it to write arbitrary programs. People have taken this too far. LaTeX has packages implementing a presentation language, a drawing language, a magnetic field plotter, a genealogy tree plotter, and much much more.
In some ways this is great, and part of TeX’s killer feature of extensibility. In some ways, this is a disaster, and none of these things should be done in TeX. It makes LaTeX monolithic.
Tools should do one thing and do them well, and be easy to compose with other tools. Plotting magnetic fields, while cool, is not within scope. It also limits the usefulness of the magnetic field plotter, since it can only be used in the context of document typesetting.
More to the point, the decision to make TeX’s macro language Turing-complete could have been made differently. Someone with more knowledge of LaTeX than me might have a broader picture of how reasonable it could be to write the most widely-used parts of LaTeX without Turing-completeness of the macro system. From experience, I suspect that it is not essential.
The consequence of this design choice is that it’s harder to write extensions for TeX, and it’s harder to analyze LaTeX source statically. By that I mean, any tool that hopes to transform a TeX file into another document format, or automatically detect problems in TeX code, cannot be correct without actually compiling the document to see how all the macros expand. This adds resistance to the wider ecosystem, and seems like a big part of why we don’t have good tools to turn a beautiful TeX document into a beautiful webpage. To the extent that we do, those tools actually ignore the macro system and cross-compile the high-level, most-used macros directly.
This can be considered a criticism of TeX’s complexity, because if we could just convert from the underlying TeX format, we could write a converter once and all LaTeX packages would be supported automatically.
These kinds of problems were probably inconceivable to Knuth at the time he wrote TeX. “Other document formats” like Word, HTML, and pdf did not exist at the time. Nor did the concept of massively distributed collaboration, or even “lazy developers”—at the time using TeX or LaTeX required buying and reading a book. Today most LaTeX users have to be cajoled into read anything beyond the top answer on TeX StackExchange. Since TeX was also written for print books, most of the focus of TeX—and much of what is now less important than, say, the math mode sub-language—is on the constraints of the physical page.
The complexities of the LaTeX ecosystem promote a culture of avoiding problems in weird ways.
For instance, Tim Gowers,
world renowned mathematician and
admits in this thread
that, to avoid a typesetting problem
with the end-of-proof tombstone,
he would inject meaningless phrases like,
“with proves the theorem.”
The suggested solutions to his problem
also showcases how stupid LaTeX solutions can be.
It involves violating the encapsulation of the
(which defines the theorem/proof macros)
to manipulate a stack
defined for the purpose of nesting proofs within proofs.
The general culture of LaTeX includes (many of which I do or used to do often).
\ \ \ \instead of
\hspaceor any one of the dozens of spacing commands.1
These days most new LaTeX users use Overleaf, a web IDE for TeX that doubles as cloud document storage. One of the “helpful” features Overleaf uses is to compile a document even if there are compilation errors, if possible. This was likely done to remove roadblocks for new users, who would stop using the product if it’s too cumbersome. This is a classic kind of decision made all over modern software that speaks to the difference between Knuth’s time and ours. User growth often outweighs principle, because what use is strong principle when there are no users for it to benefit? Or when the competition moves faster and snatches up all the users before they can experience your genius?
As a consequence, various researchers I follow on Twitter have complained about “kids these days.” Right before a conference deadline, their students send send them TeX which can’t be submitted because it doesn’t compile, and this was detected so late because it was all done in Overleaf.
As someone who thinks people should invest heavily in the proper use of their tools, I agree with the researchers. Kids these days. At the same time, a good system allows users to grow in complexity with the system being used. LaTeX has too many inscrutable barriers to start doing simple things, and the benefit of learning more about LaTeX is not clear, except to fix bugs in LaTeX packages or find slightly better workarounds. Either way, it’s more about annoyances in the tool than learning new abilities.
The core TeX engine itself is a curious software artifact worth peeking at. Here is its source. As you can see, it’s written in a language called WEB. WEB, which compiles to Pascal, was designed by Knuth to promote his vision of literate programming.
In brief, literate programming is the idea that programs can have such nicely written comments that the program and its comments are the documentation. To that end, WEB programs can be compiled into webpages and books. To the best of my knowledge, TeX is the only serious program written in WEB. Anyone who actually uses TeX’s source first cross-compiles it to C. Meanwhile, the book that TeX is compiled to is 500 pages long, and goes unread by all TeX users.
I could expand on why I think literate programming failed, perhaps another time. In short, it’s not practical. Better design, encapsulation, and stronger safety measures are more effective in all cases, and, “think deeper and write better,” is foolish to expect of everyone.
Back to the point, the TeX source, partly by it being written in WEB, and partly due to the orientation around Pascal and the limitations of computers in the age it was written, is inscrutably tangled. There is no easy way to understand the overarching architecture of the program, nor does its documentation allow you to compartmentalize details. See this blog post for another perspective on that. This is a huge barrier to understanding, and hence to improving it.
Finally, Knuth forbids anyone to modify TeX, and considers TeX finished (as seen by version numbers converging to e):
% This program is copyright (C) 1982 by D. E. Knuth; all rights are reserved. % Copying of this file is authorized only if (1) you are D. E. Knuth, or if % (2) you make absolutely no changes to your copy. ... % Version 3.141592 fixed \xleaders, glueset, weird alignments (December 2002). % Version 3.1415926 was a general cleanup with minor fixes (February 2008). % Version 3.14159265 was similar (January 2014).
In interviews, Knuth has stated that any remaining bugs should be considered features. In one sense this makes sense, Hyrum’s Law says that any observable property eventually becomes depended on, and 40-year-old old programs are very sensitive to this.
That said, the fastest way to turn someone off from reading your program is to threaten legal punishment for modification, and to say, “I accept no changes in the future.” Any other benefits of reading the TeX source (such as learning its efficiency tricks) are likely obsolete due to modern computational power and abstractions, though I have not read the entire book myself to confirm.
All together, being written in WEB and surrounded in legal threats probably caused some of the complexity of the ecosystem. If it were written in C, or rewritten in a higher level language, it could at the very least be refactored so as to be organized for readability. Better, the core engine could admit extensions. I believe LuaTeX does something along these lines.
Seriously, who thought it was a good idea to have so many spacing commands? I suppose it’s like how some human languages have more words for culturally important concepts, though I recently heard the “Eskimo words for snow” version of that idea is a myth.