Running the TLA+ workshop. No way I’m gonna have any brainpower after that.
No, not a waste in general. Syntax highlighting is quite useful. I’m saying it’s a waste of an information channel. Here’s a quick demonstration of what I mean. Here’s 399 squares and one circle. Where’s the circle?
Round two. Where’s the circle?
Color carries a huge amount of information. Color draws our attention. Color distinguishes things. And we just use it to distinguish syntax.
Nothing wrong with distinguishing syntax. It’s the “just” that bothers me. Highlighting syntax is not always the most important thing to us. The information we want from code depends on what we’re trying to do. I’m interesting in different things if I’m writing greenfield code vs optimizing code vs debugging code vs doing a code review. I should be able to swap different highlighting rules in and out depending on what I need. I should be able to combine different rules into task-level overlays that I can toggle on and off.
I’ve listed some examples of what we could do with this. If this is something that already exists I included a link. Otherwise I included a mockup. Some of the examples have implementation issues beyond what I discussed; they’re just demonstrations of what highlighting could be. All examples are Pythonish unless otherwise noted.
This is a pretty common one. We can use different colors to mark how nested a set of parenthesis are. From here.
Highlight different levels of nesting. From here.
Highlight identifiers imported from a different file.
Arguments passed into the function are highlighted differently from local variables or global identifiers.
Highlight all list variables and integer variables with different colors.
Highlight functions that raise errors not caught in their body.
Highlight functions that were directly called in the bodies of tests that failed in the last test run.
Why aren’t things this way? There’s both essential and coincidental challenges that make fully leveraging color a lot harder than just having syntax highlighting.
First is actually implementing rules. Some of these require access to the code’s AST, some require broader knowledge of the project, some require runtime information. Some of the ideas are even infeasible; accurately tracking aliasing is an open problem for most languages. Syntax highlighting, by contrast, is usually a matter of regexes and hierarchical state machines. That’s how pygments does it. Semantic highlighting would have to be made from scratch for each language.
Second is highlighting conflicts. What if something needs to be colored two things for two different reasons? In syntax highlighting this is less of a problem because you have an ordered list of matchers. But with semantic highlighting we might have dynamic priorities, where rule A is more important to us now while rule B is more important to us later. Things get even more complicated if we have multiple distinct overlays, which themselves can have priority conflicts. Semantic highlighting would need a much more complex design and implementation than simple syntax highlighting does, and adding overlays makes it even more complicated.
Finally, existing editors just aren’t well set up to handle this. Vim’s syntax highlighting is a mess of regular expressions and special cases. VSCode and (I believe) Atom use TextMate grammars, which assume a single canonical tokenization per file. VSCode recently added semantic highlighting but it seems more oriented to augment the existing syntax highlighting, not radically rethink it. I have no idea what Emacs does.
So I think this is something we’ll eventually have, because the potential advantages are too great to ignore forever. But it will take us a long time to get there. Maybe we’ll see it first with toy languages where the AST is simple enough and the expressiveness is low enough to make semantic highlighting easy.
This was a newsletter post, you can subscribe here