No newsletter next week
Running the TLA+ workshop. No way I’m gonna have any brainpower after that.
Syntax highlighting is a waste of an information channel
No, not a waste in general. Syntax highlighting is quite useful. I’m saying it’s a waste of an information channel. Here’s a quick demonstration of what I mean. Here’s 399 squares and one circle. Where’s the circle?
Round two. Where’s the circle?
Color carries a huge amount of information. Color draws our attention. Color distinguishes things. And we just use it to distinguish syntax.
Nothing wrong with distinguishing syntax. It’s the “just” that bothers me. Highlighting syntax is not always the most important thing to us. The information we want from code depends on what we’re trying to do. I’m interesting in different things if I’m writing greenfield code vs optimizing code vs debugging code vs doing a code review. I should be able to swap different highlighting rules in and out depending on what I need. I should be able to combine different rules into task-level overlays that I can toggle on and off.
I’ve listed some examples of what we could do with this. If this is something that already exists I included a link. Otherwise I included a mockup. Some of the examples have implementation issues beyond what I discussed; they’re just demonstrations of what highlighting could be. All examples are Pythonish unless otherwise noted.
Some Use Cases
This is a pretty common one. We can use different colors to mark how nested a set of parenthesis are. From here.
Highlight different levels of nesting. From here.
Highlight identifiers imported from a different file.
- Highlight imported functions and classes differently
- Highlight qualified imports
- Highlight imports from particular trees
Arguments passed into the function are highlighted differently from local variables or global identifiers.
- Carry it through to aliases (if we assign the argument to another value, highlight that too)
- Highlight local variables only
- Highlight values that will be assigned to something
- Highlight variables used in loops
Highlight all list variables and integer variables with different colors.
- Highlight all iterables
- Highlight all functions returning option types
- Highlight all variables that could be one of two types
- Highlight all polymorphic types parameterized to integers
Highlight functions that raise errors not caught in their body.
- Highlight all functions with try blocks
- Highlight functions that raise user-defined exceptions
- Highlight functions that raise a specific exception
- Highlight functions that catch a specific exception
Highlight functions that were directly called in the bodies of tests that failed in the last test run.
- Highlight functions without precondition decorators
- Highlight functions that are part of a certain stacktrace
- Highlight functions which are defined in our branch but not the master branch
Random other ideas I didn’t mock up
- All functions that transitively call functions that make an http call
- All variable identifiers we assign to twice
- All classes with more than 10 user-defined methods
- All functions more than 100 lines long
- All functions without docstrings
- All lines last edited by a particular member of the team
- All identifiers marked “deprecated” in a certain design document
- All functions with a
# TODO comment inside them
Why aren’t things this way? There’s both essential and coincidental challenges that make fully leveraging color a lot harder than just having syntax highlighting.
First is actually implementing rules. Some of these require access to the code’s AST, some require broader knowledge of the project, some require runtime information. Some of the ideas are even infeasible; accurately tracking aliasing is an open problem for most languages. Syntax highlighting, by contrast, is usually a matter of regexes and hierarchical state machines. That’s how pygments does it. Semantic highlighting would have to be made from scratch for each language.
Second is highlighting conflicts. What if something needs to be colored two things for two different reasons? In syntax highlighting this is less of a problem because you have an ordered list of matchers. But with semantic highlighting we might have dynamic priorities, where rule A is more important to us now while rule B is more important to us later. Things get even more complicated if we have multiple distinct overlays, which themselves can have priority conflicts. Semantic highlighting would need a much more complex design and implementation than simple syntax highlighting does, and adding overlays makes it even more complicated.
Finally, existing editors just aren’t well set up to handle this. Vim’s syntax highlighting is a mess of regular expressions and special cases. VSCode and (I believe) Atom use TextMate grammars, which assume a single canonical tokenization per file. VSCode recently added semantic highlighting but it seems more oriented to augment the existing syntax highlighting, not radically rethink it. I have no idea what Emacs does.
So I think this is something we’ll eventually have, because the potential advantages are too great to ignore forever. But it will take us a long time to get there. Maybe we’ll see it first with toy languages where the AST is simple enough and the expressiveness is low enough to make semantic highlighting easy.
Update for the influx of new readers
This was a newsletter post, you can subscribe here