The TLA+ workshop is in just two weeks! Oct 20-22, register here, use the code COMPUTRONSTUFF for $500 off. Still three slots left for the workshop! It’s gonna be an acid trip. Metaphorically. A metaphorical acid trip. (And also a real one.)
Excel has been having a bad year. Back in August it was slammed for converting genome names into dates and now people are ragging on it for dropping rows in COVID testing data. Everybody talks about how Excel is unauditable, error-prone, a poor substitute for real data analysis languages like Python and R. Those silly scientists, using Excel!
It’s not helpful to blame Excel. Not because it doesn’t have serious quirks that corrupt data, but saying “Excel is bad” and stopping there leads us to ignore the core problems. Python has also had high profile mistakes, too. If scientists used Haskell instead we’d see some truly boneheaded mistakes coming from that. Science tools have problems because science tools are popular and because scientists and researchers are very bad at programming. As programmers we use a wide variety of practices and techniques to reduce errors and make more maintainable code. These techniques require a lot of training and practice to use right. Nonprogrammers don’t have this training and can’t be expected to get this training.1 Of course they’re going to write worse code and make more mistakes.
Further, asking people to use Python/R/Haskell instead of Excel is missing the point of why people use Excel in the first place. Excel is not a programming language. It embeds a full programming language, but that’s not the core draw. Excel makes structured data analysis accessible to nonprogrammers. They can get started with it easily and gradually ramp up to more complex tasks. Using a programming which forces all that complexity on them from the start would drive them away. Hell, compare importing a CSV in Python and Excel. In Python, it’s something like list(csv.reader(open(file, sep))). In Excel, it’s this button:
That doesn’t excuse Excel’s errors. In addition to its many quirks, Excel spreadsheets are hard to review and version control. And its defaults encourage sloppy tables; I didn’t even know you could name cells for formulas until somebody mentioned it on Twitter. But these are all coincidental critiques, not conceptual ones. In saying “Excel implicitly converts genomes to dates” I’m implicitly imagining a spreadsheet editor which doesn’t have these quirks. It is a “coincidence” that we have these problems.2 By contrast, saying “programming languages are hard for nonprogrammers because you have to master a syntactic grammar” is a conceptual problem.3 You can’t fix it without radically rethinking what a programming language is.
This leaves us with a broken system. A group of people without the practices necessary to write high quality code. A set of tools that have been neglected by the people best suited to improve them. Financial structures that discourage additional training for scientists and researchers. We like to focus on Excel because it’s easy, it’s one thing, a single root cause we can blame for all our problems. But the world doesn’t work like that. And pretending otherwise won’t make it better.
After Friday’s SAT essay, several people wrote it to correct my misconceptions about SAT solvers. In particular, many noted that it doesn’t make sense to distinguish “using SAT” from “using SMT”. Jannis Harder wrote his clarifications into a full essay here. My apologies for the errors!
Imagine if you were told that, in addition to your full-time job, you now have to write everything in Pitman shorthand. No, you don’t get any relief from your current tasks to learn Pitman shorthand, you just have to start using it. That’s what we basically ask people to do with programming. ↩
They can’t fix these quirks in Excel directly because companies now rely on these quirks. Hyrum’s law strikes again! ↩
To be fair, you could also argue this isn’t the barrier I think it is and shouldn’t be a problem with simpler languages and better programming education. ↩