Programming AIs worry me
For some inane reason, Github classifies me as a "major open source maintainer", which means I get a free copilot subscription.1 I've been using it for a couple months now and I got to say, it's a goddamn delight. It can write boilerplate like nobody's business. I find the tool works best when I'm using it as a keystroke saving device, where it writes 1-2 lines at a time. I write
x = and it completes with
really[long][dict][lookup]. It's all very easy.
And this easiness worries me. I got a lot more worried when I read What Do ChatGPT and AI-based Automatic Program Generation Mean for the Future of Software, by Bertrand Meyer. He starts by testing ChatGPT with a tricky spec:
I like to use the example of a function that starts with explicit values: 0 for 0, 1 for 1, 4 for 2, 9 for 3, 16 for 4, 25 for 5. [...] This time I fed the above values to ChatGPT and for good measure added that the result for 6 is 35. Yes, 35, not a typo. Now, lo and behold, ChatGPT still infers the square root [sic] function!
About what I expect. After he told ChatGPT that it was for 6, ChatGPT gave back a giant if-else chain. This technically is compliant with his spec but doesn't capture the spirit of what he wants. A good example of the limits of the tool.
But then things take a turn for the worse.
but things becomes amazing again, in fact more amazing than before, when I point out my dissatisfaction with the above style:
The inferred function is rather impressive. What human would come up with that function in less time than it takes to say "Turing test"?
Except the inferred function is completely wrong. Not even subtly wrong. The first incorrect input is 2. Bertrand didn't notice.
Now, here's some important context: Bertrand Meyer's entire deal is software correctness. He invented Eiffel. He trademarked Design By Contract (tee em). He regularly rants about how SEs don't know about logic. He didn't notice the error. Oh, and this article had 114 comments on Hacker News and exactly one commenter (of 48) noticed.
Using AI-assisted code changes our work from writing code to proofreading code. And that's a problem.
Proofreading is hard
So a quick story: back in 2020 I experimented with voice-to-text. I bought a Dragon license and everything. I can speak a lot faster than I can type, after all! I'd say it was about 95% accurate. The problem was finding that 5% of errors. Most of the typos I make when writing feel wrong. I can feel my fingers slip up and type the wrong thing, and I can immediately go back and correct it. But most speakos feel normal. I say "Code matches the spec" and it transcribes "code smashes the spec". After I wrote something, I'd have to go through very carefully and correct all the speakos. It was goddamn exhausting, and many errors still fell through. Proofreading is hard!2
Over time I bent my workflow around proofreading, like putting each spoken sentence on a newline to break my reading flow. This helped find more errors but made the whole process even more miserable, and eventually I just gave up and went back to typing.
It takes longer to write a code "sentence" than a prose one, so a sentence-level generate-proofread loop is still more convenient than writing everything manually. That's why I like Copilot. But as we start using AIs to generate larger blocks of code, we're going to be faced with more and more proofreading work. And I'm worried more bugs will then slip through. If Bertrand Meyer can't proofread closely enough to catch errors, what hope do us mere mortals have?
Two reasons I could be less worried:
- People need to be proofread too, so as long as AIs eventually make fewer mistakes than the average programmer, they're a net win.
- We can make proofreading easier with better tooling. Unit tests are a means of "proofreading": we can catch AI errors automatically with tests.
I don't know how true either these are. I can certainly see a future where both of these are true, and we happily use AIs without a second thought. I also see a future where we don't adapt our skillsets and tooling around using AIs "properly", and they become a net negative for a lot of people. I don't know! That's why I'm excited but also worried.
We'll also have to see what happens when Copilot (and ChatGPT, sorta) aren't the only games in town. Are there going to be AIs that specialize in certain domains? AIs that specialize in writing tests? AIs that are designed for language newcomers? I feel like I'd be a bit less worried if there was a more diverse ecosystem, for some reason.
Programming AIs I want
Might as well share a wishlist.
- A ChatGPT-style AI that can only reply with links to reference docs, libraries, or wikipedia articles
- An AI that only highlights possible issues in the code, like "this looks kinda like an N+1 query" or
- An AI that takes code and generates comments, mostly so I could quickly understand new configuration formats
- AI-guided property-based testing. We already have AI-guided fuzzing, why not apply that at a more granular level
While I have your attention, I'll close with my favorite use of ChatGPT:
Update for the Internets
This was sent as part of an email newsletter; you can subscribe here. Common topics are software history, formal methods, the theory of software engineering, and silly research dives. Updates are 6x a month. I also have a website where I put my polished writing (the newsletter is more for off-the-cuff stuff). That usually updates monthly, though I want to try getting it to bimonthly at some point.
Also you can check out the flip side, where I talk about ways of using ChatGPT to program better here
Also also April Cools is coming up woooo write something for that if you have a blog, it's hella fun
If you're a github employee, plz plz plz don't look into this, let me have my fun ↩
This is also why a lot of people hate code review. It's good when you're acting as an editor, looking for foundational improvements to the code, but it's awful when you're acting as a proofreader. That probably deserves its own essay! ↩