Structural Exploitability (Across the Sundering Seas 2020 #06)

stops

                            February 8, 2020

                Structural Exploitability (Across the Sundering Seas 2020 #06)

                        Good Saturday to you, readers!
In case you somehow subscribed to this email in a semi-conscious daze after Groundhog Day festivities, were hate-subscribed to it by a mortal enemy, or otherwise just don’t know what you’re reading—
This is Across the Sundering Seas, a weekly newsletter by Chris Krycho (me!) about the things I’m reading and studying, in the hope that they’ll be interesting and illuminating to you. But if it ever stops being interesting, or for that matter if you were hate-subscribed by a mortal enemy, your way out is right here!
Thanks for reading!
This week: digging further into the questions I raised last week—with an emphasis on what it actually means for software to be exploitable. Before I’m done, we’ll have in hand what I hope is a useful distinction for talking about these things going forward: between incidental and structural exploitability.
1. Are Facebook’s problems less serious than Boeing’s?
A careful reader wrote back in response to last week’s email:

…the Rohingya genocide seems to have been strongly influenced by Facebook use.… Facebook doesn't even need a bug in its system to be used by a military to incite genocide. Planes crashing is an immediate problem: even when "operators are doing the right thing," as you noted, the plane can crash. But Facebook is a software that is bad because it is exploitable without bugs. It can be exploited to the sorts of things that do actively result in violence, but the actions that are leading to violence are not bugs but content. Both of these types of software problems lead to death, one immediately and one indirectly. Which is worse--the deadly bug that can be fixed or the deadly design problem that would require going against the business model to fix?

This is a really astute point, and one I actually thought about bringing up in the last issue. However, it was late in the day, and I was ready to just go read a book for a while. Sooooo I did the thing that a good writer should never do, and I let the thought slide. But I really shouldn’t have, because this reader’s point in many ways reinforces the point I was trying to make last week.
My point last week was not in the least “we don’t need to regulate Facebook, only Boeing and maybe Microsoft.” Rather, it was that the dynamics in play between Facebook and Boeing and Microsoft (and Apple and Amazon and Netflix and Google and…) are different—and therefore that conflating them is only going to make a worse mess. As this reader noted: Facebook’s software can be working perfectly accordingly to Facebook’s metrics… and have catastrophic effects as a result. The same goes for YouTube and Twitter and every other platform driven by that Web 2.0 Holy Grail: engagement.
Lumping engagement-driven companies’ and technologies’ failure modes under the same heading as the Boeing bug, or even the Windows security vulnerability, ends up obscuring essential distinctions. Keeping a clear eye on those distinctions is a hard necessitiy for getting any kind of legal/regulatory response right. All of these require societal and legal responses, as my friend Stephen and I have long argued on our podcast. But the appropriate response for how we deal with security-critical bugs is not the same as the appropriate response for how we deal with software working as designed but with catastrophic geopolitical effects. The reasons why come down to how and why the software in question is exploitable.
2. Magical thinking
I want to dig into some of the ways Wechtenhiser really got the details of his argument wrong—because I think those details actually matter quite a bit if we’re going to respond in ways that help rather than hurt.
(Note that for the rest of this post, I’m going to use Facebook as a stand-in for the others for the rest of this issue, and likewise I’ll refer to Windows rather than “Windows or macOS or Linux or FreeBSD or…” for operating systems, but everything I say here applies equally to all of them.)
Wechtenhiser more than suggested that folks in software aren’t taking security seriously. This is true in some cases—hello, Equifax—but really, really isn’t when it comes to Windows. Folks who know me know that I loathe Windows from an experiential point of view, but the people who maintain Windows and who are devoted to its security are deadly serious about vulnerabilities. For that matter, I’ve been working in software for a little over a decade now, and I can say that every single company I have worked for (including when it was just me working on various projects solo!) has taken security extremely seriously. Sometimes comically seriously, in ways that the software in question really didn’t warrant!
Don’t get me wrong: there are negligent software developers out there. There are whole cultures of negligence and incompetence. Clearly, in their own distinct ways, both Equifax and Boeing deserve that criticism. So, I do not doubt, do many others. And just as we do with other industries, we should make companies legally liable for the kinds of risks they take and the consequences their failures have for their customers. The class-action lawsuit outcome from the Equifax data loss was ridiculously difficult for people to benefit from, even though its harms were not small. That goes as much or more for failures like the Boeing crash. Change is long overdue here.
All of that granted, Wechtenhiser’s read on software quality is deeply wrong in two important ways. 
First, software is a human product, and humans—even humans at their very best, even humans regulated within an inch of their lives—make mistakes. This is not specific to software! Non-software engineers just miss things sometimes, too. The world is complicated and messy and no one can foresee all physical possibilities. We have layers of review and validation for different kinds of physical engineering as a result: building a house gets inspected rigorously; but building a space probe gets inspected much more rigorously. Depending on where you live, you might be able to put up another building on your property with no one caring at all; but no one gets to build a nuclear reactor without jumping through innumerable hoops. This is as it should be! The higher the cost and the higher the risk, the more scrutiny engineering efforts come under.
The same should go for software—because software engineers are no less fallible than any other kind of engineer. We have many practices in place as an industry already that help mitigate these things: design reviews, code reviews, automated testing, manual testing, even things like programming language design and formal verification tools. The more serious the endeavor, the more of those we should be employing. The reality is that all too often software engineers do underestimate just how much due diligence they owe to a given question. Nor have governments been appropriately regulating the spaces where both costs and risks are high. This is, perhaps unsurprisingly, not all bad: where government has intervened, the results have sometimes been even worse for software quality. Getting regulations right is very difficult.
But in any case, none of this need be attributed to carelessness or a “too big to fail” mentality per Wechtenhiser’s claim. Building software is no less complex a task than building buildings, though it is very different. Even the best engineers make mistakes, and even the best managers misjudge the risk level of certain calls. Software is not special in that regard.
Second, and more closely related to the broader themes both this issue and the last address: a common thread between Windows vulnerabilities and Facebook algorithms, but not Boeing’s bug, is that they are targets for malicious actors. Windows was not written with a bug because its engineers were just being sloppy or didn’t care about the thing they were building. Rather, two factors combine to produce this vulnerability:

the sheer complexity of a piece of software designed to work across myriads of devices spanning decades of hardware designs and to run applications of equal variation and longevity;
that this software, by dint of its ubiquity, is a target for bad actors

This kind of vulnerability is not so much akin to a badly built building, but to the fact that buildings are just generally being vulnerable to people walking in with guns, much less targeted bombings. Even the most hardened secure facilities in the world are not invulnerable to concentrated attacks from, you know, nation-states. But those are precisely the analogies to reach for when we are talking about vulnerabilities in an operating system disclosed to the software vendor by the NSA. Harden all you like: concentrated efforts by bad actors will get past your defenses.
Granted that nation-states are usually pretty leery of invading buildings because of the consequences of doing so—a point the reader who emailed me correctly noted. Even so, real-world buildings are often compromised by much less sophisticated kinds of attacks than all that. Bank robberies happen… and it’s not just because banks don’t care, have “too big to fail” mentalities, or have created cultures where their guards aren’t responsible! Security, analog or digital, is just plain hard.
This problem is exacerbated by the fact of both hardware’s and software’s rapid advance in capabilities over the past decades. Security measures that were actually unbreakable at the time of their design—and indeed which then appeared to be unbreakable for the foreseeable future—are now trivially bypassed. Hardware gets faster, and software gets better. Mathematicians working on cryptography find previously unrealized problems in old approaches. Attackers get more clever, exploiting side effects even of improvements to performance in ways that if I explained them here would just sound like wizardry (I understand them well enough and they still seem like wizardry at times).
These two failure modes are not always (and probably not even most of the time) functions of either malice or incompetence. They are functions of reality. The idea that software can magically avoid failures modes which are properties of literally everything else humans make is just magical thinking. Software isn’t special.
3. Structural exploitability
Facebook’s vulnerability is in some ways similar to the Windows vulnerability, in that the vulnerability a function of two similar factors:

its scale and ubiquity: it has penetrated to perhaps a third of the world’s population at this point
that it is therefore a target for bad actors

However, there’s an important difference in kind between Windows and Facebook. Operating systems Windows are incidentally exploitable. Exploiting it requires a malicious actor to go trying to find a way around everyone’s best efforts to keep things safe to find a vulnerability. by contrast, Facebook is structurally exploitable. Even reasonably well-intended people can contribute to serious harms through their use of the site, and it serves by nature as a vector for wicked radicalization. No one has to go looking for arcane hacks to exploit.
(To be clear: Facebook surely enables many good things as well. But we cannot merely trade goods against ills and hope they happen to stack up well enough—not least because no number of “but my friends helped me when I experienced a bad thing!” moments can outweigh a genocide.)
The problem is of course not that Facebook engineers mean to cause genocides. They, like you and I, are horrified by those outcomes. It is, rather, that all the business incentives mean maximizing people’s likelihood to use the site and to keep using the site. Ad revenue depends on eyeballs. Facebook is therefore structurally incentivized to make choices that leave it open to exactly the kinds of mass hysterias and violent surges that have led to out-and-out genocides in some places, and the rise of vicious dictators with their own varieties of purges in others. There is no need to find a hack. The algorithm’s surfacing of the content that will keep you coming back is enough.
This is structural exploitability: when the only way to eliminate the exploitability of the system is to rebuild it from the ground up on new principles entirely.
Importantly for the distinction I’m trying to make here, the Boeing bug was not a case of exploitability in the first place. It was a vulnerability to failure, but no malicious actors were involved at any point in the 737 Max software problem. No one exploited anything at all. There is more than one way that software systems can fail. Some of them are matters of exploitation; others are matters of mistakes—sometimes catastrophic. Sometimes mistakes are catastrophic precisely because they create vulnerabilities not just to failure but to exploitation. This also makes clear that structural failings—of which I think the Boeing bug was a clear example—are not the same as structural exploitability.
4. How we respond
Returning to my reader’s email:

This is a real question: which is worse? Software's immediate direct results or its longterm indirect results?
It may not be worth it to pick; both are bad. So then, how do we deal with both of those, since we are already dealing with and will have to deal with both of those? If we say that only its direct results are liable, that leaves out knock-on effects. If we say that only indirect results are liable (which is actually what we are seeing…), then we've created a warranty-less system with legal ramifications for how people use it, which seems bad. 

I think the answer is: it depends. (That’s not a cop-out, though I recognize it probably sounds like one at first blush.) If the software’s immediate direct results are that a nuclear reactor melts down or that a nuclear submarine’s communications go haywire as if it were being jammed during a war, well, there’s a good chance that it’s worse than the long-term effects of Facebook—even if Facebook has been a significant player in world horrors like ethnic cleansing. But by the same token, the way that platforms like YouTube and Twitter and Facebook have become means for radicalization into outright terrorism—whether white nationalist, Islamic, radical leftist, or otherwise—means they are dangerous. Likewise, a bug in avionices avionics software which could lead to the deaths of hundreds or thousands of people is serious in one way, while a vulnerability in an operating system that exposes people to would-be attackers is serious in another.
These dangers are different in kind and therefore warrant different responses—serious responses, but different responses. Taking them seriously requires that we deal with them as appropriate to the kind of danger they represent and the reasons behind their failures, though. Bundling our responses to late-discovered operating system vulnerabilities, perfectly functioning engagement-optimization algorithms, bugs in avionics software, and so on into a single law would be its own kind of catastrophe—slow-moving, perhaps, but certainly full of many terrible side effects, and very likely to make things worse rather than better.
5. Some proposals
In closing, my own gesture at what I think that might look like:
In the case of operating systems, we likely don’t need to make any laws: they already have the right incentives in place from the market. All major vendors pay people to disclose bugs to them, because they want to fix the bugs rather than have them sold to a malicious actor and exploited! They aren’t perfect—but they all have their heads in the right places.
In the case of Boeing, by contrast, those market forces clearly aren’t working. The remedy is unlikely to be tighter regulation on the development of software, though. The problem is instead in the agglomerative bent of the air and space industries for the last thirty years. Between its acquisitions and its incestuous relationships with all the other avionics companies, Boeing has little to fear from competition. Breaking up the massive corporations that now dominate that space would likely do far more to prevent another 737 Max-style incident than more regulation bent on the software itself.
Finally, in the case of the engagement-driven web platforms, I have to admit that I don’t have a good handle on what kinds of regulation will do the trick. I do think that break-ups within the industry and prevention of further anti-competitive acquisitions (a la Facebook’s acquisition of Instagram and WhatsApp over the 2010s) would help, but given that the problems are fundamentally matters of structural exploitability, even these kinds of shifts won’t solve the problems. They will at best mitigate the worst of the harms… but they also, in many ways, may incentivize more of the rush to the bottom when it comes to keeping eyeballs on ads. Nor can we just slap on bandaids like making infinite scroll websites illegal.
Where does that leave us? Scrambling, still, to make sense of the world we’ve been building faster than we can make good choices about it. But hopefully at least a little wiser about the distinctions we must keep in play if we are to make those good choices.

P.S. Infinite scroll websites should be illegal… as a crime against humanity in the form of horrible user experiences. Just paginate, okay? It’ll be fine.

                            Don't miss what's next. Subscribe to Across the Sundering Seas: