[LMG S5] Issue 53: The CPU is an instruction-obeying slave
Previously: PDF’s markup language is more concerned with how things appear on the page than with what they were originally. Once the PDF is generated, it is almost impossible to retrieve the original data from it. Scanned documents that are converted to PDF may have a text layer generated by OCR that lets detected text be copied from it.
In Season 4, I laid out the basics of how data is represented: text, images, audio, video. I also explained how compression happens, and unpacked how these basic data types can be combined into more complex documents.
But data by itself isn’t of much value in a computer if we can’t do things to it, perform operations on them. We are not talking surgical or military operations here, but chiefly mathematical operations to manipulate information: changing a bit here, a bit there, or making a massive set of changes throughout.
How exactly does that happen in a Central Processing Unit (henceforth CPU)?
CPUs are instruction-obeying slaves
The design of a CPU is very much inspired by human experience. One essential aspect of that experience is that everything we do consists of operations.
Civilization advances by extending the number of important operations which we can perform without thinking of them. — Alfred North Whitehead
Want to make coffee? Measure out one scoop of coffee beans per cup, add them to the grinder, press the Start button on the grinder and wait for the noise to stop, empty the coffee grounds into the drip machine, add water, press Start, and wait for a beep. 6 steps to make coffee. You can break those steps down differently depending on what kind of machine you are using and what kind of coffee you are making. Whatever the outcome we want, if it can’t be broken down into simple steps like that, we would not be able to design, make, and sell household appliances; we would have to be craftsmen (and craftswomen) of that trade.
A CPU is an unconscious operation-executing machine. Every outcome we want must be translated into operations which a CPU can perform without understanding.
A common mental model of how our computers work is that a programmer writes code in a language that a CPU understands, and the CPU simply carries out those instructions. Let’s go deeper into that model. How do those instructions get translated into the 1s and 0s of binary code?
Much the same way as information gets converted to binary in Season 4. The CPU can understand and execute a limited set of instructions, and each instruction is labelled with a number. The CPUs in use today have standardised on the instructions which they can be instructed to carry out. These sets of instructions are known as instruction sets.
What are these instructions like?
CPU instructions: moving data around
These instructions perform operations on one, two, or more pieces of data. This is how an instruction like b = 1 + 2
would be broken down:
1 LOAD 1 R1
2 ADD 2 R1, R2
3 MOV R2, MEM1011
I am using this arcane presentation format in a newsletter for layfolk because I think it helps to distinguish between human thinking and computer thinking. What the computer is doing here is:
- Load the value
1
into slot R1 - Add the value
2
to the value in slot R1, and store the result in slot R2 - Store the value in slot R2 into the memory location 1011 (where the variable
b
points, so that other programs/instructions can use the result)
Everything we ask a CPU to do essentially consists of loading data from somewhere, doing some kind of processing on it, and storing the result somewhere. The CPU processes lists of these instructions, at a rate of millions to billions of instructions per second.
Let that sink in for a moment. Every Youtube video, meme, or tweet we send or see is the result of hundreds and thousands of operations, taking place in CPUs around the world. CPUs converting text, audio, and images into raw data, encapsulating it into a data package along with some metadata, sending it out to another CPU that translates the destination address and forwards it to the next gateway, and so on, until it reaches its destination, gets decoded and processed, and signals get sent to the monitor and speakers to produce what we see and hear.
Why can’t I run an exe file from Windows on my smartphone, or an Android/iOS app on my Windows laptop?
There are many reasons for that, and I will explain one of those reasons here: the x86-64 instruction set used by Intel/AMD CPUs on your Windows laptop is not compatible with the ARM instruction set used by your smartphone CPU; the MOV, ADD, and other instructions have different numerical codes in each instruction set.
The same programming code for the app must be compiled into CPU instructions separately for Intel/AMD processors, and for ARM-based processors.
Issue summary: CPUs are unconscious slaves that simply execute instruction after instruction, at a very fast rate.
What I’ll be covering next
Next issue: Compiling programming code into CPU instructions
I think this is a good place to stop today. Before we can dig into CPU exploits, we must first unpack what a CPU does. And we are starting slow, because the CPU is ultimately a strange place. Stepping into it is kind of like stepping into Willy Wonka’s Chocolate Factory, where all kinds of wonderful things are happening, and once you figure how everything fits together you can figure out where you can sneak globs of chocolate without people finding out.
See you in the next issue of Season 5: the Chocolate Processing Unit!
Sometime in the future: What is:
- booting up? [Issue 15]
- a cookie? [Issue 8]
- XSS? [Issue 8]
- a CDN? [Issue 8]
- a good reason developers write code and give it away for free online? [Issue 21]
- compiling code into an application [Issue 26]?
- firmware? [Issue 34]
- OpenType? And what are fonts anyway? [Issue 42]
- What is involved in installing a piece of software? [Issue 48]
- How do apps know where a file starts and ends? [Issue 49]