Next: M2-Planet, Up: Full Source Bootstrap [Contents][Index]
June 2016 I learnt about Stage0. Jeremiah Orians created hex0 a ~500 byte self-hosting hex assembler. The source code is well documented and the binary is the exact mirror of the source code. I was inspired.
Here is an example of what the hex0 program looks like; the start of the hex function
00000060: 4883 f830 7c6f 4883 f83a 7c5a 4883 f841 H..0|oH..:|ZH..A … 000000d0: 48c7 c0ff ffff ffc3 0000 0000 0000 0000 H............... 000000e0: 4883 e830 c300 0000 0000 0000 0000 0000 H..0............
All computer programs look like this: an opaque list of computer codes. The initial programs that we take for granted—the bootstrap binary seed—are about 250MB of such numbers: think 250,000 pages full of numbers.
Most computers work pretty well so apparently there is not a pressing need to inspect and study all of these codes. At the same time it is tricky to fully trust9 a computer that was bootstrapped in this way.
Here is what the source code of the hex0 assembler looks like
## function: hex 48 83 f8 30 # cmp $0x30,%rax 7c 6f # jl 6000f3 <ascii_other> 48 83 f8 3a # cmp $0x3a,%rax 7c 5a # jl 6000e4 <ascii_num> 48 83 f8 41 # cmp $0x41,%rax … ## function: ascii_other 48 c7 c0 ff ff ff ff # mov $0xffffffffffffffff,%rax c3 # ret … ## function: ascii_num 48 83 e8 30 # sub $0x30,%rax c3 # ret
While it may be hard to understand what this piece of the program does, it should be possible for anyone to verify that the computer codes above correspond to the source code with comments.
One step beyond these annotated codes is Assembly language. To write a program in Assembly, you only need to provide the instructions; the codes are computed by the assembler program.
hex: # deal all ascii less than 0 cmp $48, %rax jl ascii_other # deal with 0-9 cmp $58, %rax jl ascii_num … ascii_other: mov $-1, %rax ret ascii_num: sub $48, %rax ret
More readable still, a similar program text in the C programming language.
int hex (int c) { if (c >= '0' && c <= '9') return c - 48; … }
What if we could bootstrap our entire system from only this one hex0 assembler binary seed? We would only ever need to inspect these 500 bytes of computer codes. Every10 later program is written in a more friendly programming language: Assembly, C, … Scheme.
Inspecting all these programs is a lot of work, but it can certainly be done. We might be able to create a fully inspectable path from almost nothing to all of the programs that our computer runs. Something that seemed to be an impossible dream is suddenly starting to look like “just a couple years of work”.
Ken Thompson’s 1984 Turing award acceptance speech Reflections on Trusting Tust.
Some program languages have become very hard or practically impossible to bootstrap. Instead of depending on a simple language such as C, they depend on a recent version of itself, or on other binary or ASCII seeds, on other recent programs written in that language, or even on manual intervention. Programs written in a language that cannot be bootstrapped can still run on our systems, but cannot enjoy any of the trust we intend to create.
Next: M2-Planet, Up: Full Source Bootstrap [Contents][Index]