A few days ago I’ve created a tool called cowboot – a very simple boot image generator for x86 that generates a boot image with a cow saying a dedicated message:
In this post I’m going to go through early boot concepts, and use cowboot as a (very) simple example for such concepts. In addition, I’ll go through cowboot’s code and introduce different tools I’ve used for compiling / debugging.
Early boot concepts
The BIOS scans bootable devices and searches for the byte sequence 0x55, 0xAA
(called “boot signature”) in the boot sector at offsets 510
and 511
.
When the BIOS find such a boot sector, it loads it to address 0x7c00
and execution is passed to it.
Because of legacy reasons, the computer boots in real mode – a 16-bit mode operation with 20 bit address space. 64-bit systems run in long mode and 32-bit systems run in protected mode. Special procedure is required to make the transition from real mode to protected mode and from protected mode to long mode, which are done at early boot stages.
In real mode, two registers are required in order to store an address: a segment register and an offset register. The address described by segment:offset
is calculated as follows:
Physical address = segment * 16 + offset
Meaning the address 0x7c00
can be described in multiple ways. For example:
0x0000:0x7c00 | 0x7c0:0x0000 | 0x420:0x3a00 |
0x7c00
The BIOS exposes a set of software interrupts that facilitates DOS programs (which operates under real mode) and bootloaders – for example INT10H is used for video services (e.g. writing a character / displaying pixels) and INT13H provides access to sector-based disks.
Example implementation of those interrupts can be found on SeaBIOS‘s code. SeaBIOS is an open-source BIOS implementation for x86 and is the default BIOS for QEMU and KVM.
The data structure that is used for storing the said interrupts handlers is called an Interrupt Vector Table, or IVT, and usually resides at address 0 in memory. In x86 an interrupt is triggered by the instruction int %d
where %d
is the interrupt number. Execution is then passed to the handler that resides in entry %d
in the IVT.
SeaBIOS defines the IVT as an array of pointers, which are defined as follows:
/* src/std/bda.h:13 */
struct rmode_IVT {
struct segoff_s ivec[256];
}
/* src/types.h:24 */
// Definition for common 16bit segment/offset pointers.
struct segoff_s {
union {
struct {
u16 offset;
u16 seg;
};
u32 segoff;
};
};
In SeaBIOS The macro SET_IVT is used to set an IVT entry. The function ivt_init is used by SeaBIOS for initialization of most IVT entries.
In addition, most IVT entries in SeaBIOS are named entry_%d
and their implementation is named handle_%d
. Here is the implementation for INT10H and INT13H.
Going through cowboot’s code
A linker script is used in order to create the boot image:
.text : {
*(.text)
/* write cow after code */
_cow_start = .;
*(.cowdata)
_cow_len = . - _cow_start;
/* put boot signature at the end of the boot section */
. = ORIGIN(ROM) + LENGTH(ROM) - 2;
BYTE(0x55)
BYTE(0xaa)
} > ROM
If you’re not familiar with LD scripts, you should read my previous post about them. Either way, here is a quick syntax run-down:
An LD script is used during the linkage of the program to create an output object file from several object files used as input.
One can invoke a custom linker script using gcc -T [ld_script]
or ld -T [script]
.
The output binary’s sections are called output sections (for example, the block .text: {...}
defines an output section named .text) which are composed from input sections – for example the statement *(.text)
joins all the .text
sections from the input object files.
The symbol .
is called the location counter and holds the next address memory will be mapped to.
The linker script declares a single output section named .text
composed from boot code present in cowboot.S
and from a .cowdata
section which contains the output from cowsay
.
The symbols _cow_start
and _cow_len
are referenced by cowboot.S
and are used to reference the message that will be displayed on screen.
In order to create an input file containing a .cowdata
section, I’ve created the target boot_message.o
in my Makefile
that utilize objcopy
in the following manner:
boot_message.o: boot_message.cow
$(OBJCOPY) -I binary -O elf32-i386 --rename-section .data=.cowdata $< $@
-I binary
tells objcopy that the input format of the input file boot_message.cow, which contains the string to be printed, is “binary”-O elf32-i386
tells objcopy that the output format is ELF. This is needed because ld only works with ELF files.--rename-section .data=.cowdata
renames the output section to.cowdata
instead of the default.data
As seen at the end of the linker script, the boot signature is placed at the last two bytes of the boot sector:
/* put boot signature at the end of the boot section */
. = ORIGIN(ROM) + LENGTH(ROM) - 2;
BYTE(0x55)
BYTE(0xaa)
The boot sector itself is declared to be 512 byte long:
MEMORY {
ROM : ORIGIN = 0, LENGTH = 512
}
There are multiple reasons for why I’ve chosen to write the boot signature using the linker instead of the assembler:
a. The size of my boot image is limited to 512 bytes: Linker scripts declares different memory regions using the MEMORY
command. Attributes of such section include their length. Linkage will fail if a memory section can not contain all the output sections that are mapped to it:
$ python -c 'print("A"*1337)' > boot_message.cow
$ make
nasm -f elf -o cowboot.o cowboot.S
objcopy -I binary -O elf32-i386 --rename-section .data=.cowdata boot_message.cow boot_message.o
ld -T pack_boot_section.ld cowboot.o boot_message.o -o cowboot.elf
ld:pack_boot_section.ld:20 cannot move location counter backwards (from 0000000000000561 to 00000000000001fe)
make: *** [Makefile:21: cowboot.elf] Error 1
b. I do not think that the boot signature has nothing to do with the boot image’s code: I think that the placing of the boot magic should happen during the build phase. cowboot.S
should have a single role – being responsible for the assembly code being run at startup.
As for cowboot.S
, it is compiled using nasm – an assembler with 16 bit support.
The first line declares that the file will be compiled to a 16-bit architecture
; computer boots at real mode - declare 16 bits operation
bits 16
After that, cowboot uses a series of calls to INT10H in order to draw the cow:
It must first clear the screen and sets video mode using sub-function 0:
_clear_screen:
mov ah, 0x00 ; set video mode subfunction
mov al, 0x03 ; text mode, 16 colors
int 0x10
This is a must because SeaBIOS prints a default message that we want to clear.
Let’s see it in action using gdb:
In order to debug our image, we will run qemu-system-x86_64
with the following arguments:
$ qemu-system-x86_64 -S -gdb tcp::9000 loop.img
-S
will instruct QEMU to not start the CPU-
-gdb tcp::9000
will launch a GDB server hosted atlocalhost:9000
Connecting the server using GDB:
$ gdb
GNU gdb (GDB) 9.2
(gdb) target remote localhost:9000
Remote debugging using localhost:9000
QEMU tells us that the screen had not been initialized yet (since SeaBIOS did not initialize the display yet):
We will place a breakpoint at address 0x7c00
– which is the entry point of the boot image:
(gdb) b *0x7c00
Breakpoint 1 at 0x7c00
(gdb) c
Continuing.
Breakpoint 1, 0x0000000000007c00 in ?? ()
SeaBIOS had just finished initialization so the IVT should be initialized and the following message is displayed on screen:
As a side note, the address of the IVT can change due to a call to LIDT. Following SeaBIOS’s code and launching qemu in debug mode I’ve confirmed that IVT base stayed at 0x0000
.
Let’s calculate the address of INT10H from the IVT:
(gdb) set $entry_10_offset=(unsigned short)*(0x10 * 4)
(gdb) set $entry_10_segment=(unsigned short)*(0x10 * 4 + sizeof(short))
(gdb) p/x ($entry_10_segment * 16 + $entry_10_offset) & 0xfffff
$1 = 0xc5635
Stepping after the first call, we can see that the screen is clear:
Later on, the background color is set to black using sub-function 0x0b
and finally a call to sub-function 0x13
prints the string pointed by the address _cow_start
:
_set_black_bg_color:
mov ah, 0x0b ; INT_10H 0bh - set color
mov bh, 0 ; set background color
mov bl, black_palette
int 0x10
_write_cow:
mov ax, 0x7c0 ; physical boot address
mov es, ax
mov bp, _cow_start ; string addr
mov cx, _cow_len ; string len
xor bh, bh ; page number 0
xor dx, dx ; row / col 0
mov al, 1 ; write mode - move cursor
mov bl, green_palette ; green character color for cyber effect
mov ah, 0x13 ; INT_10H 13h - write string
int 0x10
And as expected, the message is printed to the screen:
The program ends with an infinite loop:
_loop:
jmp $
The token $
evaluates to the address of the beginning of the line, making jmp $
an infinite loop.
Conclusion
Writing cowboot and this blog post made me understand the (very) basics of x86 booting, which was a lot of fun
Even though subjects like booting, real-mode programming, and even kernel-mode programming are not relevant to most developers, I do think that it is important to have a some understanding of those. There is great code in bootloaders, kernels, and even BIOSes that attempts to create the best interface to its users, which can be used as an inspiration for a higher-level design as well.
Notes concerning GDB
My GDB (and even a newly compiled on) failed to change the architecture of the remote target to i8086.
Turns out that there are open bugs on sourceware and launchpad describing the issue. The suggested workaround was to set the target’s description to a custom one and did not work for me.
I eventually gave up after debugging it after a while
(gdb) set architecture i8086
warning: Selected architecture i8086 is not compatible with reported target architecture i386:x86-64
warning: A handler for the OS ABI "GNU/Linux" is not built into this configuration
of GDB. Attempting to continue with the default i8086 settings.
Architecture `i8086' not recognized.
The target architecture is set automatically (currently i386:x86-64)
As a last attempt, I’ve tried using gdb-multiarch
as well with but no success.