Security of BIOS/UEFI System Firmware from Attacker and Defender Perspectives

oneplane · on May 27, 2017

An this is why I'm trying very hard to get coreboot to work on my systems, and I'm very eager to see Libreboot-type of FSP/BSP-free images in the future so we can actually verify our boot chain.

Keeping all of the firmware code secret makes no sense, and seems to be artificially enforced by 'patents', and 'trade secrets' while most likely the vendors are trying to keep buggy code secret and hope that obscurity alone will keep them safe.

minipci1321 · on May 27, 2017

In my opinion, vendors are oblivious to the notion of the buggy code. From the releasing-a-product standpoint, there is a tested or an untested code. While personally I wouldn't be surprised that someone could knowingly release firmware containing known "stability issues", I doubt any vendor hides source code out of fear that someone else will discover some hypothetical bugs. Among other problems, finding bugs in an unknown firmawre is not so simple, as it requires an intimate knowledge of the underlying hardware (and a fair share of the bugs is closely related to its quirks), which might be not so familiar as, say, well-established (and relatively well-documented) Intel Architecture.

zkms · on May 27, 2017

Seeing slides in BIOS-UEFI-Security.6-Mitigations.pdf that imply that there's critical crypto being done in SMM mode makes me feel unfathomable hopelessness. The whole x86_64 platform security model (which includes all the privilege levels and the corresponding access control mechanisms) is one hell of an overgrown clusterfuck that could not be more hostile to formal verification.

There's lots of wantonly convoluted stuff going on -- a random example is the access control mechanism for the PCH's GPIOs (search "GPIO registers lockdown", in quotes). This isn't, of course, implemented with a register which contains bitfields determining which privilege level can write to which sets of registers. That would be too simple. The GPIO Lockdown-Enable bit can [sic] be changed by the same software that the lockdown mechanism is meant to deny access to! This seems like utter pointlessness -- what use is an access control mechanism if the agent being restricted can change the parameters of the mechanism willy-nilly -- but Intel has a solution! Changing the lockdown-enable bit triggers an SMI and shunts control to SMM mode, which is, naturally, expected to include code that figures out that this bit should not be disabled, and is to flip it back on (and return from the interrupt, having trashed the caches a bit).

This is, of course, a pathologically needless level of complexity -- in the ARM world, we have registers with silly names like "NSACR" that higher-privileged execution modes can set to restrict access to certain resources. There's certainly no BIOS-OEM-provided code that needs to exist and be correct in order to implement such a basic task. In the end, this level of access-control is equivalent to a bloody Boolean operation or two, for heaven's sake! All the CPU needs to do is to decode the instruction, realise it's a potentially-privileged instruction, decode what the instruction tries to modify, look up in a table which register holds the relevant permissions bitfield, and do the relevant boolean operation between that register's contents and an appropriate bitmask, and fault depending on the result. Since there's an extremely close match between the properties that needs to hold (the truth table of all combinations of "privilege level x can modify resource y iff bit n in register z is set") and the mechanism that enforces it, it's easy to reason about this scheme and not difficult to either prove an implementation correct (or find counterexamples).

Meanwhile, the "wake up SMM and hope it'll countermands the illegal write" scheme depends on a lot more machinery. How does it work on a multi-core/multi-socket platform? How does this mechanism interact with the caches or the memory model? Is it possible to set up a race condition where the illegal write ends up going through uncountermanded because SMM mode can be made to not see the register in an illegal state? This is orders of magnitude of orders of magnitude more complex than analysing a lookup-table and a bitmask -- we need to understand the semantics of memory reads/writes, of caching, of mode switches to SMM and the SMI interrupt, and how all of this clusterfuck is affected by the fact that there's multiple cores in our system. LANGSEC people will call this a "shotgun parser" -- when input data checking / recognition is interspersed with processing logic.

Even if all of this miraculously works and there's literally no way that all the cores working together can send an illegal write that SMM code won't countermand -- there's still the issue of making sure that the specific SMM blob that our BIOS OEM wrote cooperates properly with this. Indeed, making BIOS OEMs implement these sorts of convoluted and critical mechanisms and expecting them to get all of them perfectly right requires a level of optimism that doesn't yet exist. The situation has devolved to the point that there is literally a tool called "chipsec" that lets you test for the presence of a handful of well-known security-critical things (from time to time someone discovers a new one, of course, UEFI/ACPI and the x86_64 privilege model is too complex for people to be sure that we found all the issues) that UEFI programmers are notorious for messing up. That this tool needs to exist is shameful. Of course, the security of the x86_64 platform doesn't just depend on a bunch of magic access control registers being set right, there's Turing-complete code that needs to be implemented by the BIOS OEM (and runs in the most privileged execution mode that isn't Intel's "management processor") that is security-critical, and, well, it's hard to prove that arbitrary Turing-complete code behaves correctly.

The auxiliary CPU mode (SMM) initially meant to hide APM and emulate PS/2 mice in 90s-era computers is now critical to platform security, and does dangerous stuff like crypto and handling pointers from UEFI / the OS. Every few months someone I follow on Twitter finds some new way to trick some widely-deployed SMM code into writing to a memory region it shouldn't, it's quite depressing. Great. Another pointless defender/attacker arms race that the defenders could decisively win had Intel thrown away the spitefully complex intricacies of SMM and the x86 security model and replaced it with a clean, formal-verification-friendly set of privilege levels whose correct operation doesn't depend on platform firmware code. Even AArch64 is less broken when it comes to this.