CVE-2024-21115: An Oracle VirtualBox LPE Used to Win Pwn2Own
In this guest blog from Pwn2Own winner Cody Gallagher, he details CVE-2024-21115 – an Out-of-Bounds (OOB) Write that occurs in Oracle VirtualBox that can be leveraged for privilege escalation. This bug was recently patched by Oracle in April. Cody has graciously provided this detailed write-up of the vulnerability and how he exploited it at the contest.
The core bug used for this escape is a relative bit clear on the heap from the VGA device. The bug is in function vgaR3DrawBlank, which is called from vgaR3UpdateDisplay. The bug can be triggered with a single core and 32MB of VRAM, and possibly less. All testing was done using the default graphics controller for Linux (VMSVGA). It should work on others as well.
As for the exploit, I could not get it to work with those constraints. For my exploit, I require at least 65 MB of VRAM but am using 128 MB to be safe. It requires 4 cores because of the race condition I use.
The Vulnerability
Inside the VGAState struct there is a bitmap used for tracking dirty pages in the vram buffer so that it knows whether it needs to redraw that part of the frame buffer.
This bitmap is large enough to hold the total number of pages even when using the max vram allowable by vbox, which is 256MB. However, inside vgaR3DrawBlank, when it attempts to clear the dirty page bits it incorrectly multiplies start_addr by 4 before doing so:
We can see here that if we are able to set start_addr to a value greater than 64MB, it will clear bits outside the bounds of the bitmap. Alternatively, even if start_addr is below 64MB, so that it starts clearing within the bitmap, the bit clear operation can continue past the bitmap's end.
Examining how start_addr is set, we can see that it allows any value up to vram_size:
Later in the code, vbe_start_addr is stored into start_addrand and vbe_line_offset is stored into line_offset. This happens when vgaR3UpdateBasicParams calls vgaR3GetOffsets. This update occurs whenever a new graphic or text is being drawn.
As long as our vram_size is greater than 64MB we will able to clear bits in heap memory following the bitmap.
The following are the values I set up to trigger the bug. All of these are settable via ioport communication.
These values are chosen to zero out a specific bit, but if the VBE_DISPI_INDEX_VIRT_WIDTH is increased it will most likely overwrite enough data to cause a segfault. For the exact ioport comms used, please reference the exploit code.
The Exploit
I explored several paths to find something we can zero out that would be usable to gain reliable code execution. I ended up looking at CritSect inside of VGAState. This critical section is used so that only 1 thread at a time can process in and out instructions for each device, as well as any loads or stores to the mmio region. There are several things we are concerned with in the critical section. The relevant structures are as follows:
When a thread locks the critical section, it adds 1 to cLockers, updates NativeThreadOwner to the current thread, and adds 1 to cNestings.
If a different thread then attempts to lock this same section it will see that cLockers is set and will attempt to wait its turn to lock. There is first an optimized wait, in which it will attempt to spin for some microseconds to see if it can quickly acquire the lock.
If that fails it will the block on the EventSem semaphore.
This hEvent value is just an int. Each time a critical section is created, a new hEvent value will be allocated in sequential fashion. When we look at the critical section of VGAState we can see the value of hEvent is 0x23.
The first 4 bytes are u32Magic, and the hEvent value can be seen at offset 0x18. With this information in hand, I realized that if we can find another critical section with an hEvent, we can modify the hEvent of VGState to match that of the other critical section. Then we can use that confusion to produce a race condition in any VGA ioport or mmio read/write. After looking around I found that VMMDev was using the hEvent value of 0x21.
After some testing, I found that the hEvent values are consistent between runs because they are assigned sequentially on startup. The critical sections for VMMDev and VGA are created directly after the processor-related critical sections. So long as the processor chipset doesn't change, these should remain constant.
I will note here that there are other critical sections that could potentially be used, but I chose to write my exploit using the VMMDev critical section.
First, we use our bit clearing bug to turn 0x23 into 0x21. Subsequently, whenever there are two threads, one holding the critical section for VMMDev and one holding the critical section for VGA, when either thread releases its critical section it can wake up a thread waiting for either device. Our plan is to use this race condition to wake a thread waiting for VGA prematurely, which is to say, while some other thread is still using VGA.
This is not good enough yet, though. Even if we hit the race, VirtualBox throws a SigTrap shortly thereafter. This is because when the racing thread locks the critical section, it changes NativeThreadOwner. When the first thread tries to unlock the critical section, the NativeThreadOwner does not match, causing the error.
Upon discovering this we also see that there is a way to completely turn off an individual critical section. There is a bit in fFlags called RTCRITSECT_FLAGS_NOP. If this bit is set then it will ignore all locking and unlocking operations for that particular critical section.
This poses a challenge for us, though. The only bug we have is a bit clear, so we have no way to set this flag. Instead, we must find a way to set the flag from our racing VGA thread before the first VGA thread exits and crashes the process.
When looking for a way to accomplish this, I found an ioport for writing data to vbe_regs in VGAState:
uint16_t vbe_regs[VBE_DISPI_INDEX_NB];
This ioport allows us to specify vbe_index as an arbitrary short, and then it will write an arbitrary short to vbe_regs[vbe_index] in vbe_ioport_write_data. The write is protected by a bounds check on the index, but we can circumvent the check by using the race condition we manufactured.
To exploit, we start a VGA request on one thread (the “worker”) specifying a valid vbe_index, and a second VGA request on a second thread (the “racer”) specifying a bad vbe_index. Normally the racer request would need to wait for the worker to finish, but by racing two VMMDev requests (on two other threads) we can wake the racerVGA thread prematurely, modifying the vbe_index after the workerthread has finished validating it but before using it.
Note that, for this to succeed, the racer thread must be woken at a critical moment during execution of the worker. To make this race easier to win, we can take advantage of a memset in vbe_ioport_write_data where we control the length. For the worker request, we make this a large number so we have a longer window in which to win the race. In testing, I found we can easily get this to over 1 millisecond which is a massive amount of time during which we can win the race.
After winning the race, we can see the desired effect.
By means of the vgaR3DrawBlank bug we have changed hEvent from 0x23 to 0x21, and by means of the vbe_ioport_write race we have changed the fFlags member at offset 0x14 to 0xf, disabling the critical section. Now that the critical section is fully disabled, we can easily race VGA threads against each other. The next step is to find a read and a better write with our new and improved race condition.
Both the write and the read can be achieved by corrupting the same value. In VGAState there is a field of struct type VMSVGASTATE, and that struct contains a field named cScratchRegion.
cScratchRegion is used to track the size of the buffer au32ScratchRegion, which stores data during VMSVGA IO port communication. In functions vmsvgaIORead and vmsvgaIOWrite we can read and write this buffer based on the value of cScratchRegion.
Using the vbe_ioport_write race one more time, we can corrupt cScratchRegion. This gives us a fully controlled buffer overread and buffer overflow of a buffer within VGAState.
From here we need to find a way to get arbitrary execution. Conveniently, each device in VirtualBox has a PDMPCIDEV allocated directly after it in memory. Since it is part of the initial allocation for the device, we can be assured it will always be there.
At the beginning of the structure there is a pointer to the static string vga located in VBoxDD.dll. We can use our buffer overread to read this pointer and infer the base address of VBoxDD.dll. The structure also has a nested PDMPCIDEVINT structure, which contains several easily accessible function pointers:
The function pointers pfnConfigRead and pfnConfigWrite can be overwritten by our buffer overflow. Afterwards, we can trigger calls to these function pointers using PCI ioports.
To prepare for calling these function pointers, we first call pciIOPortAddressWrite to set uConfigRegto to specify the PCI device we want to read from or write to. In our case, that value can be found in the uDevFn value at the beginning of the PDMPCIDEV struct.
After we set uConfigReg, we can then call pciIOPortDataWrite, which will call pci_data_write. This function will call our function pointer with some controlled arguments.
When the function pointer is called, arg1 ends up being the value of pDevInsR3 which is fully user-controlled by means of our buffer overflow. arg2 points to the PDMPCIDEV struct after our VGAState, which means we can control data at that location. With a fully controlled arg1 and arg2, we can start to write our final execution chain.
These libraries use Windows Control Flow Guard so we are not able to make indirect calls to arbitrary code. Fortunately for us, CFG allows calls to arbitrary functions in other libraries, so it doesn’t prevent us from calling WinExec("calc").
First, we need to use our buffer read/write primitives to construct an arbitrary read so we can get the address of kernel32.dll. We currently have the base address for VBoxDD.dll only, so we will have to find something to use in that library. When looking through functions in VBoxDD.dll I found one that will work perfectly for what we want to do.
Our arg1 is fully controlled, so this read routine will allow us to take memory from arg1+0x2d8 and store it into the memory pointed to by arg2. arg2 points directly after VGAState in memory, so we can read it afterwards with our buffer overread. This effectively gives us an arbitrary read primitive. With this, we can leak pointers to functions in other libraries through VBoxDD.dlls IAT.
VBoxDD.dll imports several functions from kernel32.dll, so we can read any one of those import table entries to get a pointer into kernel32.dll. From there we can scan backward using our read until we encounter the PE magic at the beginning of kernel32.dll, which gives us the base.
Next, we scan for the export table of kernel32.dll. We start by reading out all the table addresses.
We then scan through the names table until we find the name WinExec. Having obtained the index, we can use the ordinal and address tables to get the function address. Finally we write calc into heap memory we control and call WinExec("calc").
Impact
This bug can be triggered on a large percentage of virtual machines because it is an easily accessible path in VGA. I believe this can probably be turned into at least a DOS on any VM with at least 32MB of VRAM.
The way I exploited it has significantly more constraints, which restricts the number of machines affected by the full escape. It still may be possible to turn this bug into a full escape under a wider range of conditions, but that was not part of my research.
Thanks again to Cody for providing this thorough write-up. This was his first Pwn2Own event, and we certainly hope to see more submissions from him in the future. Until then, follow the team on Twitter, Mastodon, LinkedIn, or Instagram for the latest in exploit techniques and security patches.