Crashes you can’t handle easily #3: STATUS_HEAP_CORRUPTION on Windows

TL;DR: Catching heap corruptions of native heaps is not trivial. Therefore, catching heap corruptions of the C/C++ runtime’s heap is also not trivial.

Heap corruptions are infamously nasty. A few reasons:

  • At times, they might not cause any visible errors at all, making them hard to detect
  • Even if they cause an error, it usually surfaces long after the actual corruption, making them hard to pinpoint
  • Memory that gets corrupted usually has nothing to do with the actual culprit
  • Heap allocations of programs with a GUI are usually not fully deterministic, making a heap corruption potentially hard to reproduce

On Windows, the system can detect a subset of heap corruptions. As with most errors on Windows, this is signaled with an SEH exception. Catching this exception, however, needs some workarounds. But before we get to that, we need to take a look at the architectural relationship between the C/C++ runtime’s heap and native heaps.

The layered architecture of heaps in Windows

On CPUs with virtual memory support, you can’t allocate (map) arbitrary amounts of memory (assuming a paged model, like on x64 CPUs). The unit of virtual memory is the memory page. You can’t allocate just 3 bytes or 4892 bytes. You either take 1 page, 2 pages, or n pages, but not 1.3 pages. This is dictated by the CPU, which implements address translation, a.k.a. mapping of virtual addresses to physical ones. If you really want to allocate just 3 bytes, you ask for a whole page (which is usually larger than 3 bytes, on x86 family processors, it’s 4096 bytes), use 3 bytes, and ignore the rest of the page. If you happen to need 34 bytes later, you don’t even need a new page, just use the leftover space on the page you already own. Then if you suddenly need 9826 bytes, you need to allocate additional pages.

The previous paragraph subtly summarized what a heap does: it takes the burden of managing virtual memory pages and keeping track of allocations off your shoulders. So is this how malloc works on Windows? Almost.

On most POSIX systems, the C library contains the actual implementation of malloc and friends. On Windows, the operating system has its own implementation of the heap (HeapAlloc, HeapFree, etc.) in ntdll.dll. malloc, free, etc. are just thin wrappers around these functions (if you compile in debug configuration, that’s another story, but the main concept remains the same). It’s pretty easy to grasp this, but here’s a fancy figure anyways:

At the end of the day, everything that handles memory relies on the Virtual Memory Manager, or VMM for short.

The point I’m trying to make here is that since on Windows language specific heap functions are implemented in terms of the native heap, its errors also bubble up. Or do they?

STATUS_HEAP_CORRUPTION

Below is a code snippet that induces a heap corruption. An invalid pointer is passed to global delete. A usual Unhandled Exception Filter callback is registered to catch any unhandled SEH exceptions.

// Compile with MSVC, *Release* configuration
#include <iostream>
#include <new>
#include <windows.h>

LONG WINAPI MyUEF (PEXCEPTION_POINTERS pExp)
{
  if (pExp->ExceptionRecord->ExceptionCode == STATUS_HEAP_CORRUPTION) {
    std::cout << "Heap corruption detected!" << std::endl;

    Sleep (5000);
    TerminateProcess (GetCurrentProcess (), pExp->ExceptionRecord->ExceptionCode);
  }

  return EXCEPTION_CONTINUE_SEARCH;
}

int main ()
{
  SetUnhandledExceptionFilter (MyUEF);

  delete reinterpret_cast<void*> (3);
}

Compile the above code in “Release” configuration, and run it without a debugger attached. Note that the program crashes with WER, instead of printing a message and terminating silently. In other words, our registered UEF function is not called back. What’s going on?

When delete is called, it just forwards the call to free, which also forwards it, this time to HeapFree. The heap detects that the (obviously erroneous) pointer of value 3 was never allocated from it, so there must be a corruption going on. To signal this, an SEH exception is thrown with code 0xC0000374, a.k.a. STATUS_HEAP_CORRUPTION.

So far so good. But why is it not reaching our UEF? Because someone catches it of course, and thus it is not unhandled. The problem is, the handler which catches the exception crashes the process to Windows Error Reporting. Why does it do that? On purpose.

This behavior was introduced in Windows Vista (and was backported to XP SP3, not that it’s relevant nowadays…), and can be turned on with a call to HeapSetInformation with the parameter HeapEnableTerminationOnCorruption. It’s a one-shot setting, so if someone turns this feature on, you can’t turn it off (and it affects all heaps). Also, there is a good chance it’s enabled for you by default (for example, if your process is 64-bit).

So, we can’t turn this feature off, but can we work around it? Yes. But before we get to that, now is the time to remind you that this behavior is for security purposes. The heap is a popular attack vector, thus ending a process with a corrupt heap is a very good idea. Crash handling code should avoid the usage of any heap like a plague, so if you work around this mechanism to run your own carefully crafted crash handling code (which will terminate the process after it has done its work, of course), you should be fine.

We will use a Vectored Exception Handler, which is a direct callback for SEH exceptions. That is, it’s called back before the handling phase begins, so we will be notified of this exception regardless if it’s handled by somebody, or not.

// Compile with MSVC, *Release* configuration
#include <iostream>
#include <new>
#include <windows.h>

LONG WINAPI MyVEH (PEXCEPTION_POINTERS pExp)
{
  if (pExp->ExceptionRecord->ExceptionCode == STATUS_HEAP_CORRUPTION) {
    std::cout << "Heap corruption detected!" << std::endl;

    Sleep (5000);
    TerminateProcess (GetCurrentProcess (), pExp->ExceptionRecord->ExceptionCode);
  }

  return EXCEPTION_CONTINUE_SEARCH;
}

int main ()
{
  AddVectoredExceptionHandler (1, MyVEH);

  delete reinterpret_cast<void*> (3);
}

With this code we achieved our goal, we detected the heap corruption and got our custom handling code to run.

Closing thoughts

It might be interesting to note that we can’t generalize this VEH-based solution to other crashes. An exception is only fatal when it is unhandled (there are some cases where even an access violation gets handled by system DLLs, for internal reasons). STATUS_HEAP_CORRUPTION is special, because:

  1. It’s possible that someone would have handled this kind of exception, but since this is a state that is best considered fatal, we don’t do any harm by not giving others a chance to recover from it.
  2. The system already handles it by terminating the program, so using a VEH is our only chance of getting execution to our custom handler.

Bonus question: the exception described in this post gets thrown and caught in the same function (ntdll.dll!RtlReportCriticalFailure). Why is it that way?

Advertisements

Author: Donpedro

C++ programmer with an interest in operating systems and everything low level.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s