TL;DR: You can isolate printer drivers from your applications with a small change to your application manifests, resulting in increased stability.
No one likes uninvited guests, especially if they are rude, or don’t play by the rules. They come over, tamper with things that shouldn’t be tampered with, they sometimes burn down your house, or even prevent you from dying in peace. On Windows, there are many types of uninvited guests, such as shell extensions, programs using DLL injection, and printer drivers.
Applications have very little control over these components, so their quality affects your software products’ quality as well. Users don’t care if your application crashes regularly because of some sloppy shell extension, as at the end of the day, it’s your product that crashed. In this blog post, I would like to bring your attention to a somewhat obscure feature of Windows: application-level printer driver isolation.
Continue reading “Application-level Printer Driver Isolation”
TL;DR: There is a bug in Microsoft OneDrive’s shell extension (which gets installed automatically with Office, by the way) that makes programs skip global variable destructors in DLLs if they ever open a file dialog.
The company I work for has a really decent custom crash detection system. It’s multiplatform, it has features like out-of-process crash reporting, automated categorization, etc. If you’ve read some of my previous posts on this blog, then you are aware that implementing such a system is no easy feat. No matter how thorough and careful you are, there’ll always be crashes that escape your ability of detection (given your program is complex enough and has plenty of users…).
This is one of the main reasons why it has a feature called abnormal exit detection. Upon exiting, the program using said crash detection system must signal that it’s about to exit. If it fails to do that, the out-of-process crash reporting server (this is just a “watchdog process” that runs on the same machine) knows that something went wrong, and takes a note of the abnormal exit and the exit code. So, let’s say we detected that our program vanished, and its exit code was 0xC0000005 (the exception code of an access violation). That’s a telltale sign of an uncaught crash, because when an unhandled SEH exception occurs in a program, it gets terminated with the exception’s code. At this point, you can’t create a minidump or walk the stack of course, because the process is already gone, but having the exit code at least is still better than nothing.
Other than pinning down potentially unhandled crashes, sometimes this mechanism catches miscellaneous errors, and bugs in 3rd party libraries as well. This time it caught a bug in the OneDrive shell extension, GROOVEEX.dll. But let’s not skip ahead, it took quite a lot of time and effort to come to that conclusion. Let me tell you the whole story…
Continue reading “I hope you don’t mind if some of your globals’ destructors are skipped”
TL;DR: Catching heap corruptions of native heaps is not trivial. Therefore, catching heap corruptions of the C/C++ runtime’s heap is also not trivial.
Heap corruptions are infamously nasty. A few reasons:
- At times, they might not cause any visible errors at all, making them hard to detect
- Even if they cause an error, it usually surfaces long after the actual corruption, making them hard to pinpoint
- Memory that gets corrupted usually has nothing to do with the actual culprit
- Heap allocations of programs with a GUI are usually not fully deterministic, making a heap corruption potentially hard to reproduce
On Windows, the system can detect a subset of heap corruptions. As with most errors on Windows, this is signaled with an SEH exception. Catching this exception, however, needs some workarounds. But before we get to that, we need to take a look at the architectural relationship between the C/C++ runtime’s heap and native heaps.
Continue reading “Crashes you can’t handle easily #3: STATUS_HEAP_CORRUPTION on Windows”
TL;DR: Utilizing WER plugins is beneficial, but it turns out it’s not the silver bullet of crash detection.
Almost exactly a year ago I blogged about the so-called runtime exception helper modules of Windows Error Reporting (that’s a mouthful, so from now on I’ll refer to them as their internal name, WER plugins), and how you can enhance your crash detection capabilities with them on x64 Windows. I apologize for the lack of posts lately, I have been working on something in my free time that I’m not ready to share with the public yet. Since my post I referred above, I’ve gained some production experience with WER plugins, and I thought they are worthy of sharing.
Continue reading “A follow-up on WER plugins”
TL;DR: You can induce worst-case stack overflows artificially. Use this to unit test your crash handling code.
Last time I wrote about the intricacies of handling stack overflows on Windows. In that post (among other things) I presented that not all stack overflows are the same, some are worse to handle than others, because remaining stack space varies. There is a technical limit to how bad they can be, those that hit this limit I call worst-case stack overflows. If this doesn’t ring a bell, I suggest you read the post I linked before proceeding.
Continue reading “Worst-case stack overflows on Windows”
TL;DR: You have to be really careful when handling stack overflows on Windows.
In the previous (and first) entry in this series I wrote about how implementation specifics of x64 SEH can cause you pain if you want to have custom crash reporting. This time I’m going to tell you about the pitfalls of handling stack overflows. “Do you mean some kind of special case of stack overflows?”, you might ask. No, I’m talking about stack overflows in general.
Continue reading “Crashes you can’t handle easily #2: Stack overflows on Windows”
TL;DR: Don’t expect structured exception handling mechanisms to always work correctly on x64 Windows.
If you ship software, you probably care about crashes. Your product fails and gets terminated, your users get frustrated, their workflow is disrupted, and – worst of all – they might even lose some data. When a crash happens, you want to make sure relevant information is collected and sent back to you, the developer, so the problem can be investigated and fixed.
However, if you don’t rely on your platform’s built-in crash handling facilities, even detecting some crashes is far from trivial. I started this series of blog posts to write about such cases.
Continue reading “Crashes you can’t handle easily #1: SEH failure on x64 Windows”