Be careful what you ask for (especially in WMI queries)

TL;DR: In exotic environments, WMI queries might take longer than you think.

Recently, we’ve received a report from a user claiming that our product was taking ages to launch. According to the report the program started eventually, but a few minutes of delay was inconvenient enough for the client to file a support case (unsurprisingly).

Continue reading “Be careful what you ask for (especially in WMI queries)”

List of peculiar exit codes on Windows

Process exit codes are fairly straightforward on Windows. They are essentially the return value of a process: a plain, 32-bit unsigned integer code. There are no strict rules governing this value, just convention:

  • 0 means success
  • Anything else means failure
  • If the system is involved in the termination of a program, the returned value might be a standard Win32 error code, HRESULT, or NTSTATUS. For example, if your process gets terminated due to an unhandled access violation, the exit code will be 0xC0000005

Due to the third point above, when your process is terminated, you can get a pretty good idea regarding the cause. However, sometimes that’s not the case. At work I’ve seen many exit codes that I found interesting. In some cases we figured out the reason, in other cases, we didn’t (comments welcome!). I thought I’d compile a list of codes that might be of interest to others. This is listed as a resource, so I plan to update/extend it in the future.

Continue reading “List of peculiar exit codes on Windows”

Interleaving small reads of multiple files – why World of Tanks 1.0 has abysmal loading times on HDDs

TL;DR: If you have WoT (1.0) installed on an HDD, you may experience loading times so long that you might even miss the start of your battles. Fortunately, we can mitigate this.

World of Tanks update 1.0 was released recently, introducing a new game engine with lots of eye candy and better physics (amongst other improvements). I have been playing and enjoying this game since 2011 (when it was still in beta), as it has really solid gameplay mechanics, and I love World War 2 machinery in general1.

I was really excited for this major update, but right after installing it and playing a few battles, I noticed something very peculiar: loading into a battle was really slow, and my HDD sounded like an A-10 firing (almost). In fact, it was so slow that I arrived to battles about 30-40 seconds after they have started (and a minute late for the first battle of the day). That’s unfortunate, because:

  • The fraction of my free time that I’m spending on playing video games, I prefer to actually play the games, and not wait for disk IO instead
  • On open maps, spending a ~minute at the spawn location is a disaster
  • With certain types of tanks, you have to contest key positions on the map, and you need to make a run for it right at the start of the battle

Moving the installation to my SSD would have “solved”2 this issue, but I don’t have enough space on it and I was curious anyway, so I decided to try and investigate.

Continue reading “Interleaving small reads of multiple files – why World of Tanks 1.0 has abysmal loading times on HDDs”

If you ship your library/application to Windows, please use UTF-16 interfaces

TL;DR: Do as the title says, or your library/application won’t function properly in certain situations.

Here’s a small program that takes a file path as a parameter, and queries its size:

#include <iostream>
#include <sys/stat.h>
int main (int /*argc*/, char* argv[])
struct stat fileInfo;
if (stat (argv[1], &fileInfo) == 0) {
std::cout << "The file's size is: " << fileInfo.st_size <<
" bytes" << std::endl;
} else {
std::cout << "Unable to stat file (maybe it doesn't exist?)" <<
view raw FileSize.cpp hosted with ❤ by GitHub

Even though stat is a POSIX function, it happens to be available on Windows as well, so this nice program works on both POSIX platforms and Windows. Or does it?

Continue reading “If you ship your library/application to Windows, please use UTF-16 interfaces”

Detecting if the Visual Studio C++ Redistributable is installed

TL;DR: It’s worth not just detecting, but also trying out whether it actually works.

If you ship software on Windows written in C++ and compiled with the MSVC toolchain, then you probably heard about the so-called “Visual Studio C++ Redistributable”. Chances are that you link your binaries against the dynamic runtime DLLs, which means that your application has a runtime dependency of (at least some of the) DLLs contained in this redistributable package.

When your product is installed, you (or the installer framework you use) have to make sure that the appropriate version of the redistributable is also installed, or your program will fail to start. In this post, I’ll reason about why you shouldn’t settle for merely detecting whether this dependency is installed, with two real-world examples.

Continue reading “Detecting if the Visual Studio C++ Redistributable is installed”

Application-level Printer Driver Isolation

TL;DR: You can isolate printer drivers from your applications with a small change to your application manifests, resulting in increased stability.

No one likes uninvited guests, especially if they are rude, or don’t play by the rules. They come over, tamper with things that shouldn’t be tampered with, they sometimes burn down your house, or even prevent you from dying in peace. On Windows, there are many types of uninvited guests, such as shell extensions, programs using DLL injection, and printer drivers.

Applications have very little control over these components, so their quality affects your software products’ quality as well. Users don’t care if your application crashes regularly because of some sloppy shell extension, as at the end of the day, it’s your product that crashed. In this blog post, I would like to bring your attention to a somewhat obscure feature of Windows: application-level printer driver isolation.

Continue reading “Application-level Printer Driver Isolation”

I hope you don’t mind if some of your globals’ destructors are skipped

TL;DR: There is a bug in Microsoft OneDrive’s shell extension (which gets installed automatically with Office, by the way) that makes programs skip global variable destructors in DLLs if they ever open a file dialog.

The company I work for has a really decent custom crash detection system. It’s multiplatform, it has features like out-of-process crash reporting, automated categorization, etc. If you’ve read some of my previous posts on this blog, then you are aware that implementing such a system is no easy feat. No matter how thorough and careful you are, there’ll always be crashes that escape your ability of detection (given your program is complex enough and has plenty of users…).

This is one of the main reasons why it has a feature called abnormal exit detection. Upon exiting, the program using said crash detection system must signal that it’s about to exit. If it fails to do that, the out-of-process crash reporting server (this is just a “watchdog process” that runs on the same machine) knows that something went wrong, and takes a note of the abnormal exit and the exit code. So, let’s say we detected that our program vanished, and its exit code was 0xC0000005 (the exception code of an access violation). That’s a telltale sign of an uncaught crash, because when an unhandled SEH exception occurs in a program, it gets terminated with the exception’s code. At this point, you can’t create a minidump or walk the stack of course, because the process is already gone, but having the exit code at least is still better than nothing.

Other than pinning down potentially unhandled crashes, sometimes this mechanism catches miscellaneous errors, and bugs in 3rd party libraries as well. This time it caught a bug in the OneDrive shell extension, GROOVEEX.dll. But let’s not skip ahead, it took quite a lot of time and effort to come to that conclusion. Let me tell you the whole story…

Continue reading “I hope you don’t mind if some of your globals’ destructors are skipped”

Compressing ETL (ETW output) files

TL;DR: General purpose compression algorithms are pretty good. Can we beat them with a low effort optimization in compressing ETL files? Even if we could, it doesn’t mean we should.

Event Tracing for Windows is outright amazing. If you are a developer on Windows (native, managed, or even web) and have never used or (God forbid) heard of it, I can tell you that you are missing out on a very useful technology. I’m not an industry veteran nor an ETW specialist, but I already solved many incredibly complex customer issues using it (and its GUI analyzer, called the “Windows Performance Analyzer”) throughout my career. If you are not familiar with ETW, I suggest you start reading some of Bruce Dawson’s blog posts about investigations he made using it.

Anyways, this blog post will not be about ETW in general, but its binary output files of extension “.etl”. If you’ve ever taken a trace with the Windows Performace Recorder (or xperf), you know that these files can end up being pretty huge. Run system-wide tracing with sampled profiling and context switch recording for a few minutes, and we are already talking about gigabytes. Let’s see if we can beat general purpose compression algorithms using semantic knowledge about the content of ETL files.

Continue reading “Compressing ETL (ETW output) files”

Crashes you can’t handle easily #3: STATUS_HEAP_CORRUPTION on Windows

TL;DR: Catching heap corruptions of native heaps is not trivial. Therefore, catching heap corruptions of the C/C++ runtime’s heap is also not trivial.

Heap corruptions are infamously nasty. A few reasons:

  • At times, they might not cause any visible errors at all, making them hard to detect
  • Even if they cause an error, it usually surfaces long after the actual corruption, making them hard to pinpoint
  • Memory that gets corrupted usually has nothing to do with the actual culprit
  • Heap allocations of programs with a GUI are usually not fully deterministic, making a heap corruption potentially hard to reproduce

On Windows, the system can detect a subset of heap corruptions. As with most errors on Windows, this is signaled with an SEH exception. Catching this exception, however, needs some workarounds. But before we get to that, we need to take a look at the architectural relationship between the C/C++ runtime’s heap and native heaps.

Continue reading “Crashes you can’t handle easily #3: STATUS_HEAP_CORRUPTION on Windows”

A follow-up on WER plugins

TL;DR: Utilizing WER plugins is beneficial, but it turns out it’s not the silver bullet of crash detection.

Almost exactly a year ago I blogged about the so-called runtime exception helper modules of Windows Error Reporting (that’s a mouthful, so from now on I’ll refer to them as their internal name, WER plugins), and how you can enhance your crash detection capabilities with them on x64 Windows. I apologize for the lack of posts lately, I have been working on something in my free time that I’m not ready to share with the public yet. Since my post I referred above, I’ve gained some production experience with WER plugins, and I thought they are worthy of sharing.

Continue reading “A follow-up on WER plugins”