Jekyll2017-12-11T16:03:42+00:00/Things...Abusing type checking for fun and profit2017-12-11T16:00:00+00:002017-12-11T16:00:00+00:00/helenos/c/errno/2017/12/11/abusing-type-checking-for-fun-and-profit<p>This is a post about error handling in the C programming language in general,
and in HelenOS in particular.</p>
<p>First, a bit of context. C traditionally doesn’t have very strong type system,
especially when it comes to integer values. There is basically no support for
defining new integer types – there are <code class="highlighter-rouge">char</code>, <code class="highlighter-rouge">signed char</code>, <code class="highlighter-rouge">unsigned char</code>
(yes, those are three distinct types; while <code class="highlighter-rouge">char</code> is semantically identical
to one of the other two, it’s still separate and not just an alias), <code class="highlighter-rouge">short</code>,
<code class="highlighter-rouge">unsigned short</code>, <code class="highlighter-rouge">int</code>, <code class="highlighter-rouge">unsigned int</code>, <code class="highlighter-rouge">long</code>, <code class="highlighter-rouge">unsigned long</code>, <code class="highlighter-rouge">long long</code>,
<code class="highlighter-rouge">unsigned long long</code>, and <code class="highlighter-rouge">_Bool</code>. That’s the exhaustive listing for standard
types. On several platforms, compilers also define the non-standard <code class="highlighter-rouge">__int128</code>,
but it’s not universally supported.</p>
<p>Other non-stardard integer types are
allowed, but this is not done in practice. All the other types you get from
various header files – e.g. <code class="highlighter-rouge">int32_t</code>, <code class="highlighter-rouge">size_t</code>, <code class="highlighter-rouge">wchar_t</code>, etc. – are all
just aliases to one of the above, a measly <code class="highlighter-rouge">#define int32_t int</code> (although
these days <code class="highlighter-rouge">typedef</code> is more commonly used, semantically, it makes no difference
– <code class="highlighter-rouge">typedef</code> creates an alias, not a new type). Enumerated types defined via
<code class="highlighter-rouge">enum</code> are no different. Although recent compilers come with scores of
diagnostics for <code class="highlighter-rouge">enum</code> types and their constants, it’s a far cry from strong
type checking. C++ gained its <code class="highlighter-rouge">enum class</code> some time ago, but C, sadly, doesn’t
have anything like that.</p>
<p>It comes as no surprise, then, that most C code doesn’t really distinguish
between various numeric types, whether they are enumerations, bitflags, or
file descriptors. I jokingly call it the “all-int” situation. See a parameter
or a return value typed <code class="highlighter-rouge">int</code>? Well great, you learned next to nothing. It
could be anything. The designation has no semantic value.</p>
<p>Naturally, this extends to error-handling. In HelenOS code base, the go-to
error handling mechanism has been to return negative error codes on failure
and positive valid returns on success. Correspondingly, its <code class="highlighter-rouge"><errno.h></code>
header defined negative constants, contrary to the C language standard.
This led to problems. Interfacing HelenOS libraries with code written for
standard environment (typically POSIX) has been more painful than necessary,
and where using standardized error codes just doesn’t cut it, domain-specific
error codes have been used with mixed results. On several occasions, different
kinds of error returns have been mixed improperly, resulting in hidden bugs
that only manifest in the rare exceptional conditions.</p>
<h1 id="towards-the-solution">Towards the solution</h1>
<p>The issue with negative error codes is probably the single greatest blocker
for a standards-compliant libc in the heart of HelenOS. However, since the code
depends on them being negative, just changing the constants would break pretty
much everything. Annoyingly, just separating error returns from actual results
is not by itself sufficient, because some code would still (improperly) check
for negativity, and it wouldn’t help with existing error handling bugs, or
with bugs inadvertently introduced during the transition.</p>
<p>My first attempt was to simply rename the constants and keep them negative,
reintroducing standard error codes on a case-by-case basis. This turned out
to be a spectacularly useless idea. It would create many problems and probably
cause more pain than it solved. I still thought the solution would be in
splitting the errors into independent, API-specific groups, but had little
idea how to turn that into practice. At the very least, I decided it would
help to introduce the C11 <code class="highlighter-rouge">errno_t</code> type, and see where it goes.</p>
<p>Then, a week ago, Jiří Svoboda started his own efforts of separating error
returns from valid results, which at the time duplicated/conflicted-with my own
efforts. However, this pointed me back to the idea of adding output parametes
instead of working with negative returns by another name, something that I
originally dismissed as distruptive.
After a short e-mail conversation, I asked Jiří to give me until the
end of the week to work on this my way, to which he agreed.</p>
<p>Solving all the issues by the end of the week, in the entire code base?
Insane! Well, not quite. And I would have managed if I didn’t make some silly
mistakes in the process, but I digress. I was already considering how to utilize
compiler diagnostics to detect problems, so when Jiří started separating the
error values, I got an idea how to exploit it fully.</p>
<h1 id="sinterrno_t">s/int/errno_t</h1>
<p>The idea is simple. If we mark every error value by a specific type (such as
<code class="highlighter-rouge">errno_t</code>, because why not?), then we can make the compiler fail-out on every
instance of errors getting mixed with non-errors. “But wait,” you say, “didn’t
you just explain that C can’t do that?”. Well, sort of. You see, the typing
doesn’t necessarily have to make sense or work at runtime, it just needs to
typecheck. If the typechecker guarantees that no mixing is happening, you
can change the type and constants after the fact and the guarantee still
applies (at least until you make new bugs). And C actually does have decent
diagnostics for various types, even if not all of them in any single type.</p>
<p>So I started by defining <code class="highlighter-rouge">errno_t</code> to be a unique pointer type, and all <code class="highlighter-rouge">Exxxx</code>
constants to be pointers of that type. This gives us some rather strong
guarantees: no assigments from or to other types without explicit casts,
no comparison to integers (except for equality with zero, which doesn’t hurt
us), no printf as an integer (not strictly a problem, but it’s always nice to
see a string representation instead of a random number).</p>
<p>That leaves the issue of actually changing the type of thousands of instances
of function parameter/return values and variables. As Jiří pointed out, in
HelenOS almost every function that returns <code class="highlighter-rouge">int</code> returns an error code. Which is
exactly what makes it easy. We can just mechanically rename all <code class="highlighter-rouge">int</code> return
types, along with a select few variable names (<code class="highlighter-rouge">rc</code>, <code class="highlighter-rouge">ret</code>, <code class="highlighter-rouge">retval</code>, a few
others that came up). There are far fewer exceptions than there are errors,
so doing the automatic replace and then fixing the problems is much easier than
going the other direction (remember, at this point <code class="highlighter-rouge">errno_t</code> is type-checked,
so there’s no way to miss an <code class="highlighter-rouge">errno_t</code> variable that has a non-error number
assigned). And the great thing about it is that applying a reverse rename from
<code class="highlighter-rouge">errno_t</code> to <code class="highlighter-rouge">int</code> doesn’t change semantics and gives a nice, manageable diff
of actual changes. Naturally, there are a lot of instances where new variables
had to be introduces to separate errno errors from other numbers, but faced with
the certainties we get in exchange, it’s a rather small price to pay.</p>
<p>It was still a lot more demanding that I anticipated, mostly because I made
some mistakes early on that forced me to redo a lot of the work (automatic
renames can be tricky to use right), but I still consider it well worth the
effort. As of now, I finished userspace, with major changes committed and
remaining minor changes (and final gargantuan reverse-automatic-rename patch)
pending review. Kernel is still in the works (the uspace part exhausted me),
but should be ready in a few days.</p>This is a post about error handling in the C programming language in general, and in HelenOS in particular.