tl;dr: We had a codegolf challenge recently. My C-solution was 246 byte, the perl-winner was 191. I decided to give notes for C-golf beginners
Note: Most of this blog post is incredibly boring. A better way than to
read through it is to just skip through the git-repository and refer to the
explanations here everytime you don’t know why a change works or what it
does. To make this easier, I added ankers to every paragraph, named by the
commit of the change it explains. So if you want to know, how a specific change
works, you can add #commitid to the url of this post, with commitid being
the full id of that change. If you want to read the more interesting
stuff, from about here it starts
to get non-obvious I think.
At the rgb2rv10 we again had a codegolf event. C is generally not a preferred language for golf, but I use it, because I know it best and my experiences with it are not that bad. My C solutions are most of the times longer then the shortest solutions in perl or similar languages, but they are competitive. That’s why I decided to make a blogpost explaining this years C solution as an example to show some basic C-golfing techniques.
This years challenge was to implement an interpreter for esocalc, a two-dimensional language for arithmetic expressions. Follow the link for a more detailed specification. This challenge is primed to be solved in C. This is also reflected by the length of the different solutions: The shortest C solution is 246 bytes, the shortest Python solution is 227 and the shortest Perl solution is 191 bytes. For a codegolf-challenge this is an impressively small gap between C and scripting languages.
You can follow this post by checking out the code
on github. The oldest commit is the one we are starting with and we will refine
it until we reach the 246 byte solution in master. You can test it by
compiling it (gcc -o golf golf.c should suffice in most cases, the shortest
needs a longer commandline, which is put in the Makefile, so you should make
it). You can run it through the testsuite used in the contest by running
GOLF_BIN="./golf prove -l".
The first step is to implement an easily readable, working version. This is done in the first commit. Though you yourself might have come up with a different implementation, this is pretty straightforward I think. We just read the whole esocalc-sourcecode and walk through it, executing every instruction as we go. The stack is statically allocated and of a fixed size, but that’s no problem because we only have a limited testsuite anyway.
The next step is obvious: We remove comments and move to one-letter variable names, thus reducing readability, but also size considerably. We will leave most of the whitespace for now, because else it is hard to follow the changes.
An important lesson for C-golfers is the following: for is never longer then
while and most of the times shorter. An endless loop with while takes one
character more then a for-loop. We will later see more instances when for will
be considerably shorter. Also, we see the if/else-constructs in the
control-flow instructions. It is considerably shorter to use a ternary operator
in most cases, because in C, most statements are also expressions, so we can
write them as cases in ?: - or use the short-circuiting && if there is no
else-part. We will see more of that later. Lastly we collapse multiple
variable declarations into one to save int-keywords. These three changes are
what happend in the next version.
We continue in our path and notice, that we every char-literal takes three
bytes, while the number it represents often only takes two in decimal.
Let’s fix that.
We also have two temporary variables a and b, that we shouldn’t need.
We can get rid of them,
by thinking up a single statement for arithmetic operations.
The next step
uses a real detail of C: If you don’t give a type for a global variable, a
parameter or the return type of a function, int is assumed. If a function is
not defined, a prototype of int foo() is assumed, meaning we can pass
arbitrary arguments and get an int. The libc is linked in by default. All
these facts means, we can drop all includes and put our variables in the
global scope to remove all int-keywords. This is a very basic, but very
usefull technique. It has one important caveat, you should look out for: If you
need the return value of a libc-function and it is not int, you should
think about wether it can be safely converted. For example on amd64 an int
has 32 bits, but a pointer has 64 bits, therefore pointers as return values get
truncated (even if you assign them to a pointer).
We can save more
by using a parameter to main. This is also a very basic and often seen trick
in C-golfing. You get up to 3 local variables for free this way. In our case
there is an additional benefit: The first parameter to main is the number of
arguments, which is 1 for a normal call (the first argument is the name with
which the programm was called). This means, we get the initialization to 1 for
free.
A trivial optimization
is using gets instead of read. gets always adds a terminating zero-byte,
so we need to grow our buffer a little bit.
If we now look at our code, all the case-keywords might annoy us. If we see
a lot of repititions in our code, the obvious tool to use in C are defines. So
lets define
the structure of the cases and replace every case by a short 1-letter identifier.
The same goes for the arithmetic operations: Four times the same long code cries
for a define.
A define is not always a good solution. You have to weigh the additional
overhead of the keyword and the needed newline against the savings and number
of repititions.
Next
we eliminate the variable i. Skilled C-coders use pointer-arithmetic quite
often (no matter how bad the reputation is). In this case it would be a bad
idea, if we were not explicitely allowed to assume that all programs are
correct and stay in the bounds given (because bound-checks are a lot harder
without indexing).
Another example of savings by for-loops is
the next change.
Here we moved two statements into the for-loop, thus using the semicolons we
need there anyway and saving two bytes.
So the next big thing that catches our eyes are the switch, case and
break-keywords. Everytime you see long identifiers or keywords you should
think about wether a different program-structure or a different libc-builtin
may help you save it. switch-construct can almost always be replaced by an
if-else if construct (which is why we learned to use switch anyway). This
is often shorter, but as we learned, the ternary operator is even shorter. So in
the next step
we use a giant ternary expression instead of a switch-structure. This brings
one major problem: return is one of the few things that’s a statement, but
not an expression. So we can’t use it in ?:-expressions (because the branches
have to be expressions). We use exit() instead, which is an expression, but a
void-expression, so again we run into problems using it in ?:. We work
around that for now by using (exit(0),1) instead. If you connect expressions
by , they are evaluated in succession (contrary to using boolean operators
for example) and the value of the last one is becoming the value of the whole
expression - so our exit-expression evaluates to 1 in this case.
exit is still pretty long (especially with the added parens and
comma-expression), so we want to avoid it too. Here comes a notable quote of
the organisator of the competition into action: “The return value isn’t
important, as long as the output is correct. So it doesn’t matter if you
segfault or anything”. This is the key to
the next change:
Instead of exiting orderly we just create the conditions for a segfault by
assigning zero to p, which is dereferenced shortly thereafter, thus creating
a segfault when we want to exit. This is one of my favourite optimizations.
There still is some repitition in our code. We still assign to d more often
then not. But our big nested ternary operator doesn’t return anything yet. So our
next step
is to return the new value for d in all subexpressions (if need be by using a
comma). This does not save a lot, but still a few bytes.
Now the sources of bytes to save are getting scarcer. What still is a pain is
the explicit stack of a fixed size. Here another deep mistery of C (or more
specifically the way modern computers work) comes into play:
The call stack. We can actually
use this as our stack.
The way this works is, that we use a pointer to an address in the memory area,
the operating system reserved for our call stack and grow down (contrary to the
illustration on wikipedia, the stack grows downwards. But this is a minor
detail). By writing to this pointer and decrementing, we can push to the stack.
By incrementing it and reading we can pop something from the top of the stack.
To get a valid stack-address we could use the address of a local variable (for
example s itself). Local variables are at the bottom of the stackframe, so we
do not overwrite anything important if growing down. There is however a
problem: We call gets and printf which push a few stackframes to the
callstack. Our stack would get smashed by these calls. Therefore we just
subtract a sufficiently high number from it to reserve space for the
stackframes of the function calls. 760 is the minimum amount needed in my
setup, everything up to 99999 should save at least one byte.
This still is unsatisfactory, so we will hack a little more and use the fact,
that the testsuite only uses quite small programms and a quite small stack is
needed. So we just
use s unitialized,
which is absolutely crazy. I discovered (by accident), that you will always end
up with a pointer to your program-array, using around 200 bytes of the end
(most probably some earlier deeply nested call in the startup of the binary
will write an appropriate address here by accident). This of course is
borderline cheating, but it saves 6 bytes, so who cares. From now on it’s
absolutely forbidden to compile with optimizations, because this will destroy
this coincidence. Oh well.
So, if we are already doing unreliable horrific voodoo which will curl up the
fingernails of every honest C developer, we can also
save two bytes
by not setting p to zero, but instead just doubling it. You will then end up
with some address, that is hard to predict, but in all cases I tried leads to
crashing just as reliable. This means, we exit our program in just one byte. Neato!
There is not a lot we can save left now. What might still annoy us and is a
very good tip in general are all this numbers. Even if most characters have
only 2 bytes as a decimal, they still only have one byte as a character (not a
char-literal!). We can
fix this
by passing a verbatim character as the first argument to the c-makro. To
interpret it as a char, we stringify it (with #a) and dereference it (with
*#a), getting the first char. This opens a problem: A space is a
significant character in the interpreted source code, so we need to use it as
an argument. But a space is not significant at that point in the C source code,
so we simply can not pass it to our makro. The solution to this is to move the
whole ASCII-table. So instead of comparing *p we compare *p+n with n to
be choosen. Thus we don’t need to pass a space, but some other char, that is
n positions away and everyone is happy. Kind of. We also need to avoid single
quotes, double quotes (though we can avoid this by using emtpy string (think
about why this works), but too many bytes!!!), parenthesis and chars outside of
ASCII (because this will break our C-file). These constrictions make n=3
pretty much the only choice. This means, we have to include a DEL-character in
our source-code, but the compiler is quite happy about that (the wiki isn’t,
github isn’t, the editor isn’t, but who cares). This is my second most favourite hack.
Now there is not much left to do. We remove the last char-literal left and remove all non-essential whitespace.
This leaves us with 253 bytes. To get below 250, we use buildflags instead of defines. Usually such flags are counted by the difference they add to a minimal compiler call needed. In this case, we have a 186 byte C-file (after removing the trailing newline added by vim) and 60 bytes of compiler-flags, totalling 246 bytes.
I think there still is potential to remove some more characters. Other tools
not used here include
dispatch tables
(which are kind of hard in C, because it lacks an eval, but some variations of
the concept still apply) and magic formulas. If the testcases are very limited,
some people resort to hardcoding the wanted results and just golf a minimal way
to differentiate between what output is wanted. This might be surprising, but
in many cases (this included) this will end up being shorter (though I consider
it cheating and try to avoid it). We also didn’t do a lot of
bit banging. For example using ^
instead of == reverses the check but saves a byte. But I think it is a
usefull intro for people who are just learning C and want to dive deeper into
the language by golfing.