High Performance Unix Logging (Part 1)

Writing efficient logging for performance critical and real time Unix applications.
Most applications have some mechanism for logging, which is way of saving information about the progression of the program. Basic applications just write text to stdout, using printf()
or some other equivalent. Larger applications have more sophisticated mechanisms, which usually involve writing information to a file(s). For example, the Apache webserver has a log file of all the clients that connect and a log file for errors.
The problem with logging is that it isn’t free. High verbosity is helpful for debug, but it takes system resources to process log messages and store them, and it can really add up. First, the human readable text for the log message has to be formed in memory. Next, it is usually passed to the operating system, via a system call. This causes a context switch and the executing thread to block while the operating system kernel does something with it (such as writing the data to a file, or passing it to the shell process to be rendered as text). When there are multiple threads logging to the same place, they must use locking in order to take turns, to prevent from overwriting each other’s messages. This all takes system resources (processing power, memory, disk i/o) away from the application doing real work.
For most applications, particularly those that interact with a user, this is an acceptable tradeoff. Computers are fast, humans are slow, and if a program pauses for a couple of milliseconds here and there to handle some logging, the user won’t perceive it at all. However, there are some cases where this doesn’t cut it:
- When logging is extremely verbose
- When applications need all the resources they can get
- When applications are sensitive to timing, such as real-time applications
For the rest of this post, I’m going to walk through designing a simple non-traditional logging system in C to handle these kinds of cases.
Designing to minimize intrusiveness
The key for this logging system is procrastination. The processor should spend as little time as possible working on logging while in a critical section, and save the hard stuff at a more opportune time. A big source of savings is avoiding system calls (e.g. writing to a file or printing to the screen), since those require pausing the current thread while the Linux Kernel runs. I will also avoid any unnecessary string processing and locking.
Each log message will be stored in a small C struct. In order to use a familiar printf()
like interface, the struct will hold a format string and some arguments. In addition, it will also store the current processor cycle count, to act as a timestamp for each log message (many processor architectures have a way to get this without making a system call).
To store multiple log messages, they will use a circular buffer. Once the buffer fills up, it will overwrite the oldest message. The size just needs to be large enough to have sufficient history to be useful when/if something goes horribly wrong. I also recommend to keep the size as a power of two, to keep the math simple for the processor.
If there are multiple threads in the system, each thread will have its own circular buffer. This keeps us from having to do any locking to share resources. However, this requires each thread to keep track of which buffer belongs to it.
Next, comes a function to write new messages to the log buffer.
Finally, there needs to be a function to extract the logs. To read the ring buffer, it must start on the entry one past the next_index
, and works its way around the circular buffer, processing messages from oldest to newest. The entry that next_index
points to should be skipped, since that may be an invalid state. I have provided a simple function which will print all the messages to stdout.
Checking the assembly
At this point, I usually like to check what I wrote by looking at the generated assembly. It helps give me an idea how the code will run on a processor, and serves as a nice sanity check. I spent an hour or trying different variations of my logging function, to get it to behave the way I wanted it.
Here it is in aarch64 assembly (compiled with LLVM 4.0 and -O3):
I tend to find aarch64 easier to read than x86_64, but I’m probably biased. If you aren’t familiar with aarch64, 64-bit registers start with x
and 32-bit registers start with w
. Function arguments are passed in the lower number registers:
x0
register holds the pointer to thelogInfo_t
structx1
register holds the pointer to the format stringx2
andx3
registers hold theprintf
arguments (arg0 and arg1 respectively)
Putting it all together
I wrote a proof-of-concept program to demonstrate the code. It starts some number of POSIX threads, which each do a billion 128-bit integer multiplication calculations, logging the value at each step. Once all the threads finish, the main thread prints out the logs for each thread. Since the logging circular buffer only holds 1024 entries, and one of those entries is reserved, this means that only the 1023 most recent messages are printed.
Below is the code if you want to run it yourself
To compile, I used LLVM 4.0 with the following flags:
Results
Here is an excerpt from the output of the program, which I ran on Ubuntu 16.04 with an old AMD Athlon 64 X2 4400+ processor (took about 6.5 seconds):
If we compare the left most numbers from the log, which are the cycle counts (RDTSC instruction on x86) at the time the message was logged, we see that each pass through the loop requires 15 cycles. My processor runs at 2.3 GHz, which means that 15 cycles is about 6.5 ns.
I also measured the amount of time it took to dump all the log messages at the end, which was 0x10482c6 cycles (about 17 million cycles, or 7.4 ms at 2.3 GHz). In this time, it called printf()
4098 times (2 threads * (1023 * 2 + 3)), which means it takes an average of 4166 cycles per printf()
, or 1811 ns. I wouldn’t call this slow, and this is acceptable for many applications, but it is still ~280 times slower than our new logging system.
Next Steps
There are many ways this simple logging system can be improved, which I’ll mostly leave as an exercise to the reader. One area I want to delve in deeper is alternative ways to extracting the logs from memory, so that they are human readable. What if you could check the logs whenever you needed them, instead of at the end of the program?
In Part 2, we are going to look at extracting log messages using a debugger.