High Performance Unix Logging (Part 2)

High Performance Unix Logging (Part 2)

Extracting high performance logging data with a debugger

In part one of this series, I described how to build a custom logging framework in userspace with very little overhead. It worked by having each thread store logging information in a circular buffer while in critical sections, and later post processing the buffer. This helps avoid system calls and expensive string processing when performance counts.

One of the downsides to this approach is that logging data is only available at certain points in the application. If a user needs log messages in the middle of a critical section, for example if they are using a breakpoint in a debugger, the log data must be processed a different way. In this part, I’m going to write a GDB extension which can extract the log data and present fully processed messages to the user. The custom GDB command will be invoked like this:

(gdb) log dump [index] ...

Setting up a GDB extension

GDB is a common debugger for Unix that can be used to debug applications. It supports many features, including stepping through code, setting breakpoints, and examining memory. It also supports extensions, which can customize GDB behavior for the program being debugged. Specifically, I’m going to be using the Python API to write my extension, since that is a language that I am familiar with.

The source code for a Python GDB extension is written as a regular Python module, which imports the gdb module and has a .py file extension. The Python module is loaded into GDB using the GDB source command or on the gdb command line with the -x flag. There are also some options to have Python extensions automatically loaded on startup. The version of Python that GDB uses is determined by which Python library is linked with GDB and cannot be changed. In Ubuntu 16.04, the default GDB package uses Python 3, but I’ve seen Python 2 used in some older systems. Due to this uncertainly, I make an effort to write my extensions so it is compatible with both Python 2 and 3.

To implement a new GDB command in Python, I must create a new object which extends the gdb.Command class. Since the command I am writing is log dump, this requires two custom commands. The first is for the log prefix command and the second is for the dump sub-command. Here is what the basic implementation of logging_utils.py looks like:

#!/usr/bin/gdb -x

import gdb

class LogCommand(gdb.Command):
    "Command prefix for working with log buffers"

    def __init__(self):
        super (LogCommand, self).__init__ ("log", gdb.COMMAND_SUPPORT, gdb.COMPLETE_NONE, True)

LogCommand() # Register LogCommand class


class LogDumpCommand(gdb.Command):
    "Dump all log messages\nlog dump [index] ..."

    def __init__(self):
        super (LogDumpCommand, self).__init__ ("log dump", gdb.COMMAND_SUPPORT, gdb.COMPLETE_NONE)

    def invoke(self, argument, from_tty):
        gdb.write("TODO implement\n")
        
LogDumpCommand() # Register LogDumpCommand class

Notice that at the top of this file, I put the path to GDB with the -x flag in the shebang. What this does is when this Python file executed directly on the command line, it will launch GDB and load itself as an extension:

$ ./logging_utils.py
GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.5) 7.11.1
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word".
(gdb) help log
Command prefix for working with log buffers

List of log subcommands:

log dump -- Dump all log messages

Type "help log" followed by log subcommand name for full documentation.
Type "apropos word" to search for commands related to "word".
Command name abbreviations are allowed if unambiguous.
(gdb) help log dump
Dump all log messages
log dump [index] ...
(gdb) log dump
TODO implement 

Parsing the log structure

For reference, here is the log structure from part one:

#include <stdint.h>

#define NUM_LOG_ENTRIES 1024
#define MAX_NUM_THREADS 2

typedef struct {
    const char * fmt_str; // Holds the printf format string
    unsigned long long cycle_count; // Holds CPU counter
    uint64_t args[2]; // Holds the arguments for printf()
} logEntry_t;

typedef struct {
    unsigned next_index;
    logEntry_t entries [NUM_LOG_ENTRIES];
} logInfo_t;

logInfo_t g_logs[MAX_NUM_THREADS];

At the top level, all logging information is stored in the g_logs array, which contains an entry for each thread. In GDB, the first step is accessing the logs is to look up the g_logs symbol using the symbol API:

# Lookup log buffer array symbol
g_log_sym = gdb.lookup_global_symbol('g_logs')
if not g_log_sym:
    raise gdb.GdbError("Could not find symbol 'g_logs' (make sure debug info is enabled)")
g_log = g_log_sym.value()

The Python object returned by value() contains the full type information, allowing me to walk structures and arrays. Since g_logs is an array, I can access individual array elements with square brackets and an integer. Here is how I would iterate over the g_logs for each thread the user passes as an argument to my new log dump command:

# Access the logs for each thread in the thread list
for thread in gdb.string_to_argv(argument):
    loginfo = g_log[int(thread)]

The logInfo_t struct contains a circular buffer to store each log message for a thread. To access elements of a struct, the square brackets are used with the name of the struct field as the key. The number of elements in the circular buffer is 1024, but to avoid hard coding that, it can be computed from the type information:

# Calculate the number of entries in the log
n_logs = loginfo["entries"].type.sizeof // loginfo["entries"][0].type.sizeof

The logInfo_t struct also contains an index, marking the next index that will be written. One past this index is the oldest message in the circular buffer. This should be the first message printed, and the code should iterate until over the buffer until the next index is reached again. This is the code:

# Loop over all the log entries
next_index = loginfo["next_index"]
for i in range(1, n_logs):
    index = (next_index + i) % n_logs
    entry = loginfo["entries"][index]

The logEntry_t struct contains the information for a single log message. This includes a printf() like format string, a timestamp, and up to two printf arguments. GDB provides a string() method to convert a character pointer into a Python string:

# Convert to a python string
fmt_str = entry["fmt_str"].string()

Python has builtin support for printf-like string formatting, but it isn’t quite the same as C. For instance, C has different length sub specifiers, such as l and ll, which are for long int and long long int respectively. Python doesn’t support these, since all Python integers are of arbitrary size. This means that I must manually remove the length sub-specifiers from the format string and must cast the 64-bit printf arguments to the correct integer size so that they are printed correctly:

# Process the format specifiers in the string to:
# 1. Count the number of arguments
# 2. Cast arguments to the correct type
# 3. Fixup the format string to be python compatible
args = []
def process_fmt(match):

    # Skip litteral '%'
    full_str = match.group(0)
    if full_str == "%%":
        return full_str

    # Determine the appropriate type to cast to
    typestr = match.group(1).lower()
    type_name = ""
    if not typestr.endswith("d"):
        type_name += "unsigned "
    if "ll" in typestr:
        type_name += "long long"
    elif "l" in typestr:
        type_name += "long"
    else:
        type_name += "int"

    # Cast the argument and convert to python integer
    value = int(entry["args"][len(args)].cast(gdb.lookup_type(type_name)))

    # Converting gdb Values to python integers seems to interpret large unsigned
    # values as negative.  If the format expects an unsigned int, then fix
    # that here
    if value < 0 and "unsigned" in type_name:
        # Undo 2's complement
        value += 2**64
    args.append(value)

    # Remove l's from format specifiers, since python doesn't recognize them
    return full_str.replace('l', '')
python_fmt_str = re.sub(r'%%|%[+\- #\.\d]*(l?l?[duoxX])', process_fmt, fmt_str)

For simplicity, I only handled the integer format specifiers, but it is fairly easy to add support for other types, such as stings. I also don’t handle using * for the width and precision sub-specifiers.

Finally, the log entries can be printed:

# Prefix each line of output with the cycle count
gdb.write("0x%016x: " % int(entry["cycle_count"]))

# Print the processed format string
try:
    gdb.write(python_fmt_str % tuple(args))
except (TypeError, ValueError):
    gdb.write("ERROR: Bad format string: %s\n" % fmt_str)

Putting it all together

Here is the completed script (with a few tweaks for better organization and argument handling):

#!/usr/bin/gdb -x

import gdb
import re

class LogCommand(gdb.Command):
    "Command prefix for working with log buffers"

    def __init__(self):
        super (LogCommand, self).__init__ ("log", gdb.COMMAND_SUPPORT, gdb.COMPLETE_NONE, True)

LogCommand() # Register LogCommand class


class LogDumpCommand(gdb.Command):
    "Dump all log messages\nlog dump [index] ..."

    def __init__(self):
        super (LogDumpCommand, self).__init__ ("log dump", gdb.COMMAND_SUPPORT, gdb.COMPLETE_NONE)

    def invoke(self, argument, from_tty):

        # Lookup log buffer array symbol
        g_log_sym = gdb.lookup_global_symbol('g_logs')
        if not g_log_sym:
            raise gdb.GdbError("Could not find symbol 'g_logs' (make sure debug info is enabled)")
        g_log = g_log_sym.value()

        # Parse the arguments to figure out which threads to dump
        argv = gdb.string_to_argv(argument)
        if argv:
            # Print the threads specified as arguments
            thread_list = [int(arg) for arg in argv]
        else:
            # Print all the threads
            n_threads = g_log.type.sizeof // g_log[0].type.sizeof
            thread_list = range(n_threads)

        # Print the logs for each thread in the thread list
        for thread in thread_list:
            gdb.write("\nLogs for thread %d:\n" % thread)
            self.print_loginfo(g_log[thread])


    def print_loginfo(self, loginfo):

        # Calculate the number of entries in the log
        n_logs = loginfo["entries"].type.sizeof // loginfo["entries"][0].type.sizeof

        # Loop over all the log entries
        next_index = loginfo["next_index"]
        for i in range(1, n_logs):
            index = (next_index + i) % n_logs
            self.print_entry(loginfo["entries"][index])


    def print_entry(self, entry):

        # Skip entries which are null
        if not entry["fmt_str"]:
            return

        # Convert to a python string
        fmt_str = entry["fmt_str"].string()

        # Process the format specifiers in the string to:
        # 1. Count the number of arguments
        # 2. Cast arguments to the correct type
        # 3. Fixup the format string to be python compatible
        args = []
        def process_fmt(match):

            # Skip litteral '%'
            full_str = match.group(0)
            if full_str == "%%":
                return full_str

            # Determine the appropriate type to cast to
            typestr = match.group(1).lower()
            type_name = ""
            if not typestr.endswith("d"):
                type_name += "unsigned "
            if "ll" in typestr:
                type_name += "long long"
            elif "l" in typestr:
                type_name += "long"
            else:
                type_name += "int"

            # Cast the argument and convert to python integer
            value = int(entry["args"][len(args)].cast(gdb.lookup_type(type_name)))

            # Converting gdb Values to python integers seems to interpret large unsigned
            # values as negative.  If the format expects an unsigned int, then fix
            # that here
            if value < 0 and "unsigned" in type_name:
                # Undo 2's complement
                value += 2**64
            args.append(value)

            # Remove l's from format specifiers, since python doesn't recognize them
            return full_str.replace('l', '')
        python_fmt_str = re.sub(r'%%|%[+\- #\.\d]*(l?l?[duoxX])', process_fmt, fmt_str)

        # Prefix each line of output with the cycle count
        gdb.write("0x%016x: " % int(entry["cycle_count"]))

        # Print the processed format string
        try:
            gdb.write(python_fmt_str % tuple(args))
        except (TypeError, ValueError):
            gdb.write("ERROR: Bad format string: %s\n" % fmt_str)

LogDumpCommand() # Register LogDumpCommand class

To test this, I rebuilt the logging_test from part one, with a small ring buffer to limit the output:

$ clang-6.0 -g -O3 -DNUM_LOG_ENTRIES=16 logging_test.c -lpthread -o logging_test

Make sure to build with -g to enable debug info, which is required for GDB to access all the symbol information. Here is the extension in action:

$ ./logging_utils.py logging_test
GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.5) 7.11.1
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from logging_test...done.
(gdb) r
Starting program: /home/mbrase/linux_logging/logging_test
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff77ef700 (LWP 3700)]
Started thread 0
[New Thread 0x7ffff6fee700 (LWP 3701)]
Started thread 1
^C
Thread 1 "logging_test" received signal SIGINT, Interrupt.
0x00007ffff7bc298d in pthread_join (threadid=140737345681152, thread_return=0x0) at pthread_join.c:90
90      pthread_join.c: No such file or directory.
(gdb) log dump

Logs for thread 0:
0x0000e779c1457cf7: Result = 0x39623e880dcbdd70b3b9d3314105d570
0x0000e779c1457d0b: Result = 0xf4d8360cb1625d0362301e9629596670
0x0000e779c1457d1f: Result = 0x62990aac5f47a15cfba42bde6c36d770
0x0000e779c1457d33: Result = 0x1738a6c626800430ec5fc3a484c42870
0x0000e779c1457d47: Result = 0xa44eaa91dfbaee52a5491325f6a75970
0x0000e779c1457d5b: Result = 0x74eed6ed0105870c18759acf06066a70
0x0000e779c1457d6f: Result = 0xbfc892330e3c1208d968c9156f875b70
0x0000e779c1457d83: Result = 0x38b83b16827e4af50eca525220502c70
0x0000e779c1457d97: Result = 0x5558872b338093c38372ae49ee06dd70
0x0000e779c1457dab: Result = 0x1d54cb9f1e60260c4d71fc154ed16e70
0x0000e779c1457dbf: Result = 0x4389877f173f46d1845ee5171155df70
0x0000e779c1457dd3: Result = 0x0169fb9b489aec44daf7efb214ba3070
0x0000e779c1457de7: Result = 0x9098cfaa90f240218e220a6e00a46170
0x0000e779c1457dfb: Result = 0x6716f597f226ae1514c2f03afd3a7270
0x0000e779c1457e0f: Result = 0x0cd4a750dcf54b51560d5e846b226370

Logs for thread 1:
0x0000e779c043c85c: Result = 0xe0455a74ccfd7a868190896818dc18f8
0x0000e779c043c870: Result = 0xc63452cafd4e5bdc508844d1858b8938
0x0000e779c043c884: Result = 0xbf2c9e062b65188e906e5ae401e88578
0x0000e779c043c898: Result = 0x59948916416c001327981ce120fe6db8
0x0000e779c043c8ac: Result = 0xbae75bb4111702262d95d29d0924a1f8
0x0000e779c043c8c0: Result = 0xea72f109725880b03942f56e6d5e8238
0x0000e779c043c8d4: Result = 0xb9c7adf061f495615f53ab5600bb6e78
0x0000e779c043c8e8: Result = 0x29ed967b045193b6f4a6bd3093b6c6b8
0x0000e779c043c8fc: Result = 0xaea46b1ef540308306785ce70b97eaf8
0x0000e779c043c910: Result = 0x0450f6b405b4889e0c8b10cf63d23b38
0x0000e779c043c924: Result = 0x2c70e294c68dd21477aa1bd1e9651778
0x0000e779c043c938: Result = 0xaa2fb19aafd6464371d6ea65e03bdfb8
0x0000e779c043c94c: Result = 0x3f57e2410e0c26a6121aa718c28df3f8
0x0000e779c043c960: Result = 0x55cd3a535b46f7a4c3416a134a3eb438
0x0000e779c043c974: Result = 0x6a927f5a46dd24b99fc6d9ef743d8078