August 28, 2025 Tagged: secutiry gsoc
During the year 2025, I took part in Google’s Summer of Code event. My project for the year was a more or less a direct continuation of the the project I had led the previous year, also as part Google Summer of Code, working on the Pwndbg project.
As far as objectives go, where 2024 saw me port Pwndbg to LLDB, (The LLVM Debugger) 2025 would see me both port the Pwndbg test suite to LLDB, and implement functionality that would let Pwndbg better deal with the Darwin platform, such as support for handling the Objective-C and Mach-O ABIs. Both of these had been left as potential future improvements by the original project.
The details for this year’s project can be found here.
Before I got myself deep into the weeds working towards the the main goals in the project, I figured
it would be a good idea to warm up a bit. Before the project officially began, disconnect3d
and I
had a call, during which they assigned me a few bugs that had piled up since
I had first implemented it in the Summer of 2024,
and that I hadn’t had the time to fix until then. For the same reason, it had also been
a few months since I’d last pushed anything upstream to Pwndbg, and so I didn’t know if anything
major had changed since then in either the code base itself, or the contribution process more
broadly.
Upon closer inspection, I also noted that most1 of these bugs I’d been assigned could be fixed rather quickly through small, self-contained PRs. I decided, then, to kick the summer off by using my first week to fix these bugs, rather than going for either of the main goals right away. The fixes would hopefully help with Pwndbg’s usability a reasonable amount, and would help me get myself back up to speed along the way.
In addition, as I made progress towards porting the test suite, it is only natural to expect that I
would end up coming across bugs in pwndbg-lldb
, as the newly-ported tests exercised bits of it
that up until that point hadn’t been properly exercised. Indeed, a handful of bugs in this class
were found and subsequently fixed, and will be listed here along with the other fixes, for the sake
of brevity.
Here’s the list of pull requests made during that period, in the order in which they were merged:
help set
to LLDB Pwndbgrun -s
automatically for entryAfter that was done and I was back up to speed with the project and the contribution flow, it was
time to move to the main part of the project: Setting up the testing framework for pwndbg-lldb
,
and porting existing tests to it. And, before describing how the actual porting work went down, it’s
important to give some background in how Pwndbg tests work.
Say - hypothetically - that you’ve added a new feature to Pwndbg, and want to run the test suite before submitting a PR for review. Here’s how that would work:
tests.sh
from the root of the Pwndbg repository. That’s the entry point script to most of
the testing functions in Pwndbg, but otherwise it just invokes Python on the next stage.tests/tests.py
enumerates the tests.Of particular interest to me was the way enumeration and test launching worked. While the Pwndbg test suite makes extensive use of PyTest, it’s not invoking it directly, as Pwndbg needs to run under an environment that has been specially set up for it. In the original GDB-only version of the test suite, setting the environment up for Pwndbg involved spawning a GDB process, initializing Pwndbg from inside it, then having it run a script that invoked PyTest from inside GDB.
With that in mind, I had to come up with a way to add support for LLDB into the mix. Given how convoluted and clunky the original setup felt to me at first, my first instict was to give some thought to whether I could make it more straightforward, either by allowing us to invoke PyTest directly, or by getting rid of the script that manages spawning GDB in favor of spawning it directly, and having the entire suite run from within a single GDB instance. Unfortunately, though, after experimenting with that idea for a few days, I came to the conclusion that both ideas were fairly impractical, in two main ways.
Ultimately, I decided to mostly keep the same architecture in place, and just extend it to support
launching tests using both pwndbg-lldb
and GDB. And so I refactored the structure of the testing
code around this idea of test drivers - GDB, LLDB - and test groups - GDB Tests, Debugger-agnostic
tests, LLDB Tests - and you can mix and match them more-or-less as you please. More details about
the way the refactoring went down can be found in this PR.
After that I was done, I was ready to move on to setting up the testing code to handle pwndbg-lldb
and getting some initial tests to run. There, however, I hit a small but interesting nag. While
controlling debugger state from inside GDB can be done using entirely synchronous code - and so all
tests up to that point has been written as regular synchronous functions - pwndbg-lldb
needs the
user to call into async
functions, as we need to suspend the calling function and run the process
event loop ourselves until it makes sense to continue.
At this point in the project, that meant I’d have to find some way to go from the synchronous test
execution environment that PyTest gives us at the start of a test, to an asynchronous environment
from which pwndbg-lldb
can be controlled. What I ended up going for was setting up a function that
would prepare that execution environment for a given test from the outer test process management
script in tests/tests.py
, and then having the individual tests themselves call it. And, while it
worked, it would be quite terrible to have to invoke that manually for every function, and so I made
a small decoderator by the name of @pwndbg_test
to allow me to write the tests as async
functions directly.
With all of those in place, I wrote a few initial tests that would help me make sure the new and
expanded test code worked for both pwndbg-lldb
and GDB, and that it would stay that way. These
were really simple, and only just tested whether the supporting code was able to launch async
tests. These came in expected success and XFAIL pairs, in order to ensure that entirely failing to
start the test case will count as a PASS. These changes were then merged as part of
this PR.
After that, the work consisted entirely of porting the existing GDB tests to the new
Debugger-agnostic group, so they could run under both GDB and pwndbg-lldb
, which was done in two
passes. In the first pass, I sorted the tests into those that were suitable for being turned into
debugger-agnostic tests - such as ones that tested Pwndbg functionality as it relates to debugging
processes - and those that were not - such as those that just tested the integration between Pwndbg
and GDB, or those that tested things LLDB flat out doesn’t support. At the end of that, I had the
following table, which I would use as a guide going forward:
LEGEND
X - N/A
O - PORTED
F - FLAKY
. - IN PROGRESS
( - PARTIAL
? - UNSURE
O heap
O test_architectures.py
X test_attachp.py
O test_cache.py
O test_callstack.py
O test_command_branch.py
O test_command_canary.py
O test_command_config.py
O test_command_cyclic.py
O test_command_distance.py
O test_command_dt.py
O test_command_errno.py
O test_command_flags.py
X test_command_ignore.py
X test_command_killthreads.py
O test_command_libcinfo.py
? test_command_onegadget.py - Heavy use of GDB, probably has to be rewritten
O test_command_plist.py
O test_command_procinfo.py
O test_command_search.py
O test_command_stepsyscall.py
O test_command_stepuntilasm.py
O test_command_telescope.py
O test_command_tls.py
? test_command_vmmap.py - Does lots of coredump-related stuff.
O test_command_xor.py
F test_commands.py
O test_commands_dumpargs.py
( test_commands_elf.py - Some tests needed a GDB-only command
O test_commands_next.py
X test_consistent_help.py
O test_context_commands.py
X test_cymbol.py
O test_emulate.py
X test_function_base.py
X test_gdblib_parameter.py
? test_glibc.py
( test_go.py - Some tests test for GDB startup.
X test_help.py
O test_hexdump.py
X test_loads.py
O test_memory.py
O test_misc.py
O test_mmap.py
O test_mprotect.py
O test_nearpc.py
X test_prompt_recolor.py
X test_readline.py - We _do_ import readline in pwndbg-lldb!
O test_symbol.py
( test_triggers.py - GDB version tests GDB-specific quirks.
O test_windbg.py
-- KNOWN ISSUES
test_tls_address_and_command[i386] - LLDB does not have gs_base in i386.
test_go_dumping_xXX - LLDB does not seem to fully support go[1-3].
[1]: error: Could not find type system for language go: TypeSystem for language go doesn't exist
[2]: https://discourse.llvm.org/t/golang-support/72384/10
[3]: https://discourse.llvm.org/t/anybody-using-the-go-java-debugger-plugins/47418
Porting the tests mostly consisted of replacing calls to GDB with calls to the Debugger-agnostic
process control functions, changing RegExes around to accomodate the slight differences between
the outputs in pwndbg
and pwndbg-lldb
- with the occasional changes to pwndbg
to bring the
outputs closer together whenever it made sense to go for that instead of changing regular expressions
around - and sprinkling plenty of async
keywords around.
Porting the tests was really the bulk of the work in this project, and that work got upstreamed in two batches, followed by an integration with the Pwndbg CI/CD pipelines, which also came with quite a few bug fixes mixed in with it:
As previously noted, while Pwndbg had had support for running on Darwin and for debugging both macOS and iOS binaries as of soon after the end of my 2024 GSoC project, that support was somewhat limited. Pwndbg has a lot of code that deals specifically with parsing ELF binaries and poking into Linux’s data structures, and it uses it extensively to support its more interesting features. While those features didn’t outright break the Darwin version, they were still missing, and it would be nice to make at least a few of them available on it.
So, with that in mind, I set out to add support to parsing both the DYLD Shared Cache and some of
the Objective-C ABI. The rationale I had for going for these two specfically was that being able to
read the Shared Cache would let us display more meaningful information on our virtual memory map -
whose lack of useful information on Darwin had been documented in
this issue - while being able to interact with the
Objective-C ABI would let us resolve information about live objects, classes, methods, selectors,
and the like, which would otherwise just look like a bunch of calls to _objc_msgSend
and functions
like it.
As I set out to implement support for both of those things, I first looked for some kind of official piece of documentation detailing the exact ABI structures and requirements, that we could peek into from Pwndbg, only to be met by the cold, sad reality that, apparently, Apple neither documents or gives out any guarantees on either of these ABIs.
From our perspective, that lack of documentation and stability is kind of a big deal. Being tied to a debugger, unless the user explicitly wants to change something about the program, we should ideally only ever read from program memory and registers, and try not ever modify anything. Apple’s stable API for reflecting on Objective-C objects and for talking to DYLD involve calling public functions from the context of the inferior and observing their return values, which is about as far as you can get from our ideal scenario.
This gives rise to a bit of tension right at the heart of this bit of functionality. Implementing anything to do with these ABIs will fundamentally involve a balancing act between our wish to avoid having to change inferior state by calling functions from it, and the need to target more stable interfaces, so that we don’t have to change large volumes of code every time Apple decides to change something internally.
Ultimately, the way the code was implemented, there is a mix of both strategies, where each one is used when I judged it to be the most sensible in light of these two conflicting requirements.
The support for these ABIs got merged as this PR. And as of the time of writing, it’s being used to enhance the debugging experience in Darwin in these ways:
While not as high-stakes or flashy as last year’s project, I feel this year’s project was vital to
the long-term viability of pwndbg-lldb
. And that it has achieved its primary goals, as stated in
the original project proposal, with one caveat.
The caveat is iOS support. While iOS support has beneffited from all of the enhancements made to Darwin support in Pwndbg over the course of the project, it’s still somewhat untested and hard to set up. While I have put some type into trying to get prototypes off the ground that are nicer to use than the current arrengement, that’s still all they are - prototyes. And so that remains as potential future work.
As always, if you want to read more into the work that went down during the summer you can always look into the PRs listed in this article, or read the conversations and talk to us in the Pwndbg Discord server.
With the major exception being bugs that involved Pwndbg not having enough Darwin-related functionality available at the time. ↩