lobisomem.gay

Pwndbg LLDB II: Electric Boogaloo

August 28, 2025
Tagged: secutiry gsoc

During the year 2025, I took part in Google’s Summer of Code event. My project for the year was a more or less a direct continuation of the the project I had led the previous year, also as part Google Summer of Code, working on the Pwndbg project.

As far as objectives go, where 2024 saw me port Pwndbg to LLDB, (The LLVM Debugger) 2025 would see me both port the Pwndbg test suite to LLDB, and implement functionality that would let Pwndbg better deal with the Darwin platform, such as support for handling the Objective-C and Mach-O ABIs. Both of these had been left as potential future improvements by the original project.

The details for this year’s project can be found here.

Part I: Warmup, Bug Fixes & Assorted Improvements

Before I got myself deep into the weeds working towards the the main goals in the project, I figured it would be a good idea to warm up a bit. Before the project officially began, disconnect3d and I had a call, during which they assigned me a few bugs that had piled up since I had first implemented it in the Summer of 2024, and that I hadn’t had the time to fix until then. For the same reason, it had also been a few months since I’d last pushed anything upstream to Pwndbg, and so I didn’t know if anything major had changed since then in either the code base itself, or the contribution process more broadly.

Upon closer inspection, I also noted that most¹ of these bugs I’d been assigned could be fixed rather quickly through small, self-contained PRs. I decided, then, to kick the summer off by using my first week to fix these bugs, rather than going for either of the main goals right away. The fixes would hopefully help with Pwndbg’s usability a reasonable amount, and would help me get myself back up to speed along the way.

In addition, as I made progress towards porting the test suite, it is only natural to expect that I would end up coming across bugs in pwndbg-lldb, as the newly-ported tests exercised bits of it that up until that point hadn’t been properly exercised. Indeed, a handful of bugs in this class were found and subsequently fixed, and will be listed here along with the other fixes, for the sake of brevity.

Here’s the list of pull requests made during that period, in the order in which they were merged:

Part II: The Testing Framework

After that was done and I was back up to speed with the project and the contribution flow, it was time to move to the main part of the project: Setting up the testing framework for pwndbg-lldb, and porting existing tests to it. And, before describing how the actual porting work went down, it’s important to give some background in how Pwndbg tests work.

Say - hypothetically - that you’ve added a new feature to Pwndbg, and want to run the test suite before submitting a PR for review. Here’s how that would work:

Run tests.sh from the root of the Pwndbg repository. That’s the entry point script to most of the testing functions in Pwndbg, but otherwise it just invokes Python on the next stage.
The main tests script at tests/tests.py enumerates the tests.
The main tests script spawns a new process for each test case matching the filter expression, and collects and displays its status.

Of particular interest to me was the way enumeration and test launching worked. While the Pwndbg test suite makes extensive use of PyTest, it’s not invoking it directly, as Pwndbg needs to run under an environment that has been specially set up for it. In the original GDB-only version of the test suite, setting the environment up for Pwndbg involved spawning a GDB process, initializing Pwndbg from inside it, then having it run a script that invoked PyTest from inside GDB.

With that in mind, I had to come up with a way to add support for LLDB into the mix. Given how convoluted and clunky the original setup felt to me at first, my first instict was to give some thought to whether I could make it more straightforward, either by allowing us to invoke PyTest directly, or by getting rid of the script that manages spawning GDB in favor of spawning it directly, and having the entire suite run from within a single GDB instance. Unfortunately, though, after experimenting with that idea for a few days, I came to the conclusion that both ideas were fairly impractical, in two main ways.

Refactoring the code to allow for PyTest to be used directly wouldn’t change much in the end, as the code that manages spawning GDB processes would still have to live somewhere - most likely a PyTest fixture - as there’s really no getting around the need to run the tests under GDB - after all, the vast majority of them need to run on top of a real debugger and program. I’d essentially be spending quite a bit of time just to shuffle that complexity around, instead of getting rid of it, and while there would be merit in doing that if there was anything to be gained in terms of making the code easier to understand, I judged that this hypothetical new architecture wouldn’t be any better at any of those things than the existing one.
Running both PyTest and all of the tests under a single GDB instance would be quite nice, but, as things are, it wouldn’t be possible. Most tests have side effects on the running debugger that would carry on into the following tests and most likely cause them to fail. While it might technically be possible to undo every side effect at the end of every test, that would require quite a bit of work and so I decided it was wildly out of scope for this project.

Ultimately, I decided to mostly keep the same architecture in place, and just extend it to support launching tests using both pwndbg-lldb and GDB. And so I refactored the structure of the testing code around this idea of test drivers - GDB, LLDB - and test groups - GDB Tests, Debugger-agnostic tests, LLDB Tests - and you can mix and match them more-or-less as you please. More details about the way the refactoring went down can be found in this PR.

After that I was done, I was ready to move on to setting up the testing code to handle pwndbg-lldb and getting some initial tests to run. There, however, I hit a small but interesting nag. While controlling debugger state from inside GDB can be done using entirely synchronous code - and so all tests up to that point has been written as regular synchronous functions - pwndbg-lldb needs the user to call into async functions, as we need to suspend the calling function and run the process event loop ourselves until it makes sense to continue.

At this point in the project, that meant I’d have to find some way to go from the synchronous test execution environment that PyTest gives us at the start of a test, to an asynchronous environment from which pwndbg-lldb can be controlled. What I ended up going for was setting up a function that would prepare that execution environment for a given test from the outer test process management script in tests/tests.py, and then having the individual tests themselves call it. And, while it worked, it would be quite terrible to have to invoke that manually for every function, and so I made a small decoderator by the name of @pwndbg_test to allow me to write the tests as async functions directly.

With all of those in place, I wrote a few initial tests that would help me make sure the new and expanded test code worked for both pwndbg-lldb and GDB, and that it would stay that way. These were really simple, and only just tested whether the supporting code was able to launch async tests. These came in expected success and XFAIL pairs, in order to ensure that entirely failing to start the test case will count as a PASS. These changes were then merged as part of this PR.

After that, the work consisted entirely of porting the existing GDB tests to the new Debugger-agnostic group, so they could run under both GDB and pwndbg-lldb, which was done in two passes. In the first pass, I sorted the tests into those that were suitable for being turned into debugger-agnostic tests - such as ones that tested Pwndbg functionality as it relates to debugging processes - and those that were not - such as those that just tested the integration between Pwndbg and GDB, or those that tested things LLDB flat out doesn’t support. At the end of that, I had the following table, which I would use as a guide going forward:

LEGEND
X - N/A
O - PORTED
F - FLAKY
. - IN PROGRESS
( - PARTIAL
? - UNSURE

O heap
O test_architectures.py
X test_attachp.py
O test_cache.py
O test_callstack.py
O test_command_branch.py
O test_command_canary.py
O test_command_config.py
O test_command_cyclic.py
O test_command_distance.py
O test_command_dt.py
O test_command_errno.py
O test_command_flags.py
X test_command_ignore.py
X test_command_killthreads.py
O test_command_libcinfo.py
? test_command_onegadget.py	- Heavy use of GDB, probably has to be rewritten
O test_command_plist.py
O test_command_procinfo.py
O test_command_search.py
O test_command_stepsyscall.py
O test_command_stepuntilasm.py
O test_command_telescope.py
O test_command_tls.py
? test_command_vmmap.py		- Does lots of coredump-related stuff.
O test_command_xor.py
F test_commands.py
O test_commands_dumpargs.py
( test_commands_elf.py		- Some tests needed a GDB-only command
O test_commands_next.py
X test_consistent_help.py
O test_context_commands.py
X test_cymbol.py
O test_emulate.py
X test_function_base.py
X test_gdblib_parameter.py
? test_glibc.py
( test_go.py			- Some tests test for GDB startup.
X test_help.py
O test_hexdump.py
X test_loads.py
O test_memory.py
O test_misc.py
O test_mmap.py
O test_mprotect.py
O test_nearpc.py
X test_prompt_recolor.py
X test_readline.py		- We _do_ import readline in pwndbg-lldb!
O test_symbol.py
( test_triggers.py		- GDB version tests GDB-specific quirks.
O test_windbg.py

-- KNOWN ISSUES

test_tls_address_and_command[i386] - LLDB does not have gs_base in i386.
test_go_dumping_xXX - LLDB does not seem to fully support go[1-3].

[1]: error: Could not find type system for language go: TypeSystem for language go doesn't exist
[2]: https://discourse.llvm.org/t/golang-support/72384/10
[3]: https://discourse.llvm.org/t/anybody-using-the-go-java-debugger-plugins/47418

Porting the tests mostly consisted of replacing calls to GDB with calls to the Debugger-agnostic process control functions, changing RegExes around to accomodate the slight differences between the outputs in pwndbg and pwndbg-lldb - with the occasional changes to pwndbg to bring the outputs closer together whenever it made sense to go for that instead of changing regular expressions around - and sprinkling plenty of async keywords around.

Porting the tests was really the bulk of the work in this project, and that work got upstreamed in two batches, followed by an integration with the Pwndbg CI/CD pipelines, which also came with quite a few bug fixes mixed in with it:

Part III: Darwin Support

As previously noted, while Pwndbg had had support for running on Darwin and for debugging both macOS and iOS binaries as of soon after the end of my 2024 GSoC project, that support was somewhat limited. Pwndbg has a lot of code that deals specifically with parsing ELF binaries and poking into Linux’s data structures, and it uses it extensively to support its more interesting features. While those features didn’t outright break the Darwin version, they were still missing, and it would be nice to make at least a few of them available on it.

So, with that in mind, I set out to add support to parsing both the DYLD Shared Cache and some of the Objective-C ABI. The rationale I had for going for these two specfically was that being able to read the Shared Cache would let us display more meaningful information on our virtual memory map - whose lack of useful information on Darwin had been documented in this issue - while being able to interact with the Objective-C ABI would let us resolve information about live objects, classes, methods, selectors, and the like, which would otherwise just look like a bunch of calls to _objc_msgSend and functions like it.

As I set out to implement support for both of those things, I first looked for some kind of official piece of documentation detailing the exact ABI structures and requirements, that we could peek into from Pwndbg, only to be met by the cold, sad reality that, apparently, Apple neither documents or gives out any guarantees on either of these ABIs.

From our perspective, that lack of documentation and stability is kind of a big deal. Being tied to a debugger, unless the user explicitly wants to change something about the program, we should ideally only ever read from program memory and registers, and try not ever modify anything. Apple’s stable API for reflecting on Objective-C objects and for talking to DYLD involve calling public functions from the context of the inferior and observing their return values, which is about as far as you can get from our ideal scenario.

This gives rise to a bit of tension right at the heart of this bit of functionality. Implementing anything to do with these ABIs will fundamentally involve a balancing act between our wish to avoid having to change inferior state by calling functions from it, and the need to target more stable interfaces, so that we don’t have to change large volumes of code every time Apple decides to change something internally.

Ultimately, the way the code was implemented, there is a mix of both strategies, where each one is used when I judged it to be the most sensible in light of these two conflicting requirements.

The support for these ABIs got merged as this PR. And as of the time of writing, it’s being used to enhance the debugging experience in Darwin in these ways:

Conclusion

While not as high-stakes or flashy as last year’s project, I feel this year’s project was vital to the long-term viability of pwndbg-lldb. And that it has achieved its primary goals, as stated in the original project proposal, with one caveat.

The caveat is iOS support. While iOS support has beneffited from all of the enhancements made to Darwin support in Pwndbg over the course of the project, it’s still somewhat untested and hard to set up. While I have put some type into trying to get prototypes off the ground that are nicer to use than the current arrengement, that’s still all they are - prototyes. And so that remains as potential future work.

As always, if you want to read more into the work that went down during the summer you can always look into the PRs listed in this article, or read the conversations and talk to us in the Pwndbg Discord server.

With the major exception being bugs that involved Pwndbg not having enough Darwin-related functionality available at the time. ↩