Pwndbg, in _my_ LLDB? More likely than you think. Click here

August 24, 2024
Tagged: secutiry gsoc 

Over the summer of 2024, I worked as a contributor in Google Summer of Code with the wonderful people over at the Python Software Foundation and the Pwndbg project to bring about some pretty big changes to Pwndbg and to the way it works.

The Problem

If one were to visit the Pwndbg website at the start of the project, they would be greeted with the following, right near the top of the first paragraph:

[…] It improves debugging experience with strength of GDB for low-level software developers, hardware hackers, reverse engineers, and exploit developers. […]

pwndbg.re on the 27th of May 2024 (emphasis added)

Pwndbg was, decidedly, for and about GDB. And, in light of that, it should come as no surprise that Pwndbg was heavily tied to GDB at the source code level. While there had been a strong effort to abstract over GDB in the form of a module named gdblib, that effort was motivated by providing an easier API, as documented in the Development Basics. So, even if Pwndbg did have quite a number of abstractions over GDB, those abstractions themselves were implemented with no regard to how other debuggers might behave.

Here’s where the problem enters the scene: Focusing exclusively on GDB means, naturally, that Pwndbg could only ever be used in situations where GDB itself could be used. While that had been fine for a long time, with Apple platforms and, more recently, Android moving away from GDB and towards LLDB as their only supported debugger, that restriction was beginning to hinder Pwndbg.

So, it was proposed for GSoC 2024 that Pwndbg be reworked to make it more debugger-agnostic, and that it be ported to LLDB, so that it could be used in these platforms, in addition to the ones that GDB supports. And, ideally, this would be done in a way in which both GDB and LLDB would share most of the code in Pwndbg, with as much functionality as possible also being shared between the two.

The original proposal can be found here.

I took up that task, and work began in earnest.

Upstreaming Status

UPDATE: All of these changes have already been written and merged.

The most up to date branch of the work can be found here, and it’s important to note that not all of it has been upstreamed yet. Currently the PR that contains the API port is still open, and is on track to be merged soon.

Additionally, there are two changes that are still to be made into PRs, but that are blocked on the API port being merged:

The Work

Good news, and spoilers: over the course of the summer, I managed to largely achieve the goals that were laid out in the project proposal, with a few exceptions that I will get into later on. While the LLDB version of Pwndbg is still missing some features that are present in the GDB version, and the debugger-agnostic APIs are still missing one important piece of functionality, Pwndbg now works on LLDB! And, in addition, so much of the code in Pwndbg is shared between the two debuggers that all the debugger-specific bits only sum up to about 3000 lines of code, as counted by cloc.

$ nix run nixpkgs#cloc -- pwndbg/dbg/gdb.py pwndbg/dbg/lldb/
       9 text files.
       9 unique files.
       2 files ignored.

github.com/AlDanial/cloc v 2.00  T=0.13 s (71.4 files/s, 37806.7 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
Python                           9            861            996           2908
-------------------------------------------------------------------------------
SUM:                             9            861            996           2908
-------------------------------------------------------------------------------

Additionally, around 80 commands have been ported to LLDB, as of the time of writing.

The work that was done can be thought of split into three parts, though in reality they all happened in parallel to a large degree. These three parts cover distinct sets of pull requests and different parts of the code, and splitting them like this should - I can only hope - make it easier to understand.

Part I: Foundations

This part of the work consisted of only touching as much functionality as was needed to get Pwndbg to start in LLDB, without any of the commands, or the infrastructure - i.e. modules like gdblib, lib, color, etc. It also saw the introduction of the Debugger-agnostic API, under the module named dbg. This module is intended to provide an abstraction over debugger concepts and data structures - things like frames, processes, threads, register sets - for both LLDB and GDB. All the debugger-agnostic code developed for Pwndbg over the course of the project is built on top of it.

The PRs that were submitted in this part were:

By the time all of these were merged, Pwndbg was capable of starting up in LLDB, and pwndbg.command.ArgparsedCommand was fully available, including things like debugger-aware argument parsing for commands, and expression evaluation. By the end of this part, too, much of the groundwork for the abstractions that got used going forward in the debugger-agnostic API had been laid, including things like processes and frames. These get expanded considerably in the next two parts.

Additionally, in this part the Debugger-agnostic API also gained functioning support for debugger-defined values - ala gdb.Value and SBValue - and types - ala gdb.Type and SBType. The debugger-agnostic versions of these types follow their GDB counterparts in both method definitions and semantics, as long as Pwndbg actually uses them. The reason for this is, as those are fairly easy to reproduce in LLDB, we can have our own debugger-agnostic types be drop-in replacements to the GDB types in Pwndbg for basically free.

Part II: Creation of the Debugger-agnostic Library (aglib)

While coming up a strategy for porting Pwndbg to the new Debugger-agnostic API, I was aiming for something that would let me get as many commands as possible ported over in the time that I had during Google Summer of Code. Modern versions of Pwndbg, as of the time of writing, have around 160 commands, so, porting them individually was, for the most part, out. Ideally, then, I would be doing this in batches of commands, as the functionality they needed became available.

I think it’s important to detail just what that functionality is. Since, if this functionality was directly tied to GDB, it would be very likely that these command batches would be fairly small, as the GDB API tends to favor fewer methods that do a greater number of operations - think gdb.execute, for instance, or gdb.parse_and_eval - making it harder to group these calls by the type of operation they perform. Luckily for me, however, almost none of the functionality was directly tied to GDB.

Recall how Pwndbg already had some abstractions on top of GDB, in the form of gdblib. While these abstractions were specific to GDB, and the main purpose behind them was, as I’ve already mentioned, to make the GDB API easier to use and more reliable, their use also had the side effect of making what would be calls to fairly abstract GDB functionality like gdb.execute("set $rax = {val}") into things like pwndbg.gdblib.regs.rax = val. Meaning that, for the most part, I could group commands together by the gdblib modules they depended on, and not directly by the GDB functionality they needed.

In light of that, I decided that the best course of action was to create a debugger-agnostic module that had the same API surface as gdblib, and port all the commands to it. At first, the decision to do this might seem like it came completely out of left field. Was I just trading a hard problem for another hard problem? The commands themselves might not need to directly interact with GDB, but, surely, gdblib would be so strongly tied to GDB that it would have to be entirely rewritten to be debugger-agnostic, and, at that point, why even bother keeping the extra layer in? Wouldn’t it be better to just push everything onto the Debugger-agnostic API and call it a day? Well, not quite.

You see, it turned out that gdblib, upon closer scrutiny, had a lot less to do with GDB that one would initially think, given both its name and its goals. In reality gdblib is more about the things that are done to the values that come out of GDB than about talking to GDB. And while some of the things being done to the values are workarounds specific to GDB, many more of them aren’t. Things like special handling for QEMU, managing files acquired from remote debugging hosts, parsing ELF files, talking to the system, and more, are all part of gdblib, but have little to do with GDB itself. In fact, most modules in gdblib only talk to GDB fairly sporadically, and in a small number of ways. Meaning that, by taking out the bits of gdblib that interface with GDB and the workarounds that came with them, and abstracting them behind the Debugger-agnostic API, everything that gets left behind can then be shared between GDB and LLDB.

And, as it turned out, this strategy worked out very well! For more details, see the PR that introduced aglib, and how extensively it is used in the port PR.

Part III: Porting the Internal Functionality and Commands in Pwndbg

This part of the work consisted mostly of just the bulk of the porting, as outlined in the previous parts. The details of everything that went down would be too much to cover here, but here are the PRs belong to this part of the work. All of them come with their own explanations.

Miscellaneous

There were also a few miscellaneous PRs that I don’t feel fit into any of the aforementioned parts, so I’ll just list them here:

What’s missing?

While I have done my best to have this project be as complete as possible, and while I believe I am through with and have been successful in porting the hardest, most tangled bits of Pwndbg to LLDB, there are still a couple of loose ends left that need to be tied up:

Further Reading

If you are curious and want to read more, you can read my posts on the PSF’s Mastodon instance, which give weekly insights into the project, as it was happening. Additionally, a lot of discussion happened in the Pwndbg Discord, specifically in the #dev channel and the GSoC LLDB Project thread.