Skip to content

Conversation

@b-pass
Copy link
Collaborator

@b-pass b-pass commented Jan 13, 2026

Description

pybind11 leaks three PyTypes (metaclass, function_record, static_property) when an interpreter or sub-interpreter is finalized. This is about 1KB each. These should be released during finalization so that simply creating interpreters does not leak memory.

This fixes #5794 (Python itself leaks more memory than this, but we can fix what we're doing at least...)

If a pybind11 module is loaded into a non-pybind11-owned sub-interpreter, then the pybind11 internals are leaked when the sub-interpreter shuts down. This PR also fixes that, by having the internals destruct and free during the interpreter finalization.

Suggested changelog entry:

  • Deallocate pybind11 internals during (sub-)interpreter shutdown to avoid memory leaks.

shutdown can safely DECREF Python objects owned by the internals.
@b-pass b-pass requested a review from oremanj January 13, 2026 01:45
@oremanj
Copy link
Collaborator

oremanj commented Jan 13, 2026

I think for shared internals you should go ahead and deallocate the internals struct, but not the pointer to it, from the capsule destructor. That is, call get_pp()->reset() followed by unref() on the pp_manager, but don't call destroy(). (For local internals, go ahead and call destroy().) This reduces the leak for the main interpreter to 8 bytes (the std::unique_ptr<internals> itself, but not its contents), and means you can straightforwardly put internals cleanup code in the destructor of struct internals.

The internals state dict is cleared very late in interpreter shutdown - AFAICT it's the last place that calls user-controlled code where the Python C API is available. C-level Py_AtExit handlers are later but those aren't allowed to call into Python. Step 24 in this wonderful list, which I wish had a wider audience than a random GitHub comment: hudson-trading/pymetabind#2 (comment) (that list is for main interpreter shutdown, but I believe subinterpreters do a subset in basically the same order).

If anyone is trying to access internals after the state dict is cleared, it's overwhelmingly likely that whatever they're going to do with the result is invalid anyway.

If somehow someone does manage to ask for internals after they've been destroyed, there are two possibilities for what will occur, and I think both are acceptable.

  • If they are accessing the internals through a pp_manager that has had unref() called -- that is, if they are accessing local internals, or shared internals from the extension module that originally created them -- they will create and initialize a new internals struct and add a new entry for it to the interpreter state dict. If this occurred before the state dict was destroyed (the only possibility for that I can think of is if it's done from the destructor of a different entry in the state dict), then the new internals will promptly be destroyed again as PyDict_Clear continues its work. If it occurred after the state dict was destroyed, it will fail because the state dict doesn't exist - that's fine since you're really not supposed to run anything that touches the Python C API after that point.
  • If they are accessing through a different pp_manager (different extension module's reference to the same shared internals), they will see that the std::unique_ptr<internals> (which is still allocated) holds null, and fix that by creating and initializing a new internals struct but not registering it in the interpreter state dict. This has the outside potential to create a split-brain scenario where the extension modules in this category share one internals and the extension module in the previous category has a different internals. But everything is shutting down anyway, so it will be difficult for anyone to observe the mildly confusing results.

If we were OK with extension modules in the second category above crashing (since they hold a pointer to a deallocated std::unique_ptr<internals> in their pp_manager) we could go ahead and destroy() the pp_manager in the capsule destructor. But I think eight bytes' leak is a fair price to pay to allow them to not crash.

The perfect solution would be to store inside internals an intrusive linked list of all the pp_managers that refer to it. Then upon internals destruction we could automatically unref() all of them, so that it would be safe to deallocate the internals pointer too. This eliminates the leak and the split-brain possibility, but adding the list head would require an internals version bump. Maybe leave a TODO comment with a link to this thread.

@b-pass b-pass changed the title Free PyTypes owned by internals during interpreter finalization Destruct internals during interpreter finalization Jan 14, 2026
@b-pass
Copy link
Collaborator Author

b-pass commented Jan 14, 2026

Ok, I have updated the title and description and code to delete the internals itself in the capsule destructor. I didn't have it call unref currently, so @oremanj's second case applies. If something manages to use pybind11 during finalization after the internals capsule is destroyed, it will re-create (and subsequently might leak) its own new internals. This is an exceedingly rare case, and even in the event it happens the behavior won't result in a crash.

If something triggers internals to be created during finalization, it might end up being destroyed after finalization and we don't want to do the DECREF at that point, we need the leaky behavior.
// In odd finalization scenarios it might end up running after the interpreter has
// completely shut down, In that case, we should not decref these objects because pymalloc
// is gone.
if (Py_IsInitialized() != 0) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check defeats the point. Py_IsInitialized begins returning false quite early in Py_Finalize, well before the state dict is cleared, so adding this check will bring your leak back (at least for the main interpreter).

internals can only be freed by a DECREF on the capsule (in which case its destructor runs at a time when it's safe to DECREF things) or by an explicit call to internals_pp_manager::destroy(). Do any of the latter happen at times when it's not safe to DECREF things?

// completely shut down, In that case, we should not decref these objects because pymalloc
// is gone.
if (Py_IsInitialized() != 0) {
Py_XDECREF(static_property_type);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sequence of XDECREF + set to null is mostly equivalent to Py_CLEAR. The latter sets to null before the decref, so can be safer in some scenarios where the destructor of the thing you're destroying might reentrantly access the field that points to it. I don't think that's possible here but Py_CLEAR might express your intention more directly regardless.

// destructed when going out of scope here, so the destructor will be called
// immediately, which will also free the storage.
/*destructor=*/[](void *ptr) -> void { delete static_cast<Payload *>(ptr); });
/*destructor=*/dtor);
Copy link
Collaborator

@oremanj oremanj Jan 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a semantics-preserving change.

There are two possibilities for when the capsule gets destroyed:

  • It could be destroyed almost immediately, if another thread inserted the same key before we did. See the comment on line 639. In that case, no one else has seen our newly-allocated payload, so we should delete it.
  • It could be destroyed at interpreter finalization. That's when custom destruction logic might come into play.

If two threads concurrently try to create internals, your use of the final dtor from the start means that one of them will leak its unique_ptr. Not a terrible outcome, but easy enough to avoid.

Suggest you change the dtor parameter to dtor_if_inserted. Create the capsule initially with a destructor that deallocates the payload. If insertion succeeds and dtor_if_inserted is not null, then swap the capsule destructor to be dtor_if_inserted instead. Effectively, the old clear_destructor parameter is a special case of the dtor_if_inserted semantics: dtor_if_inserted is a no-op lambda if clear_destructor is true, or is null (i.e. continue with the original dtor that deletes the payload) if clear_destructor is false.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG]: Simply creating subinterpreter leaks memory

3 participants