Skip to content

Conversation

@gerd-rausch
Copy link
Contributor

The PROCMAP_QUERY ioctl, available since Linux-6.11, is able to obtain the size of an underlying mapping without the need to parse /proc/self/smaps.

Because kernels that support the PROCMAP_QUERY ioctl also support copy_on_fork, environment variable
RDMAV_USE_MADVISE must be used to bypass the
copy_on_fork feature, and allow PROCMAP_QUERY to be used instead.

If PROCMAP_QUERY isn't available on the running kernel, the "smaps" file will be parsed afterall.

The PROCMAP_QUERY ioctl, available since Linux-6.11,
is able to obtain the size of an underlying mapping
without the need to parse /proc/self/smaps.

Because kernels that support the PROCMAP_QUERY ioctl
also support copy_on_fork, environment variable
RDMAV_USE_MADVISE must be used to bypass the
copy_on_fork feature, and allow PROCMAP_QUERY to be
used instead.

If PROCMAP_QUERY isn't available on the running kernel,
the "smaps" file will be parsed afterall.

Signed-off-by: Gerd Rausch <gerd.rausch@oracle.com>
@jgunthorpe
Copy link
Member

What is the point of this? copy_on_fork is the superior solution, if your kernel has it you should use it.

@gerd-rausch
Copy link
Contributor Author

copy_on_fork is slow with very large mappings.

@jgunthorpe
Copy link
Member

So don't create mappings that need copy on fork? This patch causes the library to call MADVISE_DONT_FORK, if that is what the application wants then just call it when it created the mapping in the first place. Or use a MAP_SHARED mapping and avoid all of this.

@gerd-rausch
Copy link
Contributor Author

gerd-rausch commented Jan 21, 2026

libibverbs is a library. Applications with very large mappings may be old.

If they ran on a kernel that didn't support copy_on_fork yet, or rdma-core pre-this-commit,
they'd be able to fork() much faster than on kernels with copy_on_fork support.

Thus the desire to not copy all those pages (especially if followed by execve()), but to stick to the madvise() approach.

@gerd-rausch
Copy link
Contributor Author

gerd-rausch commented Jan 21, 2026

Also: Prior to the introduction of copy_on_fork support to the kernel, there was no need for any application
to use MADV_DONTFORK on their mappings, as libibverbs would take care of that for them (e.g. via ibv_reg_mr()).

If the very same applications now run on a copy_on_fork supporting kernel, they'd suddenly be expected
to do the MADV_DONTFORK attribution on their own, unless they're okay with significant performance regressions?

@jgunthorpe
Copy link
Member

Prior to this the whole thing was insane and half broken, applications could get random memory corruption during fork if they were not careful. Now it works as expected. If applications were fine with discarding the mappings they should have been setting MADV_DONTFORK. If they are only using fork to execv then they should be using MAP_SHARED which will run even faster.

copy on fork is old at this point, I don't think it is a good suggestion to start adding environment variables that users have to randomly guess based on what applications and libraries they happen to use.

And it isn't like this was fast before, all the proc parsing was really slow stuff, so IDK about this performance argument.. If the app enabled fork support then it instantly became a lot slower when registering buffers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants