diff options
Diffstat (limited to 'Documentation/vm/userfaultfd.txt')
-rw-r--r-- | Documentation/vm/userfaultfd.txt | 91 |
1 files changed, 90 insertions, 1 deletions
diff --git a/Documentation/vm/userfaultfd.txt b/Documentation/vm/userfaultfd.txt index 70a3c94d1941..0e5543a920e5 100644 --- a/Documentation/vm/userfaultfd.txt +++ b/Documentation/vm/userfaultfd.txt @@ -54,6 +54,26 @@ uffdio_api.features and uffdio_api.ioctls two 64bit bitmasks of respectively all the available features of the read(2) protocol and the generic ioctl available. +The uffdio_api.features bitmask returned by the UFFDIO_API ioctl +defines what memory types are supported by the userfaultfd and what +events, except page fault notifications, may be generated. + +If the kernel supports registering userfaultfd ranges on hugetlbfs +virtual memory areas, UFFD_FEATURE_MISSING_HUGETLBFS will be set in +uffdio_api.features. Similarly, UFFD_FEATURE_MISSING_SHMEM will be +set if the kernel supports registering userfaultfd ranges on shared +memory (covering all shmem APIs, i.e. tmpfs, IPCSHM, /dev/zero +MAP_SHARED, memfd_create, etc). + +The userland application that wants to use userfaultfd with hugetlbfs +or shared memory need to set the corresponding flag in +uffdio_api.features to enable those features. + +If the userland desires to receive notifications for events other than +page faults, it has to verify that uffdio_api.features has appropriate +UFFD_FEATURE_EVENT_* bits set. These events are described in more +detail below in "Non-cooperative userfaultfd" section. + Once the userfaultfd has been enabled the UFFDIO_REGISTER ioctl should be invoked (if present in the returned uffdio_api.ioctls bitmask) to register a memory range in the userfaultfd by setting the @@ -129,7 +149,7 @@ migration thread in the QEMU running in the destination node will receive the page that triggered the userfault and it'll map it as usual with the UFFDIO_COPY|ZEROPAGE (without actually knowing if it was spontaneously sent by the source or if it was an urgent page -requested through an userfault). +requested through a userfault). By the time the userfaults start, the QEMU in the destination node doesn't need to keep any per-page state bitmap relative to the live @@ -142,3 +162,72 @@ course the bitmap is updated accordingly. It's also useful to avoid sending the same page twice (in case the userfault is read by the postcopy thread just before UFFDIO_COPY|ZEROPAGE runs in the migration thread). + +== Non-cooperative userfaultfd == + +When the userfaultfd is monitored by an external manager, the manager +must be able to track changes in the process virtual memory +layout. Userfaultfd can notify the manager about such changes using +the same read(2) protocol as for the page fault notifications. The +manager has to explicitly enable these events by setting appropriate +bits in uffdio_api.features passed to UFFDIO_API ioctl: + +UFFD_FEATURE_EVENT_EXIT - enable notification about exit() of the +non-cooperative process. When the monitored process exits, the uffd +manager will get UFFD_EVENT_EXIT. + +UFFD_FEATURE_EVENT_FORK - enable userfaultfd hooks for fork(). When +this feature is enabled, the userfaultfd context of the parent process +is duplicated into the newly created process. The manager receives +UFFD_EVENT_FORK with file descriptor of the new userfaultfd context in +the uffd_msg.fork. + +UFFD_FEATURE_EVENT_REMAP - enable notifications about mremap() +calls. When the non-cooperative process moves a virtual memory area to +a different location, the manager will receive UFFD_EVENT_REMAP. The +uffd_msg.remap will contain the old and new addresses of the area and +its original length. + +UFFD_FEATURE_EVENT_REMOVE - enable notifications about +madvise(MADV_REMOVE) and madvise(MADV_DONTNEED) calls. The event +UFFD_EVENT_REMOVE will be generated upon these calls to madvise. The +uffd_msg.remove will contain start and end addresses of the removed +area. + +UFFD_FEATURE_EVENT_UNMAP - enable notifications about memory +unmapping. The manager will get UFFD_EVENT_UNMAP with uffd_msg.remove +containing start and end addresses of the unmapped area. + +Although the UFFD_FEATURE_EVENT_REMOVE and UFFD_FEATURE_EVENT_UNMAP +are pretty similar, they quite differ in the action expected from the +userfaultfd manager. In the former case, the virtual memory is +removed, but the area is not, the area remains monitored by the +userfaultfd, and if a page fault occurs in that area it will be +delivered to the manager. The proper resolution for such page fault is +to zeromap the faulting address. However, in the latter case, when an +area is unmapped, either explicitly (with munmap() system call), or +implicitly (e.g. during mremap()), the area is removed and in turn the +userfaultfd context for such area disappears too and the manager will +not get further userland page faults from the removed area. Still, the +notification is required in order to prevent manager from using +UFFDIO_COPY on the unmapped area. + +Unlike userland page faults which have to be synchronous and require +explicit or implicit wakeup, all the events are delivered +asynchronously and the non-cooperative process resumes execution as +soon as manager executes read(). The userfaultfd manager should +carefully synchronize calls to UFFDIO_COPY with the events +processing. To aid the synchronization, the UFFDIO_COPY ioctl will +return -ENOSPC when the monitored process exits at the time of +UFFDIO_COPY, and -ENOENT, when the non-cooperative process has changed +its virtual memory layout simultaneously with outstanding UFFDIO_COPY +operation. + +The current asynchronous model of the event delivery is optimal for +single threaded non-cooperative userfaultfd manager implementations. A +synchronous event delivery model can be added later as a new +userfaultfd feature to facilitate multithreading enhancements of the +non cooperative manager, for example to allow UFFDIO_COPY ioctls to +run in parallel to the event reception. Single threaded +implementations should continue to use the current async event +delivery model instead. |