<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/fs/userfaultfd.c, branch v5.10.257</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v5.10.257</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v5.10.257'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2022-10-26T11:25:17+00:00</updated>
<entry>
<title>userfaultfd: open userfaultfds with O_RDONLY</title>
<updated>2022-10-26T11:25:17+00:00</updated>
<author>
<name>Ondrej Mosnacek</name>
<email>omosnace@redhat.com</email>
</author>
<published>2022-07-08T09:34:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=6e4be747f15fa32a6382ca2d9392fb9ba2ce051f'/>
<id>urn:sha1:6e4be747f15fa32a6382ca2d9392fb9ba2ce051f</id>
<content type='text'>
[ Upstream commit abec3d015fdfb7c63105c7e1c956188bf381aa55 ]

Since userfaultfd doesn't implement a write operation, it is more
appropriate to open it read-only.

When userfaultfds are opened read-write like it is now, and such fd is
passed from one process to another, SELinux will check both read and
write permissions for the target process, even though it can't actually
do any write operation on the fd later.

Inspired by the following bug report, which has hit the SELinux scenario
described above:
https://bugzilla.redhat.com/show_bug.cgi?id=1974559

Reported-by: Robert O'Callahan &lt;roc@ocallahan.org&gt;
Fixes: 86039bd3b4e6 ("userfaultfd: add new syscall to provide memory externalization")
Signed-off-by: Ondrej Mosnacek &lt;omosnace@redhat.com&gt;
Acked-by: Peter Xu &lt;peterx@redhat.com&gt;
Acked-by: Christian Brauner (Microsoft) &lt;brauner@kernel.org&gt;
Signed-off-by: Paul Moore &lt;paul@paul-moore.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>userfaultfd: fix a race between writeprotect and exit_mmap()</title>
<updated>2021-10-27T07:56:51+00:00</updated>
<author>
<name>Nadav Amit</name>
<email>namit@vmware.com</email>
</author>
<published>2021-10-18T22:15:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=3cda4bfffd4f755645577aaa9e96a606657b4525'/>
<id>urn:sha1:3cda4bfffd4f755645577aaa9e96a606657b4525</id>
<content type='text'>
commit cb185d5f1ebf900f4ae3bf84cee212e6dd035aca upstream.

A race is possible when a process exits, its VMAs are removed by
exit_mmap() and at the same time userfaultfd_writeprotect() is called.

The race was detected by KASAN on a development kernel, but it appears
to be possible on vanilla kernels as well.

Use mmget_not_zero() to prevent the race as done in other userfaultfd
operations.

Link: https://lkml.kernel.org/r/20210921200247.25749-1-namit@vmware.com
Fixes: 63b2d4174c4ad ("userfaultfd: wp: add the writeprotect API to userfaultfd ioctl")
Signed-off-by: Nadav Amit &lt;namit@vmware.com&gt;
Tested-by: Li  Wang &lt;liwang@redhat.com&gt;
Reviewed-by: Peter Xu &lt;peterx@redhat.com&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>userfaultfd: prevent concurrent API initialization</title>
<updated>2021-09-18T11:40:17+00:00</updated>
<author>
<name>Nadav Amit</name>
<email>namit@vmware.com</email>
</author>
<published>2021-09-02T21:58:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=6afd1e053d9bd1f39cf03f2b0702ad0b06b647bd'/>
<id>urn:sha1:6afd1e053d9bd1f39cf03f2b0702ad0b06b647bd</id>
<content type='text'>
[ Upstream commit 22e5fe2a2a279d9a6fcbdfb4dffe73821bef1c90 ]

userfaultfd assumes that the enabled features are set once and never
changed after UFFDIO_API ioctl succeeded.

However, currently, UFFDIO_API can be called concurrently from two
different threads, succeed on both threads and leave userfaultfd's
features in non-deterministic state.  Theoretically, other uffd operations
(ioctl's and page-faults) can be dispatched while adversely affected by
such changes of features.

Moreover, the writes to ctx-&gt;state and ctx-&gt;features are not ordered,
which can - theoretically, again - let userfaultfd_ioctl() think that
userfaultfd API completed, while the features are still not initialized.

To avoid races, it is arguably best to get rid of ctx-&gt;state.  Since there
are only 2 states, record the API initialization in ctx-&gt;features as the
uppermost bit and remove ctx-&gt;state.

Link: https://lkml.kernel.org/r/20210808020724.1022515-3-namit@vmware.com
Fixes: 9cd75c3cd4c3d ("userfaultfd: non-cooperative: add ability to report non-PF events from uffd descriptor")
Signed-off-by: Nadav Amit &lt;namit@vmware.com&gt;
Cc: Alexander Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: Axel Rasmussen &lt;axelrasmussen@google.com&gt;
Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Cc: Mike Rapoport &lt;rppt@linux.vnet.ibm.com&gt;
Cc: Peter Xu &lt;peterx@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>userfaultfd: do not untag user pointers</title>
<updated>2021-07-28T12:35:46+00:00</updated>
<author>
<name>Peter Collingbourne</name>
<email>pcc@google.com</email>
</author>
<published>2021-07-23T22:50:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=0b591c020d280cc0cd92d84c24a5be89b4c57033'/>
<id>urn:sha1:0b591c020d280cc0cd92d84c24a5be89b4c57033</id>
<content type='text'>
commit e71e2ace5721a8b921dca18b045069e7bb411277 upstream.

Patch series "userfaultfd: do not untag user pointers", v5.

If a user program uses userfaultfd on ranges of heap memory, it may end
up passing a tagged pointer to the kernel in the range.start field of
the UFFDIO_REGISTER ioctl.  This can happen when using an MTE-capable
allocator, or on Android if using the Tagged Pointers feature for MTE
readiness [1].

When a fault subsequently occurs, the tag is stripped from the fault
address returned to the application in the fault.address field of struct
uffd_msg.  However, from the application's perspective, the tagged
address *is* the memory address, so if the application is unaware of
memory tags, it may get confused by receiving an address that is, from
its point of view, outside of the bounds of the allocation.  We observed
this behavior in the kselftest for userfaultfd [2] but other
applications could have the same problem.

Address this by not untagging pointers passed to the userfaultfd ioctls.
Instead, let the system call fail.  Also change the kselftest to use
mmap so that it doesn't encounter this problem.

[1] https://source.android.com/devices/tech/debug/tagged-pointers
[2] tools/testing/selftests/vm/userfaultfd.c

This patch (of 2):

Do not untag pointers passed to the userfaultfd ioctls.  Instead, let
the system call fail.  This will provide an early indication of problems
with tag-unaware userspace code instead of letting the code get confused
later, and is consistent with how we decided to handle brk/mmap/mremap
in commit dcde237319e6 ("mm: Avoid creating virtual address aliases in
brk()/mmap()/mremap()"), as well as being consistent with the existing
tagged address ABI documentation relating to how ioctl arguments are
handled.

The code change is a revert of commit 7d0325749a6c ("userfaultfd: untag
user pointers") plus some fixups to some additional calls to
validate_range that have appeared since then.

[1] https://source.android.com/devices/tech/debug/tagged-pointers
[2] tools/testing/selftests/vm/userfaultfd.c

Link: https://lkml.kernel.org/r/20210714195437.118982-1-pcc@google.com
Link: https://lkml.kernel.org/r/20210714195437.118982-2-pcc@google.com
Link: https://linux-review.googlesource.com/id/I761aa9f0344454c482b83fcfcce547db0a25501b
Fixes: 63f0c6037965 ("arm64: Introduce prctl() options to control the tagged user addresses ABI")
Signed-off-by: Peter Collingbourne &lt;pcc@google.com&gt;
Reviewed-by: Andrey Konovalov &lt;andreyknvl@gmail.com&gt;
Reviewed-by: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Cc: Alistair Delva &lt;adelva@google.com&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: Dave Martin &lt;Dave.Martin@arm.com&gt;
Cc: Evgenii Stepanov &lt;eugenis@google.com&gt;
Cc: Lokesh Gidra &lt;lokeshgidra@google.com&gt;
Cc: Mitch Phillips &lt;mitchp@google.com&gt;
Cc: Vincenzo Frascino &lt;vincenzo.frascino@arm.com&gt;
Cc: Will Deacon &lt;will@kernel.org&gt;
Cc: William McVicker &lt;willmcvicker@google.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;	[5.4]
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>mm: remove the now-unnecessary mmget_still_valid() hack</title>
<updated>2020-10-16T18:11:22+00:00</updated>
<author>
<name>Jann Horn</name>
<email>jannh@google.com</email>
</author>
<published>2020-10-16T03:13:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=4d45e75a9955ade5c2f49bd96fc4173b2cec9a72'/>
<id>urn:sha1:4d45e75a9955ade5c2f49bd96fc4173b2cec9a72</id>
<content type='text'>
The preceding patches have ensured that core dumping properly takes the
mmap_lock.  Thanks to that, we can now remove mmget_still_valid() and all
its users.

Signed-off-by: Jann Horn &lt;jannh@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Acked-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: Alexander Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: "Eric W . Biederman" &lt;ebiederm@xmission.com&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Link: http://lkml.kernel.org/r/20200827114932.3572699-8-jannh@google.com
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>Merge tag 'locking-urgent-2020-08-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip</title>
<updated>2020-08-11T02:07:44+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2020-08-11T02:07:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=97d052ea3fa853b9aabcc4baca1a605cb1188611'/>
<id>urn:sha1:97d052ea3fa853b9aabcc4baca1a605cb1188611</id>
<content type='text'>
Pull locking updates from Thomas Gleixner:
 "A set of locking fixes and updates:

   - Untangle the header spaghetti which causes build failures in
     various situations caused by the lockdep additions to seqcount to
     validate that the write side critical sections are non-preemptible.

   - The seqcount associated lock debug addons which were blocked by the
     above fallout.

     seqcount writers contrary to seqlock writers must be externally
     serialized, which usually happens via locking - except for strict
     per CPU seqcounts. As the lock is not part of the seqcount, lockdep
     cannot validate that the lock is held.

     This new debug mechanism adds the concept of associated locks.
     sequence count has now lock type variants and corresponding
     initializers which take a pointer to the associated lock used for
     writer serialization. If lockdep is enabled the pointer is stored
     and write_seqcount_begin() has a lockdep assertion to validate that
     the lock is held.

     Aside of the type and the initializer no other code changes are
     required at the seqcount usage sites. The rest of the seqcount API
     is unchanged and determines the type at compile time with the help
     of _Generic which is possible now that the minimal GCC version has
     been moved up.

     Adding this lockdep coverage unearthed a handful of seqcount bugs
     which have been addressed already independent of this.

     While generally useful this comes with a Trojan Horse twist: On RT
     kernels the write side critical section can become preemtible if
     the writers are serialized by an associated lock, which leads to
     the well known reader preempts writer livelock. RT prevents this by
     storing the associated lock pointer independent of lockdep in the
     seqcount and changing the reader side to block on the lock when a
     reader detects that a writer is in the write side critical section.

   - Conversion of seqcount usage sites to associated types and
     initializers"

* tag 'locking-urgent-2020-08-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits)
  locking/seqlock, headers: Untangle the spaghetti monster
  locking, arch/ia64: Reduce &lt;asm/smp.h&gt; header dependencies by moving XTP bits into the new &lt;asm/xtp.h&gt; header
  x86/headers: Remove APIC headers from &lt;asm/smp.h&gt;
  seqcount: More consistent seqprop names
  seqcount: Compress SEQCNT_LOCKNAME_ZERO()
  seqlock: Fold seqcount_LOCKNAME_init() definition
  seqlock: Fold seqcount_LOCKNAME_t definition
  seqlock: s/__SEQ_LOCKDEP/__SEQ_LOCK/g
  hrtimer: Use sequence counter with associated raw spinlock
  kvm/eventfd: Use sequence counter with associated spinlock
  userfaultfd: Use sequence counter with associated spinlock
  NFSv4: Use sequence counter with associated spinlock
  iocost: Use sequence counter with associated spinlock
  raid5: Use sequence counter with associated spinlock
  vfs: Use sequence counter with associated spinlock
  timekeeping: Use sequence counter with associated raw spinlock
  xfrm: policy: Use sequence counters with associated lock
  netfilter: nft_set_rbtree: Use sequence counter with associated rwlock
  netfilter: conntrack: Use sequence counter with associated spinlock
  sched: tasks: Use sequence counter with associated spinlock
  ...
</content>
</entry>
<entry>
<title>userfaultfd: simplify fault handling</title>
<updated>2020-08-03T18:25:16+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2020-08-02T17:42:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=f9bf352224d7d4612b55b8d0cd0eaa981a3246cf'/>
<id>urn:sha1:f9bf352224d7d4612b55b8d0cd0eaa981a3246cf</id>
<content type='text'>
Instead of waiting in a loop for the userfaultfd condition to become
true, just wait once and return VM_FAULT_RETRY.

We've already dropped the mmap lock, we know we can't really
successfully handle the fault at this point and the caller will have to
retry anyway.  So there's no point in making the wait any more
complicated than it needs to be - just schedule away.

And once you don't have that complexity with explicit looping, you can
also just lose all the 'userfaultfd_signal_pending()' complexity,
because once we've set the correct process sleeping state, and don't
loop, the act of scheduling itself will be checking if there are any
pending signals before going to sleep.

We can also drop the VM_FAULT_MAJOR games, since we'll be treating all
retried faults as major soon anyway (series to regularize and share more
of fault handling across architectures in a separate series by Peter Xu,
and in the meantime we won't worry about the possible minor - I'll be
here all week, try the veal - accounting difference).

Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: Peter Xu &lt;peterx@redhat.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>userfaultfd: Use sequence counter with associated spinlock</title>
<updated>2020-07-29T14:14:28+00:00</updated>
<author>
<name>Ahmed S. Darwish</name>
<email>a.darwish@linutronix.de</email>
</author>
<published>2020-07-20T15:55:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=2ca97ac8bdcc31fdc868f40c73c017f0a648dc07'/>
<id>urn:sha1:2ca97ac8bdcc31fdc868f40c73c017f0a648dc07</id>
<content type='text'>
A sequence counter write side critical section must be protected by some
form of locking to serialize writers. A plain seqcount_t does not
contain the information of which lock must be held when entering a write
side critical section.

Use the new seqcount_spinlock_t data type, which allows to associate a
spinlock with the sequence counter. This enables lockdep to verify that
the spinlock used for writer serialization is held when the write side
critical section is entered.

If lockdep is disabled this lock association is compiled out and has
neither storage size nor runtime overhead.

Signed-off-by: Ahmed S. Darwish &lt;a.darwish@linutronix.de&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://lkml.kernel.org/r/20200720155530.1173732-23-a.darwish@linutronix.de
</content>
</entry>
<entry>
<title>mmap locking API: convert mmap_sem comments</title>
<updated>2020-06-09T16:39:14+00:00</updated>
<author>
<name>Michel Lespinasse</name>
<email>walken@google.com</email>
</author>
<published>2020-06-09T04:33:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=c1e8d7c6a7a682e1405e3e242d32fc377fd196ff'/>
<id>urn:sha1:c1e8d7c6a7a682e1405e3e242d32fc377fd196ff</id>
<content type='text'>
Convert comments that reference mmap_sem to reference mmap_lock instead.

[akpm@linux-foundation.org: fix up linux-next leftovers]
[akpm@linux-foundation.org: s/lockaphore/lock/, per Vlastimil]
[akpm@linux-foundation.org: more linux-next fixups, per Michel]

Signed-off-by: Michel Lespinasse &lt;walken@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Reviewed-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Reviewed-by: Daniel Jordan &lt;daniel.m.jordan@oracle.com&gt;
Cc: Davidlohr Bueso &lt;dbueso@suse.de&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Jason Gunthorpe &lt;jgg@ziepe.ca&gt;
Cc: Jerome Glisse &lt;jglisse@redhat.com&gt;
Cc: John Hubbard &lt;jhubbard@nvidia.com&gt;
Cc: Laurent Dufour &lt;ldufour@linux.ibm.com&gt;
Cc: Liam Howlett &lt;Liam.Howlett@oracle.com&gt;
Cc: Matthew Wilcox &lt;willy@infradead.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Ying Han &lt;yinghan@google.com&gt;
Link: http://lkml.kernel.org/r/20200520052908.204642-13-walken@google.com
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mmap locking API: convert mmap_sem API comments</title>
<updated>2020-06-09T16:39:14+00:00</updated>
<author>
<name>Michel Lespinasse</name>
<email>walken@google.com</email>
</author>
<published>2020-06-09T04:33:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=3e4e28c5a8f01ee4174d639e36ed155ade489a6f'/>
<id>urn:sha1:3e4e28c5a8f01ee4174d639e36ed155ade489a6f</id>
<content type='text'>
Convert comments that reference old mmap_sem APIs to reference
corresponding new mmap locking APIs instead.

Signed-off-by: Michel Lespinasse &lt;walken@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Reviewed-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Reviewed-by: Davidlohr Bueso &lt;dbueso@suse.de&gt;
Reviewed-by: Daniel Jordan &lt;daniel.m.jordan@oracle.com&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Jason Gunthorpe &lt;jgg@ziepe.ca&gt;
Cc: Jerome Glisse &lt;jglisse@redhat.com&gt;
Cc: John Hubbard &lt;jhubbard@nvidia.com&gt;
Cc: Laurent Dufour &lt;ldufour@linux.ibm.com&gt;
Cc: Liam Howlett &lt;Liam.Howlett@oracle.com&gt;
Cc: Matthew Wilcox &lt;willy@infradead.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Ying Han &lt;yinghan@google.com&gt;
Link: http://lkml.kernel.org/r/20200520052908.204642-12-walken@google.com
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
</feed>
