summaryrefslogtreecommitdiff
path: root/Documentation/admin-guide/clearing-warn-once.rst
diff options
context:
space:
mode:
authorNicholas Piggin <npiggin@gmail.com>2020-09-14 07:52:18 +0300
committerMichael Ellerman <mpe@ellerman.id.au>2020-09-16 05:24:37 +0300
commitbafb056ce27940c9994ea905336aa8f27b4f7275 (patch)
tree63f421548c15de591847df415644261ac26ab642 /Documentation/admin-guide/clearing-warn-once.rst
parent66acd46080bd9e5ad2be4b0eb1d498d5145d058e (diff)
downloadlinux-bafb056ce27940c9994ea905336aa8f27b4f7275.tar.xz
sparc64: remove mm_cpumask clearing to fix kthread_use_mm race
The de facto (and apparently uncommented) standard for using an mm had, thanks to this code in sparc if nothing else, been that you must have a reference on mm_users *and that reference must have been obtained with mmget()*, i.e., from a thread with a reference to mm_users that had used the mm. The introduction of mmget_not_zero() in commit d2005e3f41d4 ("userfaultfd: don't pin the user memory in userfaultfd_file_create()") allowed mm_count holders to aoperate on user mappings asynchronously from the actual threads using the mm, but they were not to load those mappings into their TLB (i.e., walking vmas and page tables is okay, kthread_use_mm() is not). io_uring 2b188cc1bb857 ("Add io_uring IO interface") added code which does a kthread_use_mm() from a mmget_not_zero() refcount. The problem with this is code which previously assumed mm == current->mm and mm->mm_users == 1 implies the mm will remain single-threaded at least until this thread creates another mm_users reference, has now broken. arch/sparc/kernel/smp_64.c: if (atomic_read(&mm->mm_users) == 1) { cpumask_copy(mm_cpumask(mm), cpumask_of(cpu)); goto local_flush_and_out; } vs fs/io_uring.c if (unlikely(!(ctx->flags & IORING_SETUP_SQPOLL) || !mmget_not_zero(ctx->sqo_mm))) return -EFAULT; kthread_use_mm(ctx->sqo_mm); mmget_not_zero() could come in right after the mm_users == 1 test, then kthread_use_mm() which sets its CPU in the mm_cpumask. That update could be lost if cpumask_copy() occurs afterward. I propose we fix this by allowing mmget_not_zero() to be a first-class reference, and not have this obscure undocumented and unchecked restriction. The basic fix for sparc64 is to remove its mm_cpumask clearing code. The optimisation could be effectively restored by sending IPIs to mm_cpumask members and having them remove themselves from mm_cpumask. This is more tricky so I leave it as an exercise for someone with a sparc64 SMP. powerpc has a (currently similarly broken) example. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200914045219.3736466-4-npiggin@gmail.com
Diffstat (limited to 'Documentation/admin-guide/clearing-warn-once.rst')
0 files changed, 0 insertions, 0 deletions