summaryrefslogtreecommitdiff
path: root/mm/huge_memory.c
diff options
context:
space:
mode:
authorKirill A. Shutemov <kirill.shutemov@linux.intel.com>2017-04-14 00:56:20 +0300
committerGreg Kroah-Hartman <gregkh@linuxfoundation.org>2017-12-16 12:33:50 +0300
commit47c53ccca4a0984d45b4157447b8eeca47d9a51f (patch)
treea442e154492f1310edbab38d15054af4a8b2db1a /mm/huge_memory.c
parentd4443b53732e5e98220ec8777d11e3c9eb87f5fd (diff)
downloadlinux-47c53ccca4a0984d45b4157447b8eeca47d9a51f.tar.xz
thp: fix MADV_DONTNEED vs. numa balancing race
commit ced108037c2aa542b3ed8b7afd1576064ad1362a upstream. In case prot_numa, we are under down_read(mmap_sem). It's critical to not clear pmd intermittently to avoid race with MADV_DONTNEED which is also under down_read(mmap_sem): CPU0: CPU1: change_huge_pmd(prot_numa=1) pmdp_huge_get_and_clear_notify() madvise_dontneed() zap_pmd_range() pmd_trans_huge(*pmd) == 0 (without ptl) // skip the pmd set_pmd_at(); // pmd is re-established The race makes MADV_DONTNEED miss the huge pmd and don't clear it which may break userspace. Found by code analysis, never saw triggered. Link: http://lkml.kernel.org/r/20170302151034.27829-3-kirill.shutemov@linux.intel.com Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [jwang: adjust context for 4.4] Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Diffstat (limited to 'mm/huge_memory.c')
-rw-r--r--mm/huge_memory.c34
1 files changed, 33 insertions, 1 deletions
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index ea013cb2c3f2..0127b788272f 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1588,7 +1588,39 @@ int change_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
if (prot_numa && pmd_protnone(*pmd))
goto unlock;
- entry = pmdp_huge_get_and_clear_notify(mm, addr, pmd);
+ /*
+ * In case prot_numa, we are under down_read(mmap_sem). It's critical
+ * to not clear pmd intermittently to avoid race with MADV_DONTNEED
+ * which is also under down_read(mmap_sem):
+ *
+ * CPU0: CPU1:
+ * change_huge_pmd(prot_numa=1)
+ * pmdp_huge_get_and_clear_notify()
+ * madvise_dontneed()
+ * zap_pmd_range()
+ * pmd_trans_huge(*pmd) == 0 (without ptl)
+ * // skip the pmd
+ * set_pmd_at();
+ * // pmd is re-established
+ *
+ * The race makes MADV_DONTNEED miss the huge pmd and don't clear it
+ * which may break userspace.
+ *
+ * pmdp_invalidate() is required to make sure we don't miss
+ * dirty/young flags set by hardware.
+ */
+ entry = *pmd;
+ pmdp_invalidate(vma, addr, pmd);
+
+ /*
+ * Recover dirty/young flags. It relies on pmdp_invalidate to not
+ * corrupt them.
+ */
+ if (pmd_dirty(*pmd))
+ entry = pmd_mkdirty(entry);
+ if (pmd_young(*pmd))
+ entry = pmd_mkyoung(entry);
+
entry = pmd_modify(entry, newprot);
if (preserve_write)
entry = pmd_mkwrite(entry);