<feed xmlns='http://www.w3.org/2005/Atom'>
<title>BMC/Intel-BMC/linux.git/include/linux/mm.h, branch dev-4.7</title>
<subtitle>Intel OpenBMC Linux kernel source tree (mirror)</subtitle>
<id>https://git.radix-linux.su/BMC/Intel-BMC/linux.git/atom?h=dev-4.7</id>
<link rel='self' href='https://git.radix-linux.su/BMC/Intel-BMC/linux.git/atom?h=dev-4.7'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/BMC/Intel-BMC/linux.git/'/>
<updated>2016-10-20T08:02:21+00:00</updated>
<entry>
<title>mm: remove gup_flags FOLL_WRITE games from __get_user_pages()</title>
<updated>2016-10-20T08:02:21+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2016-10-13T20:07:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/BMC/Intel-BMC/linux.git/commit/?id=30712aff8011b67be3e7e31ea079cfc9d3effef4'/>
<id>urn:sha1:30712aff8011b67be3e7e31ea079cfc9d3effef4</id>
<content type='text'>
commit 19be0eaffa3ac7d8eb6784ad9bdbc7d67ed8e619 upstream.

This is an ancient bug that was actually attempted to be fixed once
(badly) by me eleven years ago in commit 4ceb5db9757a ("Fix
get_user_pages() race for write access") but that was then undone due to
problems on s390 by commit f33ea7f404e5 ("fix get_user_pages bug").

In the meantime, the s390 situation has long been fixed, and we can now
fix it by checking the pte_dirty() bit properly (and do it better).  The
s390 dirty bit was implemented in abf09bed3cce ("s390/mm: implement
software dirty bits") which made it into v3.9.  Earlier kernels will
have to look at the page state itself.

Also, the VM has become more scalable, and what used a purely
theoretical race back then has become easier to trigger.

To fix it, we introduce a new internal FOLL_COW flag to mark the "yes,
we already did a COW" rather than play racy games with FOLL_WRITE that
is very fundamental, and then use the pte dirty flag to validate that
the FOLL_COW flag is still valid.

Reported-and-tested-by: Phil "not Paul" Oester &lt;kernel@linuxace.com&gt;
Acked-by: Hugh Dickins &lt;hughd@google.com&gt;
Reviewed-by: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Andy Lutomirski &lt;luto@kernel.org&gt;
Cc: Kees Cook &lt;keescook@chromium.org&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Willy Tarreau &lt;w@1wt.eu&gt;
Cc: Nick Piggin &lt;npiggin@gmail.com&gt;
Cc: Greg Thelen &lt;gthelen@google.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>mm: introduce get_task_exe_file</title>
<updated>2016-09-24T08:09:28+00:00</updated>
<author>
<name>Mateusz Guzik</name>
<email>mguzik@redhat.com</email>
</author>
<published>2016-08-23T14:20:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/BMC/Intel-BMC/linux.git/commit/?id=2daa86f541b8fa873c56b3cc2c8dc32e03b864fc'/>
<id>urn:sha1:2daa86f541b8fa873c56b3cc2c8dc32e03b864fc</id>
<content type='text'>
commit cd81a9170e69e018bbaba547c1fd85a585f5697a upstream.

For more convenient access if one has a pointer to the task.

As a minor nit take advantage of the fact that only task lock + rcu are
needed to safely grab -&gt;exe_file. This saves mm refcount dance.

Use the helper in proc_exe_link.

Signed-off-by: Mateusz Guzik &lt;mguzik@redhat.com&gt;
Acked-by: Konstantin Khlebnikov &lt;khlebnikov@yandex-team.ru&gt;
Acked-by: Richard Guy Briggs &lt;rgb@redhat.com&gt;
Signed-off-by: Paul Moore &lt;paul@paul-moore.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>Revert "mm: make faultaround produce old ptes"</title>
<updated>2016-06-25T00:23:52+00:00</updated>
<author>
<name>Kirill A. Shutemov</name>
<email>kirill.shutemov@linux.intel.com</email>
</author>
<published>2016-06-24T21:49:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/BMC/Intel-BMC/linux.git/commit/?id=315d09bf30c2b436a1fdac86d31c24380cd56c4f'/>
<id>urn:sha1:315d09bf30c2b436a1fdac86d31c24380cd56c4f</id>
<content type='text'>
This reverts commit 5c0a85fad949212b3e059692deecdeed74ae7ec7.

The commit causes ~6% regression in unixbench.

Let's revert it for now and consider other solution for reclaim problem
later.

Link: http://lkml.kernel.org/r/1465893750-44080-2-git-send-email-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Reported-by: "Huang, Ying" &lt;ying.huang@intel.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Minchan Kim &lt;minchan@kernel.org&gt;
Cc: Vinayak Menon &lt;vinmenon@codeaurora.org&gt;
Cc: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: remove more IS_ERR_VALUE abuses</title>
<updated>2016-05-27T22:57:31+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2016-05-27T22:57:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/BMC/Intel-BMC/linux.git/commit/?id=5d22fc25d4fc8096d2d7df27ea1893d4e055e764'/>
<id>urn:sha1:5d22fc25d4fc8096d2d7df27ea1893d4e055e764</id>
<content type='text'>
The do_brk() and vm_brk() return value was "unsigned long" and returned
the starting address on success, and an error value on failure.  The
reasons are entirely historical, and go back to it basically behaving
like the mmap() interface does.

However, nobody actually wanted that interface, and it causes totally
pointless IS_ERR_VALUE() confusion.

What every single caller actually wants is just the simpler integer
return of zero for success and negative error number on failure.

So just convert to that much clearer and more common calling convention,
and get rid of all the IS_ERR_VALUE() uses wrt vm_brk().

Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>Merge tag 'dax-locking-for-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm</title>
<updated>2016-05-27T03:00:28+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2016-05-27T03:00:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/BMC/Intel-BMC/linux.git/commit/?id=478a1469a7d27fe6b2f85fc801ecdeb8afc836e6'/>
<id>urn:sha1:478a1469a7d27fe6b2f85fc801ecdeb8afc836e6</id>
<content type='text'>
Pull DAX locking updates from Ross Zwisler:
 "Filesystem DAX locking for 4.7

   - We use a bit in an exceptional radix tree entry as a lock bit and
     use it similarly to how page lock is used for normal faults.  This
     fixes races between hole instantiation and read faults of the same
     index.

   - Filesystem DAX PMD faults are disabled, and will be re-enabled when
     PMD locking is implemented"

* tag 'dax-locking-for-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
  dax: Remove i_mmap_lock protection
  dax: Use radix tree entry lock to protect cow faults
  dax: New fault locking
  dax: Allow DAX code to replace exceptional entries
  dax: Define DAX lock bit for radix tree exceptional entry
  dax: Make huge page handling depend of CONFIG_BROKEN
  dax: Fix condition for filling of PMD holes
</content>
</entry>
<entry>
<title>mm: make vm_brk killable</title>
<updated>2016-05-24T00:04:14+00:00</updated>
<author>
<name>Michal Hocko</name>
<email>mhocko@suse.com</email>
</author>
<published>2016-05-23T23:25:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/BMC/Intel-BMC/linux.git/commit/?id=2d6c928241add2848e4eebfce407e95164229976'/>
<id>urn:sha1:2d6c928241add2848e4eebfce407e95164229976</id>
<content type='text'>
Now that all the callers handle vm_brk failure we can change it wait for
mmap_sem killable to help oom_reaper to not get blocked just because
vm_brk gets blocked behind mmap_sem readers.

Signed-off-by: Michal Hocko &lt;mhocko@suse.com&gt;
Acked-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: "Kirill A. Shutemov" &lt;kirill.shutemov@linux.intel.com&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: make vm_mmap killable</title>
<updated>2016-05-24T00:04:14+00:00</updated>
<author>
<name>Michal Hocko</name>
<email>mhocko@suse.com</email>
</author>
<published>2016-05-23T23:25:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/BMC/Intel-BMC/linux.git/commit/?id=9fbeb5ab59a2b2a09cca2eb68283e7a090d4b98d'/>
<id>urn:sha1:9fbeb5ab59a2b2a09cca2eb68283e7a090d4b98d</id>
<content type='text'>
All the callers of vm_mmap seem to check for the failure already and
bail out in one way or another on the error which means that we can
change it to use killable version of vm_mmap_pgoff and return -EINTR if
the current task gets killed while waiting for mmap_sem.  This also
means that vm_mmap_pgoff can be killable by default and drop the
additional parameter.

This will help in the OOM conditions when the oom victim might be stuck
waiting for the mmap_sem for write which in turn can block oom_reaper
which relies on the mmap_sem for read to make a forward progress and
reclaim the address space of the victim.

Please note that load_elf_binary is ignoring vm_mmap error for
current-&gt;personality &amp; MMAP_PAGE_ZERO case but that shouldn't be a
problem because the address is not used anywhere and we never return to
the userspace if we got killed.

Signed-off-by: Michal Hocko &lt;mhocko@suse.com&gt;
Acked-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: "Kirill A. Shutemov" &lt;kirill.shutemov@linux.intel.com&gt;
Cc: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: page_is_guard(): return false when page_ext arrays are not allocated yet</title>
<updated>2016-05-21T00:58:30+00:00</updated>
<author>
<name>Yang Shi</name>
<email>yang.shi@linaro.org</email>
</author>
<published>2016-05-20T23:58:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/BMC/Intel-BMC/linux.git/commit/?id=0bb2fd13b69abfd88880f356903b5c7ca36d5eea'/>
<id>urn:sha1:0bb2fd13b69abfd88880f356903b5c7ca36d5eea</id>
<content type='text'>
When enabling the below kernel configs:

CONFIG_DEFERRED_STRUCT_PAGE_INIT
CONFIG_DEBUG_PAGEALLOC
CONFIG_PAGE_EXTENSION
CONFIG_DEBUG_VM

kernel bootup may fail due to the following oops:

  BUG: unable to handle kernel NULL pointer dereference at           (null)
  IP: [&lt;ffffffff8118d982&gt;] free_pcppages_bulk+0x2d2/0x8d0
  PGD 0
  Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
  Modules linked in:
  CPU: 11 PID: 106 Comm: pgdatinit1 Not tainted 4.6.0-rc5-next-20160427 #26
  Hardware name: Intel Corporation S5520HC/S5520HC, BIOS S5500.86B.01.10.0025.030220091519 03/02/2009
  task: ffff88017c080040 ti: ffff88017c084000 task.ti: ffff88017c084000
  RIP: 0010:[&lt;ffffffff8118d982&gt;]  [&lt;ffffffff8118d982&gt;] free_pcppages_bulk+0x2d2/0x8d0
  RSP: 0000:ffff88017c087c48  EFLAGS: 00010046
  RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000001
  RDX: 0000000000000980 RSI: 0000000000000080 RDI: 0000000000660401
  RBP: ffff88017c087cd0 R08: 0000000000000401 R09: 0000000000000009
  R10: ffff88017c080040 R11: 000000000000000a R12: 0000000000000400
  R13: ffffea0019810000 R14: ffffea0019810040 R15: ffff88066cfe6080
  FS:  0000000000000000(0000) GS:ffff88066cd40000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000000000000 CR3: 0000000002406000 CR4: 00000000000006e0
  Call Trace:
    free_hot_cold_page+0x192/0x1d0
    __free_pages+0x5c/0x90
    __free_pages_boot_core+0x11a/0x14e
    deferred_free_range+0x50/0x62
    deferred_init_memmap+0x220/0x3c3
    kthread+0xf8/0x110
    ret_from_fork+0x22/0x40
  Code: 49 89 d4 48 c1 e0 06 49 01 c5 e9 de fe ff ff 4c 89 f7 44 89 4d b8 4c 89 45 c0 44 89 5d c8 48 89 4d d0 e8 62 c7 07 00 48 8b 4d d0 &lt;48&gt; 8b 00 44 8b 5d c8 4c 8b 45 c0 44 8b 4d b8 a8 02 0f 84 05 ff
  RIP  [&lt;ffffffff8118d982&gt;] free_pcppages_bulk+0x2d2/0x8d0
   RSP &lt;ffff88017c087c48&gt;
  CR2: 0000000000000000

The problem is lookup_page_ext() returns NULL then page_is_guard() tried
to access it in page freeing.

page_is_guard() depends on PAGE_EXT_DEBUG_GUARD bit of page extension
flag, but freeing page might reach here before the page_ext arrays are
allocated when feeding a range of pages to the allocator for the first
time during bootup or memory hotplug.

When it returns NULL, page_is_guard() should just return false instead
of checking PAGE_EXT_DEBUG_GUARD unconditionally.

Link: http://lkml.kernel.org/r/1463610225-29060-1-git-send-email-yang.shi@linaro.org
Signed-off-by: Yang Shi &lt;yang.shi@linaro.org&gt;
Cc: Joonsoo Kim &lt;iamjoonsoo.kim@lge.com&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: make faultaround produce old ptes</title>
<updated>2016-05-21T00:58:30+00:00</updated>
<author>
<name>Kirill A. Shutemov</name>
<email>kirill.shutemov@linux.intel.com</email>
</author>
<published>2016-05-20T23:58:41+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/BMC/Intel-BMC/linux.git/commit/?id=5c0a85fad949212b3e059692deecdeed74ae7ec7'/>
<id>urn:sha1:5c0a85fad949212b3e059692deecdeed74ae7ec7</id>
<content type='text'>
Currently, faultaround code produces young pte.  This can screw up
vmscan behaviour[1], as it makes vmscan think that these pages are hot
and not push them out on first round.

During sparse file access faultaround gets more pages mapped and all of
them are young.  Under memory pressure, this makes vmscan swap out anon
pages instead, or to drop other page cache pages which otherwise stay
resident.

Modify faultaround to produce old ptes, so they can easily be reclaimed
under memory pressure.

This can to some extend defeat the purpose of faultaround on machines
without hardware accessed bit as it will not help us with reducing the
number of minor page faults.

We may want to disable faultaround on such machines altogether, but
that's subject for separate patchset.

Minchan:
 "I tested 512M mmap sequential word read test on non-HW access bit
  system (i.e., ARM) and confirmed it doesn't increase minor fault any
  more.

  old: 4096 fault_around
  minor fault: 131291
  elapsed time: 6747645 usec

  new: 65536 fault_around
  minor fault: 131291
  elapsed time: 6709263 usec

  0.56% benefit"

[1] https://lkml.kernel.org/r/1460992636-711-1-git-send-email-vinmenon@codeaurora.org

Link: http://lkml.kernel.org/r/1463488366-47723-1-git-send-email-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Acked-by: Minchan Kim &lt;minchan@kernel.org&gt;
Tested-by: Minchan Kim &lt;minchan@kernel.org&gt;
Acked-by: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Cc: Vinayak Menon &lt;vinmenon@codeaurora.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: use phys_addr_t for reserve_bootmem_region() arguments</title>
<updated>2016-05-21T00:58:30+00:00</updated>
<author>
<name>Stefan Bader</name>
<email>stefan.bader@canonical.com</email>
</author>
<published>2016-05-20T23:58:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/BMC/Intel-BMC/linux.git/commit/?id=4b50bcc7eda4d3cc9e3f2a0aa60e590fedf728c5'/>
<id>urn:sha1:4b50bcc7eda4d3cc9e3f2a0aa60e590fedf728c5</id>
<content type='text'>
Since commit 92923ca3aace ("mm: meminit: only set page reserved in the
memblock region") the reserved bit is set on reserved memblock regions.
However start and end address are passed as unsigned long.  This is only
32bit on i386, so it can end up marking the wrong pages reserved for
ranges at 4GB and above.

This was observed on a 32bit Xen dom0 which was booted with initial
memory set to a value below 4G but allowing to balloon in memory
(dom0_mem=1024M for example).  This would define a reserved bootmem
region for the additional memory (for example on a 8GB system there was
a reverved region covering the 4GB-8GB range).  But since the addresses
were passed on as unsigned long, this was actually marking all pages
from 0 to 4GB as reserved.

Fixes: 92923ca3aacef63 ("mm: meminit: only set page reserved in the memblock region")
Link: http://lkml.kernel.org/r/1463491221-10573-1-git-send-email-stefan.bader@canonical.com
Signed-off-by: Stefan Bader &lt;stefan.bader@canonical.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;	[4.2+]
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
</feed>
