<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/mm, branch v4.0.8</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v4.0.8</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v4.0.8'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2015-06-23T00:03:35+00:00</updated>
<entry>
<title>mm/memory_hotplug.c: set zone-&gt;wait_table to null after freeing it</title>
<updated>2015-06-23T00:03:35+00:00</updated>
<author>
<name>Gu Zheng</name>
<email>guz.fnst@cn.fujitsu.com</email>
</author>
<published>2015-06-10T18:14:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=5dbedee441b11de887d945c655ace5f6215f5d4c'/>
<id>urn:sha1:5dbedee441b11de887d945c655ace5f6215f5d4c</id>
<content type='text'>
commit 85bd839983778fcd0c1c043327b14a046e979b39 upstream.

Izumi found the following oops when hot re-adding a node:

    BUG: unable to handle kernel paging request at ffffc90008963690
    IP: __wake_up_bit+0x20/0x70
    Oops: 0000 [#1] SMP
    CPU: 68 PID: 1237 Comm: rs:main Q:Reg Not tainted 4.1.0-rc5 #80
    Hardware name: FUJITSU PRIMEQUEST2800E/SB, BIOS PRIMEQUEST 2000 Series BIOS Version 1.87 04/28/2015
    task: ffff880838df8000 ti: ffff880017b94000 task.ti: ffff880017b94000
    RIP: 0010:[&lt;ffffffff810dff80&gt;]  [&lt;ffffffff810dff80&gt;] __wake_up_bit+0x20/0x70
    RSP: 0018:ffff880017b97be8  EFLAGS: 00010246
    RAX: ffffc90008963690 RBX: 00000000003c0000 RCX: 000000000000a4c9
    RDX: 0000000000000000 RSI: ffffea101bffd500 RDI: ffffc90008963648
    RBP: ffff880017b97c08 R08: 0000000002000020 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000000 R12: ffff8a0797c73800
    R13: ffffea101bffd500 R14: 0000000000000001 R15: 00000000003c0000
    FS:  00007fcc7ffff700(0000) GS:ffff880874800000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: ffffc90008963690 CR3: 0000000836761000 CR4: 00000000001407e0
    Call Trace:
      unlock_page+0x6d/0x70
      generic_write_end+0x53/0xb0
      xfs_vm_write_end+0x29/0x80 [xfs]
      generic_perform_write+0x10a/0x1e0
      xfs_file_buffered_aio_write+0x14d/0x3e0 [xfs]
      xfs_file_write_iter+0x79/0x120 [xfs]
      __vfs_write+0xd4/0x110
      vfs_write+0xac/0x1c0
      SyS_write+0x58/0xd0
      system_call_fastpath+0x12/0x76
    Code: 5d c3 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 48 83 ec 20 65 48 8b 04 25 28 00 00 00 48 89 45 f8 31 c0 48 8d 47 48 &lt;48&gt; 39 47 48 48 c7 45 e8 00 00 00 00 48 c7 45 f0 00 00 00 00 48
    RIP  [&lt;ffffffff810dff80&gt;] __wake_up_bit+0x20/0x70
     RSP &lt;ffff880017b97be8&gt;
    CR2: ffffc90008963690

Reproduce method (re-add a node)::
  Hot-add nodeA --&gt; remove nodeA --&gt; hot-add nodeA (panic)

This seems an use-after-free problem, and the root cause is
zone-&gt;wait_table was not set to *NULL* after free it in
try_offline_node.

When hot re-add a node, we will reuse the pgdat of it, so does the zone
struct, and when add pages to the target zone, it will init the zone
first (including the wait_table) if the zone is not initialized.  The
judgement of zone initialized is based on zone-&gt;wait_table:

	static inline bool zone_is_initialized(struct zone *zone)
	{
		return !!zone-&gt;wait_table;
	}

so if we do not set the zone-&gt;wait_table to *NULL* after free it, the
memory hotplug routine will skip the init of new zone when hot re-add
the node, and the wait_table still points to the freed memory, then we
will access the invalid address when trying to wake up the waiting
people after the i/o operation with the page is done, such as mentioned
above.

Signed-off-by: Gu Zheng &lt;guz.fnst@cn.fujitsu.com&gt;
Reported-by: Taku Izumi &lt;izumi.taku@jp.fujitsu.com&gt;
Reviewed by: Yasuaki Ishimatsu &lt;isimatu.yasuaki@jp.fujitsu.com&gt;
Cc: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Cc: Tang Chen &lt;tangchen@cn.fujitsu.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>block: discard bdi_unregister() in favour of bdi_destroy()</title>
<updated>2015-06-23T00:03:29+00:00</updated>
<author>
<name>NeilBrown</name>
<email>neilb@suse.de</email>
</author>
<published>2015-05-19T05:58:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=9e58b828f8480f951328fdb339876c22d07c8861'/>
<id>urn:sha1:9e58b828f8480f951328fdb339876c22d07c8861</id>
<content type='text'>
commit aad653a0bc09dd4ebcb5579f9f835bbae9ef2ba3 upstream.

bdi_unregister() now contains very little functionality.

It contains a "WARN_ON" if bdi-&gt;dev is NULL.  This warning is of no
real consequence as bdi-&gt;dev isn't needed by anything else in the function,
and it triggers if
   blk_cleanup_queue() -&gt; bdi_destroy()
is called before bdi_unregister, which happens since
  Commit: 6cd18e711dd8 ("block: destroy bdi before blockdev is unregistered.")

So this isn't wanted.

It also calls bdi_set_min_ratio().  This needs to be called after
writes through the bdi have all been flushed, and before the bdi is destroyed.
Calling it early is better than calling it late as it frees up a global
resource.

Calling it immediately after bdi_wb_shutdown() in bdi_destroy()
perfectly fits these requirements.

So bdi_unregister() can be discarded with the important content moved to
bdi_destroy(), as can the
  writeback_bdi_unregister
event which is already not used.

Reported-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
Fixes: c4db59d31e39 ("fs: don't reassign dirty inodes to default_backing_dev_info")
Fixes: 6cd18e711dd8 ("block: destroy bdi before blockdev is unregistered.")
Acked-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Acked-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
Tested-by: Nicholas Moulin &lt;nicholas.w.moulin@linux.intel.com&gt;
Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>mm, numa: really disable NUMA balancing by default on single node machines</title>
<updated>2015-06-06T15:21:05+00:00</updated>
<author>
<name>Mel Gorman</name>
<email>mgorman@suse.de</email>
</author>
<published>2015-05-14T22:17:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=f138997c09631f1cf6e58a85fa2dda088e3c0c49'/>
<id>urn:sha1:f138997c09631f1cf6e58a85fa2dda088e3c0c49</id>
<content type='text'>
commit b0dc2b9bb4ab782115b964310518ee0b17784277 upstream.

NUMA balancing is meant to be disabled by default on UMA machines but
the check is using nr_node_ids (highest node) instead of
num_online_nodes (online nodes).

The consequences are that a UMA machine with a node ID of 1 or higher
will enable NUMA balancing.  This will incur useless overhead due to
minor faults with the impact depending on the workload.  These are the
impact on the stats when running a kernel build on a single node machine
whose node ID happened to be 1:

  			       vanilla     patched
  NUMA base PTE updates          5113158           0
  NUMA huge PMD updates              643           0
  NUMA page range updates        5442374           0
  NUMA hint faults               2109622           0
  NUMA hint local faults         2109622           0
  NUMA hint local percent            100         100
  NUMA pages migrated                  0           0

Signed-off-by: Mel Gorman &lt;mgorman@suse.de&gt;
Reviewed-by: Rik van Riel &lt;riel@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>gfp: add __GFP_NOACCOUNT</title>
<updated>2015-06-06T15:21:05+00:00</updated>
<author>
<name>Vladimir Davydov</name>
<email>vdavydov@parallels.com</email>
</author>
<published>2015-05-14T22:16:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=4d039741b4155da5b25bc484b32439794672645f'/>
<id>urn:sha1:4d039741b4155da5b25bc484b32439794672645f</id>
<content type='text'>
commit 8f4fc071b1926d0b20336e2b3f8ab85c94c734c5 upstream.

Not all kmem allocations should be accounted to memcg.  The following
patch gives an example when accounting of a certain type of allocations to
memcg can effectively result in a memory leak.  This patch adds the
__GFP_NOACCOUNT flag which if passed to kmalloc and friends will force the
allocation to go through the root cgroup.  It will be used by the next
patch.

Note, since in case of kmemleak enabled each kmalloc implies yet another
allocation from the kmemleak_object cache, we add __GFP_NOACCOUNT to
gfp_kmemleak_mask.

Alternatively, we could introduce a per kmem cache flag disabling
accounting for all allocations of a particular kind, but (a) we would not
be able to bypass accounting for kmalloc then and (b) a kmem cache with
this flag set could not be merged with a kmem cache without this flag,
which would increase the number of global caches and therefore
fragmentation even if the memory cgroup controller is not used.

Despite its generic name, currently __GFP_NOACCOUNT disables accounting
only for kmem allocations while user page allocations are always charged.
To catch abusing of this flag, a warning is issued on an attempt of
passing it to mem_cgroup_try_charge.

Signed-off-by: Vladimir Davydov &lt;vdavydov@parallels.com&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Michal Hocko &lt;mhocko@suse.cz&gt;
Cc: Christoph Lameter &lt;cl@linux.com&gt;
Cc: Pekka Enberg &lt;penberg@kernel.org&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Joonsoo Kim &lt;iamjoonsoo.kim@lge.com&gt;
Cc: Greg Thelen &lt;gthelen@google.com&gt;
Cc: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>mm: soft-offline: fix num_poisoned_pages counting on concurrent events</title>
<updated>2015-05-17T16:55:07+00:00</updated>
<author>
<name>Naoya Horiguchi</name>
<email>n-horiguchi@ah.jp.nec.com</email>
</author>
<published>2015-05-05T23:23:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=b0108b0d7bd5f1e0c0551094de3bf01cb62ad0d2'/>
<id>urn:sha1:b0108b0d7bd5f1e0c0551094de3bf01cb62ad0d2</id>
<content type='text'>
commit 602498f9aa43d4951eece3fd6ad95a6d0a78d537 upstream.

If multiple soft offline events hit one free page/hugepage concurrently,
soft_offline_page() can handle the free page/hugepage multiple times,
which makes num_poisoned_pages counter increased more than once.  This
patch fixes this wrong counting by checking TestSetPageHWPoison for normal
papes and by checking the return value of dequeue_hwpoisoned_huge_page()
for hugepages.

Signed-off-by: Naoya Horiguchi &lt;n-horiguchi@ah.jp.nec.com&gt;
Acked-by: Dean Nelson &lt;dnelson@redhat.com&gt;
Cc: Andi Kleen &lt;andi@firstfloor.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>writeback: use |1 instead of +1 to protect against div by zero</title>
<updated>2015-05-17T16:55:07+00:00</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2015-04-21T20:49:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=ae2aafcf1a7132e268c8568627a6721fec6f3d98'/>
<id>urn:sha1:ae2aafcf1a7132e268c8568627a6721fec6f3d98</id>
<content type='text'>
commit 464d1387acb94dc43ba772b35242345e3d2ead1b upstream.

mm/page-writeback.c has several places where 1 is added to the divisor
to prevent division by zero exceptions; however, if the original
divisor is equivalent to -1, adding 1 leads to division by zero.

There are three places where +1 is used for this purpose - one in
pos_ratio_polynom() and two in bdi_position_ratio().  The second one
in bdi_position_ratio() actually triggered div-by-zero oops on a
machine running a 3.10 kernel.  The divisor is

  x_intercept - bdi_setpoint + 1 == span + 1

span is confirmed to be (u32)-1.  It isn't clear how it ended up that
but it could be from write bandwidth calculation underflow fixed by
c72efb658f7c ("writeback: fix possible underflow in write bandwidth
calculation").

At any rate, +1 isn't a proper protection against div-by-zero.  This
patch converts all +1 protections to |1.  Note that
bdi_update_dirty_ratelimit() was already using |1 before this patch.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>mm/memory-failure: call shake_page() when error hits thp tail page</title>
<updated>2015-05-17T16:55:06+00:00</updated>
<author>
<name>Naoya Horiguchi</name>
<email>n-horiguchi@ah.jp.nec.com</email>
</author>
<published>2015-05-05T23:23:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=f0091a53b507d2d27c2348efd44d2dfdf9dc1321'/>
<id>urn:sha1:f0091a53b507d2d27c2348efd44d2dfdf9dc1321</id>
<content type='text'>
commit 09789e5de18e4e442870b2d700831f5cb802eb05 upstream.

Currently memory_failure() calls shake_page() to sweep pages out from
pcplists only when the victim page is 4kB LRU page or thp head page.
But we should do this for a thp tail page too.

Consider that a memory error hits a thp tail page whose head page is on
a pcplist when memory_failure() runs.  Then, the current kernel skips
shake_pages() part, so hwpoison_user_mappings() returns without calling
split_huge_page() nor try_to_unmap() because PageLRU of the thp head is
still cleared due to the skip of shake_page().

As a result, me_huge_page() runs for the thp, which is broken behavior.

One effect is a leak of the thp.  And another is to fail to isolate the
memory error, so later access to the error address causes another MCE,
which kills the processes which used the thp.

This patch fixes this problem by calling shake_page() for thp tail case.

Fixes: 385de35722c9 ("thp: allow a hwpoisoned head page to be put back to LRU")
Signed-off-by: Naoya Horiguchi &lt;n-horiguchi@ah.jp.nec.com&gt;
Reviewed-by: Andi Kleen &lt;ak@linux.intel.com&gt;
Acked-by: Dean Nelson &lt;dnelson@redhat.com&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: Hidetoshi Seto &lt;seto.hidetoshi@jp.fujitsu.com&gt;
Cc: Jin Dongming &lt;jin.dongming@np.css.fujitsu.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>mm, thp: really limit transparent hugepage allocation to local node</title>
<updated>2015-05-06T20:04:07+00:00</updated>
<author>
<name>David Rientjes</name>
<email>rientjes@google.com</email>
</author>
<published>2015-04-14T22:46:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=0b97a15f6fedf422d276245866319990c2c771c5'/>
<id>urn:sha1:0b97a15f6fedf422d276245866319990c2c771c5</id>
<content type='text'>
commit 5265047ac30191ea24b16503165000c225f54feb upstream.

Commit 077fcf116c8c ("mm/thp: allocate transparent hugepages on local
node") restructured alloc_hugepage_vma() with the intent of only
allocating transparent hugepages locally when there was not an effective
interleave mempolicy.

alloc_pages_exact_node() does not limit the allocation to the single node,
however, but rather prefers it.  This is because __GFP_THISNODE is not set
which would cause the node-local nodemask to be passed.  Without it, only
a nodemask that prefers the local node is passed.

Fix this by passing __GFP_THISNODE and falling back to small pages when
the allocation fails.

Commit 9f1b868a13ac ("mm: thp: khugepaged: add policy for finding target
node") suffers from a similar problem for khugepaged, which is also fixed.

Fixes: 077fcf116c8c ("mm/thp: allocate transparent hugepages on local node")
Fixes: 9f1b868a13ac ("mm: thp: khugepaged: add policy for finding target node")
Signed-off-by: David Rientjes &lt;rientjes@google.com&gt;
Acked-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Christoph Lameter &lt;cl@linux.com&gt;
Cc: Pekka Enberg &lt;penberg@kernel.org&gt;
Cc: Joonsoo Kim &lt;iamjoonsoo.kim@lge.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: Pravin Shelar &lt;pshelar@nicira.com&gt;
Cc: Jarno Rajahalme &lt;jrajahalme@nicira.com&gt;
Cc: Li Zefan &lt;lizefan@huawei.com&gt;
Cc: Greg Thelen &lt;gthelen@google.com&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>mm/hugetlb: use pmd_page() in follow_huge_pmd()</title>
<updated>2015-05-06T20:03:38+00:00</updated>
<author>
<name>Gerald Schaefer</name>
<email>gerald.schaefer@de.ibm.com</email>
</author>
<published>2015-04-14T22:42:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=5683056e4853891106ae0a99938c96dfdc8fa881'/>
<id>urn:sha1:5683056e4853891106ae0a99938c96dfdc8fa881</id>
<content type='text'>
commit 97534127012f0e396eddea4691f4c9b170aed74b upstream.

Commit 61f77eda9bbf ("mm/hugetlb: reduce arch dependent code around
follow_huge_*") broke follow_huge_pmd() on s390, where pmd and pte
layout differ and using pte_page() on a huge pmd will return wrong
results.  Using pmd_page() instead fixes this.

All architectures that were touched by that commit have pmd_page()
defined, so this should not break anything on other architectures.

Fixes: 61f77eda "mm/hugetlb: reduce arch dependent code around follow_huge_*"
Signed-off-by: Gerald Schaefer &lt;gerald.schaefer@de.ibm.com&gt;
Acked-by: Naoya Horiguchi &lt;n-horiguchi@ah.jp.nec.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.cz&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: Martin Schwidefsky &lt;schwidefsky@de.ibm.com&gt;
Acked-by: David Rientjes &lt;rientjes@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>fix mremap() vs. ioctx_kill() race</title>
<updated>2015-04-06T21:50:59+00:00</updated>
<author>
<name>Al Viro</name>
<email>viro@zeniv.linux.org.uk</email>
</author>
<published>2015-04-06T21:48:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=b2edffdd912b4205899a8efa0974dfbbc3216109'/>
<id>urn:sha1:b2edffdd912b4205899a8efa0974dfbbc3216109</id>
<content type='text'>
teach -&gt;mremap() method to return an error and have it fail for
aio mappings in process of being killed

Note that in case of -&gt;mremap() failure we need to undo move_page_tables()
we'd already done; we could call -&gt;mremap() first, but then the failure of
move_page_tables() would require undoing whatever _successful_ -&gt;mremap()
has done, which would be a lot more headache in general.

Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
</entry>
</feed>
