<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/mm, branch v4.10.2</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v4.10.2</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v4.10.2'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2017-03-12T05:44:10+00:00</updated>
<entry>
<title>mm, vmscan: consider eligible zones in get_scan_count</title>
<updated>2017-03-12T05:44:10+00:00</updated>
<author>
<name>Michal Hocko</name>
<email>mhocko@suse.com</email>
</author>
<published>2017-02-22T23:46:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=97ddabf533f7369d10f28ebc54822dc841344622'/>
<id>urn:sha1:97ddabf533f7369d10f28ebc54822dc841344622</id>
<content type='text'>
commit 71ab6cfe88dcf9f6e6a65eb85cf2bda20a257682 upstream.

get_scan_count() considers the whole node LRU size when

 - doing SCAN_FILE due to many page cache inactive pages
 - calculating the number of pages to scan

In both cases this might lead to unexpected behavior especially on 32b
systems where we can expect lowmem memory pressure very often.

A large highmem zone can easily distort SCAN_FILE heuristic because
there might be only few file pages from the eligible zones on the node
lru and we would still enforce file lru scanning which can lead to
trashing while we could still scan anonymous pages.

The later use of lruvec_lru_size can be problematic as well.  Especially
when there are not many pages from the eligible zones.  We would have to
skip over many pages to find anything to reclaim but shrink_node_memcg
would only reduce the remaining number to scan by SWAP_CLUSTER_MAX at
maximum.  Therefore we can end up going over a large LRU many times
without actually having chance to reclaim much if anything at all.  The
closer we are out of memory on lowmem zone the worse the problem will
be.

Fix this by filtering out all the ineligible zones when calculating the
lru size for both paths and consider only sc-&gt;reclaim_idx zones.

The patch would need to be tweaked a bit to apply to 4.10 and older but
I will do that as soon as it hits the Linus tree in the next merge
window.

Link: http://lkml.kernel.org/r/20170117103702.28542-3-mhocko@kernel.org
Fixes: b2e18757f2c9 ("mm, vmscan: begin reclaiming pages on a per-node basis")
Signed-off-by: Michal Hocko &lt;mhocko@suse.com&gt;
Tested-by: Trevor Cordes &lt;trevor@tecnopolis.ca&gt;
Acked-by: Minchan Kim &lt;minchan@kernel.org&gt;
Acked-by: Hillf Danton &lt;hillf.zj@alibaba-inc.com&gt;
Acked-by: Mel Gorman &lt;mgorman@suse.de&gt;
Acked-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;


</content>
</entry>
<entry>
<title>mm, vmscan: cleanup lru size claculations</title>
<updated>2017-03-12T05:44:10+00:00</updated>
<author>
<name>Michal Hocko</name>
<email>mhocko@suse.com</email>
</author>
<published>2017-02-22T23:45:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=e2338022cdedb4d6b85385a3d1459f3086ff6fef'/>
<id>urn:sha1:e2338022cdedb4d6b85385a3d1459f3086ff6fef</id>
<content type='text'>
commit fd538803731e50367b7c59ce4ad3454426a3d671 upstream.

lruvec_lru_size returns the full size of the LRU list while we sometimes
need a value reduced only to eligible zones (e.g.  for lowmem requests).
inactive_list_is_low is one such user.  Later patches will add more of
them.  Add a new parameter to lruvec_lru_size and allow it filter out
zones which are not eligible for the given context.

Link: http://lkml.kernel.org/r/20170117103702.28542-2-mhocko@kernel.org
Signed-off-by: Michal Hocko &lt;mhocko@suse.com&gt;
Acked-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Acked-by: Hillf Danton &lt;hillf.zj@alibaba-inc.com&gt;
Acked-by: Minchan Kim &lt;minchan@kernel.org&gt;
Acked-by: Mel Gorman &lt;mgorman@suse.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;


</content>
</entry>
<entry>
<title>mm: do not access page-&gt;mapping directly on page_endio</title>
<updated>2017-03-12T05:44:10+00:00</updated>
<author>
<name>Minchan Kim</name>
<email>minchan@kernel.org</email>
</author>
<published>2017-02-24T22:59:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=e560c8b23c3be63ad1c761eaa26bd9624a5b093e'/>
<id>urn:sha1:e560c8b23c3be63ad1c761eaa26bd9624a5b093e</id>
<content type='text'>
commit dd8416c47715cf324c9a16f13273f9fda87acfed upstream.

With rw_page, page_endio is used for completing IO on a page and it
propagates write error to the address space if the IO fails.  The
problem is it accesses page-&gt;mapping directly which might be okay for
file-backed pages but it shouldn't for anonymous page.  Otherwise, it
can corrupt one of field from anon_vma under us and system goes panic
randomly.

swap_writepage
  bdev_writepage
    ops-&gt;rw_page

I encountered the BUG during developing new zram feature and it was
really hard to figure it out because it made random crash, somtime
mmap_sem lockdep, sometime other places where places never related to
zram/zsmalloc, and not reproducible with some configuration.

When I consider how that bug is subtle and people do fast-swap test with
brd, it's worth to add stable mark, I think.

Fixes: dd6bd0d9c7db ("swap: use bdev_read_page() / bdev_write_page()")
Signed-off-by: Minchan Kim &lt;minchan@kernel.org&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Matthew Wilcox &lt;willy@infradead.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>mm: vmpressure: fix sending wrong events on underflow</title>
<updated>2017-03-12T05:44:10+00:00</updated>
<author>
<name>Vinayak Menon</name>
<email>vinmenon@codeaurora.org</email>
</author>
<published>2017-02-24T22:59:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=67b5c7997148bd48d693b1d6add8287a358a6e37'/>
<id>urn:sha1:67b5c7997148bd48d693b1d6add8287a358a6e37</id>
<content type='text'>
commit e1587a4945408faa58d0485002c110eb2454740c upstream.

At the end of a window period, if the reclaimed pages is greater than
scanned, an unsigned underflow can result in a huge pressure value and
thus a critical event.  Reclaimed pages is found to go higher than
scanned because of the addition of reclaimed slab pages to reclaimed in
shrink_node without a corresponding increment to scanned pages.

Minchan Kim mentioned that this can also happen in the case of a THP
page where the scanned is 1 and reclaimed could be 512.

Link: http://lkml.kernel.org/r/1486641577-11685-1-git-send-email-vinmenon@codeaurora.org
Signed-off-by: Vinayak Menon &lt;vinmenon@codeaurora.org&gt;
Acked-by: Minchan Kim &lt;minchan@kernel.org&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Mel Gorman &lt;mgorman@techsingularity.net&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Vladimir Davydov &lt;vdavydov.dev@gmail.com&gt;
Cc: Anton Vorontsov &lt;anton.vorontsov@linaro.org&gt;
Cc: Shiraz Hashim &lt;shashim@codeaurora.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>mm/page_alloc: fix nodes for reclaim in fast path</title>
<updated>2017-03-12T05:44:10+00:00</updated>
<author>
<name>Gavin Shan</name>
<email>gwshan@linux.vnet.ibm.com</email>
</author>
<published>2017-02-24T22:59:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=6fd7a425d92537c7e4b659586f22a4901ce11d71'/>
<id>urn:sha1:6fd7a425d92537c7e4b659586f22a4901ce11d71</id>
<content type='text'>
commit e02dc017c3032dcdce1b993af0db135462e1b4b7 upstream.

When @node_reclaim_node isn't 0, the page allocator tries to reclaim
pages if the amount of free memory in the zones are below the low
watermark.  On Power platform, none of NUMA nodes are scanned for page
reclaim because no nodes match the condition in zone_allows_reclaim().
On Power platform, RECLAIM_DISTANCE is set to 10 which is the distance
of Node-A to Node-A.  So the preferred node even won't be scanned for
page reclaim.

   __alloc_pages_nodemask()
   get_page_from_freelist()
      zone_allows_reclaim()

Anton proposed the test code as below:

   # cat alloc.c
      :
   int main(int argc, char *argv[])
   {
	void *p;
	unsigned long size;
	unsigned long start, end;

	start = time(NULL);
	size = strtoul(argv[1], NULL, 0);
	printf("To allocate %ldGB memory\n", size);

	size &lt;&lt;= 30;
	p = malloc(size);
	assert(p);
	memset(p, 0, size);

	end = time(NULL);
	printf("Used time: %ld seconds\n", end - start);
	sleep(3600);
	return 0;
   }

The system I use for testing has two NUMA nodes.  Both have 128GB
memory.  In below scnario, the page caches on node#0 should be reclaimed
when it encounters pressure to accommodate request of allocation.

   # echo 2 &gt; /proc/sys/vm/zone_reclaim_mode; \
     sync; \
     echo 3 &gt; /proc/sys/vm/drop_caches; \
   # taskset -c 0 cat file.32G &gt; /dev/null; \
     grep FilePages /sys/devices/system/node/node0/meminfo
     Node 0 FilePages:       33619712 kB
   # taskset -c 0 ./alloc 128
   # grep FilePages /sys/devices/system/node/node0/meminfo
     Node 0 FilePages:       33619840 kB
   # grep MemFree /sys/devices/system/node/node0/meminfo
     Node 0 MemFree:          186816 kB

With the patch applied, the pagecache on node-0 is reclaimed when its
free memory is running out.  It's the expected behaviour.

   # echo 2 &gt; /proc/sys/vm/zone_reclaim_mode; \
     sync; \
     echo 3 &gt; /proc/sys/vm/drop_caches
   # taskset -c 0 cat file.32G &gt; /dev/null; \
     grep FilePages /sys/devices/system/node/node0/meminfo
     Node 0 FilePages:       33605568 kB
   # taskset -c 0 ./alloc 128
   # grep FilePages /sys/devices/system/node/node0/meminfo
     Node 0 FilePages:        1379520 kB
   # grep MemFree /sys/devices/system/node/node0/meminfo
     Node 0 MemFree:           317120 kB

Fixes: 5f7a75acdb24 ("mm: page_alloc: do not cache reclaim distances")
Link: http://lkml.kernel.org/r/1486532455-29613-1-git-send-email-gwshan@linux.vnet.ibm.com
Signed-off-by: Gavin Shan &lt;gwshan@linux.vnet.ibm.com&gt;
Acked-by: Mel Gorman &lt;mgorman@suse.de&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Anton Blanchard &lt;anton@samba.org&gt;
Cc: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>block: fix double-free in the failure path of cgwb_bdi_init()</title>
<updated>2017-02-26T10:09:19+00:00</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2017-02-08T20:19:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=dea972f381a12fa0f4026a6eb6070a4fe6fb0465'/>
<id>urn:sha1:dea972f381a12fa0f4026a6eb6070a4fe6fb0465</id>
<content type='text'>
commit 5f478e4ea5c5560b4e40eb136991a09f9389f331 upstream.

When !CONFIG_CGROUP_WRITEBACK, bdi has single bdi_writeback_congested
at bdi-&gt;wb_congested.  cgwb_bdi_init() allocates it with kzalloc() and
doesn't do further initialization.  This usually works fine as the
reference count gets bumped to 1 by wb_init() and the put from
wb_exit() releases it.

However, when wb_init() fails, it puts the wb base ref automatically
freeing the wb and the explicit kfree() in cgwb_bdi_init() error path
ends up trying to free the same pointer the second time causing a
double-free.

Fix it by explicitly initilizing the refcnt to 1 and putting the base
ref from cgwb_bdi_destroy().

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Reported-by: Dmitry Vyukov &lt;dvyukov@google.com&gt;
Fixes: a13f35e87140 ("writeback: don't embed root bdi_writeback_congested in bdi_writeback")
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>mm/slub.c: fix random_seq offset destruction</title>
<updated>2017-02-08T23:41:43+00:00</updated>
<author>
<name>Sean Rees</name>
<email>sean@erifax.org</email>
</author>
<published>2017-02-08T22:30:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=a810007afe239d59c1115fcaa06eb5b480f876e9'/>
<id>urn:sha1:a810007afe239d59c1115fcaa06eb5b480f876e9</id>
<content type='text'>
Commit 210e7a43fa90 ("mm: SLUB freelist randomization") broke USB hub
initialisation as described in

  https://bugzilla.kernel.org/show_bug.cgi?id=177551.

Bail out early from init_cache_random_seq if s-&gt;random_seq is already
initialised.  This prevents destroying the previously computed
random_seq offsets later in the function.

If the offsets are destroyed, then shuffle_freelist will truncate
page-&gt;freelist to just the first object (orphaning the rest).

Fixes: 210e7a43fa90 ("mm: SLUB freelist randomization")
Link: http://lkml.kernel.org/r/20170207140707.20824-1-sean@erifax.org
Signed-off-by: Sean Rees &lt;sean@erifax.org&gt;
Reported-by: &lt;userwithuid@gmail.com&gt;
Cc: Christoph Lameter &lt;cl@linux.com&gt;
Cc: Pekka Enberg &lt;penberg@kernel.org&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Joonsoo Kim &lt;iamjoonsoo.kim@lge.com&gt;
Cc: Thomas Garnier &lt;thgarnie@google.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm, fs: check for fatal signals in do_generic_file_read()</title>
<updated>2017-02-03T22:13:19+00:00</updated>
<author>
<name>Michal Hocko</name>
<email>mhocko@suse.com</email>
</author>
<published>2017-02-03T21:13:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=5abf186a30a89d5b9c18a6bf93a2c192c9fd52f6'/>
<id>urn:sha1:5abf186a30a89d5b9c18a6bf93a2c192c9fd52f6</id>
<content type='text'>
do_generic_file_read() can be told to perform a large request from
userspace.  If the system is under OOM and the reading task is the OOM
victim then it has an access to memory reserves and finishing the full
request can lead to the full memory depletion which is dangerous.  Make
sure we rather go with a short read and allow the killed task to
terminate.

Link: http://lkml.kernel.org/r/20170201092706.9966-3-mhocko@kernel.org
Signed-off-by: Michal Hocko &lt;mhocko@suse.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: Tetsuo Handa &lt;penguin-kernel@I-love.SAKURA.ne.jp&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>base/memory, hotplug: fix a kernel oops in show_valid_zones()</title>
<updated>2017-02-03T22:13:19+00:00</updated>
<author>
<name>Toshi Kani</name>
<email>toshi.kani@hpe.com</email>
</author>
<published>2017-02-03T21:13:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=a96dfddbcc04336bbed50dc2b24823e45e09e80c'/>
<id>urn:sha1:a96dfddbcc04336bbed50dc2b24823e45e09e80c</id>
<content type='text'>
Reading a sysfs "memoryN/valid_zones" file leads to the following oops
when the first page of a range is not backed by struct page.
show_valid_zones() assumes that 'start_pfn' is always valid for
page_zone().

 BUG: unable to handle kernel paging request at ffffea017a000000
 IP: show_valid_zones+0x6f/0x160

This issue may happen on x86-64 systems with 64GiB or more memory since
their memory block size is bumped up to 2GiB.  [1] An example of such
systems is desribed below.  0x3240000000 is only aligned by 1GiB and
this memory block starts from 0x3200000000, which is not backed by
struct page.

 BIOS-e820: [mem 0x0000003240000000-0x000000603fffffff] usable

Since test_pages_in_a_zone() already checks holes, fix this issue by
extending this function to return 'valid_start' and 'valid_end' for a
given range.  show_valid_zones() then proceeds with the valid range.

[1] 'Commit bdee237c0343 ("x86: mm: Use 2GB memory block size on
    large-memory x86-64 systems")'

Link: http://lkml.kernel.org/r/20170127222149.30893-3-toshi.kani@hpe.com
Signed-off-by: Toshi Kani &lt;toshi.kani@hpe.com&gt;
Cc: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Cc: Zhang Zhen &lt;zhenzhang.zhang@huawei.com&gt;
Cc: Reza Arbab &lt;arbab@linux.vnet.ibm.com&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Dan Williams &lt;dan.j.williams@intel.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;	[4.4+]

Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm/memory_hotplug.c: check start_pfn in test_pages_in_a_zone()</title>
<updated>2017-02-03T22:13:19+00:00</updated>
<author>
<name>Toshi Kani</name>
<email>toshi.kani@hpe.com</email>
</author>
<published>2017-02-03T21:13:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=deb88a2a19e85842d79ba96b05031739ec327ff4'/>
<id>urn:sha1:deb88a2a19e85842d79ba96b05031739ec327ff4</id>
<content type='text'>
Patch series "fix a kernel oops when reading sysfs valid_zones", v2.

A sysfs memory file is created for each 2GiB memory block on x86-64 when
the system has 64GiB or more memory.  [1] When the start address of a
memory block is not backed by struct page, i.e.  a memory range is not
aligned by 2GiB, reading its 'valid_zones' attribute file leads to a
kernel oops.  This issue was observed on multiple x86-64 systems with
more than 64GiB of memory.  This patch-set fixes this issue.

Patch 1 first fixes an issue in test_pages_in_a_zone(), which does not
test the start section.

Patch 2 then fixes the kernel oops by extending test_pages_in_a_zone()
to return valid [start, end).

Note for stable kernels: The memory block size change was made by commit
bdee237c0343 ("x86: mm: Use 2GB memory block size on large-memory x86-64
systems"), which was accepted to 3.9.  However, this patch-set depends
on (and fixes) the change to test_pages_in_a_zone() made by commit
5f0f2887f4de ("mm/memory_hotplug.c: check for missing sections in
test_pages_in_a_zone()"), which was accepted to 4.4.

So, I recommend that we backport it up to 4.4.

[1] 'Commit bdee237c0343 ("x86: mm: Use 2GB memory block size on
    large-memory x86-64 systems")'

This patch (of 2):

test_pages_in_a_zone() does not check 'start_pfn' when it is aligned by
section since 'sec_end_pfn' is set equal to 'pfn'.  Since this function
is called for testing the range of a sysfs memory file, 'start_pfn' is
always aligned by section.

Fix it by properly setting 'sec_end_pfn' to the next section pfn.

Also make sure that this function returns 1 only when the range belongs
to a zone.

Link: http://lkml.kernel.org/r/20170127222149.30893-2-toshi.kani@hpe.com
Signed-off-by: Toshi Kani &lt;toshi.kani@hpe.com&gt;
Cc: Andrew Banman &lt;abanman@sgi.com&gt;
Cc: Reza Arbab &lt;arbab@linux.vnet.ibm.com&gt;
Cc: Greg KH &lt;greg@kroah.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;	[4.4+]
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
</feed>
