<feed xmlns='http://www.w3.org/2005/Atom'>
<title>BMC/Intel-BMC/linux.git/mm/filemap.c, branch dev-4.7</title>
<subtitle>Intel OpenBMC Linux kernel source tree (mirror)</subtitle>
<id>https://git.radix-linux.su/BMC/Intel-BMC/linux.git/atom?h=dev-4.7</id>
<link rel='self' href='https://git.radix-linux.su/BMC/Intel-BMC/linux.git/atom?h=dev-4.7'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/BMC/Intel-BMC/linux.git/'/>
<updated>2016-10-22T10:06:48+00:00</updated>
<entry>
<title>vfs,mm: fix a dead loop in truncate_inode_pages_range()</title>
<updated>2016-10-22T10:06:48+00:00</updated>
<author>
<name>Wei Fang</name>
<email>fangwei1@huawei.com</email>
</author>
<published>2016-10-08T00:01:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/BMC/Intel-BMC/linux.git/commit/?id=fbc7b803831e5c8a42c1f3427a17e55a814d6b3c'/>
<id>urn:sha1:fbc7b803831e5c8a42c1f3427a17e55a814d6b3c</id>
<content type='text'>
commit c2a9737f45e27d8263ff9643f994bda9bac0b944 upstream.

We triggered a deadloop in truncate_inode_pages_range() on 32 bits
architecture with the test case bellow:

	...
	fd = open();
	write(fd, buf, 4096);
	preadv64(fd, &amp;iovec, 1, 0xffffffff000);
	ftruncate(fd, 0);
	...

Then ftruncate() will not return forever.

The filesystem used in this case is ubifs, but it can be triggered on
many other filesystems.

When preadv64() is called with offset=0xffffffff000, a page with
index=0xffffffff will be added to the radix tree of -&gt;mapping.  Then
this page can be found in -&gt;mapping with pagevec_lookup().  After that,
truncate_inode_pages_range(), which is called in ftruncate(), will fall
into an infinite loop:

 - find a page with index=0xffffffff, since index&gt;=end, this page won't
   be truncated

 - index++, and index become 0

 - the page with index=0xffffffff will be found again

The data type of index is unsigned long, so index won't overflow to 0 on
64 bits architecture in this case, and the dead loop won't happen.

Since truncate_inode_pages_range() is executed with holding lock of
inode-&gt;i_rwsem, any operation related with this lock will be blocked,
and a hung task will happen, e.g.:

  INFO: task truncate_test:3364 blocked for more than 120 seconds.
  ...
     call_rwsem_down_write_failed+0x17/0x30
     generic_file_write_iter+0x32/0x1c0
     ubifs_write_iter+0xcc/0x170
     __vfs_write+0xc4/0x120
     vfs_write+0xb2/0x1b0
     SyS_write+0x46/0xa0

The page with index=0xffffffff added to -&gt;mapping is useless.  Fix this
by checking the read position before allocating pages.

Link: http://lkml.kernel.org/r/1475151010-40166-1-git-send-email-fangwei1@huawei.com
Signed-off-by: Wei Fang &lt;fangwei1@huawei.com&gt;
Cc: Christoph Hellwig &lt;hch@infradead.org&gt;
Cc: Dave Chinner &lt;david@fromorbit.com&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>Revert "mm: make faultaround produce old ptes"</title>
<updated>2016-06-25T00:23:52+00:00</updated>
<author>
<name>Kirill A. Shutemov</name>
<email>kirill.shutemov@linux.intel.com</email>
</author>
<published>2016-06-24T21:49:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/BMC/Intel-BMC/linux.git/commit/?id=315d09bf30c2b436a1fdac86d31c24380cd56c4f'/>
<id>urn:sha1:315d09bf30c2b436a1fdac86d31c24380cd56c4f</id>
<content type='text'>
This reverts commit 5c0a85fad949212b3e059692deecdeed74ae7ec7.

The commit causes ~6% regression in unixbench.

Let's revert it for now and consider other solution for reclaim problem
later.

Link: http://lkml.kernel.org/r/1465893750-44080-2-git-send-email-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Reported-by: "Huang, Ying" &lt;ying.huang@intel.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Minchan Kim &lt;minchan@kernel.org&gt;
Cc: Vinayak Menon &lt;vinmenon@codeaurora.org&gt;
Cc: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>Merge tag 'dax-locking-for-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm</title>
<updated>2016-05-27T03:00:28+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2016-05-27T03:00:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/BMC/Intel-BMC/linux.git/commit/?id=478a1469a7d27fe6b2f85fc801ecdeb8afc836e6'/>
<id>urn:sha1:478a1469a7d27fe6b2f85fc801ecdeb8afc836e6</id>
<content type='text'>
Pull DAX locking updates from Ross Zwisler:
 "Filesystem DAX locking for 4.7

   - We use a bit in an exceptional radix tree entry as a lock bit and
     use it similarly to how page lock is used for normal faults.  This
     fixes races between hole instantiation and read faults of the same
     index.

   - Filesystem DAX PMD faults are disabled, and will be re-enabled when
     PMD locking is implemented"

* tag 'dax-locking-for-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
  dax: Remove i_mmap_lock protection
  dax: Use radix tree entry lock to protect cow faults
  dax: New fault locking
  dax: Allow DAX code to replace exceptional entries
  dax: Define DAX lock bit for radix tree exceptional entry
  dax: Make huge page handling depend of CONFIG_BROKEN
  dax: Fix condition for filling of PMD holes
</content>
</entry>
<entry>
<title>radix-tree: introduce radix_tree_replace_clear_tags()</title>
<updated>2016-05-21T00:58:30+00:00</updated>
<author>
<name>Matthew Wilcox</name>
<email>willy@linux.intel.com</email>
</author>
<published>2016-05-21T00:03:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/BMC/Intel-BMC/linux.git/commit/?id=d604c324524bf61c68182bb27db64656a78fe911'/>
<id>urn:sha1:d604c324524bf61c68182bb27db64656a78fe911</id>
<content type='text'>
In addition to replacing the entry, we also clear all associated tags.
This is really a one-off special for page_cache_tree_delete() which had
far too much detailed knowledge about how the radix tree works.

For efficiency, factor node_tag_clear() out of radix_tree_tag_clear() It
can be used by radix_tree_delete_item() as well as
radix_tree_replace_clear_tags().

Signed-off-by: Matthew Wilcox &lt;willy@linux.intel.com&gt;
Cc: Konstantin Khlebnikov &lt;koct9i@gmail.com&gt;
Cc: Kirill Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Cc: Jan Kara &lt;jack@suse.com&gt;
Cc: Neil Brown &lt;neilb@suse.de&gt;
Cc: Ross Zwisler &lt;ross.zwisler@linux.intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: make faultaround produce old ptes</title>
<updated>2016-05-21T00:58:30+00:00</updated>
<author>
<name>Kirill A. Shutemov</name>
<email>kirill.shutemov@linux.intel.com</email>
</author>
<published>2016-05-20T23:58:41+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/BMC/Intel-BMC/linux.git/commit/?id=5c0a85fad949212b3e059692deecdeed74ae7ec7'/>
<id>urn:sha1:5c0a85fad949212b3e059692deecdeed74ae7ec7</id>
<content type='text'>
Currently, faultaround code produces young pte.  This can screw up
vmscan behaviour[1], as it makes vmscan think that these pages are hot
and not push them out on first round.

During sparse file access faultaround gets more pages mapped and all of
them are young.  Under memory pressure, this makes vmscan swap out anon
pages instead, or to drop other page cache pages which otherwise stay
resident.

Modify faultaround to produce old ptes, so they can easily be reclaimed
under memory pressure.

This can to some extend defeat the purpose of faultaround on machines
without hardware accessed bit as it will not help us with reducing the
number of minor page faults.

We may want to disable faultaround on such machines altogether, but
that's subject for separate patchset.

Minchan:
 "I tested 512M mmap sequential word read test on non-HW access bit
  system (i.e., ARM) and confirmed it doesn't increase minor fault any
  more.

  old: 4096 fault_around
  minor fault: 131291
  elapsed time: 6747645 usec

  new: 65536 fault_around
  minor fault: 131291
  elapsed time: 6709263 usec

  0.56% benefit"

[1] https://lkml.kernel.org/r/1460992636-711-1-git-send-email-vinmenon@codeaurora.org

Link: http://lkml.kernel.org/r/1463488366-47723-1-git-send-email-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Acked-by: Minchan Kim &lt;minchan@kernel.org&gt;
Tested-by: Minchan Kim &lt;minchan@kernel.org&gt;
Acked-by: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Cc: Vinayak Menon &lt;vinmenon@codeaurora.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: filemap: only do access activations on reads</title>
<updated>2016-05-21T00:58:30+00:00</updated>
<author>
<name>Johannes Weiner</name>
<email>hannes@cmpxchg.org</email>
</author>
<published>2016-05-20T23:56:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/BMC/Intel-BMC/linux.git/commit/?id=bbddabe2e436aa7869b3ac5248df5c14ddde0cbf'/>
<id>urn:sha1:bbddabe2e436aa7869b3ac5248df5c14ddde0cbf</id>
<content type='text'>
Andres observed that his database workload is struggling with the
transaction journal creating pressure on frequently read pages.

Access patterns like transaction journals frequently write the same
pages over and over, but in the majority of cases those pages are never
read back.  There are no caching benefits to be had for those pages, so
activating them and having them put pressure on pages that do benefit
from caching is a bad choice.

Leave page activations to read accesses and don't promote pages based on
writes alone.

It could be said that partially written pages do contain cache-worthy
data, because even if *userspace* does not access the unwritten part,
the kernel still has to read it from the filesystem for correctness.
However, a counter argument is that these pages enjoy at least *some*
protection over other inactive file pages through the writeback cache,
in the sense that dirty pages are written back with a delay and cache
reclaim leaves them alone until they have been written back to disk.
Should that turn out to be insufficient and we see increased read IO
from partial writes under memory pressure, we can always go back and
update grab_cache_page_write_begin() to take (pos, len) so that it can
tell partial writes from pages that don't need partial reads.  But for
now, keep it simple.

Signed-off-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Reported-by: Andres Freund &lt;andres@anarazel.de&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: workingset: only do workingset activations on reads</title>
<updated>2016-05-21T00:58:30+00:00</updated>
<author>
<name>Rik van Riel</name>
<email>riel@redhat.com</email>
</author>
<published>2016-05-20T23:56:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/BMC/Intel-BMC/linux.git/commit/?id=f0281a00fe80f0e689dd51e68c3aed5f6ef1bf58'/>
<id>urn:sha1:f0281a00fe80f0e689dd51e68c3aed5f6ef1bf58</id>
<content type='text'>
This is a follow-up to

  http://www.spinics.net/lists/linux-mm/msg101739.html

where Andres reported his database workingset being pushed out by the
minimum size enforcement of the inactive file list - currently 50% of
cache - as well as repeatedly written file pages that are never actually
read.

Two changes fell out of the discussions.  The first change observes that
pages that are only ever written don't benefit from caching beyond what
the writeback cache does for partial page writes, and so we shouldn't
promote them to the active file list where they compete with pages whose
cached data is actually accessed repeatedly.  This change comes in two
patches - one for in-cache write accesses and one for refaults triggered
by writes, neither of which should promote a cache page.

Second, with the refault detection we don't need to set 50% of the cache
aside for used-once cache anymore since we can detect frequently used
pages even when they are evicted between accesses.  We can allow the
active list to be bigger and thus protect a bigger workingset that isn't
challenged by streamers.  Depending on the access patterns, this can
increase major faults during workingset transitions for better
performance during stable phases.

This patch (of 3):

When rewriting a page, the data in that page is replaced with new data.
This means that evicting something else from the active file list, in
order to cache data that will be replaced by something else, is likely
to be a waste of memory.

It is better to save the active list for frequently read pages, because
reads actually use the data that is in the page.

This patch ignores partial writes, because it is unclear whether the
complexity of identifying those is worth any potential performance gain
obtained from better caching pages that see repeated partial writes at
large enough intervals to not get caught by the use-twice promotion code
used for the inactive file list.

Signed-off-by: Rik van Riel &lt;riel@redhat.com&gt;
Signed-off-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Reported-by: Andres Freund &lt;andres@anarazel.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm/page_ref: use page_ref helper instead of direct modification of _count</title>
<updated>2016-05-20T02:12:14+00:00</updated>
<author>
<name>Joonsoo Kim</name>
<email>iamjoonsoo.kim@lge.com</email>
</author>
<published>2016-05-20T00:10:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/BMC/Intel-BMC/linux.git/commit/?id=6d061f9f6136d477932088c24ce155d7dc785746'/>
<id>urn:sha1:6d061f9f6136d477932088c24ce155d7dc785746</id>
<content type='text'>
page_reference manipulation functions are introduced to track down
reference count change of the page.  Use it instead of direct
modification of _count.

Signed-off-by: Joonsoo Kim &lt;iamjoonsoo.kim@lge.com&gt;
Acked-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Johannes Berg &lt;johannes@sipsolutions.net&gt;
Cc: "David S. Miller" &lt;davem@davemloft.net&gt;
Cc: Sunil Goutham &lt;sgoutham@cavium.com&gt;
Cc: Chris Metcalf &lt;cmetcalf@mellanox.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>dax: New fault locking</title>
<updated>2016-05-19T21:20:54+00:00</updated>
<author>
<name>Jan Kara</name>
<email>jack@suse.cz</email>
</author>
<published>2016-05-12T16:29:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/BMC/Intel-BMC/linux.git/commit/?id=ac401cc782429cc8560ce4840b1405d603740917'/>
<id>urn:sha1:ac401cc782429cc8560ce4840b1405d603740917</id>
<content type='text'>
Currently DAX page fault locking is racy.

CPU0 (write fault)		CPU1 (read fault)

__dax_fault()			__dax_fault()
  get_block(inode, block, &amp;bh, 0) -&gt; not mapped
				  get_block(inode, block, &amp;bh, 0)
				    -&gt; not mapped
  if (!buffer_mapped(&amp;bh))
    if (vmf-&gt;flags &amp; FAULT_FLAG_WRITE)
      get_block(inode, block, &amp;bh, 1) -&gt; allocates blocks
  if (page) -&gt; no
				  if (!buffer_mapped(&amp;bh))
				    if (vmf-&gt;flags &amp; FAULT_FLAG_WRITE) {
				    } else {
				      dax_load_hole();
				    }
  dax_insert_mapping()

And we are in a situation where we fail in dax_radix_entry() with -EIO.

Another problem with the current DAX page fault locking is that there is
no race-free way to clear dirty tag in the radix tree. We can always
end up with clean radix tree and dirty data in CPU cache.

We fix the first problem by introducing locking of exceptional radix
tree entries in DAX mappings acting very similarly to page lock and thus
synchronizing properly faults against the same mapping index. The same
lock can later be used to avoid races when clearing radix tree dirty
tag.

Reviewed-by: NeilBrown &lt;neilb@suse.com&gt;
Reviewed-by: Ross Zwisler &lt;ross.zwisler@linux.intel.com&gt;
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Ross Zwisler &lt;ross.zwisler@linux.intel.com&gt;
</content>
</entry>
<entry>
<title>dax: Allow DAX code to replace exceptional entries</title>
<updated>2016-05-19T21:18:30+00:00</updated>
<author>
<name>Jan Kara</name>
<email>jack@suse.cz</email>
</author>
<published>2016-05-12T16:29:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/BMC/Intel-BMC/linux.git/commit/?id=4f622938a5e2b7f1374ffb1e5fc212744898f513'/>
<id>urn:sha1:4f622938a5e2b7f1374ffb1e5fc212744898f513</id>
<content type='text'>
Currently we forbid page_cache_tree_insert() to replace exceptional radix
tree entries for DAX inodes. However to make DAX faults race free we will
lock radix tree entries and when hole is created, we need to replace
such locked radix tree entry with a hole page. So modify
page_cache_tree_insert() to allow that.

Reviewed-by: Ross Zwisler &lt;ross.zwisler@linux.intel.com&gt;
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Ross Zwisler &lt;ross.zwisler@linux.intel.com&gt;
</content>
</entry>
</feed>
