<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/lib/radix-tree.c, branch v4.19.165</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v4.19.165</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v4.19.165'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2019-12-05T08:19:40+00:00</updated>
<entry>
<title>idr: Fix idr_alloc_u32 on 32-bit systems</title>
<updated>2019-12-05T08:19:40+00:00</updated>
<author>
<name>Matthew Wilcox (Oracle)</name>
<email>willy@infradead.org</email>
</author>
<published>2019-11-02T04:25:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=8ef58b82d1e48378a53d2390629f23fba3aaf8c2'/>
<id>urn:sha1:8ef58b82d1e48378a53d2390629f23fba3aaf8c2</id>
<content type='text'>
[ Upstream commit b7e9728f3d7fc5c5c8508d99f1675212af5cfd49 ]

Attempting to allocate an entry at 0xffffffff when one is already
present would succeed in allocating one at 2^32, which would confuse
everything.  Return -ENOSPC in this case, as expected.

Signed-off-by: Matthew Wilcox (Oracle) &lt;willy@infradead.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>ida: Remove old API</title>
<updated>2018-08-22T03:54:21+00:00</updated>
<author>
<name>Matthew Wilcox</name>
<email>willy@infradead.org</email>
</author>
<published>2018-06-18T23:02:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=b03f8e43c9261878bf29d8cc1c3ba458cc98287e'/>
<id>urn:sha1:b03f8e43c9261878bf29d8cc1c3ba458cc98287e</id>
<content type='text'>
Delete ida_pre_get(), ida_get_new(), ida_get_new_above() and ida_remove()
from the public API.  Some of these functions still exist as internal
helpers, but they should not be called by consumers.

Signed-off-by: Matthew Wilcox &lt;willy@infradead.org&gt;
</content>
</entry>
<entry>
<title>radix-tree: Fix UBSAN warning</title>
<updated>2018-08-22T03:49:31+00:00</updated>
<author>
<name>Matthew Wilcox</name>
<email>willy@infradead.org</email>
</author>
<published>2018-08-18T11:05:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=76f070b4135563165c523ab560056b7a9353e2f2'/>
<id>urn:sha1:76f070b4135563165c523ab560056b7a9353e2f2</id>
<content type='text'>
get_slot_offset() can be called with a NULL 'parent' argument.
In this case, the calculated value will not be used, but calculating
it is undefined.  Rather than fixing the caller (__radix_tree_delete)
to not call get_slot_offset(), make get_slot_offset() robust against
being called with a NULL parent.

Signed-off-by: Matthew Wilcox &lt;willy@infradead.org&gt;
</content>
</entry>
<entry>
<title>idr: fix invalid ptr dereference on item delete</title>
<updated>2018-05-26T01:12:10+00:00</updated>
<author>
<name>Matthew Wilcox</name>
<email>mawilcox@microsoft.com</email>
</author>
<published>2018-05-25T21:47:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=7a4deea1aa8bddfed4ef1b35fc2b6732563d8ad5'/>
<id>urn:sha1:7a4deea1aa8bddfed4ef1b35fc2b6732563d8ad5</id>
<content type='text'>
If the radix tree underlying the IDR happens to be full and we attempt
to remove an id which is larger than any id in the IDR, we will call
__radix_tree_delete() with an uninitialised 'slot' pointer, at which
point anything could happen.  This was easiest to hit with a single
entry at id 0 and attempting to remove a non-0 id, but it could have
happened with 64 entries and attempting to remove an id &gt;= 64.

Roman said:

  The syzcaller test boils down to opening /dev/kvm, creating an
  eventfd, and calling a couple of KVM ioctls. None of this requires
  superuser. And the result is dereferencing an uninitialized pointer
  which is likely a crash. The specific path caught by syzbot is via
  KVM_HYPERV_EVENTD ioctl which is new in 4.17. But I guess there are
  other user-triggerable paths, so cc:stable is probably justified.

Matthew added:

  We have around 250 calls to idr_remove() in the kernel today. Many of
  them pass an ID which is embedded in the object they're removing, so
  they're safe. Picking a few likely candidates:

  drivers/firewire/core-cdev.c looks unsafe; the ID comes from an ioctl.
  drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c is similar
  drivers/atm/nicstar.c could be taken down by a handcrafted packet

Link: http://lkml.kernel.org/r/20180518175025.GD6361@bombadil.infradead.org
Fixes: 0a835c4f090a ("Reimplement IDR and IDA using the radix tree")
Reported-by: &lt;syzbot+35666cba7f0a337e2e79@syzkaller.appspotmail.com&gt;
Debugged-by: Roman Kagan &lt;rkagan@virtuozzo.com&gt;
Signed-off-by: Matthew Wilcox &lt;mawilcox@microsoft.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>radix tree: fix multi-order iteration race</title>
<updated>2018-05-19T00:17:12+00:00</updated>
<author>
<name>Ross Zwisler</name>
<email>ross.zwisler@linux.intel.com</email>
</author>
<published>2018-05-18T23:09:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=9f418224e8114156d995b98fa4e0f4fd21f685fe'/>
<id>urn:sha1:9f418224e8114156d995b98fa4e0f4fd21f685fe</id>
<content type='text'>
Fix a race in the multi-order iteration code which causes the kernel to
hit a GP fault.  This was first seen with a production v4.15 based
kernel (4.15.6-300.fc27.x86_64) utilizing a DAX workload which used
order 9 PMD DAX entries.

The race has to do with how we tear down multi-order sibling entries
when we are removing an item from the tree.  Remember for example that
an order 2 entry looks like this:

  struct radix_tree_node.slots[] = [entry][sibling][sibling][sibling]

where 'entry' is in some slot in the struct radix_tree_node, and the
three slots following 'entry' contain sibling pointers which point back
to 'entry.'

When we delete 'entry' from the tree, we call :

  radix_tree_delete()
    radix_tree_delete_item()
      __radix_tree_delete()
        replace_slot()

replace_slot() first removes the siblings in order from the first to the
last, then at then replaces 'entry' with NULL.  This means that for a
brief period of time we end up with one or more of the siblings removed,
so:

  struct radix_tree_node.slots[] = [entry][NULL][sibling][sibling]

This causes an issue if you have a reader iterating over the slots in
the tree via radix_tree_for_each_slot() while only under
rcu_read_lock()/rcu_read_unlock() protection.  This is a common case in
mm/filemap.c.

The issue is that when __radix_tree_next_slot() =&gt; skip_siblings() tries
to skip over the sibling entries in the slots, it currently does so with
an exact match on the slot directly preceding our current slot.
Normally this works:

                                      V preceding slot
  struct radix_tree_node.slots[] = [entry][sibling][sibling][sibling]
                                              ^ current slot

This lets you find the first sibling, and you skip them all in order.

But in the case where one of the siblings is NULL, that slot is skipped
and then our sibling detection is interrupted:

                                             V preceding slot
  struct radix_tree_node.slots[] = [entry][NULL][sibling][sibling]
                                                    ^ current slot

This means that the sibling pointers aren't recognized since they point
all the way back to 'entry', so we think that they are normal internal
radix tree pointers.  This causes us to think we need to walk down to a
struct radix_tree_node starting at the address of 'entry'.

In a real running kernel this will crash the thread with a GP fault when
you try and dereference the slots in your broken node starting at
'entry'.

We fix this race by fixing the way that skip_siblings() detects sibling
nodes.  Instead of testing against the preceding slot we instead look
for siblings via is_sibling_entry() which compares against the position
of the struct radix_tree_node.slots[] array.  This ensures that sibling
entries are properly identified, even if they are no longer contiguous
with the 'entry' they point to.

Link: http://lkml.kernel.org/r/20180503192430.7582-6-ross.zwisler@linux.intel.com
Fixes: 148deab223b2 ("radix-tree: improve multiorder iterators")
Signed-off-by: Ross Zwisler &lt;ross.zwisler@linux.intel.com&gt;
Reported-by: CR, Sapthagirish &lt;sapthagirish.cr@intel.com&gt;
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Cc: Matthew Wilcox &lt;willy@infradead.org&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: Dan Williams &lt;dan.j.williams@intel.com&gt;
Cc: Dave Chinner &lt;david@fromorbit.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>radix tree: use GFP_ZONEMASK bits of gfp_t for flags</title>
<updated>2018-04-11T17:28:39+00:00</updated>
<author>
<name>Matthew Wilcox</name>
<email>mawilcox@microsoft.com</email>
</author>
<published>2018-04-10T23:36:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=fa290cda102c096f5ca394277d65d3dbd689930b'/>
<id>urn:sha1:fa290cda102c096f5ca394277d65d3dbd689930b</id>
<content type='text'>
Patch series "XArray", v9.  (First part thereof).

This patchset is, I believe, appropriate for merging for 4.17.  It
contains the XArray implementation, to eventually replace the radix
tree, and converts the page cache to use it.

This conversion keeps the radix tree and XArray data structures in sync
at all times.  That allows us to convert the page cache one function at
a time and should allow for easier bisection.  Other than renaming some
elements of the structures, the data structures are fundamentally
unchanged; a radix tree walk and an XArray walk will touch the same
number of cachelines.  I have changes planned to the XArray data
structure, but those will happen in future patches.

Improvements the XArray has over the radix tree:

 - The radix tree provides operations like other trees do; 'insert' and
   'delete'. But what most users really want is an automatically
   resizing array, and so it makes more sense to give users an API that
   is like an array -- 'load' and 'store'. We still have an 'insert'
   operation for users that really want that semantic.

 - The XArray considers locking as part of its API. This simplifies a
   lot of users who formerly had to manage their own locking just for
   the radix tree. It also improves code generation as we can now tell
   RCU that we're holding a lock and it doesn't need to generate as much
   fencing code. The other advantage is that tree nodes can be moved
   (not yet implemented).

 - GFP flags are now parameters to calls which may need to allocate
   memory. The radix tree forced users to decide what the allocation
   flags would be at creation time. It's much clearer to specify them at
   allocation time.

 - Memory is not preloaded; we don't tie up dozens of pages on the off
   chance that the slab allocator fails. Instead, we drop the lock,
   allocate a new node and retry the operation. We have to convert all
   the radix tree, IDA and IDR preload users before we can realise this
   benefit, but I have not yet found a user which cannot be converted.

 - The XArray provides a cmpxchg operation. The radix tree forces users
   to roll their own (and at least four have).

 - Iterators take a 'max' parameter. That simplifies many users and will
   reduce the amount of iteration done.

 - Iteration can proceed backwards. We only have one user for this, but
   since it's called as part of the pagefault readahead algorithm, that
   seemed worth mentioning.

 - RCU-protected pointers are not exposed as part of the API. There are
   some fun bugs where the page cache forgets to use rcu_dereference()
   in the current codebase.

 - Value entries gain an extra bit compared to radix tree exceptional
   entries. That gives us the extra bit we need to put huge page swap
   entries in the page cache.

 - Some iterators now take a 'filter' argument instead of having
   separate iterators for tagged/untagged iterations.

The page cache is improved by this:

 - Shorter, easier to read code

 - More efficient iterations

 - Reduction in size of struct address_space

 - Fewer walks from the top of the data structure; the XArray API
   encourages staying at the leaf node and conducting operations there.

This patch (of 8):

None of these bits may be used for slab allocations, so we can use them
as radix tree flags as long as we mask them off before passing them to
the slab allocator. Move the IDR flag from the high bits to the
GFP_ZONEMASK bits.

Link: http://lkml.kernel.org/r/20180313132639.17387-3-willy@infradead.org
Signed-off-by: Matthew Wilcox &lt;mawilcox@microsoft.com&gt;
Acked-by: Jeff Layton &lt;jlayton@kernel.org&gt;
Cc: Darrick J. Wong &lt;darrick.wong@oracle.com&gt;
Cc: Dave Chinner &lt;david@fromorbit.com&gt;
Cc: Ryusuke Konishi &lt;konishi.ryusuke@lab.ntt.co.jp&gt;
Cc: Will Deacon &lt;will.deacon@arm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>ida: do zeroing in ida_pre_get()</title>
<updated>2018-02-21T23:35:43+00:00</updated>
<author>
<name>Rasmus Villemoes</name>
<email>linux@rasmusvillemoes.dk</email>
</author>
<published>2018-02-21T22:45:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=b1a8a7a70043400d1e685899548c92b92f640d71'/>
<id>urn:sha1:b1a8a7a70043400d1e685899548c92b92f640d71</id>
<content type='text'>
As far as I can tell, the only place the per-cpu ida_bitmap is populated
is in ida_pre_get.  The pre-allocated element is stolen in two places in
ida_get_new_above, in both cases immediately followed by a memset(0).

Since ida_get_new_above is called with locks held, do the zeroing in
ida_pre_get, or rather let kmalloc() do it.  Also, apparently gcc
generates ~44 bytes of code to do a memset(, 0, 128):

  $ scripts/bloat-o-meter vmlinux.{0,1}
  add/remove: 0/0 grow/shrink: 2/1 up/down: 5/-88 (-83)
  Function                                     old     new   delta
  ida_pre_get                                  115     119      +4
  vermagic                                      27      28      +1
  ida_get_new_above                            715     627     -88

Link: http://lkml.kernel.org/r/20180108225634.15340-1-linux@rasmusvillemoes.dk
Signed-off-by: Rasmus Villemoes &lt;linux@rasmusvillemoes.dk&gt;
Acked-by: Matthew Wilcox &lt;mawilcox@microsoft.com&gt;
Cc: Eric Biggers &lt;ebiggers@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>idr: Remove idr_alloc_ext</title>
<updated>2018-02-06T21:41:28+00:00</updated>
<author>
<name>Matthew Wilcox</name>
<email>mawilcox@microsoft.com</email>
</author>
<published>2017-11-28T20:16:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=460488c58ca8b9167463ac22ec9a2e33db351962'/>
<id>urn:sha1:460488c58ca8b9167463ac22ec9a2e33db351962</id>
<content type='text'>
It has no more users, so remove it.  Move idr_alloc() back into idr.c,
move the guts of idr_alloc_cmn() into idr_alloc_u32(), remove the
wrappers around idr_get_free_cmn() and rename it to idr_get_free().
While there is now no interface to allocate IDs larger than a u32,
the IDR internals remain ready to handle a larger ID should a need arise.

These changes make it possible to provide the guarantee that, if the
nextid pointer points into the object, the object's ID will be initialised
before a concurrent lookup can find the object.

Signed-off-by: Matthew Wilcox &lt;mawilcox@microsoft.com&gt;
</content>
</entry>
<entry>
<title>mm, truncate: do not check mapping for every page being truncated</title>
<updated>2017-11-16T02:21:06+00:00</updated>
<author>
<name>Mel Gorman</name>
<email>mgorman@techsingularity.net</email>
</author>
<published>2017-11-16T01:37:41+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=c7df8ad2910e965a6241b6d8f52fd122e26b0315'/>
<id>urn:sha1:c7df8ad2910e965a6241b6d8f52fd122e26b0315</id>
<content type='text'>
During truncation, the mapping has already been checked for shmem and
dax so it's known that workingset_update_node is required.

This patch avoids the checks on mapping for each page being truncated.
In all other cases, a lookup helper is used to determine if
workingset_update_node() needs to be called.  The one danger is that the
API is slightly harder to use as calling workingset_update_node directly
without checking for dax or shmem mappings could lead to surprises.
However, the API rarely needs to be used and hopefully the comment is
enough to give people the hint.

sparsetruncate (tiny)
                              4.14.0-rc4             4.14.0-rc4
                             oneirq-v1r1        pickhelper-v1r1
Min          Time      141.00 (   0.00%)      140.00 (   0.71%)
1st-qrtle    Time      142.00 (   0.00%)      141.00 (   0.70%)
2nd-qrtle    Time      142.00 (   0.00%)      142.00 (   0.00%)
3rd-qrtle    Time      143.00 (   0.00%)      143.00 (   0.00%)
Max-90%      Time      144.00 (   0.00%)      144.00 (   0.00%)
Max-95%      Time      147.00 (   0.00%)      145.00 (   1.36%)
Max-99%      Time      195.00 (   0.00%)      191.00 (   2.05%)
Max          Time      230.00 (   0.00%)      205.00 (  10.87%)
Amean        Time      144.37 (   0.00%)      143.82 (   0.38%)
Stddev       Time       10.44 (   0.00%)        9.00 (  13.74%)
Coeff        Time        7.23 (   0.00%)        6.26 (  13.41%)
Best99%Amean Time      143.72 (   0.00%)      143.34 (   0.26%)
Best95%Amean Time      142.37 (   0.00%)      142.00 (   0.26%)
Best90%Amean Time      142.19 (   0.00%)      141.85 (   0.24%)
Best75%Amean Time      141.92 (   0.00%)      141.58 (   0.24%)
Best50%Amean Time      141.69 (   0.00%)      141.31 (   0.27%)
Best25%Amean Time      141.38 (   0.00%)      140.97 (   0.29%)

As you'd expect, the gain is marginal but it can be detected.  The
differences in bonnie are all within the noise which is not surprising
given the impact on the microbenchmark.

radix_tree_update_node_t is a callback for some radix operations that
optionally passes in a private field.  The only user of the callback is
workingset_update_node and as it no longer requires a mapping, the
private field is removed.

Link: http://lkml.kernel.org/r/20171018075952.10627-3-mgorman@techsingularity.net
Signed-off-by: Mel Gorman &lt;mgorman@techsingularity.net&gt;
Acked-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Cc: Andi Kleen &lt;ak@linux.intel.com&gt;
Cc: Dave Chinner &lt;david@fromorbit.com&gt;
Cc: Dave Hansen &lt;dave.hansen@intel.com&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>radix-tree: must check __radix_tree_preload() return value</title>
<updated>2017-09-09T01:26:49+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2017-09-08T23:15:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=bc9ae2247ac92fd4d962507bafa3afffff6660ff'/>
<id>urn:sha1:bc9ae2247ac92fd4d962507bafa3afffff6660ff</id>
<content type='text'>
__radix_tree_preload() only disables preemption if no error is returned.

So we really need to make sure callers always check the return value.

idr_preload() contract is to always disable preemption, so we need
to add a missing preempt_disable() if an error happened.

Similarly, ida_pre_get() only needs to call preempt_enable() in the
case no error happened.

Link: http://lkml.kernel.org/r/1504637190.15310.62.camel@edumazet-glaptop3.roam.corp.google.com
Fixes: 0a835c4f090a ("Reimplement IDR and IDA using the radix tree")
Fixes: 7ad3d4d85c7a ("ida: Move ida_bitmap to a percpu variable")
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Matthew Wilcox &lt;mawilcox@microsoft.com&gt;
Cc: "Kirill A. Shutemov" &lt;kirill.shutemov@linux.intel.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;    [4.11+]
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
</feed>
