<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/fs/ceph/snap.c, branch v5.15.209</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v5.15.209</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v5.15.209'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2023-06-14T09:13:06+00:00</updated>
<entry>
<title>ceph: fix use-after-free bug for inodes when flushing capsnaps</title>
<updated>2023-06-14T09:13:06+00:00</updated>
<author>
<name>Xiubo Li</name>
<email>xiubli@redhat.com</email>
</author>
<published>2023-06-01T00:59:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=fd03b5575c8a55bcb4170cfcdd5e5183f0ec5fd7'/>
<id>urn:sha1:fd03b5575c8a55bcb4170cfcdd5e5183f0ec5fd7</id>
<content type='text'>
commit 409e873ea3c1fd3079909718bbeb06ac1ec7f38b upstream.

There is a race between capsnaps flush and removing the inode from
'mdsc-&gt;snap_flush_list' list:

   == Thread A ==                     == Thread B ==
ceph_queue_cap_snap()
 -&gt; allocate 'capsnapA'
 -&gt;ihold('&amp;ci-&gt;vfs_inode')
 -&gt;add 'capsnapA' to 'ci-&gt;i_cap_snaps'
 -&gt;add 'ci' to 'mdsc-&gt;snap_flush_list'
    ...
   == Thread C ==
ceph_flush_snaps()
 -&gt;__ceph_flush_snaps()
  -&gt;__send_flush_snap()
                                handle_cap_flushsnap_ack()
                                 -&gt;iput('&amp;ci-&gt;vfs_inode')
                                   this also will release 'ci'
                                    ...
				      == Thread D ==
                                ceph_handle_snap()
                                 -&gt;flush_snaps()
                                  -&gt;iterate 'mdsc-&gt;snap_flush_list'
                                   -&gt;get the stale 'ci'
 -&gt;remove 'ci' from                -&gt;ihold(&amp;ci-&gt;vfs_inode) this
   'mdsc-&gt;snap_flush_list'           will WARNING

To fix this we will increase the inode's i_count ref when adding 'ci'
to the 'mdsc-&gt;snap_flush_list' list.

[ idryomov: need_put int -&gt; bool ]

Cc: stable@vger.kernel.org
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2209299
Signed-off-by: Xiubo Li &lt;xiubli@redhat.com&gt;
Reviewed-by: Milind Changire &lt;mchangir@redhat.com&gt;
Reviewed-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>ceph: force updating the msg pointer in non-split case</title>
<updated>2023-05-24T16:36:55+00:00</updated>
<author>
<name>Xiubo Li</name>
<email>xiubli@redhat.com</email>
</author>
<published>2023-05-18T01:47:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=317ee8c54faa0e6152d1af2826b05f407db458a2'/>
<id>urn:sha1:317ee8c54faa0e6152d1af2826b05f407db458a2</id>
<content type='text'>
commit 4cafd0400bcb6187c0d4ab4d4b0229a89ac4f8c2 upstream.

When the MClientSnap reqeust's op is not CEPH_SNAP_OP_SPLIT the
request may still contain a list of 'split_realms', and we need
to skip it anyway. Or it will be parsed as a corrupt snaptrace.

Cc: stable@vger.kernel.org
Link: https://tracker.ceph.com/issues/61200
Reported-by: Frank Schilder &lt;frans@dtu.dk&gt;
Signed-off-by: Xiubo Li &lt;xiubli@redhat.com&gt;
Reviewed-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>ceph: avoid putting the realm twice when decoding snaps fails</title>
<updated>2022-12-02T16:41:00+00:00</updated>
<author>
<name>Xiubo Li</name>
<email>xiubli@redhat.com</email>
</author>
<published>2022-11-09T03:00:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=2f6e2de3a5289004650118b61f138fe7c28e1905'/>
<id>urn:sha1:2f6e2de3a5289004650118b61f138fe7c28e1905</id>
<content type='text'>
[ Upstream commit 51884d153f7ec85e18d607b2467820a90e0f4359 ]

When decoding the snaps fails it maybe leaving the 'first_realm'
and 'realm' pointing to the same snaprealm memory. And then it'll
put it twice and could cause random use-after-free, BUG_ON, etc
issues.

Cc: stable@vger.kernel.org
Link: https://tracker.ceph.com/issues/57686
Signed-off-by: Xiubo Li &lt;xiubli@redhat.com&gt;
Reviewed-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>ceph: do not update snapshot context when there is no new snapshot</title>
<updated>2022-12-02T16:41:00+00:00</updated>
<author>
<name>Xiubo Li</name>
<email>xiubli@redhat.com</email>
</author>
<published>2022-02-19T06:28:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=8bef55d7934dca945376af8fef988d94c55cfe1d'/>
<id>urn:sha1:8bef55d7934dca945376af8fef988d94c55cfe1d</id>
<content type='text'>
[ Upstream commit 2e586641c950e7f3e7e008404bd783a466b9b590 ]

We will only track the uppest parent snapshot realm from which we
need to rebuild the snapshot contexts _downward_ in hierarchy. For
all the others having no new snapshot we will do nothing.

This fix will avoid calling ceph_queue_cap_snap() on some inodes
inappropriately. For example, with the code in mainline, suppose there
are 2 directory hierarchies (with 6 directories total), like this:

/dir_X1/dir_X2/dir_X3/
/dir_Y1/dir_Y2/dir_Y3/

Firstly, make a snapshot under /dir_X1/dir_X2/.snap/snap_X2, then make a
root snapshot under /.snap/root_snap. Every time we make snapshots under
/dir_Y1/..., the kclient will always try to rebuild the snap context for
snap_X2 realm and finally will always try to queue cap snaps for dir_Y2
and dir_Y3, which makes no sense.

That's because the snap_X2's seq is 2 and root_snap's seq is 3. So when
creating a new snapshot under /dir_Y1/... the new seq will be 4, and
the mds will send the kclient a snapshot backtrace in _downward_
order: seqs 4, 3.

When ceph_update_snap_trace() is called, it will always rebuild the from
the last realm, that's the root_snap. So later when rebuilding the snap
context, the current logic will always cause it to rebuild the snap_X2
realm and then try to queue cap snaps for all the inodes related in that
realm, even though it's not necessary.

This is accompanied by a lot of these sorts of dout messages:

    "ceph:  queue_cap_snap 00000000a42b796b nothing dirty|writing"

Fix the logic to avoid this situation.

Also, the 'invalidate' word is not precise here. In actuality, it will
cause a rebuild of the existing snapshot contexts or just build
non-existent ones. Rename it to 'rebuild_snapcs'.

URL: https://tracker.ceph.com/issues/44100
Signed-off-by: Xiubo Li &lt;xiubli@redhat.com&gt;
Reviewed-by: Jeff Layton &lt;jlayton@kernel.org&gt;
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Stable-dep-of: 51884d153f7e ("ceph: avoid putting the realm twice when decoding snaps fails")
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>ceph: add ceph_change_snap_realm() helper</title>
<updated>2021-09-02T20:49:17+00:00</updated>
<author>
<name>Jeff Layton</name>
<email>jlayton@kernel.org</email>
</author>
<published>2021-08-02T15:01:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=0ba92e1c5f7ce7e9c1c828a84e873d7be51c1b9f'/>
<id>urn:sha1:0ba92e1c5f7ce7e9c1c828a84e873d7be51c1b9f</id>
<content type='text'>
Consolidate some fiddly code for changing an inode's snap_realm
into a new helper function, and change the callers to use it.

While we're in here, nothing uses the i_snap_realm_counter field, so
remove that from the inode.

Signed-off-by: Jeff Layton &lt;jlayton@kernel.org&gt;
Reviewed-by: Luis Henriques &lt;lhenriques@suse.de&gt;
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</content>
</entry>
<entry>
<title>ceph: correctly handle releasing an embedded cap flush</title>
<updated>2021-08-25T14:34:11+00:00</updated>
<author>
<name>Xiubo Li</name>
<email>xiubli@redhat.com</email>
</author>
<published>2021-08-18T13:38:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=b2f9fa1f3bd8846f50b355fc2168236975c4d264'/>
<id>urn:sha1:b2f9fa1f3bd8846f50b355fc2168236975c4d264</id>
<content type='text'>
The ceph_cap_flush structures are usually dynamically allocated, but
the ceph_cap_snap has an embedded one.

When force umounting, the client will try to remove all the session
caps. During this, it will free them, but that should not be done
with the ones embedded in a capsnap.

Fix this by adding a new boolean that indicates that the cap flush is
embedded in a capsnap, and skip freeing it if that's set.

At the same time, switch to using list_del_init() when detaching the
i_list and g_list heads.  It's possible for a forced umount to remove
these objects but then handle_cap_flushsnap_ack() races in and does the
list_del_init() again, corrupting memory.

Cc: stable@vger.kernel.org
URL: https://tracker.ceph.com/issues/52283
Signed-off-by: Xiubo Li &lt;xiubli@redhat.com&gt;
Reviewed-by: Jeff Layton &lt;jlayton@kernel.org&gt;
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</content>
</entry>
<entry>
<title>ceph: take snap_empty_lock atomically with snaprealm refcount change</title>
<updated>2021-08-04T17:20:29+00:00</updated>
<author>
<name>Jeff Layton</name>
<email>jlayton@kernel.org</email>
</author>
<published>2021-08-03T16:47:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=8434ffe71c874b9c4e184b88d25de98c2bf5fe3f'/>
<id>urn:sha1:8434ffe71c874b9c4e184b88d25de98c2bf5fe3f</id>
<content type='text'>
There is a race in ceph_put_snap_realm. The change to the nref and the
spinlock acquisition are not done atomically, so you could decrement
nref, and before you take the spinlock, the nref is incremented again.
At that point, you end up putting it on the empty list when it
shouldn't be there. Eventually __cleanup_empty_realms runs and frees
it when it's still in-use.

Fix this by protecting the 1-&gt;0 transition with atomic_dec_and_lock,
and just drop the spinlock if we can get the rwsem.

Because these objects can also undergo a 0-&gt;1 refcount transition, we
must protect that change as well with the spinlock. Increment locklessly
unless the value is at 0, in which case we take the spinlock, increment
and then take it off the empty list if it did the 0-&gt;1 transition.

With these changes, I'm removing the dout() messages from these
functions, as well as in __put_snap_realm. They've always been racy, and
it's better to not print values that may be misleading.

Cc: stable@vger.kernel.org
URL: https://tracker.ceph.com/issues/46419
Reported-by: Mark Nelson &lt;mnelson@redhat.com&gt;
Signed-off-by: Jeff Layton &lt;jlayton@kernel.org&gt;
Reviewed-by: Luis Henriques &lt;lhenriques@suse.de&gt;
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</content>
</entry>
<entry>
<title>ceph: eliminate ceph_async_iput()</title>
<updated>2021-06-28T22:15:52+00:00</updated>
<author>
<name>Jeff Layton</name>
<email>jlayton@kernel.org</email>
</author>
<published>2021-06-04T16:03:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=23c2c76ead541b3b7c9336bd4f3737494736b2ee'/>
<id>urn:sha1:23c2c76ead541b3b7c9336bd4f3737494736b2ee</id>
<content type='text'>
Now that we don't need to hold session-&gt;s_mutex or the snap_rwsem when
calling ceph_check_caps, we can eliminate ceph_async_iput and just use
normal iput calls.

Signed-off-by: Jeff Layton &lt;jlayton@kernel.org&gt;
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</content>
</entry>
<entry>
<title>ceph: don't take s_mutex in ceph_flush_snaps</title>
<updated>2021-06-28T22:15:52+00:00</updated>
<author>
<name>Jeff Layton</name>
<email>jlayton@kernel.org</email>
</author>
<published>2021-06-14T19:38:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=7732fe168edaea825ed65954712c825f4625f2ba'/>
<id>urn:sha1:7732fe168edaea825ed65954712c825f4625f2ba</id>
<content type='text'>
Signed-off-by: Jeff Layton &lt;jlayton@kernel.org&gt;
Reviewed-by: Luis Henriques &lt;lhenriques@suse.de&gt;
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</content>
</entry>
<entry>
<title>ceph: clean up locking annotation for ceph_get_snap_realm and __lookup_snap_realm</title>
<updated>2021-06-28T22:15:51+00:00</updated>
<author>
<name>Jeff Layton</name>
<email>jlayton@kernel.org</email>
</author>
<published>2021-06-01T13:24:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=df2c0cb7f8e8c83e495260ad86df8c5da947f2a7'/>
<id>urn:sha1:df2c0cb7f8e8c83e495260ad86df8c5da947f2a7</id>
<content type='text'>
They both say that the snap_rwsem must be held for write, but I don't
see any real reason for it, and it's not currently always called that
way.

The lookup is just walking the rbtree, so holding it for read should be
fine there. The "get" is bumping the refcount and (possibly) removing
it from the empty list. I see no need to hold the snap_rwsem for write
for that.

Signed-off-by: Jeff Layton &lt;jlayton@kernel.org&gt;
Reviewed-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</content>
</entry>
</feed>
