<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/net/ceph/messenger.c, branch v6.12.80</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.12.80</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.12.80'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2025-09-19T14:35:47+00:00</updated>
<entry>
<title>libceph: fix invalid accesses to ceph_connection_v1_info</title>
<updated>2025-09-19T14:35:47+00:00</updated>
<author>
<name>Ilya Dryomov</name>
<email>idryomov@gmail.com</email>
</author>
<published>2025-07-03T10:10:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=35dbbc3dbf8bccb2d77c68444f42c1e6d2d27983'/>
<id>urn:sha1:35dbbc3dbf8bccb2d77c68444f42c1e6d2d27983</id>
<content type='text'>
commit cdbc9836c7afadad68f374791738f118263c5371 upstream.

There is a place where generic code in messenger.c is reading and
another place where it is writing to con-&gt;v1 union member without
checking that the union member is active (i.e. msgr1 is in use).

On 64-bit systems, con-&gt;v1.auth_retry overlaps with con-&gt;v2.out_iter,
so such a read is almost guaranteed to return a bogus value instead of
0 when msgr2 is in use.  This ends up being fairly benign because the
side effect is just the invalidation of the authorizer and successive
fetching of new tickets.

con-&gt;v1.connect_seq overlaps with con-&gt;v2.conn_bufs and the fact that
it's being written to can cause more serious consequences, but luckily
it's not something that happens often.

Cc: stable@vger.kernel.org
Fixes: cd1a677cad99 ("libceph, ceph: implement msgr2.1 protocol (crc and secure modes)")
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Reviewed-by: Viacheslav Dubeyko &lt;Slava.Dubeyko@ibm.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>libceph: use min() to simplify code in ceph_dns_resolve_name()</title>
<updated>2024-08-27T07:30:16+00:00</updated>
<author>
<name>Li Zetao</name>
<email>lizetao1@huawei.com</email>
</author>
<published>2024-08-22T13:39:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=ede0b1d30b82829d6bc7924be18c7ae09cb1eb33'/>
<id>urn:sha1:ede0b1d30b82829d6bc7924be18c7ae09cb1eb33</id>
<content type='text'>
When resolving name in ceph_dns_resolve_name(), the end address of name
is determined by the minimum value of delim_p and colon_p. So using min()
here is more in line with the context.

Signed-off-by: Li Zetao &lt;lizetao1@huawei.com&gt;
Reviewed-by: Simon Horman &lt;horms@kernel.org&gt;
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</content>
</entry>
<entry>
<title>libceph: use kernel_connect()</title>
<updated>2023-10-09T11:35:24+00:00</updated>
<author>
<name>Jordan Rife</name>
<email>jrife@google.com</email>
</author>
<published>2023-10-04T23:38:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=7563cf17dce0a875ba3d872acdc63a78ea344019'/>
<id>urn:sha1:7563cf17dce0a875ba3d872acdc63a78ea344019</id>
<content type='text'>
Direct calls to ops-&gt;connect() can overwrite the address parameter when
used in conjunction with BPF SOCK_ADDR hooks. Recent changes to
kernel_connect() ensure that callers are insulated from such side
effects. This patch wraps the direct call to ops-&gt;connect() with
kernel_connect() to prevent unexpected changes to the address passed to
ceph_tcp_connect().

This change was originally part of a larger patch targeting the net tree
addressing all instances of unprotected calls to ops-&gt;connect()
throughout the kernel, but this change was split up into several patches
targeting various trees.

Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/netdev/20230821100007.559638-1-jrife@google.com/
Link: https://lore.kernel.org/netdev/9944248dba1bce861375fcce9de663934d933ba9.camel@redhat.com/
Fixes: d74bad4e74ee ("bpf: Hooks for sys_connect")
Signed-off-by: Jordan Rife &lt;jrife@google.com&gt;
Reviewed-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</content>
</entry>
<entry>
<title>libceph: add new iov_iter-based ceph_msg_data_type and ceph_osd_data_type</title>
<updated>2023-08-22T07:01:48+00:00</updated>
<author>
<name>Jeff Layton</name>
<email>jlayton@kernel.org</email>
</author>
<published>2022-07-01T10:30:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=dee0c5f834605ce9b384ee8b9c7032ffd8db4eca'/>
<id>urn:sha1:dee0c5f834605ce9b384ee8b9c7032ffd8db4eca</id>
<content type='text'>
Add an iov_iter to the unions in ceph_msg_data and ceph_msg_data_cursor.
Instead of requiring a list of pages or bvecs, we can just use an
iov_iter directly, and avoid extra allocations.

We assume that the pages represented by the iter are pinned such that
they shouldn't incur page faults, which is the case for the iov_iters
created by netfs.

While working on this, Al Viro informed me that he was going to change
iov_iter_get_pages to auto-advance the iterator as that pattern is more
or less required for ITER_PIPE anyway. We emulate that here for now by
advancing in the _next op and tracking that amount in the "lastlen"
field.

In the event that _next is called twice without an intervening
_advance, we revert the iov_iter by the remaining lastlen before
calling iov_iter_get_pages.

Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: David Howells &lt;dhowells@redhat.com&gt;
Signed-off-by: Jeff Layton &lt;jlayton@kernel.org&gt;
Reviewed-by: Xiubo Li &lt;xiubli@redhat.com&gt;
Reviewed-and-tested-by: Luís Henriques &lt;lhenriques@suse.de&gt;
Reviewed-by: Milind Changire &lt;mchangir@redhat.com&gt;
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</content>
</entry>
<entry>
<title>libceph: new sparse_read op, support sparse reads on msgr2 crc codepath</title>
<updated>2023-08-22T07:01:47+00:00</updated>
<author>
<name>Jeff Layton</name>
<email>jlayton@kernel.org</email>
</author>
<published>2022-01-25T13:26:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=ec3bc567eac12c557a2b99bd0b34b5dff12cab23'/>
<id>urn:sha1:ec3bc567eac12c557a2b99bd0b34b5dff12cab23</id>
<content type='text'>
Add support for a new sparse_read ceph_connection operation. The idea is
that the client driver can define this operation use it to do special
handling for incoming reads.

The alloc_msg routine will look at the request and determine whether the
reply is expected to be sparse. If it is, then we'll dispatch to a
different set of state machine states that will repeatedly call the
driver's sparse_read op to get length and placement info for reading the
extent map, and the extents themselves.

This necessitates adding some new field to some other structs:

- The msg gets a new bool to track whether it's a sparse_read request.

- A new field is added to the cursor to track the amount remaining in the
current extent. This is used to cap the read from the socket into the
msg_data

- Handing a revoke with all of this is particularly difficult, so I've
added a new data_len_remain field to the v2 connection info, and then
use that to skip that much on a revoke. We may want to expand the use of
that to the normal read path as well, just for consistency's sake.

Signed-off-by: Jeff Layton &lt;jlayton@kernel.org&gt;
Reviewed-by: Xiubo Li &lt;xiubli@redhat.com&gt;
Reviewed-and-tested-by: Luís Henriques &lt;lhenriques@suse.de&gt;
Reviewed-by: Milind Changire &lt;mchangir@redhat.com&gt;
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</content>
</entry>
<entry>
<title>rbd: harden get_lock_owner_info() a bit</title>
<updated>2023-07-26T13:08:09+00:00</updated>
<author>
<name>Ilya Dryomov</name>
<email>idryomov@gmail.com</email>
</author>
<published>2023-07-08T14:16:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=8ff2c64c9765446c3cef804fb99da04916603e27'/>
<id>urn:sha1:8ff2c64c9765446c3cef804fb99da04916603e27</id>
<content type='text'>
- we want the exclusive lock type, so test for it directly
- use sscanf() to actually parse the lock cookie and avoid admitting
  invalid handles
- bail if locker has a blank address

Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Reviewed-by: Dongsheng Yang &lt;dongsheng.yang@easystack.cn&gt;
</content>
</entry>
<entry>
<title>net/sock: Introduce trace_sk_data_ready()</title>
<updated>2023-01-23T11:26:50+00:00</updated>
<author>
<name>Peilin Ye</name>
<email>peilin.ye@bytedance.com</email>
</author>
<published>2023-01-20T00:45:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=40e0b09081420853542571c38875b48b60404ebb'/>
<id>urn:sha1:40e0b09081420853542571c38875b48b60404ebb</id>
<content type='text'>
As suggested by Cong, introduce a tracepoint for all -&gt;sk_data_ready()
callback implementations.  For example:

&lt;...&gt;
  iperf-609  [002] .....  70.660425: sk_data_ready: family=2 protocol=6 func=sock_def_readable
  iperf-609  [002] .....  70.660436: sk_data_ready: family=2 protocol=6 func=sock_def_readable
&lt;...&gt;

Suggested-by: Cong Wang &lt;cong.wang@bytedance.com&gt;
Signed-off-by: Peilin Ye &lt;peilin.ye@bytedance.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>Treewide: Stop corrupting socket's task_frag</title>
<updated>2022-12-20T01:28:49+00:00</updated>
<author>
<name>Benjamin Coddington</name>
<email>bcodding@redhat.com</email>
</author>
<published>2022-12-16T12:45:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=98123866fcf3fe95a0c1b198ef122dfdbd351916'/>
<id>urn:sha1:98123866fcf3fe95a0c1b198ef122dfdbd351916</id>
<content type='text'>
Since moving to memalloc_nofs_save/restore, SUNRPC has stopped setting the
GFP_NOIO flag on sk_allocation which the networking system uses to decide
when it is safe to use current-&gt;task_frag.  The results of this are
unexpected corruption in task_frag when SUNRPC is involved in memory
reclaim.

The corruption can be seen in crashes, but the root cause is often
difficult to ascertain as a crashing machine's stack trace will have no
evidence of being near NFS or SUNRPC code.  I believe this problem to
be much more pervasive than reports to the community may indicate.

Fix this by having kernel users of sockets that may corrupt task_frag due
to reclaim set sk_use_task_frag = false.  Preemptively correcting this
situation for users that still set sk_allocation allows them to convert to
memalloc_nofs_save/restore without the same unexpected corruptions that are
sure to follow, unlikely to show up in testing, and difficult to bisect.

CC: Philipp Reisner &lt;philipp.reisner@linbit.com&gt;
CC: Lars Ellenberg &lt;lars.ellenberg@linbit.com&gt;
CC: "Christoph Böhmwalder" &lt;christoph.boehmwalder@linbit.com&gt;
CC: Jens Axboe &lt;axboe@kernel.dk&gt;
CC: Josef Bacik &lt;josef@toxicpanda.com&gt;
CC: Keith Busch &lt;kbusch@kernel.org&gt;
CC: Christoph Hellwig &lt;hch@lst.de&gt;
CC: Sagi Grimberg &lt;sagi@grimberg.me&gt;
CC: Lee Duncan &lt;lduncan@suse.com&gt;
CC: Chris Leech &lt;cleech@redhat.com&gt;
CC: Mike Christie &lt;michael.christie@oracle.com&gt;
CC: "James E.J. Bottomley" &lt;jejb@linux.ibm.com&gt;
CC: "Martin K. Petersen" &lt;martin.petersen@oracle.com&gt;
CC: Valentina Manea &lt;valentina.manea.m@gmail.com&gt;
CC: Shuah Khan &lt;shuah@kernel.org&gt;
CC: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
CC: David Howells &lt;dhowells@redhat.com&gt;
CC: Marc Dionne &lt;marc.dionne@auristor.com&gt;
CC: Steve French &lt;sfrench@samba.org&gt;
CC: Christine Caulfield &lt;ccaulfie@redhat.com&gt;
CC: David Teigland &lt;teigland@redhat.com&gt;
CC: Mark Fasheh &lt;mark@fasheh.com&gt;
CC: Joel Becker &lt;jlbec@evilplan.org&gt;
CC: Joseph Qi &lt;joseph.qi@linux.alibaba.com&gt;
CC: Eric Van Hensbergen &lt;ericvh@gmail.com&gt;
CC: Latchesar Ionkov &lt;lucho@ionkov.net&gt;
CC: Dominique Martinet &lt;asmadeus@codewreck.org&gt;
CC: Ilya Dryomov &lt;idryomov@gmail.com&gt;
CC: Xiubo Li &lt;xiubli@redhat.com&gt;
CC: Chuck Lever &lt;chuck.lever@oracle.com&gt;
CC: Jeff Layton &lt;jlayton@kernel.org&gt;
CC: Trond Myklebust &lt;trond.myklebust@hammerspace.com&gt;
CC: Anna Schumaker &lt;anna@kernel.org&gt;
CC: Steffen Klassert &lt;steffen.klassert@secunet.com&gt;
CC: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;

Suggested-by: Guillaume Nault &lt;gnault@redhat.com&gt;
Signed-off-by: Benjamin Coddington &lt;bcodding@redhat.com&gt;
Reviewed-by: Guillaume Nault &lt;gnault@redhat.com&gt;
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>libceph: drop last_piece flag from ceph_msg_data_cursor</title>
<updated>2022-10-04T17:18:08+00:00</updated>
<author>
<name>Jeff Layton</name>
<email>jlayton@kernel.org</email>
</author>
<published>2022-05-25T10:11:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=da4ab869e37cf81f93333ba74b16e0ea6d322e15'/>
<id>urn:sha1:da4ab869e37cf81f93333ba74b16e0ea6d322e15</id>
<content type='text'>
ceph_msg_data_next is always passed a NULL pointer for this field. Some
of the "next" operations look at it in order to determine the length,
but we can just take the min of the data on the page or cursor-&gt;resid.

Signed-off-by: Jeff Layton &lt;jlayton@kernel.org&gt;
Reviewed-by: Xiubo Li &lt;xiubli@redhat.com&gt;
Reviewed-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</content>
</entry>
<entry>
<title>libceph: optionally use bounce buffer on recv path in crc mode</title>
<updated>2022-02-02T17:50:36+00:00</updated>
<author>
<name>Ilya Dryomov</name>
<email>idryomov@gmail.com</email>
</author>
<published>2021-12-30T14:13:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=038b8d1d1ab1cce11a158d30bf080ff41a2cfd15'/>
<id>urn:sha1:038b8d1d1ab1cce11a158d30bf080ff41a2cfd15</id>
<content type='text'>
Both msgr1 and msgr2 in crc mode are zero copy in the sense that
message data is read from the socket directly into the destination
buffer.  We assume that the destination buffer is stable (i.e. remains
unchanged while it is being read to) though.  Otherwise, CRC errors
ensue:

  libceph: read_partial_message 0000000048edf8ad data crc 1063286393 != exp. 228122706
  libceph: osd1 (1)192.168.122.1:6843 bad crc/signature

  libceph: bad data crc, calculated 57958023, expected 1805382778
  libceph: osd2 (2)192.168.122.1:6876 integrity error, bad crc

Introduce rxbounce option to enable use of a bounce buffer when
receiving message data.  In particular this is needed if a mapped
image is a Windows VM disk, passed to QEMU.  Windows has a system-wide
"dummy" page that may be mapped into the destination buffer (potentially
more than once into the same buffer) by the Windows Memory Manager in
an effort to generate a single large I/O [1][2].  QEMU makes a point of
preserving overlap relationships when cloning I/O vectors, so krbd gets
exposed to this behaviour.

[1] "What Is Really in That MDL?"
    https://docs.microsoft.com/en-us/previous-versions/windows/hardware/design/dn614012(v=vs.85)
[2] https://blogs.msmvps.com/kernelmustard/2005/05/04/dummy-pages/

URL: https://bugzilla.redhat.com/show_bug.cgi?id=1973317
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Reviewed-by: Jeff Layton &lt;jlayton@kernel.org&gt;
</content>
</entry>
</feed>
