<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/net/ceph/osdmap.c, branch v5.10.257</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v5.10.257</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v5.10.257'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2026-01-19T12:12:08+00:00</updated>
<entry>
<title>libceph: make free_choose_arg_map() resilient to partial allocation</title>
<updated>2026-01-19T12:12:08+00:00</updated>
<author>
<name>Tuo Li</name>
<email>islituo@gmail.com</email>
</author>
<published>2025-12-20T18:11:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=9b3730dabcf3764bfe3ff07caf55e641a0b45234'/>
<id>urn:sha1:9b3730dabcf3764bfe3ff07caf55e641a0b45234</id>
<content type='text'>
commit e3fe30e57649c551757a02e1cad073c47e1e075e upstream.

free_choose_arg_map() may dereference a NULL pointer if its caller fails
after a partial allocation.

For example, in decode_choose_args(), if allocation of arg_map-&gt;args
fails, execution jumps to the fail label and free_choose_arg_map() is
called. Since arg_map-&gt;size is updated to a non-zero value before memory
allocation, free_choose_arg_map() will iterate over arg_map-&gt;args and
dereference a NULL pointer.

To prevent this potential NULL pointer dereference and make
free_choose_arg_map() more resilient, add checks for pointers before
iterating.

Cc: stable@vger.kernel.org
Co-authored-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Signed-off-by: Tuo Li &lt;islituo@gmail.com&gt;
Reviewed-by: Viacheslav Dubeyko &lt;Slava.Dubeyko@ibm.com&gt;
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>libceph: replace overzealous BUG_ON in osdmap_apply_incremental()</title>
<updated>2026-01-19T12:12:08+00:00</updated>
<author>
<name>Ilya Dryomov</name>
<email>idryomov@gmail.com</email>
</author>
<published>2025-12-15T10:53:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=9aa0b0c14cefece078286d78b97d4c09685e372d'/>
<id>urn:sha1:9aa0b0c14cefece078286d78b97d4c09685e372d</id>
<content type='text'>
commit e00c3f71b5cf75681dbd74ee3f982a99cb690c2b upstream.

If the osdmap is (maliciously) corrupted such that the incremental
osdmap epoch is different from what is expected, there is no need to
BUG.  Instead, just declare the incremental osdmap to be invalid.

Cc: stable@vger.kernel.org
Reported-by: ziming zhang &lt;ezrakiez@gmail.com&gt;
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>libceph: make decode_pool() more resilient against corrupted osdmaps</title>
<updated>2026-01-19T12:11:50+00:00</updated>
<author>
<name>Ilya Dryomov</name>
<email>idryomov@gmail.com</email>
</author>
<published>2025-12-02T09:32:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=d061be4c8040ffb1110d537654a038b8b6ad39d2'/>
<id>urn:sha1:d061be4c8040ffb1110d537654a038b8b6ad39d2</id>
<content type='text'>
commit 8c738512714e8c0aa18f8a10c072d5b01c83db39 upstream.

If the osdmap is (maliciously) corrupted such that the encoded length
of ceph_pg_pool envelope is less than what is expected for a particular
encoding version, out-of-bounds reads may ensue because the only bounds
check that is there is based on that length value.

This patch adds explicit bounds checks for each field that is decoded
or skipped.

Cc: stable@vger.kernel.org
Reported-by: ziming zhang &lt;ezrakiez@gmail.com&gt;
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Reviewed-by: Xiubo Li &lt;xiubli@redhat.com&gt;
Tested-by: ziming zhang &lt;ezrakiez@gmail.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>libceph: multiple workspaces for CRUSH computations</title>
<updated>2020-10-12T13:29:26+00:00</updated>
<author>
<name>Ilya Dryomov</name>
<email>idryomov@gmail.com</email>
</author>
<published>2020-08-17T11:45:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=3986f9a42e993075af01c17dc8968cfb96a4fe53'/>
<id>urn:sha1:3986f9a42e993075af01c17dc8968cfb96a4fe53</id>
<content type='text'>
Replace a global map-&gt;crush_workspace (protected by a global mutex)
with a list of workspaces, up to the number of CPUs + 1.

This is based on a patch from Robin Geuze &lt;robing@nl.team.blue&gt;.
Robin and his team have observed a 10-20% increase in IOPS on all
queue depths and lower CPU usage as well on a high-end all-NVMe
100GbE cluster.

Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</content>
</entry>
<entry>
<title>libceph: support for balanced and localized reads</title>
<updated>2020-06-01T11:22:53+00:00</updated>
<author>
<name>Ilya Dryomov</name>
<email>idryomov@gmail.com</email>
</author>
<published>2020-05-23T09:45:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=117d96a04f007ce8fc2e292369056c3bd09f6f63'/>
<id>urn:sha1:117d96a04f007ce8fc2e292369056c3bd09f6f63</id>
<content type='text'>
OSD-side issues with reads from replica have been resolved in
Octopus.  Reading from replica should be safe wrt. unstable or
uncommitted state now, so add support for balanced and localized
reads.

There are two cases when a read from replica can't be served:

- OSD may silently drop the request, expecting the client to
  notice that the acting set has changed and resend via the usual
  means (handled with t-&gt;used_replica)

- OSD may return EAGAIN, expecting the client to resend to the
  primary, ignoring replica read flags (see handle_reply())

Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Reviewed-by: Jeff Layton &lt;jlayton@kernel.org&gt;
</content>
</entry>
<entry>
<title>libceph: crush_location infrastructure</title>
<updated>2020-06-01T11:22:53+00:00</updated>
<author>
<name>Ilya Dryomov</name>
<email>idryomov@gmail.com</email>
</author>
<published>2020-05-22T13:24:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=45e6aa9f5592cd127367074f4822039cd8a825c3'/>
<id>urn:sha1:45e6aa9f5592cd127367074f4822039cd8a825c3</id>
<content type='text'>
Allow expressing client's location in terms of CRUSH hierarchy as
a set of (bucket type name, bucket name) pairs.  The userspace syntax
"crush_location = key1=value1 key2=value2" is incompatible with mount
options and needed adaptation.  Key-value pairs are separated by '|'
and we use ':' instead of '=' to separate keys from values.  So for:

  crush_location = host=foo rack=bar

one would write:

  crush_location=host:foo|rack:bar

As in userspace, "multipath" locations are supported, so indicating
locality for parallel hierarchies is possible:

  crush_location=rack:foo1|rack:foo2|datacenter:bar

Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Reviewed-by: Jeff Layton &lt;jlayton@kernel.org&gt;
</content>
</entry>
<entry>
<title>libceph: decode CRUSH device/bucket types and names</title>
<updated>2020-06-01T11:22:53+00:00</updated>
<author>
<name>Ilya Dryomov</name>
<email>idryomov@gmail.com</email>
</author>
<published>2020-05-19T15:09:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=86403a92c3c5c6c395983fdbfc5e2f29dc39279b'/>
<id>urn:sha1:86403a92c3c5c6c395983fdbfc5e2f29dc39279b</id>
<content type='text'>
These would be matched with the provided client location to calculate
the locality value.

Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Reviewed-by: Jeff Layton &lt;jlayton@kernel.org&gt;
</content>
</entry>
<entry>
<title>libceph: add non-asserting rbtree insertion helper</title>
<updated>2020-06-01T11:22:53+00:00</updated>
<author>
<name>Ilya Dryomov</name>
<email>idryomov@gmail.com</email>
</author>
<published>2020-05-19T14:46:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=8a4b863c876d9f135fa00cfe65774c3740970303'/>
<id>urn:sha1:8a4b863c876d9f135fa00cfe65774c3740970303</id>
<content type='text'>
Needed for the next commit and useful for ceph_pg_pool_info tree as
well.  I'm leaving the asserting helper in for now, but we should look
at getting rid of it in the future.

Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Reviewed-by: Jeff Layton &lt;jlayton@kernel.org&gt;
</content>
</entry>
<entry>
<title>ceph: check POOL_FLAG_FULL/NEARFULL in addition to OSDMAP_FULL/NEARFULL</title>
<updated>2020-03-23T12:07:08+00:00</updated>
<author>
<name>Ilya Dryomov</name>
<email>idryomov@gmail.com</email>
</author>
<published>2020-03-09T11:03:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=7614209736fbc4927584d4387faade4f31444fce'/>
<id>urn:sha1:7614209736fbc4927584d4387faade4f31444fce</id>
<content type='text'>
CEPH_OSDMAP_FULL/NEARFULL aren't set since mimic, so we need to consult
per-pool flags as well.  Unfortunately the backwards compatibility here
is lacking:

- the change that deprecated OSDMAP_FULL/NEARFULL went into mimic, but
  was guarded by require_osd_release &gt;= RELEASE_LUMINOUS
- it was subsequently backported to luminous in v12.2.2, but that makes
  no difference to clients that only check OSDMAP_FULL/NEARFULL because
  require_osd_release is not client-facing -- it is for OSDs

Since all kernels are affected, the best we can do here is just start
checking both map flags and pool flags and send that to stable.

These checks are best effort, so take osdc-&gt;lock and look up pool flags
just once.  Remove the FIXME, since filesystem quotas are checked above
and RADOS quotas are reflected in POOL_FLAG_FULL: when the pool reaches
its quota, both POOL_FLAG_FULL and POOL_FLAG_FULL_QUOTA are set.

Cc: stable@vger.kernel.org
Reported-by: Yanhu Cao &lt;gmayyyha@gmail.com&gt;
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Reviewed-by: Jeff Layton &lt;jlayton@kernel.org&gt;
Acked-by: Sage Weil &lt;sage@redhat.com&gt;
</content>
</entry>
<entry>
<title>libceph: use ceph_kvmalloc() for osdmap arrays</title>
<updated>2019-09-16T10:06:25+00:00</updated>
<author>
<name>Ilya Dryomov</name>
<email>idryomov@gmail.com</email>
</author>
<published>2019-09-04T13:40:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=cf73d882cc51c1f245a890cccb79952a260302d3'/>
<id>urn:sha1:cf73d882cc51c1f245a890cccb79952a260302d3</id>
<content type='text'>
osdmap has a bunch of arrays that grow linearly with the number of
OSDs.  osd_state, osd_weight and osd_primary_affinity take 4 bytes per
OSD.  osd_addr takes 136 bytes per OSD because of sockaddr_storage.
The CRUSH workspace area also grows linearly with the number of OSDs.

Normally these arrays are allocated at client startup.  The osdmap is
usually updated in small incrementals, but once in a while a full map
may need to be processed.  For a cluster with 10000 OSDs, this means
a bunch of 40K allocations followed by a 1.3M allocation, all of which
are currently required to be physically contiguous.  This results in
sporadic ENOMEM errors, hanging the client.

Go back to manually (re)allocating arrays and use ceph_kvmalloc() to
fall back to non-contiguous allocation when necessary.

Link: https://tracker.ceph.com/issues/40481
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Reviewed-by: Jeff Layton &lt;jlayton@kernel.org&gt;
</content>
</entry>
</feed>
