<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/drivers/md/dm-thin.c, branch v6.1.168</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.1.168</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.1.168'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2025-01-17T12:34:41+00:00</updated>
<entry>
<title>dm thin: make get_first_thin use rcu-safe list first function</title>
<updated>2025-01-17T12:34:41+00:00</updated>
<author>
<name>Krister Johansen</name>
<email>kjlx@templeofstupid.com</email>
</author>
<published>2025-01-07T23:24:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=12771050b6d059eea096993bf2001da9da9fddff'/>
<id>urn:sha1:12771050b6d059eea096993bf2001da9da9fddff</id>
<content type='text'>
commit 80f130bfad1dab93b95683fc39b87235682b8f72 upstream.

The documentation in rculist.h explains the absence of list_empty_rcu()
and cautions programmers against relying on a list_empty() -&gt;
list_first() sequence in RCU safe code.  This is because each of these
functions performs its own READ_ONCE() of the list head.  This can lead
to a situation where the list_empty() sees a valid list entry, but the
subsequent list_first() sees a different view of list head state after a
modification.

In the case of dm-thin, this author had a production box crash from a GP
fault in the process_deferred_bios path.  This function saw a valid list
head in get_first_thin() but when it subsequently dereferenced that and
turned it into a thin_c, it got the inside of the struct pool, since the
list was now empty and referring to itself.  The kernel on which this
occurred printed both a warning about a refcount_t being saturated, and
a UBSAN error for an out-of-bounds cpuid access in the queued spinlock,
prior to the fault itself.  When the resulting kdump was examined, it
was possible to see another thread patiently waiting in thin_dtr's
synchronize_rcu.

The thin_dtr call managed to pull the thin_c out of the active thins
list (and have it be the last entry in the active_thins list) at just
the wrong moment which lead to this crash.

Fortunately, the fix here is straight forward.  Switch get_first_thin()
function to use list_first_or_null_rcu() which performs just a single
READ_ONCE() and returns NULL if the list is already empty.

This was run against the devicemapper test suite's thin-provisioning
suites for delete and suspend and no regressions were observed.

Signed-off-by: Krister Johansen &lt;kjlx@templeofstupid.com&gt;
Fixes: b10ebd34ccca ("dm thin: fix rcu_read_lock being held in code that can sleep")
Cc: stable@vger.kernel.org
Acked-by: Ming-Hung Tsai &lt;mtsai@redhat.com&gt;
Signed-off-by: Mikulas Patocka &lt;mpatocka@redhat.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>dm thin: Add missing destroy_work_on_stack()</title>
<updated>2024-12-14T18:54:16+00:00</updated>
<author>
<name>Yuan Can</name>
<email>yuancan@huawei.com</email>
</author>
<published>2024-11-06T01:03:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=6404f3ae4ffc065d4028900e8bd0dde9eb94f88c'/>
<id>urn:sha1:6404f3ae4ffc065d4028900e8bd0dde9eb94f88c</id>
<content type='text'>
commit e74fa2447bf9ed03d085b6d91f0256cc1b53f1a8 upstream.

This commit add missed destroy_work_on_stack() operations for pw-&gt;worker in
pool_work_wait().

Fixes: e7a3e871d895 ("dm thin: cleanup noflush_work to use a proper completion")
Cc: stable@vger.kernel.org
Signed-off-by: Yuan Can &lt;yuancan@huawei.com&gt;
Signed-off-by: Mikulas Patocka &lt;mpatocka@redhat.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>dm: address indent/space issues</title>
<updated>2024-03-26T22:21:02+00:00</updated>
<author>
<name>Heinz Mauelshagen</name>
<email>heinzm@redhat.com</email>
</author>
<published>2023-01-25T22:31:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=d0980ed818d73788e4817a438aed8ddb56d98219'/>
<id>urn:sha1:d0980ed818d73788e4817a438aed8ddb56d98219</id>
<content type='text'>
[ Upstream commit 255e2646496fcbf836a3dfe1b535692f09f11b45 ]

Signed-off-by: Heinz Mauelshagen &lt;heinzm@redhat.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@kernel.org&gt;
Stable-dep-of: b4d78cfeb304 ("dm-integrity: align the outgoing bio in integrity_recheck")
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>dm thin: fix issue_discard to pass GFP_NOIO to __blkdev_issue_discard</title>
<updated>2023-06-21T14:00:55+00:00</updated>
<author>
<name>Mike Snitzer</name>
<email>snitzer@kernel.org</email>
</author>
<published>2023-06-14T00:05:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=bfaf388d359922d81e7e7f0551ebe9297d8da98e'/>
<id>urn:sha1:bfaf388d359922d81e7e7f0551ebe9297d8da98e</id>
<content type='text'>
commit 722d90822321497e2837cfc9000202e256e6b32f upstream.

issue_discard() passes GFP_NOWAIT to __blkdev_issue_discard() despite
its code assuming bio_alloc() always succeeds.

Commit 3dba53a958a75 ("dm thin: use __blkdev_issue_discard for async
discard support") clearly shows where things went bad:

Before commit 3dba53a958a75, dm-thin.c's open-coded
__blkdev_issue_discard_async() properly handled using GFP_NOWAIT.
Unfortunately __blkdev_issue_discard() doesn't and it was missed
during review.

Cc: stable@vger.kernel.org
Signed-off-by: Mike Snitzer &lt;snitzer@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>dm: change "unsigned" to "unsigned int"</title>
<updated>2023-04-13T14:55:17+00:00</updated>
<author>
<name>Heinz Mauelshagen</name>
<email>heinzm@redhat.com</email>
</author>
<published>2023-01-25T20:14:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=83ee6b2729429f7feadd42e081751c00225d2347'/>
<id>urn:sha1:83ee6b2729429f7feadd42e081751c00225d2347</id>
<content type='text'>
[ Upstream commit 86a3238c7b9b759cb864f4f768ab2e24687dc0e6 ]

Signed-off-by: Heinz Mauelshagen &lt;heinzm@redhat.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@kernel.org&gt;
Stable-dep-of: f7b58a69fad9 ("dm: fix improper splitting for abnormal bios")
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>dm thin: fix deadlock when swapping to thin device</title>
<updated>2023-03-30T10:49:23+00:00</updated>
<author>
<name>Coly Li</name>
<email>colyli@suse.de</email>
</author>
<published>2023-02-27T15:23:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=0131f901835b56741e946809714b1612dc478973'/>
<id>urn:sha1:0131f901835b56741e946809714b1612dc478973</id>
<content type='text'>
commit 9bbf5feecc7eab2c370496c1c161bbfe62084028 upstream.

This is an already known issue that dm-thin volume cannot be used as
swap, otherwise a deadlock may happen when dm-thin internal memory
demand triggers swap I/O on the dm-thin volume itself.

But thanks to commit a666e5c05e7c ("dm: fix deadlock when swapping to
encrypted device"), the limit_swap_bios target flag can also be used
for dm-thin to avoid the recursive I/O when it is used as swap.

Fix is to simply set ti-&gt;limit_swap_bios to true in both pool_ctr()
and thin_ctr().

In my test, I create a dm-thin volume /dev/vg/swap and use it as swap
device. Then I run fio on another dm-thin volume /dev/vg/main and use
large --blocksize to trigger swap I/O onto /dev/vg/swap.

The following fio command line is used in my test,
  fio --name recursive-swap-io --lockmem 1 --iodepth 128 \
     --ioengine libaio --filename /dev/vg/main --rw randrw \
    --blocksize 1M --numjobs 32 --time_based --runtime=12h

Without this fix, the whole system can be locked up within 15 seconds.

With this fix, there is no any deadlock or hung task observed after
2 hours of running fio.

Furthermore, if blocksize is changed from 1M to 128M, after around 30
seconds fio has no visible I/O, and the out-of-memory killer message
shows up in kernel message. After around 20 minutes all fio processes
are killed and the whole system is back to being alive.

This is exactly what is expected when recursive I/O happens on dm-thin
volume when it is used as swap.

Depends-on: a666e5c05e7c ("dm: fix deadlock when swapping to encrypted device")
Cc: stable@vger.kernel.org
Signed-off-by: Coly Li &lt;colyli@suse.de&gt;
Acked-by: Mikulas Patocka &lt;mpatocka@redhat.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>dm thin: add cond_resched() to various workqueue loops</title>
<updated>2023-03-10T08:34:01+00:00</updated>
<author>
<name>Mike Snitzer</name>
<email>snitzer@kernel.org</email>
</author>
<published>2023-02-16T20:29:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=e2c4490d40319a31786e04ec8b048565a259e20b'/>
<id>urn:sha1:e2c4490d40319a31786e04ec8b048565a259e20b</id>
<content type='text'>
[ Upstream commit e4f80303c2353952e6e980b23914e4214487f2a6 ]

Otherwise on resource constrained systems these workqueues may be too
greedy.

Signed-off-by: Mike Snitzer &lt;snitzer@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>dm thin: Fix UAF in run_timer_softirq()</title>
<updated>2023-01-07T10:11:45+00:00</updated>
<author>
<name>Luo Meng</name>
<email>luomeng12@huawei.com</email>
</author>
<published>2022-11-29T02:48:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=d9971fa4d8bde63d49c743c1b32d12fbbd3a30bd'/>
<id>urn:sha1:d9971fa4d8bde63d49c743c1b32d12fbbd3a30bd</id>
<content type='text'>
commit 88430ebcbc0ec637b710b947738839848c20feff upstream.

When dm_resume() and dm_destroy() are concurrent, it will
lead to UAF, as follows:

 BUG: KASAN: use-after-free in __run_timers+0x173/0x710
 Write of size 8 at addr ffff88816d9490f0 by task swapper/0/0
&lt;snip&gt;
 Call Trace:
  &lt;IRQ&gt;
  dump_stack_lvl+0x73/0x9f
  print_report.cold+0x132/0xaa2
  _raw_spin_lock_irqsave+0xcd/0x160
  __run_timers+0x173/0x710
  kasan_report+0xad/0x110
  __run_timers+0x173/0x710
  __asan_store8+0x9c/0x140
  __run_timers+0x173/0x710
  call_timer_fn+0x310/0x310
  pvclock_clocksource_read+0xfa/0x250
  kvm_clock_read+0x2c/0x70
  kvm_clock_get_cycles+0xd/0x20
  ktime_get+0x5c/0x110
  lapic_next_event+0x38/0x50
  clockevents_program_event+0xf1/0x1e0
  run_timer_softirq+0x49/0x90
  __do_softirq+0x16e/0x62c
  __irq_exit_rcu+0x1fa/0x270
  irq_exit_rcu+0x12/0x20
  sysvec_apic_timer_interrupt+0x8e/0xc0

One of the concurrency UAF can be shown as below:

        use                                  free
do_resume                           |
  __find_device_hash_cell           |
    dm_get                          |
      atomic_inc(&amp;md-&gt;holders)      |
                                    | dm_destroy
                                    |   __dm_destroy
                                    |     if (!dm_suspended_md(md))
                                    |     atomic_read(&amp;md-&gt;holders)
                                    |     msleep(1)
  dm_resume                         |
    __dm_resume                     |
      dm_table_resume_targets       |
        pool_resume                 |
          do_waker  #add delay work |
  dm_put                            |
    atomic_dec(&amp;md-&gt;holders)        |
                                    |     dm_table_destroy
                                    |       pool_dtr
                                    |         __pool_dec
                                    |           __pool_destroy
                                    |             destroy_workqueue
                                    |             kfree(pool) # free pool
        time out
__do_softirq
  run_timer_softirq # pool has already been freed

This can be easily reproduced using:
  1. create thin-pool
  2. dmsetup suspend pool
  3. dmsetup resume pool
  4. dmsetup remove_all # Concurrent with 3

The root cause of this UAF bug is that dm_resume() adds timer after
dm_destroy() skips cancelling the timer because of suspend status.
After timeout, it will call run_timer_softirq(), however pool has
already been freed. The concurrency UAF bug will happen.

Therefore, cancelling timer again in __pool_destroy().

Cc: stable@vger.kernel.org
Fixes: 991d9fa02da0d ("dm: add thin provisioning target")
Signed-off-by: Luo Meng &lt;luomeng12@huawei.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>dm thin: resume even if in FAIL mode</title>
<updated>2023-01-07T10:11:45+00:00</updated>
<author>
<name>Luo Meng</name>
<email>luomeng12@huawei.com</email>
</author>
<published>2022-11-30T02:09:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=82976ba6deb1636a0d63f69da96f804c3cf54d46'/>
<id>urn:sha1:82976ba6deb1636a0d63f69da96f804c3cf54d46</id>
<content type='text'>
commit 19eb1650afeb1aa86151f61900e9e5f1de5d8d02 upstream.

If a thinpool set fail_io while suspending, resume will fail with:
 device-mapper: resume ioctl on vg-thinpool  failed: Invalid argument

The thin-pool also can't be removed if an in-flight bio is in the
deferred list.

This can be easily reproduced using:

  echo "offline" &gt; /sys/block/sda/device/state
  dd if=/dev/zero of=/dev/mapper/thin bs=4K count=1
  dmsetup suspend /dev/mapper/pool
  mkfs.ext4 /dev/mapper/thin
  dmsetup resume /dev/mapper/pool

The root cause is maybe_resize_data_dev() will check fail_io and return
error before called dm_resume.

Fix this by adding FAIL mode check at the end of pool_preresume().

Cc: stable@vger.kernel.org
Fixes: da105ed5fd7e ("dm thin metadata: introduce dm_pool_abort_metadata")
Signed-off-by: Luo Meng &lt;luomeng12@huawei.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>dm thin: fix use-after-free crash in dm_sm_register_threshold_callback</title>
<updated>2022-07-15T22:09:14+00:00</updated>
<author>
<name>Luo Meng</name>
<email>luomeng12@huawei.com</email>
</author>
<published>2022-07-14T11:28:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=3534e5a5ed2997ca1b00f44a0378a075bd05e8a3'/>
<id>urn:sha1:3534e5a5ed2997ca1b00f44a0378a075bd05e8a3</id>
<content type='text'>
Fault inject on pool metadata device reports:
  BUG: KASAN: use-after-free in dm_pool_register_metadata_threshold+0x40/0x80
  Read of size 8 at addr ffff8881b9d50068 by task dmsetup/950

  CPU: 7 PID: 950 Comm: dmsetup Tainted: G        W         5.19.0-rc6 #1
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1.fc33 04/01/2014
  Call Trace:
   &lt;TASK&gt;
   dump_stack_lvl+0x34/0x44
   print_address_description.constprop.0.cold+0xeb/0x3f4
   kasan_report.cold+0xe6/0x147
   dm_pool_register_metadata_threshold+0x40/0x80
   pool_ctr+0xa0a/0x1150
   dm_table_add_target+0x2c8/0x640
   table_load+0x1fd/0x430
   ctl_ioctl+0x2c4/0x5a0
   dm_ctl_ioctl+0xa/0x10
   __x64_sys_ioctl+0xb3/0xd0
   do_syscall_64+0x35/0x80
   entry_SYSCALL_64_after_hwframe+0x46/0xb0

This can be easily reproduced using:
  echo offline &gt; /sys/block/sda/device/state
  dd if=/dev/zero of=/dev/mapper/thin bs=4k count=10
  dmsetup load pool --table "0 20971520 thin-pool /dev/sda /dev/sdb 128 0 0"

If a metadata commit fails, the transaction will be aborted and the
metadata space maps will be destroyed. If a DM table reload then
happens for this failed thin-pool, a use-after-free will occur in
dm_sm_register_threshold_callback (called from
dm_pool_register_metadata_threshold).

Fix this by in dm_pool_register_metadata_threshold() by returning the
-EINVAL error if the thin-pool is in fail mode. Also fail pool_ctr()
with a new error message: "Error registering metadata threshold".

Fixes: ac8c3f3df65e4 ("dm thin: generate event when metadata threshold passed")
Cc: stable@vger.kernel.org
Reported-by: Hulk Robot &lt;hulkci@huawei.com&gt;
Signed-off-by: Luo Meng &lt;luomeng12@huawei.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@kernel.org&gt;
</content>
</entry>
</feed>
