<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/drivers/md/bcache/writeback.c, branch v6.18.21</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.18.21</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.18.21'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2025-09-09T13:31:59+00:00</updated>
<entry>
<title>block: remove the bi_inline_vecs variable sized array from struct bio</title>
<updated>2025-09-09T13:31:59+00:00</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2025-09-08T10:56:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=d86eaa0f3c56da286853b698b45c8ce404291082'/>
<id>urn:sha1:d86eaa0f3c56da286853b698b45c8ce404291082</id>
<content type='text'>
Bios are embedded into other structures, and at least spare is unhappy
about embedding structures with variable sized arrays.  There's no
real need to the array anyway, we can replace it with a helper pointing
to the memory just behind the bio, and with the previous cleanups there
is very few site doing anything special with it.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: John Garry &lt;john.g.garry@oracle.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>block: add a bio_init_inline helper</title>
<updated>2025-09-09T13:31:59+00:00</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2025-09-08T10:56:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=70a6f71b1a77decfc5b1db426ccbe914b58adb38'/>
<id>urn:sha1:70a6f71b1a77decfc5b1db426ccbe914b58adb38</id>
<content type='text'>
Just a simpler wrapper around bio_init for callers that want to
initialize a bio with inline bvecs.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: John Garry &lt;john.g.garry@oracle.com&gt;
Reviewed-by: Yu Kuai &lt;yukuai3@huawei.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>Revert "bcache: remove heap-related macros and switch to generic min_heap"</title>
<updated>2025-06-20T03:48:03+00:00</updated>
<author>
<name>Kuan-Wei Chiu</name>
<email>visitorckw@gmail.com</email>
</author>
<published>2025-06-14T20:23:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=48fd7ebe00c1cdc782b42576548b25185902f64c'/>
<id>urn:sha1:48fd7ebe00c1cdc782b42576548b25185902f64c</id>
<content type='text'>
This reverts commit 866898efbb25bb44fd42848318e46db9e785973a.

The generic bottom-up min_heap implementation causes performance
regression in invalidate_buckets_lru(), a hot path in bcache.  Before the
cache is fully populated, new_bucket_prio() often returns zero, leading to
many equal comparisons.  In such cases, bottom-up sift_down performs up to
2 * log2(n) comparisons, while the original top-down approach completes
with just O() comparisons, resulting in a measurable performance gap.

The performance degradation is further worsened by the non-inlined
min_heap API functions introduced in commit 92a8b224b833 ("lib/min_heap:
introduce non-inline versions of min heap API functions"), adding function
call overhead to this critical path.

As reported by Robert, bcache now suffers from latency spikes, with P100
(max) latency increasing from 600 ms to 2.4 seconds every 5 minutes. 
These regressions degrade bcache's effectiveness as a low-latency cache
layer and lead to frequent timeouts and application stalls in production
environments.

This revert aims to restore bcache's original low-latency behavior.

Link: https://lore.kernel.org/lkml/CAJhEC05+0S69z+3+FB2Cd0hD+pCRyWTKLEOsc8BOmH73p1m+KQ@mail.gmail.com
Link: https://lkml.kernel.org/r/20250614202353.1632957-3-visitorckw@gmail.com
Fixes: 866898efbb25 ("bcache: remove heap-related macros and switch to generic min_heap")
Fixes: 92a8b224b833 ("lib/min_heap: introduce non-inline versions of min heap API functions")
Signed-off-by: Kuan-Wei Chiu &lt;visitorckw@gmail.com&gt;
Reported-by: Robert Pang &lt;robertpang@google.com&gt;
Closes: https://lore.kernel.org/linux-bcache/CAJhEC06F_AtrPgw2-7CvCqZgeStgCtitbD-ryuPpXQA-JG5XXw@mail.gmail.com
Acked-by: Coly Li &lt;colyli@kernel.org&gt;
Cc: Ching-Chun (Jim) Huang &lt;jserv@ccns.ncku.edu.tw&gt;
Cc: Kent Overstreet &lt;kent.overstreet@linux.dev&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>block: Delete bio_set_prio()</title>
<updated>2024-12-23T15:17:23+00:00</updated>
<author>
<name>John Garry</name>
<email>john.g.garry@oracle.com</email>
</author>
<published>2024-12-02T11:19:57+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=19206d3f5ef7f051056d2fb49203a347e4844e6e'/>
<id>urn:sha1:19206d3f5ef7f051056d2fb49203a347e4844e6e</id>
<content type='text'>
Since commit 43b62ce3ff0a ("block: move bio io prio to a new field"), macro
bio_set_prio() does nothing but set bio-&gt;bi_ioprio. All other places just
set bio-&gt;bi_ioprio directly, so replace bio_set_prio() remaining
callsites with setting bio-&gt;bi_ioprio directly and delete that macro.

Signed-off-by: John Garry &lt;john.g.garry@oracle.com&gt;
Acked-by: Jack Wang &lt;jinpu.wang@ionos.com&gt;
Reviewed-by: Chaitanya Kulkarni &lt;kch@nvidia.com&gt;
Link: https://lore.kernel.org/r/20241202111957.2311683-3-john.g.garry@oracle.com
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>bcache: remove heap-related macros and switch to generic min_heap</title>
<updated>2024-06-25T05:25:00+00:00</updated>
<author>
<name>Kuan-Wei Chiu</name>
<email>visitorckw@gmail.com</email>
</author>
<published>2024-05-24T15:29:57+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=866898efbb25bb44fd42848318e46db9e785973a'/>
<id>urn:sha1:866898efbb25bb44fd42848318e46db9e785973a</id>
<content type='text'>
Drop the heap-related macros from bcache and replacing them with the
generic min_heap implementation from include/linux.  By doing so, code
readability is improved by using functions instead of macros.  Moreover,
the min_heap implementation in include/linux adopts a bottom-up variation
compared to the textbook version currently used in bcache.  This bottom-up
variation allows for approximately 50% reduction in the number of
comparison operations during heap siftdown, without changing the number of
swaps, thus making it more efficient.

Link: https://lkml.kernel.org/ioyfizrzq7w7mjrqcadtzsfgpuntowtjdw5pgn4qhvsdp4mqqg@nrlek5vmisbu
Link: https://lkml.kernel.org/r/20240524152958.919343-16-visitorckw@gmail.com
Signed-off-by: Kuan-Wei Chiu &lt;visitorckw@gmail.com&gt;
Reviewed-by: Ian Rogers &lt;irogers@google.com&gt;
Acked-by: Coly Li &lt;colyli@suse.de&gt;
Cc: Adrian Hunter &lt;adrian.hunter@intel.com&gt;
Cc: Alexander Shishkin &lt;alexander.shishkin@linux.intel.com&gt;
Cc: Arnaldo Carvalho de Melo &lt;acme@kernel.org&gt;
Cc: Bagas Sanjaya &lt;bagasdotme@gmail.com&gt;
Cc: Brian Foster &lt;bfoster@redhat.com&gt;
Cc: Ching-Chun (Jim) Huang &lt;jserv@ccns.ncku.edu.tw&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Jiri Olsa &lt;jolsa@kernel.org&gt;
Cc: Kent Overstreet &lt;kent.overstreet@linux.dev&gt;
Cc: Mark Rutland &lt;mark.rutland@arm.com&gt;
Cc: Matthew Sakai &lt;msakai@redhat.com&gt;
Cc: Namhyung Kim &lt;namhyung@kernel.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Randy Dunlap &lt;rdunlap@infradead.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>bcache: fix variable length array abuse in btree_iter</title>
<updated>2024-05-09T01:15:01+00:00</updated>
<author>
<name>Matthew Mirvish</name>
<email>matthew@mm12.xyz</email>
</author>
<published>2024-05-09T01:11:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=3a861560ccb35f2a4f0a4b8207fa7c2a35fc7f31'/>
<id>urn:sha1:3a861560ccb35f2a4f0a4b8207fa7c2a35fc7f31</id>
<content type='text'>
btree_iter is used in two ways: either allocated on the stack with a
fixed size MAX_BSETS, or from a mempool with a dynamic size based on the
specific cache set. Previously, the struct had a fixed-length array of
size MAX_BSETS which was indexed out-of-bounds for the dynamically-sized
iterators, which causes UBSAN to complain.

This patch uses the same approach as in bcachefs's sort_iter and splits
the iterator into a btree_iter with a flexible array member and a
btree_iter_stack which embeds a btree_iter as well as a fixed-length
data array.

Cc: stable@vger.kernel.org
Closes: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2039368
Signed-off-by: Matthew Mirvish &lt;matthew@mm12.xyz&gt;
Signed-off-by: Coly Li &lt;colyli@suse.de&gt;
Link: https://lore.kernel.org/r/20240509011117.2697-3-colyli@suse.de
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>Merge tag 'bcachefs-2023-11-29' of https://evilpiepirate.org/git/bcachefs</title>
<updated>2023-12-01T21:02:16+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2023-12-01T21:02:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=e6861be452a53a5de3e1a048eabd811a05a44915'/>
<id>urn:sha1:e6861be452a53a5de3e1a048eabd811a05a44915</id>
<content type='text'>
Pull more bcachefs bugfixes from Kent Overstreet:

 - bcache &amp; bcachefs were broken with CFI enabled; patch for closures to
   fix type punning

 - mark erasure coding as extra-experimental; there are incompatible
   disk space accounting changes coming for erasure coding, and I'm
   still seeing checksum errors in some tests

 - several fixes for durability-related issues (durability is a device
   specific setting where we can tell bcachefs that data on a given
   device should be counted as replicated x times)

 - a fix for a rare livelock when a btree node merge then updates a
   parent node that is almost full

 - fix a race in the device removal path, where dropping a pointer in a
   btree node to a device would be clobbered by an in flight btree write
   updating the btree node key on completion

 - fix one SRCU lock hold time warning in the btree gc code - ther's
   still a bunch more of these to fix

 - fix a rare race where we'd start copygc before initializing the "are
   we rw" percpu refcount; copygc would think we were already ro and die
   immediately

* tag 'bcachefs-2023-11-29' of https://evilpiepirate.org/git/bcachefs: (23 commits)
  bcachefs: Extra kthread_should_stop() calls for copygc
  bcachefs: Convert gc_alloc_start() to for_each_btree_key2()
  bcachefs: Fix race between btree writes and metadata drop
  bcachefs: move journal seq assertion
  bcachefs: -EROFS doesn't count as move_extent_start_fail
  bcachefs: trace_move_extent_start_fail() now includes errcode
  bcachefs: Fix split_race livelock
  bcachefs: Fix bucket data type for stripe buckets
  bcachefs: Add missing validation for jset_entry_data_usage
  bcachefs: Fix zstd compress workspace size
  bcachefs: bpos is misaligned on big endian
  bcachefs: Fix ec + durability calculation
  bcachefs: Data update path won't accidentaly grow replicas
  bcachefs: deallocate_extra_replicas()
  bcachefs: Proper refcounting for journal_keys
  bcachefs: preserve device path as device name
  bcachefs: Fix an endianness conversion
  bcachefs: Start gc, copygc, rebalance threads after initing writes ref
  bcachefs: Don't stop copygc thread on device resize
  bcachefs: Make sure bch2_move_ratelimit() also waits for move_ops
  ...
</content>
</entry>
<entry>
<title>closures: CLOSURE_CALLBACK() to fix type punning</title>
<updated>2023-11-24T05:29:58+00:00</updated>
<author>
<name>Kent Overstreet</name>
<email>kent.overstreet@linux.dev</email>
</author>
<published>2023-11-18T00:13:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=d4e3b928ab487a8aecd1f6a140b40ac365116cfb'/>
<id>urn:sha1:d4e3b928ab487a8aecd1f6a140b40ac365116cfb</id>
<content type='text'>
Control flow integrity is now checking that type signatures match on
indirect function calls. That breaks closures, which embed a work_struct
in a closure in such a way that a closure_fn may also be used as a
workqueue fn by the underlying closure code.

So we have to change closure fns to take a work_struct as their
argument - but that results in a loss of clarity, as closure fns have
different semantics from normal workqueue functions (they run owning a
ref on the closure, which must be released with continue_at() or
closure_return()).

Thus, this patc introduces CLOSURE_CALLBACK() and closure_type() macros
as suggested by Kees, to smooth things over a bit.

Suggested-by: Kees Cook &lt;keescook@chromium.org&gt;
Cc: Coly Li &lt;colyli@suse.de&gt;
Signed-off-by: Kent Overstreet &lt;kent.overstreet@linux.dev&gt;
</content>
</entry>
<entry>
<title>bcache: fixup multi-threaded bch_sectors_dirty_init() wake-up race</title>
<updated>2023-11-20T16:17:51+00:00</updated>
<author>
<name>Mingzhe Zou</name>
<email>mingzhe.zou@easystack.cn</email>
</author>
<published>2023-11-20T05:25:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=2faac25d7958c4761bb8cec54adb79f806783ad6'/>
<id>urn:sha1:2faac25d7958c4761bb8cec54adb79f806783ad6</id>
<content type='text'>
We get a kernel crash about "unable to handle kernel paging request":

```dmesg
[368033.032005] BUG: unable to handle kernel paging request at ffffffffad9ae4b5
[368033.032007] PGD fc3a0d067 P4D fc3a0d067 PUD fc3a0e063 PMD 8000000fc38000e1
[368033.032012] Oops: 0003 [#1] SMP PTI
[368033.032015] CPU: 23 PID: 55090 Comm: bch_dirtcnt[0] Kdump: loaded Tainted: G           OE    --------- -  - 4.18.0-147.5.1.es8_24.x86_64 #1
[368033.032017] Hardware name: Tsinghua Tongfang THTF Chaoqiang Server/072T6D, BIOS 2.4.3 01/17/2017
[368033.032027] RIP: 0010:native_queued_spin_lock_slowpath+0x183/0x1d0
[368033.032029] Code: 8b 02 48 85 c0 74 f6 48 89 c1 eb d0 c1 e9 12 83 e0
03 83 e9 01 48 c1 e0 05 48 63 c9 48 05 c0 3d 02 00 48 03 04 cd 60 68 93
ad &lt;48&gt; 89 10 8b 42 08 85 c0 75 09 f3 90 8b 42 08 85 c0 74 f7 48 8b 02
[368033.032031] RSP: 0018:ffffbb48852abe00 EFLAGS: 00010082
[368033.032032] RAX: ffffffffad9ae4b5 RBX: 0000000000000246 RCX: 0000000000003bf3
[368033.032033] RDX: ffff97b0ff8e3dc0 RSI: 0000000000600000 RDI: ffffbb4884743c68
[368033.032034] RBP: 0000000000000001 R08: 0000000000000000 R09: 000007ffffffffff
[368033.032035] R10: ffffbb486bb01000 R11: 0000000000000001 R12: ffffffffc068da70
[368033.032036] R13: 0000000000000003 R14: 0000000000000000 R15: 0000000000000000
[368033.032038] FS:  0000000000000000(0000) GS:ffff97b0ff8c0000(0000) knlGS:0000000000000000
[368033.032039] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[368033.032040] CR2: ffffffffad9ae4b5 CR3: 0000000fc3a0a002 CR4: 00000000003626e0
[368033.032042] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[368033.032043] bcache: bch_cached_dev_attach() Caching rbd479 as bcache462 on set 8cff3c36-4a76-4242-afaa-7630206bc70b
[368033.032045] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[368033.032046] Call Trace:
[368033.032054]  _raw_spin_lock_irqsave+0x32/0x40
[368033.032061]  __wake_up_common_lock+0x63/0xc0
[368033.032073]  ? bch_ptr_invalid+0x10/0x10 [bcache]
[368033.033502]  bch_dirty_init_thread+0x14c/0x160 [bcache]
[368033.033511]  ? read_dirty_submit+0x60/0x60 [bcache]
[368033.033516]  kthread+0x112/0x130
[368033.033520]  ? kthread_flush_work_fn+0x10/0x10
[368033.034505]  ret_from_fork+0x35/0x40
```

The crash occurred when call wake_up(&amp;state-&gt;wait), and then we want
to look at the value in the state. However, bch_sectors_dirty_init()
is not found in the stack of any task. Since state is allocated on
the stack, we guess that bch_sectors_dirty_init() has exited, causing
bch_dirty_init_thread() to be unable to handle kernel paging request.

In order to verify this idea, we added some printing information during
wake_up(&amp;state-&gt;wait). We find that "wake up" is printed twice, however
we only expect the last thread to wake up once.

```dmesg
[  994.641004] alcache: bch_dirty_init_thread() wake up
[  994.641018] alcache: bch_dirty_init_thread() wake up
[  994.641523] alcache: bch_sectors_dirty_init() init exit
```

There is a race. If bch_sectors_dirty_init() exits after the first wake
up, the second wake up will trigger this bug("unable to handle kernel
paging request").

Proceed as follows:

bch_sectors_dirty_init
    kthread_run ==============&gt; bch_dirty_init_thread(bch_dirtcnt[0])
            ...                         ...
    atomic_inc(&amp;state.started)          ...
            ...                         ...
    atomic_read(&amp;state.enough)          ...
            ...                 atomic_set(&amp;state-&gt;enough, 1)
    kthread_run ======================================================&gt; bch_dirty_init_thread(bch_dirtcnt[1])
            ...                 atomic_dec_and_test(&amp;state-&gt;started)            ...
    atomic_inc(&amp;state.started)          ...                                     ...
            ...                 wake_up(&amp;state-&gt;wait)                           ...
    atomic_read(&amp;state.enough)                                          atomic_dec_and_test(&amp;state-&gt;started)
            ...                                                                 ...
    wait_event(state.wait, atomic_read(&amp;state.started) == 0)                    ...
    return                                                                      ...
                                                                        wake_up(&amp;state-&gt;wait)

We believe it is very common to wake up twice if there is no dirty, but
crash is an extremely low probability event. It's hard for us to reproduce
this issue. We attached and detached continuously for a week, with a total
of more than one million attaches and only one crash.

Putting atomic_inc(&amp;state.started) before kthread_run() can avoid waking
up twice.

Fixes: b144e45fc576 ("bcache: make bch_sectors_dirty_init() to be multithreaded")
Signed-off-by: Mingzhe Zou &lt;mingzhe.zou@easystack.cn&gt;
Cc:  &lt;stable@vger.kernel.org&gt;
Signed-off-by: Coly Li &lt;colyli@suse.de&gt;
Link: https://lore.kernel.org/r/20231120052503.6122-8-colyli@suse.de
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>bcache: fixup lock c-&gt;root error</title>
<updated>2023-11-20T16:17:51+00:00</updated>
<author>
<name>Mingzhe Zou</name>
<email>mingzhe.zou@easystack.cn</email>
</author>
<published>2023-11-20T05:24:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=e34820f984512b433ee1fc291417e60c47d56727'/>
<id>urn:sha1:e34820f984512b433ee1fc291417e60c47d56727</id>
<content type='text'>
We had a problem with io hung because it was waiting for c-&gt;root to
release the lock.

crash&gt; cache_set.root -l cache_set.list ffffa03fde4c0050
  root = 0xffff802ef454c800
crash&gt; btree -o 0xffff802ef454c800 | grep rw_semaphore
  [ffff802ef454c858] struct rw_semaphore lock;
crash&gt; struct rw_semaphore ffff802ef454c858
struct rw_semaphore {
  count = {
    counter = -4294967297
  },
  wait_list = {
    next = 0xffff00006786fc28,
    prev = 0xffff00005d0efac8
  },
  wait_lock = {
    raw_lock = {
      {
        val = {
          counter = 0
        },
        {
          locked = 0 '\000',
          pending = 0 '\000'
        },
        {
          locked_pending = 0,
          tail = 0
        }
      }
    }
  },
  osq = {
    tail = {
      counter = 0
    }
  },
  owner = 0xffffa03fdc586603
}

The "counter = -4294967297" means that lock count is -1 and a write lock
is being attempted. Then, we found that there is a btree with a counter
of 1 in btree_cache_freeable.

crash&gt; cache_set -l cache_set.list ffffa03fde4c0050 -o|grep btree_cache
  [ffffa03fde4c1140] struct list_head btree_cache;
  [ffffa03fde4c1150] struct list_head btree_cache_freeable;
  [ffffa03fde4c1160] struct list_head btree_cache_freed;
  [ffffa03fde4c1170] unsigned int btree_cache_used;
  [ffffa03fde4c1178] wait_queue_head_t btree_cache_wait;
  [ffffa03fde4c1190] struct task_struct *btree_cache_alloc_lock;
crash&gt; list -H ffffa03fde4c1140|wc -l
973
crash&gt; list -H ffffa03fde4c1150|wc -l
1123
crash&gt; cache_set.btree_cache_used -l cache_set.list ffffa03fde4c0050
  btree_cache_used = 2097
crash&gt; list -s btree -l btree.list -H ffffa03fde4c1140|grep -E -A2 "^  lock = {" &gt; btree_cache.txt
crash&gt; list -s btree -l btree.list -H ffffa03fde4c1150|grep -E -A2 "^  lock = {" &gt; btree_cache_freeable.txt
[root@node-3 127.0.0.1-2023-08-04-16:40:28]# pwd
/var/crash/127.0.0.1-2023-08-04-16:40:28
[root@node-3 127.0.0.1-2023-08-04-16:40:28]# cat btree_cache.txt|grep counter|grep -v "counter = 0"
[root@node-3 127.0.0.1-2023-08-04-16:40:28]# cat btree_cache_freeable.txt|grep counter|grep -v "counter = 0"
      counter = 1

We found that this is a bug in bch_sectors_dirty_init() when locking c-&gt;root:
    (1). Thread X has locked c-&gt;root(A) write.
    (2). Thread Y failed to lock c-&gt;root(A), waiting for the lock(c-&gt;root A).
    (3). Thread X bch_btree_set_root() changes c-&gt;root from A to B.
    (4). Thread X releases the lock(c-&gt;root A).
    (5). Thread Y successfully locks c-&gt;root(A).
    (6). Thread Y releases the lock(c-&gt;root B).

        down_write locked ---(1)----------------------┐
                |                                     |
                |   down_read waiting ---(2)----┐     |
                |           |               ┌-------------┐ ┌-------------┐
        bch_btree_set_root ===(3)========&gt;&gt; | c-&gt;root   A | | c-&gt;root   B |
                |           |               └-------------┘ └-------------┘
            up_write ---(4)---------------------┘     |            |
                            |                         |            |
                    down_read locked ---(5)-----------┘            |
                            |                                      |
                        up_read ---(6)-----------------------------┘

Since c-&gt;root may change, the correct steps to lock c-&gt;root should be
the same as bch_root_usage(), compare after locking.

static unsigned int bch_root_usage(struct cache_set *c)
{
        unsigned int bytes = 0;
        struct bkey *k;
        struct btree *b;
        struct btree_iter iter;

        goto lock_root;

        do {
                rw_unlock(false, b);
lock_root:
                b = c-&gt;root;
                rw_lock(false, b, b-&gt;level);
        } while (b != c-&gt;root);

        for_each_key_filter(&amp;b-&gt;keys, k, &amp;iter, bch_ptr_bad)
                bytes += bkey_bytes(k);

        rw_unlock(false, b);

        return (bytes * 100) / btree_bytes(c);
}

Fixes: b144e45fc576 ("bcache: make bch_sectors_dirty_init() to be multithreaded")
Signed-off-by: Mingzhe Zou &lt;mingzhe.zou@easystack.cn&gt;
Cc:  &lt;stable@vger.kernel.org&gt;
Signed-off-by: Coly Li &lt;colyli@suse.de&gt;
Link: https://lore.kernel.org/r/20231120052503.6122-7-colyli@suse.de
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
</feed>
