<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/drivers/md/bcache, branch v5.4.113</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v5.4.113</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v5.4.113'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2021-03-04T09:26:43+00:00</updated>
<entry>
<title>bcache: Move journal work to new flush wq</title>
<updated>2021-03-04T09:26:43+00:00</updated>
<author>
<name>Kai Krakow</name>
<email>kai@kaishome.de</email>
</author>
<published>2021-02-10T05:07:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=5431cb67306d6d488bc33f5aa7cb91191691b353'/>
<id>urn:sha1:5431cb67306d6d488bc33f5aa7cb91191691b353</id>
<content type='text'>
commit afe78ab46f638ecdf80a35b122ffc92c20d9ae5d upstream.

This is potentially long running and not latency sensitive, let's get
it out of the way of other latency sensitive events.

As observed in the previous commit, the `system_wq` comes easily
congested by bcache, and this fixes a few more stalls I was observing
every once in a while.

Let's not make this `WQ_MEM_RECLAIM` as it showed to reduce performance
of boot and file system operations in my tests. Also, without
`WQ_MEM_RECLAIM`, I no longer see desktop stalls. This matches the
previous behavior as `system_wq` also does no memory reclaim:

&gt; // workqueue.c:
&gt; system_wq = alloc_workqueue("events", 0, 0);

Cc: Coly Li &lt;colyli@suse.de&gt;
Cc: stable@vger.kernel.org # 5.4+
Signed-off-by: Kai Krakow &lt;kai@kaishome.de&gt;
Signed-off-by: Coly Li &lt;colyli@suse.de&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>bcache: Give btree_io_wq correct semantics again</title>
<updated>2021-03-04T09:26:42+00:00</updated>
<author>
<name>Kai Krakow</name>
<email>kai@kaishome.de</email>
</author>
<published>2021-02-10T05:07:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=a339f0998eb1b06606a794ff37957b1d1d13311d'/>
<id>urn:sha1:a339f0998eb1b06606a794ff37957b1d1d13311d</id>
<content type='text'>
commit d797bd9897e3559eb48d68368550d637d32e468c upstream.

Before killing `btree_io_wq`, the queue was allocated using
`create_singlethread_workqueue()` which has `WQ_MEM_RECLAIM`. After
killing it, it no longer had this property but `system_wq` is not
single threaded.

Let's combine both worlds and make it multi threaded but able to
reclaim memory.

Cc: Coly Li &lt;colyli@suse.de&gt;
Cc: stable@vger.kernel.org # 5.4+
Signed-off-by: Kai Krakow &lt;kai@kaishome.de&gt;
Signed-off-by: Coly Li &lt;colyli@suse.de&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>Revert "bcache: Kill btree_io_wq"</title>
<updated>2021-03-04T09:26:42+00:00</updated>
<author>
<name>Kai Krakow</name>
<email>kai@kaishome.de</email>
</author>
<published>2021-02-10T05:07:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=de5510b9825cd74b07412591a15bba1bd422c6aa'/>
<id>urn:sha1:de5510b9825cd74b07412591a15bba1bd422c6aa</id>
<content type='text'>
commit 9f233ffe02e5cef611100cd8c5bcf4de26ca7bef upstream.

This reverts commit 56b30770b27d54d68ad51eccc6d888282b568cee.

With the btree using the `system_wq`, I seem to see a lot more desktop
latency than I should.

After some more investigation, it looks like the original assumption
of 56b3077 no longer is true, and bcache has a very high potential of
congesting the `system_wq`. In turn, this introduces laggy desktop
performance, IO stalls (at least with btrfs), and input events may be
delayed.

So let's revert this. It's important to note that the semantics of
using `system_wq` previously mean that `btree_io_wq` should be created
before and destroyed after other bcache wqs to keep the same
assumptions.

Cc: Coly Li &lt;colyli@suse.de&gt;
Cc: stable@vger.kernel.org # 5.4+
Signed-off-by: Kai Krakow &lt;kai@kaishome.de&gt;
Signed-off-by: Coly Li &lt;colyli@suse.de&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>bcache: fix a lost wake-up problem caused by mca_cannibalize_lock</title>
<updated>2020-10-01T11:17:18+00:00</updated>
<author>
<name>Guoju Fang</name>
<email>fangguoju@gmail.com</email>
</author>
<published>2019-11-13T08:03:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=fd3572bd5bc1c18512a44d4e73ea39af957dc658'/>
<id>urn:sha1:fd3572bd5bc1c18512a44d4e73ea39af957dc658</id>
<content type='text'>
[ Upstream commit 34cf78bf34d48dddddfeeadb44f9841d7864997a ]

This patch fix a lost wake-up problem caused by the race between
mca_cannibalize_lock and bch_cannibalize_unlock.

Consider two processes, A and B. Process A is executing
mca_cannibalize_lock, while process B takes c-&gt;btree_cache_alloc_lock
and is executing bch_cannibalize_unlock. The problem happens that after
process A executes cmpxchg and will execute prepare_to_wait. In this
timeslice process B executes wake_up, but after that process A executes
prepare_to_wait and set the state to TASK_INTERRUPTIBLE. Then process A
goes to sleep but no one will wake up it. This problem may cause bcache
device to dead.

Signed-off-by: Guoju Fang &lt;fangguoju@gmail.com&gt;
Signed-off-by: Coly Li &lt;colyli@suse.de&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>bcache: avoid nr_stripes overflow in bcache_device_init()</title>
<updated>2020-08-26T08:40:48+00:00</updated>
<author>
<name>Coly Li</name>
<email>colyli@suse.de</email>
</author>
<published>2020-07-25T12:00:21+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=b40753984979b1cd61a19d91b2044dd15a816e9f'/>
<id>urn:sha1:b40753984979b1cd61a19d91b2044dd15a816e9f</id>
<content type='text'>
[ Upstream commit 65f0f017e7be8c70330372df23bcb2a407ecf02d ]

For some block devices which large capacity (e.g. 8TB) but small io_opt
size (e.g. 8 sectors), in bcache_device_init() the stripes number calcu-
lated by,
	DIV_ROUND_UP_ULL(sectors, d-&gt;stripe_size);
might be overflow to the unsigned int bcache_device-&gt;nr_stripes.

This patch uses the uint64_t variable to store DIV_ROUND_UP_ULL()
and after the value is checked to be available in unsigned int range,
sets it to bache_device-&gt;nr_stripes. Then the overflow is avoided.

Reported-and-tested-by: Ken Raeburn &lt;raeburn@redhat.com&gt;
Signed-off-by: Coly Li &lt;colyli@suse.de&gt;
Cc: stable@vger.kernel.org
Link: https://bugzilla.redhat.com/show_bug.cgi?id=1783075
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>bcache: fix overflow in offset_to_stripe()</title>
<updated>2020-08-21T11:05:25+00:00</updated>
<author>
<name>Coly Li</name>
<email>colyli@suse.de</email>
</author>
<published>2020-07-25T12:00:22+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=c573e8673dc1fb89012e95e793e783f7d5267f2f'/>
<id>urn:sha1:c573e8673dc1fb89012e95e793e783f7d5267f2f</id>
<content type='text'>
commit 7a1481267999c02abf4a624515c1b5c7c1fccbd6 upstream.

offset_to_stripe() returns the stripe number (in type unsigned int) from
an offset (in type uint64_t) by the following calculation,
	do_div(offset, d-&gt;stripe_size);
For large capacity backing device (e.g. 18TB) with small stripe size
(e.g. 4KB), the result is 4831838208 and exceeds UINT_MAX. The actual
returned value which caller receives is 536870912, due to the overflow.

Indeed in bcache_device_init(), bcache_device-&gt;nr_stripes is limited in
range [1, INT_MAX]. Therefore all valid stripe numbers in bcache are
in range [0, bcache_dev-&gt;nr_stripes - 1].

This patch adds a upper limition check in offset_to_stripe(): the max
valid stripe number should be less than bcache_device-&gt;nr_stripes. If
the calculated stripe number from do_div() is equal to or larger than
bcache_device-&gt;nr_stripe, -EINVAL will be returned. (Normally nr_stripes
is less than INT_MAX, exceeding upper limitation doesn't mean overflow,
therefore -EOVERFLOW is not used as error code.)

This patch also changes nr_stripes' type of struct bcache_device from
'unsigned int' to 'int', and return value type of offset_to_stripe()
from 'unsigned int' to 'int', to match their exact data ranges.

All locations where bcache_device-&gt;nr_stripes and offset_to_stripe() are
referenced also get updated for the above type change.

Reported-and-tested-by: Ken Raeburn &lt;raeburn@redhat.com&gt;
Signed-off-by: Coly Li &lt;colyli@suse.de&gt;
Cc: stable@vger.kernel.org
Link: https://bugzilla.redhat.com/show_bug.cgi?id=1783075
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>bcache: allocate meta data pages as compound pages</title>
<updated>2020-08-21T11:05:25+00:00</updated>
<author>
<name>Coly Li</name>
<email>colyli@suse.de</email>
</author>
<published>2020-07-25T12:00:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=42dd8cc9e499a41537a8228523362d3f8ce649c1'/>
<id>urn:sha1:42dd8cc9e499a41537a8228523362d3f8ce649c1</id>
<content type='text'>
commit 5fe48867856367142d91a82f2cbf7a57a24cbb70 upstream.

There are some meta data of bcache are allocated by multiple pages,
and they are used as bio bv_page for I/Os to the cache device. for
example cache_set-&gt;uuids, cache-&gt;disk_buckets, journal_write-&gt;data,
bset_tree-&gt;data.

For such meta data memory, all the allocated pages should be treated
as a single memory block. Then the memory management and underlying I/O
code can treat them more clearly.

This patch adds __GFP_COMP flag to all the location allocating &gt;0 order
pages for the above mentioned meta data. Then their pages are treated
as compound pages now.

Signed-off-by: Coly Li &lt;colyli@suse.de&gt;
Cc: stable@vger.kernel.org
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>bcache: fix super block seq numbers comparision in register_cache_set()</title>
<updated>2020-08-19T06:16:05+00:00</updated>
<author>
<name>Coly Li</name>
<email>colyli@suse.de</email>
</author>
<published>2020-07-25T12:00:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=ca6654d7da5932c42de7ea935c00452c505a5f71'/>
<id>urn:sha1:ca6654d7da5932c42de7ea935c00452c505a5f71</id>
<content type='text'>
[ Upstream commit 117f636ea695270fe492d0c0c9dfadc7a662af47 ]

In register_cache_set(), c is pointer to struct cache_set, and ca is
pointer to struct cache, if ca-&gt;sb.seq &gt; c-&gt;sb.seq, it means this
registering cache has up to date version and other members, the in-
memory version and other members should be updated to the newer value.

But current implementation makes a cache set only has a single cache
device, so the above assumption works well except for a special case.
The execption is when a cache device new created and both ca-&gt;sb.seq and
c-&gt;sb.seq are 0, because the super block is never flushed out yet. In
the location for the following if() check,
2156         if (ca-&gt;sb.seq &gt; c-&gt;sb.seq) {
2157                 c-&gt;sb.version           = ca-&gt;sb.version;
2158                 memcpy(c-&gt;sb.set_uuid, ca-&gt;sb.set_uuid, 16);
2159                 c-&gt;sb.flags             = ca-&gt;sb.flags;
2160                 c-&gt;sb.seq               = ca-&gt;sb.seq;
2161                 pr_debug("set version = %llu\n", c-&gt;sb.version);
2162         }
c-&gt;sb.version is not initialized yet and valued 0. When ca-&gt;sb.seq is 0,
the if() check will fail (because both values are 0), and the cache set
version, set_uuid, flags and seq won't be updated.

The above problem is hiden for current code, because the bucket size is
compatible among different super block version. And the next time when
running cache set again, ca-&gt;sb.seq will be larger than 0 and cache set
super block version will be updated properly.

But if the large bucket feature is enabled,  sb-&gt;bucket_size is the low
16bits of the bucket size. For a power of 2 value, when the actual
bucket size exceeds 16bit width, sb-&gt;bucket_size will always be 0. Then
read_super_common() will fail because the if() check to
is_power_of_2(sb-&gt;bucket_size) is false. This is how the long time
hidden bug is triggered.

This patch modifies the if() check to the following way,
2156         if (ca-&gt;sb.seq &gt; c-&gt;sb.seq || c-&gt;sb.seq == 0) {
Then cache set's version, set_uuid, flags and seq will always be updated
corectly including for a new created cache device.

Signed-off-by: Coly Li &lt;colyli@suse.de&gt;
Reviewed-by: Hannes Reinecke &lt;hare@suse.de&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>bcache: fix potential deadlock problem in btree_gc_coalesce</title>
<updated>2020-06-24T15:50:45+00:00</updated>
<author>
<name>Zhiqiang Liu</name>
<email>liuzhiqiang26@huawei.com</email>
</author>
<published>2020-06-14T16:53:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=f651e94899ed08b1766bda30f410d33fdd3970ff'/>
<id>urn:sha1:f651e94899ed08b1766bda30f410d33fdd3970ff</id>
<content type='text'>
[ Upstream commit be23e837333a914df3f24bf0b32e87b0331ab8d1 ]

coccicheck reports:
  drivers/md//bcache/btree.c:1538:1-7: preceding lock on line 1417

In btree_gc_coalesce func, if the coalescing process fails, we will goto
to out_nocoalesce tag directly without releasing new_nodes[i]-&gt;write_lock.
Then, it will cause a deadlock when trying to acquire new_nodes[i]-&gt;
write_lock for freeing new_nodes[i] before return.

btree_gc_coalesce func details as follows:
	if alloc new_nodes[i] fails:
		goto out_nocoalesce;
	// obtain new_nodes[i]-&gt;write_lock
	mutex_lock(&amp;new_nodes[i]-&gt;write_lock)
	// main coalescing process
	for (i = nodes - 1; i &gt; 0; --i)
		[snipped]
		if coalescing process fails:
			// Here, directly goto out_nocoalesce
			 // tag will cause a deadlock
			goto out_nocoalesce;
		[snipped]
	// release new_nodes[i]-&gt;write_lock
	mutex_unlock(&amp;new_nodes[i]-&gt;write_lock)
	// coalesing succ, return
	return;
out_nocoalesce:
	btree_node_free(new_nodes[i])	// free new_nodes[i]
	// obtain new_nodes[i]-&gt;write_lock
	mutex_lock(&amp;new_nodes[i]-&gt;write_lock);
	// set flag for reuse
	clear_bit(BTREE_NODE_dirty, &amp;ew_nodes[i]-&gt;flags);
	// release new_nodes[i]-&gt;write_lock
	mutex_unlock(&amp;new_nodes[i]-&gt;write_lock);

To fix the problem, we add a new tag 'out_unlock_nocoalesce' for
releasing new_nodes[i]-&gt;write_lock before out_nocoalesce tag. If
coalescing process fails, we will go to out_unlock_nocoalesce tag
for releasing new_nodes[i]-&gt;write_lock before free new_nodes[i] in
out_nocoalesce tag.

(Coly Li helps to clean up commit log format.)

Fixes: 2a285686c109816 ("bcache: btree locking rework")
Signed-off-by: Zhiqiang Liu &lt;liuzhiqiang26@huawei.com&gt;
Signed-off-by: Coly Li &lt;colyli@suse.de&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>bcache: fix refcount underflow in bcache_device_free()</title>
<updated>2020-06-22T07:31:09+00:00</updated>
<author>
<name>Coly Li</name>
<email>colyli@suse.de</email>
</author>
<published>2020-05-27T04:01:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=62e7e4f5976ce617d57453b01ec4ab24608499cd'/>
<id>urn:sha1:62e7e4f5976ce617d57453b01ec4ab24608499cd</id>
<content type='text'>
[ Upstream commit 86da9f736740eba602389908574dfbb0f517baa5 ]

The problematic code piece in bcache_device_free() is,

 785 static void bcache_device_free(struct bcache_device *d)
 786 {
 787     struct gendisk *disk = d-&gt;disk;
 [snipped]
 799     if (disk) {
 800             if (disk-&gt;flags &amp; GENHD_FL_UP)
 801                     del_gendisk(disk);
 802
 803             if (disk-&gt;queue)
 804                     blk_cleanup_queue(disk-&gt;queue);
 805
 806             ida_simple_remove(&amp;bcache_device_idx,
 807                               first_minor_to_idx(disk-&gt;first_minor));
 808             put_disk(disk);
 809         }
 [snipped]
 816 }

At line 808, put_disk(disk) may encounter kobject refcount of 'disk'
being underflow.

Here is how to reproduce the issue,
- Attche the backing device to a cache device and do random write to
  make the cache being dirty.
- Stop the bcache device while the cache device has dirty data of the
  backing device.
- Only register the backing device back, NOT register cache device.
- The bcache device node /dev/bcache0 won't show up, because backing
  device waits for the cache device shows up for the missing dirty
  data.
- Now echo 1 into /sys/fs/bcache/pendings_cleanup, to stop the pending
  backing device.
- After the pending backing device stopped, use 'dmesg' to check kernel
  message, a use-after-free warning from KASA reported the refcount of
  kobject linked to the 'disk' is underflow.

The dropping refcount at line 808 in the above code piece is added by
add_disk(d-&gt;disk) in bch_cached_dev_run(). But in the above condition
the cache device is not registered, bch_cached_dev_run() has no chance
to be called and the refcount is not added. The put_disk() for a non-
added refcount of gendisk kobject triggers a underflow warning.

This patch checks whether GENHD_FL_UP is set in disk-&gt;flags, if it is
not set then the bcache device was not added, don't call put_disk()
and the the underflow issue can be avoided.

Signed-off-by: Coly Li &lt;colyli@suse.de&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
</feed>
