<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/drivers/md/bcache, branch linux-5.9.y</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=linux-5.9.y</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=linux-5.9.y'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2020-08-23T22:36:59+00:00</updated>
<entry>
<title>treewide: Use fallthrough pseudo-keyword</title>
<updated>2020-08-23T22:36:59+00:00</updated>
<author>
<name>Gustavo A. R. Silva</name>
<email>gustavoars@kernel.org</email>
</author>
<published>2020-08-23T22:36:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=df561f6688fef775baa341a0f5d960becd248b11'/>
<id>urn:sha1:df561f6688fef775baa341a0f5d960becd248b11</id>
<content type='text'>
Replace the existing /* fall through */ comments and its variants with
the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
fall-through markings when it is the case.

[1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through

Signed-off-by: Gustavo A. R. Silva &lt;gustavoars@kernel.org&gt;
</content>
</entry>
<entry>
<title>Merge tag 'for-5.9/drivers-20200803' of git://git.kernel.dk/linux-block</title>
<updated>2020-08-05T17:51:40+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2020-08-05T17:51:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=e0fc99e21e6e299673f1640105ac0c1d829c2d93'/>
<id>urn:sha1:e0fc99e21e6e299673f1640105ac0c1d829c2d93</id>
<content type='text'>
Pull block driver updates from Jens Axboe:

 - NVMe:
      - ZNS support (Aravind, Keith, Matias, Niklas)
      - Misc cleanups, optimizations, fixes (Baolin, Chaitanya, David,
        Dongli, Max, Sagi)

 - null_blk zone capacity support (Aravind)

 - MD:
      - raid5/6 fixes (ChangSyun)
      - Warning fixes (Damien)
      - raid5 stripe fixes (Guoqing, Song, Yufen)
      - sysfs deadlock fix (Junxiao)
      - raid10 deadlock fix (Vitaly)

 - struct_size conversions (Gustavo)

 - Set of bcache updates/fixes (Coly)

* tag 'for-5.9/drivers-20200803' of git://git.kernel.dk/linux-block: (117 commits)
  md/raid5: Allow degraded raid6 to do rmw
  md/raid5: Fix Force reconstruct-write io stuck in degraded raid5
  raid5: don't duplicate code for different paths in handle_stripe
  raid5-cache: hold spinlock instead of mutex in r5c_journal_mode_show
  md: print errno in super_written
  md/raid5: remove the redundant setting of STRIPE_HANDLE
  md: register new md sysfs file 'uuid' read-only
  md: fix max sectors calculation for super 1.0
  nvme-loop: remove extra variable in create ctrl
  nvme-loop: set ctrl state connecting after init
  nvme-multipath: do not fall back to __nvme_find_path() for non-optimized paths
  nvme-multipath: fix logic for non-optimized paths
  nvme-rdma: fix controller reset hang during traffic
  nvme-tcp: fix controller reset hang during traffic
  nvmet: introduce the passthru Kconfig option
  nvmet: introduce the passthru configfs interface
  nvmet: Add passthru enable/disable helpers
  nvmet: add passthru code to process commands
  nvme: export nvme_find_get_ns() and nvme_put_ns()
  nvme: introduce nvme_ctrl_get_by_path()
  ...
</content>
</entry>
<entry>
<title>bcache: use disk_{start,end}_io_acct() to count I/O for bcache device</title>
<updated>2020-07-28T15:14:52+00:00</updated>
<author>
<name>Coly Li</name>
<email>colyli@suse.de</email>
</author>
<published>2020-07-28T13:59:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=c5be1f2c5bab1538aa29cd42e226d6b80391e3ff'/>
<id>urn:sha1:c5be1f2c5bab1538aa29cd42e226d6b80391e3ff</id>
<content type='text'>
This patch is a fix to patch "bcache: fix bio_{start,end}_io_acct with
proper device". The previous patch uses a hack to temporarily set
bi_disk to bcache device, which is mistaken too.

As Christoph suggests, this patch uses disk_{start,end}_io_acct() to
count I/O for bcache device in the correct way.

Fixes: 85750aeb748f ("bcache: use bio_{start,end}_io_acct")
Signed-off-by: Coly Li &lt;colyli@suse.de&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: stable@vger.kernel.org
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>bcache: fix bio_{start,end}_io_acct with proper device</title>
<updated>2020-07-25T13:38:21+00:00</updated>
<author>
<name>Coly Li</name>
<email>colyli@suse.de</email>
</author>
<published>2020-07-25T12:00:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=a2f32ee8fd853cec8860f883d98afc3a339546de'/>
<id>urn:sha1:a2f32ee8fd853cec8860f883d98afc3a339546de</id>
<content type='text'>
Commit 85750aeb748f ("bcache: use bio_{start,end}_io_acct") moves the
io account code to the location after bio_set_dev(bio, dc-&gt;bdev) in
cached_dev_make_request(). Then the account is performed incorrectly on
backing device, indeed the I/O should be counted to bcache device like
/dev/bcache0.

With the mistaken I/O account, iostat does not display I/O counts for
bcache device and all the numbers go to backing device. In writeback
mode, the hard drive may have 340K+ IOPS which is impossible and wrong
for spinning disk.

This patch introduces bch_bio_start_io_acct() and bch_bio_end_io_acct(),
which switches bio-&gt;bi_disk to bcache device before calling
bio_start_io_acct() or bio_end_io_acct(). Now the I/Os are counted to
bcache device, and bcache device, cache device and backing device have
their correct I/O count information back.

Fixes: 85750aeb748f ("bcache: use bio_{start,end}_io_acct")
Signed-off-by: Coly Li &lt;colyli@suse.de&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: stable@vger.kernel.org
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>bcache: avoid extra memory consumption in struct bbio for large bucket size</title>
<updated>2020-07-25T13:38:21+00:00</updated>
<author>
<name>Coly Li</name>
<email>colyli@suse.de</email>
</author>
<published>2020-07-25T12:00:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=4e4d4e0962262c97ba4580716d11623384effdfd'/>
<id>urn:sha1:4e4d4e0962262c97ba4580716d11623384effdfd</id>
<content type='text'>
Bcache uses struct bbio to do I/Os for meta data pages like uuids,
disk_buckets, prio_buckets, and btree nodes.

Example writing a btree node onto cache device, the process is,
- Allocate a struct bbio from mempool c-&gt;bio_meta.
- Inside struct bbio embedded a struct bio, initialize bi_inline_vecs
  for this embedded bio.
- Call bch_bio_map() to map each meta data page to each bv from the
  inlined  bi_io_vec table.
- Call bch_submit_bbio() to submit the bio into underlying block layer.
- When the I/O completed, only release the struct bbio, don't touch the
  reference counter of the meta data pages.

The struct bbio is defined as,
738 struct bbio {
739     unsigned int            submit_time_us;
	[snipped]
748     struct bio              bio;
749 };

Because struct bio is embedded at the end of struct bbio, therefore the
actual size of struct bbio is sizeof(struct bio) + size of the embedded
bio-&gt;bi_inline_vecs.

Now all the meta data bucket size are limited to meta_bucket_pages(), if
the bucket size is large than meta_bucket_pages()*PAGE_SECTORS, rested
space in the bucket is unused. Therefore the most used space in meta
bucket is (1&lt;&lt;MAX_ORDER) pages, or (1&lt;&lt;CONFIG_FORCE_MAX_ZONEORDER) if it
is configured.

Therefore for large bucket size, it is unnecessary to calculate the
allocation size of mempool c-&gt;bio_meta as,
	mempool_init_kmalloc_pool(&amp;c-&gt;bio_meta, 2,
			sizeof(struct bbio) +
			sizeof(struct bio_vec) * bucket_pages(c))
It is too large, neither the Linux buddy allocator cannot allocate so
much continuous pages, nor the extra allocated pages are wasted.

This patch replace bucket_pages() to meta_bucket_pages() in two places,
- In bch_cache_set_alloc(), when initialize mempool c-&gt;bio_meta, uses
  sizeof(struct bbio) + sizeof(struct bio_vec) * bucket_pages(c) to set
  the allocating object size.
- In bch_bbio_alloc(), when calling bio_init() to set inline bvec talbe
  bi_inline_bvecs, uses meta_bucket_pages() to indicate number of the
  inline bio vencs number.

Now the maximum size of embedded bio inside struct bbio exactly matches
the limit of meta_bucket_pages(), no extra page wasted.

Signed-off-by: Coly Li &lt;colyli@suse.de&gt;
Reviewed-by: Hannes Reinecke &lt;hare@suse.de&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>bcache: avoid extra memory allocation from mempool c-&gt;fill_iter</title>
<updated>2020-07-25T13:38:21+00:00</updated>
<author>
<name>Coly Li</name>
<email>colyli@suse.de</email>
</author>
<published>2020-07-25T12:00:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=6907dc498f79543e5a32e632135e51b960331316'/>
<id>urn:sha1:6907dc498f79543e5a32e632135e51b960331316</id>
<content type='text'>
Mempool c-&gt;fill_iter is used to allocate memory for struct btree_iter in
bch_btree_node_read_done() to iterate all keys of a read-in btree node.

The allocation size is defined in bch_cache_set_alloc() by,
  mempool_init_kmalloc_pool(&amp;c-&gt;fill_iter, 1, iter_size))
where iter_size is defined by a calculation,
  (sb-&gt;bucket_size / sb-&gt;block_size + 1) * sizeof(struct btree_iter_set)

For 16bit width bucket_size the calculation is OK, but now the bucket
size is extended to 32bit, the bucket size can be 2GB. By the above
calculation, iter_size can be 2048 pages (order 11 is still accepted by
buddy allocator).

But the actual size holds the bkeys in meta data bucket is limited to
meta_bucket_pages() already, which is 16MB. By the above calculation,
if replace sb-&gt;bucket_size by meta_bucket_pages() * PAGE_SECTORS, the
result is 16 pages. This is the size large enough for the mempool
allocation to struct btree_iter.

Therefore in worst case every time mempool c-&gt;fill_iter allocates, at
most 4080 pages are wasted and won't be used. Therefore this patch uses
meta_bucket_pages() * PAGE_SECTORS to calculate the iter size in
bch_cache_set_alloc(), to avoid extra memory allocation from mempool
c-&gt;fill_iter.

Signed-off-by: Coly Li &lt;colyli@suse.de&gt;
Reviewed-by: Hannes Reinecke &lt;hare@suse.de&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>bcache: add sysfs file to display feature sets information of cache set</title>
<updated>2020-07-25T13:38:21+00:00</updated>
<author>
<name>Coly Li</name>
<email>colyli@suse.de</email>
</author>
<published>2020-07-25T12:00:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=092bd54d69230b4a4a7f640a4cdcdeb004d3546e'/>
<id>urn:sha1:092bd54d69230b4a4a7f640a4cdcdeb004d3546e</id>
<content type='text'>
The following three sysfs files are created to display according feature
set information of bcache:
	/sys/fs/bcache/&lt;cache set UUID&gt;/internal/feature_compat
	/sys/fs/bcache/&lt;cache set UUID&gt;/internal/feature_ro_compat
	/sys/fs/bcache/&lt;cache set UUID&gt;/internal/feature_incompat
is added by this patch, to display feature sets information of the cache
set.

Now only an incompat feature 'large_bucket' added in bcache, the sysfs
file content is:
        [large_bucket]
string large_bucket means the running bcache drive supports incompat
feature 'large_bucket', the wrapping [] means the 'large_bucket' feature
is currently enabled on this cache set.

This patch is ready to display compat and ro_compat features, in future
once bcache code implements such feature sets, the according feature
strings will be displayed in their sysfs files too.

Signed-off-by: Coly Li &lt;colyli@suse.de&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>bcache: add bucket_size_hi into struct cache_sb_disk for large bucket</title>
<updated>2020-07-25T13:38:21+00:00</updated>
<author>
<name>Coly Li</name>
<email>colyli@suse.de</email>
</author>
<published>2020-07-25T12:00:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=ffa470327572b8f85dceda48fd0676d9658cb8c5'/>
<id>urn:sha1:ffa470327572b8f85dceda48fd0676d9658cb8c5</id>
<content type='text'>
The large bucket feature is to extend bucket_size from 16bit to 32bit.

When create cache device on zoned device (e.g. zoned NVMe SSD), making
a single bucket cover one or more zones of the zoned device is the
simplest way to support zoned device as cache by bcache.

But current maximum bucket size is 16MB and a typical zone size of zoned
device is 256MB, this is the major motiviation to extend bucket size to
a larger bit width.

This patch is the basic and first change to support large bucket size,
the major changes it makes are,
- Add BCH_FEATURE_INCOMPAT_LARGE_BUCKET for the large bucket feature,
  INCOMPAT means it introduces incompatible on-disk format change.
- Add BCH_FEATURE_INCOMPAT_FUNCS(large_bucket, LARGE_BUCKET) routines.
- Adds __le16 bucket_size_hi into struct cache_sb_disk at offset 0x8d0
  for the on-disk super block format.
- For the in-memory super block struct cache_sb, member bucket_size is
  extended from __u16 to __32.
- Add get_bucket_size() to combine the bucket_size and bucket_size_hi
  from struct cache_sb_disk into an unsigned int value.

Since we already have large bucket size helpers meta_bucket_pages(),
meta_bucket_bytes() and alloc_meta_bucket_pages(), they make sure when
bucket size &gt; 8MB, the memory allocation for bcache meta data bucket
won't fail no matter how large the bucket size extended. So these meta
data buckets are handled properly when the bucket size width increase
from 16bit to 32bit, we don't need to worry about them.

Signed-off-by: Coly Li &lt;colyli@suse.de&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>bcache: handle btree node memory allocation properly for bucket size &gt; 8MB</title>
<updated>2020-07-25T13:38:20+00:00</updated>
<author>
<name>Coly Li</name>
<email>colyli@suse.de</email>
</author>
<published>2020-07-25T12:00:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=f9c32a5a900c6a9032b064aae9a5784af9d24453'/>
<id>urn:sha1:f9c32a5a900c6a9032b064aae9a5784af9d24453</id>
<content type='text'>
Currently the bcache internal btree node occupies a whole bucket. When
loading the btree node from cache device into memory, mca_data_alloc()
will call bch_btree_keys_alloc() to allocate memory for the whole bucket
size, ilog2(b-&gt;c-&gt;btree_pages) is send to bch_btree_keys_alloc() as the
parameter 'page_order'.

c-&gt;btree_pages is set as bucket_pages() in bch_cache_set_alloc(), for
bucket size &gt; 8MB, ilog2(b-&gt;c-&gt;btree_pages) is 12 for 4KB page size. By
default the maximum page order __get_free_pages() accepts is MAX_ORDER
(11), in this condition bch_btree_keys_alloc() will always fail.

Because of other over-page-order allocation failure fails the cache
device registration, such btree node allocation failure wasn't observed
during runtime. After other blocking page allocation failures for bucket
size &gt; 8MB, this btree node allocation issue may trigger potentical risk
e.g. infinite dead-loop to retry btree node allocation after failure.

This patch fixes the potential problem by setting c-&gt;btree_pages to
meta_bucket_pages() in bch_cache_set_alloc(). In the condition that
bucket size &gt; 8MB, meta_bucket_pages() will always return a number which
won't exceed the maximum page order of the buddy allocator.

Signed-off-by: Coly Li &lt;colyli@suse.de&gt;
Reviewed-by: Hannes Reinecke &lt;hare@suse.de&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>bcache: handle cache set verify_ondisk properly for bucket size &gt; 8MB</title>
<updated>2020-07-25T13:38:20+00:00</updated>
<author>
<name>Coly Li</name>
<email>colyli@suse.de</email>
</author>
<published>2020-07-25T12:00:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=bf6af17065079d29b9bd4e59de27cc2965e6fabf'/>
<id>urn:sha1:bf6af17065079d29b9bd4e59de27cc2965e6fabf</id>
<content type='text'>
In bch_btree_cache_alloc() when CONFIG_BCACHE_DEBUG is configured,
allocate memory for c-&gt;verify_ondisk may fail if the bucket size &gt; 8MB,
which will require __get_free_pages() to allocate continuous pages
with order &gt; 11 (the default MAX_ORDER of Linux buddy allocator). Such
over size allocation will fail, and cause 2 problems,
- When CONFIG_BCACHE_DEBUG is configured,  bch_btree_verify() does not
  work, because c-&gt;verify_ondisk is NULL and bch_btree_verify() returns
  immediately.
- bch_btree_cache_alloc() will fail due to c-&gt;verify_ondisk allocation
  failed, then the whole cache device registration fails. And because of
  this failure, the first problem of bch_btree_verify() has no chance to
  be triggered.

This patch fixes the above problem by two means,
1) If pages allocation of c-&gt;verify_ondisk fails, set it to NULL and
   returns bch_btree_cache_alloc() with -ENOMEM.
2) When calling __get_free_pages() to allocate c-&gt;verify_ondisk pages,
   use ilog2(meta_bucket_pages(&amp;c-&gt;sb)) to make sure ilog2() will always
   generate a pages order &lt;= MAX_ORDER (or CONFIG_FORCE_MAX_ZONEORDER).
   Then the buddy system won't directly reject the allocation request.

Signed-off-by: Coly Li &lt;colyli@suse.de&gt;
Reviewed-by: Hannes Reinecke &lt;hare@suse.de&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
</feed>
