kernel/linux.git/block/blk-sysfs.c, branch v5.10.257

block: introduce zone_write_granularity limit

2024-04-13T10:58:07+00:00

[ Upstream commit a805a4fa4fa376bbc145762bb8b09caa2fa8af48 ] Per ZBC and ZAC specifications, host-managed SMR hard-disks mandate that all writes into sequential write required zones be aligned to the device physical block size. However, NVMe ZNS does not have this constraint and allows write operations into sequential zones to be aligned to the device logical block size. This inconsistency does not help with software portability across device types. To solve this, introduce the zone_write_granularity queue limit to indicate the alignment constraint, in bytes, of write operations into zones of a zoned block device. This new limit is exported as a read-only sysfs queue attribute and the helper blk_queue_zone_write_granularity() introduced for drivers to set this limit. The function blk_queue_set_zoned() is modified to set this new limit to the device logical block size by default. NVMe ZNS devices as well as zoned nullb devices use this default value as is. The scsi disk driver is modified to execute the blk_queue_zone_write_granularity() helper to set the zone write granularity of host-managed SMR disks to the disk physical block size. The accessor functions queue_zone_write_granularity() and bdev_zone_write_granularity() are also introduced. Signed-off-by: Damien Le Moal Reviewed-by: Christoph Hellwig Reviewed-by: Martin K. Petersen Reviewed-by: Chaitanya Kulkarni Reviewed-by: Johannes Thumshirn Signed-off-by: Jens Axboe Stable-dep-of: c8f6f88d2592 ("block: Clear zone limits for a non-zoned stacked queue") Signed-off-by: Sasha Levin

block: fix use-after-free of q->q_usage_counter

2023-10-10T19:53:36+00:00

commit d36a9ea5e7766961e753ee38d4c331bbe6ef659b upstream. For blk-mq, queue release handler is usually called after blk_mq_freeze_queue_wait() returns. However, the q_usage_counter->release() handler may not be run yet at that time, so this can cause a use-after-free. Fix the issue by moving percpu_ref_exit() into blk_free_queue_rcu(). Since ->release() is called with rcu read lock held, it is agreed that the race should be covered in caller per discussion from the two links. Reported-by: Zhang Wensheng Reported-by: Zhong Jinghua Link: https://lore.kernel.org/linux-block/Y5prfOjyyjQKUrtH@T590/T/#u Link: https://lore.kernel.org/lkml/Y4%2FmzMd4evRg9yDi@fedora/ Cc: Hillf Danton Cc: Yu Kuai Cc: Dennis Zhou Fixes: 2b0d3d3e4fcf ("percpu_ref: reduce memory footprint of percpu_ref in fast path") Signed-off-by: Ming Lei Link: https://lore.kernel.org/r/20221215021629.74870-1-ming.lei@redhat.com Signed-off-by: Jens Axboe Signed-off-by: Saranya Muruganandam Signed-off-by: Greg Kroah-Hartman

block: don't delete queue kobject before its children

2022-04-08T12:40:00+00:00

[ Upstream commit 0f69288253e9fc7c495047720e523b9f1aba5712 ] kobjects aren't supposed to be deleted before their child kobjects are deleted. Apparently this is usually benign; however, a WARN will be triggered if one of the child kobjects has a named attribute group: sysfs group 'modes' not found for kobject 'crypto' WARNING: CPU: 0 PID: 1 at fs/sysfs/group.c:278 sysfs_remove_group+0x72/0x80 ... Call Trace: sysfs_remove_groups+0x29/0x40 fs/sysfs/group.c:312 __kobject_del+0x20/0x80 lib/kobject.c:611 kobject_cleanup+0xa4/0x140 lib/kobject.c:696 kobject_release lib/kobject.c:736 [inline] kref_put include/linux/kref.h:65 [inline] kobject_put+0x53/0x70 lib/kobject.c:753 blk_crypto_sysfs_unregister+0x10/0x20 block/blk-crypto-sysfs.c:159 blk_unregister_queue+0xb0/0x110 block/blk-sysfs.c:962 del_gendisk+0x117/0x250 block/genhd.c:610 Fix this by moving the kobject_del() and the corresponding kobject_uevent() to the correct place. Fixes: 2c2086afc2b8 ("block: Protect less code with sysfs_lock in blk_{un,}register_queue()") Reviewed-by: Hannes Reinecke Reviewed-by: Greg Kroah-Hartman Reviewed-by: Bart Van Assche Signed-off-by: Eric Biggers Reviewed-by: Christoph Hellwig Link: https://lore.kernel.org/r/20220124215938.2769-3-ebiggers@kernel.org Signed-off-by: Jens Axboe Signed-off-by: Sasha Levin

blk-mq: move cancel of hctx->run_work to the front of blk_exit_queue

2020-10-09T18:46:28+00:00

blk_exit_queue will free elevator_data, while blk_mq_run_work_fn will access it. Move cancel of hctx->run_work to the front of blk_exit_queue to avoid use-after-free. Fixes: 1b97871b501f ("blk-mq: move cancel of hctx->run_work into blk_mq_hw_sysfs_release") Signed-off-by: Yang Yang Reviewed-by: Ming Lei Signed-off-by: Jens Axboe

block: get rid of unnecessary local variable

2020-10-09T18:34:06+00:00

Since whole elevator register is protectd by sysfs_lock, we don't need extras 'has_elevator'. Just use q->elevator directly. Signed-off-by: Yufen Yu Signed-off-by: Jens Axboe

block: invoke blk_mq_exit_sched no matter whether have .exit_sched

2020-10-09T18:34:06+00:00

We will register debugfs for scheduler no matter whether it have defined callback funciton .exit_sched. So, blk_mq_exit_sched() is always needed to unregister debugfs. Also, q->elevator should be set as NULL after exiting scheduler. For now, since all register scheduler have defined .exit_sched, it will not cause any actual problem. But It will be more reasonable to do this change. Signed-off-by: Yufen Yu Signed-off-by: Jens Axboe

bdi: replace BDI_CAP_STABLE_WRITES with a queue and a sb flag

2020-09-24T19:43:39+00:00

The BDI_CAP_STABLE_WRITES is one of the few bits of information in the backing_dev_info shared between the block drivers and the writeback code. To help untangling the dependency replace it with a queue flag and a superblock flag derived from it. This also helps with the case of e.g. a file system requiring stable writes due to its own checksumming, but not forcing it on other users of the block device like the swap code. One downside is that we an't support the stable_pages_required bdi attribute in sysfs anymore. It is replaced with a queue attribute which also is writable for easier testing. Signed-off-by: Christoph Hellwig Reviewed-by: Jan Kara Reviewed-by: Johannes Thumshirn Signed-off-by: Jens Axboe

block: lift setting the readahead size into the block layer

2020-09-24T19:43:39+00:00

Drivers shouldn't really mess with the readahead size, as that is a VM concept. Instead set it based on the optimal I/O size by lifting the algorithm from the md driver when registering the disk. Also set bdi->io_pages there as well by applying the same scheme based on max_sectors. To ensure the limits work well for stacking drivers a new helper is added to update the readahead limits from the block limits, which is also called from disk_stack_limits. Signed-off-by: Christoph Hellwig Reviewed-by: Johannes Thumshirn Reviewed-by: Jan Kara Reviewed-by: Mike Snitzer Reviewed-by: Martin K. Petersen Acked-by: Coly Li Signed-off-by: Jens Axboe

block: make QUEUE_SYSFS_BIT_FNS more useful

2020-09-08T15:01:10+00:00

Switch to the naming used by the other entries so that we can use the QUEUE_RW_ENTRY helper. Signed-off-by: Christoph Hellwig Signed-off-by: Jens Axboe

block: add helper macros for queue sysfs entries

2020-09-08T15:01:10+00:00

Add two helpers macros to avoid boilerplate code for the queue sysfs entries. Signed-off-by: Christoph Hellwig Reviewed-by: Johannes Thumshirn Signed-off-by: Jens Axboe