starfive-tech/linux.git - StarFive Tech Linux Kernel for VisionFive (JH7110) boards (mirror)

Age	Commit message (Collapse)	Author	Files	Lines
2022-12-06	nvme-pci: remove nvme_disable_admin_queue	Christoph Hellwig	1	-9/+2
	nvme_disable_admin_queue has only a single caller, and just calls two other funtions, so remove it to clean up the remove path a little more. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Eric Curtin <ecurtin@redhat.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
2022-12-06	nvme: merge nvme_shutdown_ctrl into nvme_disable_ctrl	Christoph Hellwig	8	-45/+25
	Many of the callers decide which one to use based on a bool argument and there is at least some code to be shared, so merge these two. Also move a comment specific to a single callsite to that callsite. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Hector Martin <marcan@marcan.st>
2022-12-06	nvme: use nvme_wait_ready in nvme_shutdown_ctrl	Christoph Hellwig	1	-26/+12
	Refactor the code to wait for CSTS state changes so that it can be reused by nvme_shutdown_ctrl. This reduces the delay between each iteration that checks CSTS from 100ms in the shutdown code to the 1 to 2ms range done during enable, matching the changes from commit 3e98c2443f5c that were only applied to the enable/disable path. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Pankaj Raghav <p.raghav@samsung.com>
2022-12-06	nvme-apple: fix controller shutdown in apple_nvme_disable	Christoph Hellwig	1	-1/+2
	nvme_shutdown_ctrl already shuts the controller down, there is no need to also call nvme_disable_ctrl for the shutdown case. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Eric Curtin <ecurtin@redhat.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Hector Martin <marcan@marcan.st>
2022-12-06	nvme-fc: move common code into helper	Chaitanya Kulkarni	1	-8/+11
	Add a helper to move the duplicate code for error message from nvme_fc_rcv_ls_req() to nvme_fc_rcv_ls_req_err_msg(). Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-12-06	nvme-fc: avoid null pointer dereference	Chaitanya Kulkarni	1	-1/+10
	Before using dynamically allcoated variable lsop in the nvme_fc_rcv_ls_req(), add a check for NULL and error out early. Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-12-06	nvme: allow unprivileged passthrough of Identify Controller	Joel Granados	1	-0/+2
	Add unprivileged passthrough of the I/O Command Set Independent and I/O Command Set Specific Identify Controller sub-command. This will allow access to attributes (e.g. MDTS and WZSL) that are needed to effectively form passthrough I/O to the /dev/ng* character devices. Signed-off-by: Joel Granados <j.granados@samsung.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-12-06	nvme-multipath: support io stats on the mpath device	Sagi Grimberg	3	-0/+42
	Our mpath stack device is just a shim that selects a bottom namespace and submits the bio to it without any fancy splitting. This also means that we don't clone the bio or have any context to the bio beyond submission. However it really sucks that we don't see the mpath device io stats. Given that the mpath device can't do that without adding some context to it, we let the bottom device do it on its behalf (somewhat similar to the approach taken in nvme_trace_bio_complete). When the IO starts, we account the request for multipath IO stats using REQ_NVME_MPATH_IO_STATS nvme_request flag to avoid queue io stats disable in the middle of the request. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org>
2022-12-06	nvme: introduce nvme_start_request	Sagi Grimberg	7	-6/+11
	In preparation for nvme-multipath IO stats accounting, we want the accounting to happen in a centralized place. The request completion is already centralized, but we need a common helper to request I/O start. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de>
2022-12-06	nvme: use kstrtobool() instead of strtobool()	Christophe JAILLET	2	-9/+11
	strtobool() is the same as kstrtobool(). However, the latter is more used within the kernel. In order to remove strtobool() and slightly simplify kstrtox.h, switch to the other function name. While at it, include the corresponding header file (<linux/kstrtox.h>) Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-12-06	nvme: don't call blk_mq_{,un}quiesce_tagset when ctrl->tagset is NULL	Christoph Hellwig	1	-0/+4
	The NVMe drivers support a mode where no tagset is allocated for the I/O queues and only the admin queue is usable. In that case ctrl->tagset is NULL and we must not call the block per-tagset quiesce helpers that dereference it. Fixes: 98d81f0df70c ("nvme: use blk_mq_[un]quiesce_tagset") Reported-by: Gerd Bayer <gbayer@linux.ibm.com> Reported-by: Chao Leng <lengchao@huawei.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chao Leng <lengchao@huawei.com>
2022-12-05	blk-throttle: Use more suitable time_after check for update of slice_start	Kemeng Shi	1	-1/+1
	There is no need to update tg->slice_start[rw] to start when they are equal already. So remove "eq" part of check before update slice_start. Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Kemeng Shi <shikemeng@huawei.com> Link: https://lore.kernel.org/r/20221205115709.251489-10-shikemeng@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-12-05	blk-throttle: remove repeat check of elapsed time	Kemeng Shi	1	-2/+6
	There is no need to check elapsed time from last upgrade for each node in hierarchy. Move this check before traversing as throtl_can_upgrade do to remove repeat check. Reported-by: kernel test robot <lkp@intel.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Kemeng Shi <shikemeng@huawei.com> Link: https://lore.kernel.org/r/20221205115709.251489-9-shikemeng@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-12-05	blk-throttle: remove incorrect comment for tg_last_low_overflow_time	Kemeng Shi	1	-1/+0
	Function tg_last_low_overflow_time is called with intermediate node as following: throtl_hierarchy_can_downgrade throtl_tg_can_downgrade tg_last_low_overflow_time throtl_hierarchy_can_upgrade throtl_tg_can_upgrade tg_last_low_overflow_time throtl_hierarchy_can_downgrade/throtl_hierarchy_can_upgrade will traverse from leaf node to sub-root node and pass traversed intermediate node to tg_last_low_overflow_time. No such limit could be found from context and implementation of tg_last_low_overflow_time, so remove this limit in comment. Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Kemeng Shi <shikemeng@huawei.com> Link: https://lore.kernel.org/r/20221205115709.251489-8-shikemeng@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-12-05	blk-throttle: fix typo in comment of throtl_adjusted_limit	Kemeng Shi	1	-1/+1
	lapsed time -> elapsed time Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Kemeng Shi <shikemeng@huawei.com> Link: https://lore.kernel.org/r/20221205115709.251489-7-shikemeng@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-12-05	blk-throttle: simpfy low limit reached check in throtl_tg_can_upgrade	Kemeng Shi	1	-13/+18
	Commit c79892c557616 ("blk-throttle: add upgrade logic for LIMIT_LOW state") added upgrade logic for low limit and methioned that 1. "To determine if a cgroup exceeds its limitation, we check if the cgroup has pending request. Since cgroup is throttled according to the limit, pending request means the cgroup reaches the limit." 2. "If a cgroup has limit set for both read and write, we consider the combination of them for upgrade. The reason is read IO and write IO can interfere with each other. If we do the upgrade based in one direction IO, the other direction IO could be severly harmed." Besides, we also determine that cgroup reaches low limit if low limit is 0, see comment in throtl_tg_can_upgrade. Collect the information above, the desgin of upgrade check is as following: 1.The low limit is reached if limit is zero or io is already queued. 2.Cgroup will pass upgrade check if low limits of READ and WRITE are both reached. Simpfy the check code described above to removce repeat check and improve readability. There is no functional change. Detail equivalence proof is as following: All replaced conditions to return true are as following: condition 1 (!read_limit && !write_limit) condition 2 read_limit && sq->nr_queued[READ] && (!write_limit \|\| sq->nr_queued[WRITE]) condition 3 write_limit && sq->nr_queued[WRITE] && (!read_limit \|\| sq->nr_queued[READ]) Transferring condition 2 as following: (read_limit && sq->nr_queued[READ]) && (!write_limit \|\| sq->nr_queued[WRITE]) is equivalent to (read_limit && sq->nr_queued[READ]) && (!write_limit \|\| (write_limit && sq->nr_queued[WRITE])) is equivalent to condition 2.1 (read_limit && sq->nr_queued[READ] && !write_limit) \|\| condition 2.2 (read_limit && sq->nr_queued[READ] && (write_limit && sq->nr_queued[WRITE])) Transferring condition 3 as following: write_limit && sq->nr_queued[WRITE] && (!read_limit \|\| sq->nr_queued[READ]) is equivalent to (write_limit && sq->nr_queued[WRITE]) && (!read_limit \|\| (read_limit && sq->nr_queued[READ])) is equivalent to condition 3.1 ((write_limit && sq->nr_queued[WRITE]) && !read_limit) \|\| condition 3.2 ((write_limit && sq->nr_queued[WRITE]) && (read_limit && sq->nr_queued[READ])) Condition 3.2 is the same as condition 2.2, so all conditions we get to return are as following: (!read_limit && !write_limit) (1) (!read_limit && (write_limit && sq->nr_queued[WRITE])) (3.1) ((read_limit && sq->nr_queued[READ]) && !write_limit) (2.1) ((write_limit && sq->nr_queued[WRITE]) && (read_limit && sq->nr_queued[READ])) (2.2) As we can extract conditions "(a1 \|\| a2) && (b1 \|\| b2)" to: a1 && b1 a1 && b2 a2 && b1 ab && b2 Considering that: a1 = !read_limit a2 = read_limit && sq->nr_queued[READ] b1 = !write_limit b2 = write_limit && sq->nr_queued[WRITE] We can pack replaced conditions to (!read_limit \|\| (read_limit && sq->nr_queued[READ])) && (!write_limit \|\| (write_limit && sq->nr_queued[WRITE])) which is equivalent to (!read_limit \|\| sq->nr_queued[READ]) && (!write_limit \|\| sq->nr_queued[WRITE]) Reported-by: kernel test robot <lkp@intel.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Kemeng Shi <shikemeng@huawei.com> Link: https://lore.kernel.org/r/20221205115709.251489-6-shikemeng@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-12-05	blk-throttle: correct calculation of wait time in tg_may_dispatch	Kemeng Shi	1	-25/+13
	In C language, When executing "if (expression1 && expression2)" and expression1 return false, the expression2 may not be executed. For "tg_within_bps_limit(tg, bio, bps_limit, &bps_wait) && tg_within_iops_limit(tg, bio, iops_limit, &iops_wait))", if bps is limited, tg_within_bps_limit will return false and tg_within_iops_limit will not be called. So even bps and iops are both limited, iops_wait will not be calculated and is always zero. So wait time of iops is always ignored. Fix this by always calling tg_within_bps_limit and tg_within_iops_limit to get wait time for both bps and iops. Observed that: 1. Wait time in tg_within_iops_limit/tg_within_bps_limit need always be stored as wait argument is always passed. 2. wait time is stored to zero if iops/bps is limited otherwise non-zero is stored. Simpfy tg_within_iops_limit/tg_within_bps_limit by removing wait argument and return wait time directly. Caller tg_may_dispatch checks if wait time is zero to find if iops/bps is limited. Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Kemeng Shi <shikemeng@huawei.com> Link: https://lore.kernel.org/r/20221205115709.251489-5-shikemeng@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-12-05	blk-throttle: ignore cgroup without io queued in blk_throtl_cancel_bios	Kemeng Shi	1	-1/+12
	Ignore cgroup without io queued in blk_throtl_cancel_bios for two reasons: 1. Save cpu cycle for trying to dispatch cgroup which is no io queued. 2. Avoid non-consistent state that cgroup is inserted to service queue without THROTL_TG_PENDING set as tg_update_disptime will unconditional re-insert cgroup to service queue. If we are on the default hierarchy, IO dispatched from child in tg_dispatch_one_bio will trigger inserting cgroup to service queue without erase first and ruin the tree. Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Kemeng Shi <shikemeng@huawei.com> Link: https://lore.kernel.org/r/20221205115709.251489-4-shikemeng@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-12-05	blk-throttle: Fix that bps of child could exceed bps limited in parent	Kemeng Shi	1	-1/+1
	Consider situation as following (on the default hierarchy): HDD \| root (bps limit: 4k) \| child (bps limit :8k) \| fio bs=8k Rate of fio is supposed to be 4k, but result is 8k. Reason is as following: Size of single IO from fio is larger than bytes allowed in one throtl_slice in child, so IOs are always queued in child group first. When queued IOs in child are dispatched to parent group, BIO_BPS_THROTTLED is set and these IOs will not be limited by tg_within_bps_limit anymore. Fix this by only set BIO_BPS_THROTTLED when the bio traversed the entire tree. There patch has no influence on situation which is not on the default hierarchy as each group is a single root group without parent. Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Kemeng Shi <shikemeng@huawei.com> Link: https://lore.kernel.org/r/20221205115709.251489-3-shikemeng@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-12-05	blk-throttle: correct stale comment in throtl_pd_init	Kemeng Shi	1	-2/+3
	On the default hierarchy (cgroup2), the throttle interface files don't exist in the root cgroup, so the ablity to limit the whole system by configuring root group is not existing anymore. In general, cgroup doesn't wanna be in the business of restricting resources at the system level, so correct the stale comment that we can limit whole system to we can only limit subtree. Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Kemeng Shi <shikemeng@huawei.com> Link: https://lore.kernel.org/r/20221205115709.251489-2-shikemeng@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-12-04	Merge tag 'floppy-for-6.2' of https://github.com/evdenis/linux-floppy into ↵	Jens Axboe	1	-1/+3
	for-6.2/block Pull floppy fix from Denis: "Floppy patch for 6.2 The patch from Yuan Can fixes a memory leak in floppy init code. Signed-off-by: Denis Efremov <efremov@linux.com>" * tag 'floppy-for-6.2' of https://github.com/evdenis/linux-floppy: floppy: Fix memory leak in do_floppy_init()
2022-12-04	floppy: Fix memory leak in do_floppy_init()	Yuan Can	1	-1/+3
	A memory leak was reported when floppy_alloc_disk() failed in do_floppy_init(). unreferenced object 0xffff888115ed25a0 (size 8): comm "modprobe", pid 727, jiffies 4295051278 (age 25.529s) hex dump (first 8 bytes): 00 ac 67 5b 81 88 ff ff ..g[.... backtrace: [<000000007f457abb>] __kmalloc_node+0x4c/0xc0 [<00000000a87bfa9e>] blk_mq_realloc_tag_set_tags.part.0+0x6f/0x180 [<000000006f02e8b1>] blk_mq_alloc_tag_set+0x573/0x1130 [<0000000066007fd7>] 0xffffffffc06b8b08 [<0000000081f5ac40>] do_one_initcall+0xd0/0x4f0 [<00000000e26d04ee>] do_init_module+0x1a4/0x680 [<000000001bb22407>] load_module+0x6249/0x7110 [<00000000ad31ac4d>] __do_sys_finit_module+0x140/0x200 [<000000007bddca46>] do_syscall_64+0x35/0x80 [<00000000b5afec39>] entry_SYSCALL_64_after_hwframe+0x46/0xb0 unreferenced object 0xffff88810fc30540 (size 32): comm "modprobe", pid 727, jiffies 4295051278 (age 25.529s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<000000007f457abb>] __kmalloc_node+0x4c/0xc0 [<000000006b91eab4>] blk_mq_alloc_tag_set+0x393/0x1130 [<0000000066007fd7>] 0xffffffffc06b8b08 [<0000000081f5ac40>] do_one_initcall+0xd0/0x4f0 [<00000000e26d04ee>] do_init_module+0x1a4/0x680 [<000000001bb22407>] load_module+0x6249/0x7110 [<00000000ad31ac4d>] __do_sys_finit_module+0x140/0x200 [<000000007bddca46>] do_syscall_64+0x35/0x80 [<00000000b5afec39>] entry_SYSCALL_64_after_hwframe+0x46/0xb0 If the floppy_alloc_disk() failed, disks of current drive will not be set, thus the lastest allocated set->tag cannot be freed in the error handling path. A simple call graph shown as below: floppy_module_init() floppy_init() do_floppy_init() for (drive = 0; drive < N_DRIVE; drive++) blk_mq_alloc_tag_set() blk_mq_alloc_tag_set_tags() blk_mq_realloc_tag_set_tags() # set->tag allocated floppy_alloc_disk() blk_mq_alloc_disk() # error occurred, disks failed to allocated ->out_put_disk: for (drive = 0; drive < N_DRIVE; drive++) if (!disks[drive][0]) # the last disks is not set and loop break break; blk_mq_free_tag_set() # the latest allocated set->tag leaked Fix this problem by free the set->tag of current drive before jump to error handling path. Cc: stable@vger.kernel.org Fixes: 302cfee15029 ("floppy: use a separate gendisk for each media format") Signed-off-by: Yuan Can <yuancan@huawei.com> [efremov: added stable list, changed title] Signed-off-by: Denis Efremov <efremov@linux.com>
2022-12-03	block: remove devnode callback from struct block_device_operations	Greg Kroah-Hartman	2	-12/+0
	With the removal of the pktcdvd driver, there are no in-kernel users of the devnode callback in struct block_device_operations, so it can be safely removed. If it is needed for new block drivers in the future, it can be brought back. Cc: Jens Axboe <axboe@kernel.dk> Cc: linux-block@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/r/20221203140747.1942969-1-gregkh@linuxfoundation.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-12-02	Merge branch 'md-next' of ↵	Jens Axboe	2	-62/+36
	https://git.kernel.org/pub/scm/linux/kernel/git/song/md into for-6.2/block Pull MD fixes from Song: "This contains code cleanup by Christoph." * 'md-next' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md: md: fold unbind_rdev_from_array into md_kick_rdev_from_array md: mark md_kick_rdev_from_array static md: remove lock_bdev / unlock_bdev
2022-12-02	pktcdvd: remove driver.	Greg Kroah-Hartman	8	-3419/+0
	Way back in 2016 in commit 5a8b187c61e9 ("pktcdvd: mark as unmaintained and deprecated") this driver was marked as "will be removed soon". 5 years seems long enough to have it stick around after that, so finally remove the thing now. Reported-by: Christoph Hellwig <hch@infradead.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Thomas Maier <balagi@justmail.de> Cc: Peter Osterlund <petero2@telia.com> Cc: linux-block@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/r/20221202182758.1339039-1-gregkh@linuxfoundation.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-12-02	md: fold unbind_rdev_from_array into md_kick_rdev_from_array	Christoph Hellwig	1	-21/+16
	unbind_rdev_from_array is only called from md_kick_rdev_from_array, so merge it into its only caller. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Song Liu <song@kernel.org>
2022-12-02	md: mark md_kick_rdev_from_array static	Christoph Hellwig	2	-3/+1
	md_kick_rdev_from_array is only used in md.c, so unexport it and mark the symbol static. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Song Liu <song@kernel.org>
2022-12-02	md: remove lock_bdev / unlock_bdev	Christoph Hellwig	1	-41/+22
	These wrappers for blkdev_get / blkdev_put just horribly confuse the code with their odd naming. Remove them and improve the error unwinding in md_import_device with the now folded code. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Song Liu <song@kernel.org>
2022-12-02	blk-cgroup: Fix some kernel-doc comments	Yang Li	1	-1/+1
	Make the description of @gendisk to @disk in blkcg_schedule_throttle() to clear the below warnings: block/blk-cgroup.c:1850: warning: Function parameter or member 'disk' not described in 'blkcg_schedule_throttle' block/blk-cgroup.c:1850: warning: Excess function parameter 'gendisk' description in 'blkcg_schedule_throttle' Fixes: de185b56e8a6 ("blk-cgroup: pass a gendisk to blkcg_schedule_throttle") Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=3338 Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Link: https://lore.kernel.org/r/20221202011713.14834-1-yang.lee@linux.alibaba.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-12-02	null_blk: support read-only and offline zone conditions	Shin'ichiro Kawasaki	3	-4/+121
	In zoned mode, zones with write pointers can have conditions "read-only" or "offline". In read-only condition, zones can not be written. In offline condition, the zones can be neither written nor read. These conditions are intended for zones with media failures, then it is difficult to set those conditions to zones on real devices. To test handling of zones in the conditions, add a feature to null_blk to set up zones in read-only or offline condition. Add new configuration attributes "zone_readonly" and "zone_offline". Write a sector to the attribute files to specify the target zone to set the zone conditions. For example, following command lines do it: echo 0 > nullb1/zone_readonly echo 524288 > nullb1/zone_offline When the specified zones are already in read-only or offline condition, normal empty condition is restored to the zones. These condition changes can be done only after the null_blk device get powered, since status area of each zone is not yet allocated before power-on. Also improve zone condition checks to inhibit all commands for zones in offline conditions. In same manner, inhibit write and zone management commands for zones in read-only condition. Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Link: https://lore.kernel.org/r/20221201061036.2342206-1-shinichiro.kawasaki@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-12-01	drbd: add context parameter to expect() macro	Christoph Böhmwalder	6	-42/+42
	Originally-from: Andreas Gruenbacher <agruen@linbit.com> Signed-off-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> Link: https://lore.kernel.org/r/20221201110349.1282687-6-christoph.boehmwalder@linbit.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-12-01	drbd: introduce drbd_ratelimit()	Christoph Böhmwalder	8	-18/+26
	Use call site specific ratelimit instead of one single static global. Also ratelimit ASSERTION messages generated by expect(). Originally-from: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> Link: https://lore.kernel.org/r/20221201110349.1282687-5-christoph.boehmwalder@linbit.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-12-01	drbd: introduce dynamic debug	Christoph Böhmwalder	1	-36/+97
	Incorporate as many out-of-tree changes as possible without changing the genl API. Over the years, we restructured this several times, and also changed the log format. One breaking change is that DRBD 9 gained "implicit options", like a connection name. This cannot be replayed here without changing the API, so save it for later. Originally-from: Andreas Gruenbacher <agruen@linbit.com> Originally-from: Philipp Reisner <philipp.reisner@linbit.com> Originally-from: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> Link: https://lore.kernel.org/r/20221201110349.1282687-4-christoph.boehmwalder@linbit.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-12-01	drbd: split polymorph printk to its own file	Christoph Böhmwalder	2	-67/+73
	Signed-off-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> Link: https://lore.kernel.org/r/20221201110349.1282687-3-christoph.boehmwalder@linbit.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-12-01	drbd: unify how failed assertions are logged	Christoph Böhmwalder	1	-3/+5
	Unify how failed assertions from D_ASSERT() and expect() are logged. Originally-from: Andreas Gruenbacher <agruen@linbit.com> Signed-off-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> Link: https://lore.kernel.org/r/20221201110349.1282687-2-christoph.boehmwalder@linbit.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-12-01	block: bdev & blktrace: use consistent function doc. notation	Randy Dunlap	2	-4/+4
	Use only one hyphen in kernel-doc notation between the function name and its short description. The is the documented kerenl-doc format. It also fixes the HTML presentation to be consistent with other functions. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: linux-block@vger.kernel.org Link: https://lore.kernel.org/r/20221201070331.25685-1-rdunlap@infradead.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-12-01	blk-iocost: Correct comment in blk_iocost_init	Kemeng Shi	1	-1/+1
	There is no iocg_pd_init function. The pd_alloc_fn function pointer of iocost policy is set with ioc_pd_init. Just correct it. Signed-off-by: Kemeng Shi <shikemeng@huawei.com> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20221018121932.10792-6-shikemeng@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-12-01	blk-iocost: Remove vrate member in struct ioc_now	Kemeng Shi	1	-3/+3
	If we trace vtime_base_rate instead of vtime_rate, there is nowhere which accesses now->vrate except function ioc_now using now->vrate locally. Just remove it. Signed-off-by: Kemeng Shi <shikemeng@huawei.com> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20221018121932.10792-5-shikemeng@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-12-01	blk-iocost: Trace vtime_base_rate instead of vtime_rate	Kemeng Shi	2	-3/+3
	Since commit ac33e91e2daca ("blk-iocost: implement vtime loss compensation") rename original vtime_rate to vtime_base_rate and current vtime_rate is original vtime_rate with compensation. The current rate showed in tracepoint is mixed with vtime_rate and vtime_base_rate: 1) In function ioc_adjust_base_vrate, the first trace_iocost_ioc_vrate_adj shows vtime_rate, the second trace_iocost_ioc_vrate_adj shows vtime_base_rate. 2) In function iocg_activate shows vtime_rate by calling TRACE_IOCG_PATH(iocg_activate... 3) In function ioc_check_iocgs shows vtime_rate by calling TRACE_IOCG_PATH(iocg_idle... Trace vtime_base_rate instead of vtime_rate as: 1) Before commit ac33e91e2daca ("blk-iocost: implement vtime loss compensation"), the traced rate is without compensation, so still show rate without compensation. 2) The vtime_base_rate is more stable while vtime_rate heavily depends on excess budeget on current period which may change abruptly in next period. Signed-off-by: Kemeng Shi <shikemeng@huawei.com> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20221018121932.10792-4-shikemeng@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-12-01	blk-iocost: Reset vtime_base_rate in ioc_refresh_params	Kemeng Shi	1	-1/+3
	Since commit ac33e91e2daca("blk-iocost: implement vtime loss compensation") split vtime_rate into vtime_rate and vtime_base_rate, we need reset both vtime_base_rate and vtime_rate when device parameters are refreshed. If vtime_base_rate is no reset here, vtime_rate will be overwritten with old vtime_base_rate soon in ioc_refresh_vrate. Signed-off-by: Kemeng Shi <shikemeng@huawei.com> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20221018121932.10792-3-shikemeng@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-12-01	blk-iocost: Fix typo in comment	Kemeng Shi	1	-1/+1
	soley -> solely Signed-off-by: Kemeng Shi <shikemeng@huawei.com> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20221018121932.10792-2-shikemeng@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-12-01	block: Do not reread partition table on exclusively open device	Jan Kara	3	-8/+13
	Since commit 10c70d95c0f2 ("block: remove the bd_openers checks in blk_drop_partitions") we allow rereading of partition table although there are users of the block device. This has an undesirable consequence that e.g. if sda and sdb are assembled to a RAID1 device md0 with partitions, BLKRRPART ioctl on sda will rescan partition table and create sda1 device. This partition device under a raid device confuses some programs (such as libstorage-ng used for initial partitioning for distribution installation) leading to failures. Fix the problem refusing to rescan partitions if there is another user that has the block device exclusively open. Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/20221130135344.2ul4cyfstfs3znxg@quack3 Fixes: 10c70d95c0f2 ("block: remove the bd_openers checks in blk_drop_partitions") Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20221130175653.24299-1-jack@suse.cz [axboe: fold in followup fix] Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-12-01	virtio-blk: replace ida_simple[get\|remove] with ida_[alloc_range\|free]	Pankaj Raghav	1	-4/+4
	ida_simple[get\|remove] are deprecated, and are just wrappers to ida_[alloc_range\|free]. Replace ida_simple[get\|remove] with their corresponding counterparts. No functional changes. Signed-off-by: Pankaj Raghav <p.raghav@samsung.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Link: https://lore.kernel.org/r/20221130123001.25473-1-p.raghav@samsung.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-11-30	block: mark blk_put_queue as potentially blocking	Christoph Hellwig	1	-4/+2
	We can't just say that the last reference release may block, as any reference dropped could be the last one. So move the might_sleep() from blk_free_queue to blk_put_queue and update the documentation. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20221114042637.1009333-6-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-11-30	block: untangle request_queue refcounting from sysfs	Christoph Hellwig	8	-87/+71
	The kobject embedded into the request_queue is used for the queue directory in sysfs, but that is a child of the gendisks directory and is intimately tied to it. Move this kobject to the gendisk and use a refcount_t in the request_queue for the actual request_queue refcounting that is completely unrelated to the device model. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20221114042637.1009333-5-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-11-30	block: fix error unwinding in blk_register_queue	Christoph Hellwig	1	-12/+16
	blk_register_queue fails to handle errors from blk_mq_sysfs_register, leaks various resources on errors and accidentally sets queue refs percpu refcount to percpu mode on kobject_add failure. Fix all that by properly unwinding on errors. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20221114042637.1009333-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-11-30	block: factor out a blk_debugfs_remove helper	Christoph Hellwig	1	-7/+14
	Split the debugfs removal from blk_unregister_queue into a helper so that the it can be reused for blk_register_queue error handling. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20221114042637.1009333-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-11-30	blk-crypto: pass a gendisk to blk_crypto_sysfs_{,un}register	Christoph Hellwig	3	-9/+12
	Prepare for changes to the block layer sysfs handling by passing the readily available gendisk to blk_crypto_sysfs_{,un}register. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20221114042637.1009333-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-11-30	Revert "blk-cgroup: Flush stats at blkgs destruction path"	Jens Axboe	3	-35/+1
	This reverts commit dae590a6c96c799434e0ff8156ef29b88c257e60. We've had a few reports on this causing a crash at boot time, because of a reference issue. While this problem seemginly did exist before the patch and needs solving separately, this patch makes it a lot easier to trigger. Link: https://lore.kernel.org/linux-block/CA+QYu4oxiRKC6hJ7F27whXy-PRBx=Tvb+-7TQTONN8qTtV3aDA@mail.gmail.com/ Link: https://lore.kernel.org/linux-block/69af7ccb-6901-c84c-0e95-5682ccfb750c@acm.org/ Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-11-29	block: use bool as the return type of elv_iosched_allow_bio_merge	Jinlong Chen	1	-2/+2
	We have bool type now, update the old signature. Signed-off-by: Jinlong Chen <nickyc975@zju.edu.cn> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/0db0a0298758d60d0f4df8b7126ac6a381e5a5bb.1669736350.git.nickyc975@zju.edu.cn Signed-off-by: Jens Axboe <axboe@kernel.dk>