summaryrefslogtreecommitdiff
path: root/block
AgeCommit message (Collapse)AuthorFilesLines
2021-10-20blk-mq: support concurrent queue quiesce/unquiesceMing Lei1-3/+19
blk_mq_quiesce_queue() has been used a bit wide now, so far we don't support concurrent/nested quiesce. One biggest issue is that unquiesce can happen unexpectedly in case that quiesce/unquiesce are run concurrently from more than one context. This patch introduces q->mq_quiesce_depth to deal concurrent quiesce, and we only unquiesce queue when it is the last/outer-most one of all contexts. Several kernel panic issue has been reported[1][2][3] when running stress quiesce test. And this patch has been verified in these reports. [1] https://lore.kernel.org/linux-block/9b21c797-e505-3821-4f5b-df7bf9380328@huawei.com/T/#m1fc52431fad7f33b1ffc3f12c4450e4238540787 [2] https://lore.kernel.org/linux-block/9b21c797-e505-3821-4f5b-df7bf9380328@huawei.com/T/#m10ad90afeb9c8cc318334190a7c24c8b5c5e0722 [3] https://listman.redhat.com/archives/dm-devel/2021-September/msg00189.html Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211014081710.1871747-7-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-20block, bfq: fix UAF problem in bfqg_stats_init()Zheng Liang1-5/+7
In bfq_pd_alloc(), the function bfqg_stats_init() init bfqg. If blkg_rwstat_init() init bfqg_stats->bytes successful and init bfqg_stats->ios failed, bfqg_stats_init() return failed, bfqg will be freed. But blkg_rwstat->cpu_cnt is not deleted from the list of percpu_counters. If we traverse the list of percpu_counters, It will have UAF problem. we should use blkg_rwstat_exit() to cleanup bfqg_stats bytes in the above scenario. Fixes: commit fd41e60331b ("bfq-iosched: stop using blkg->stat_bytes and ->stat_ios") Signed-off-by: Zheng Liang <zhengliang6@huawei.com> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20211018024225.1493938-1-zhengliang6@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-20block: inline fast path of driver tag allocationJens Axboe2-6/+17
If we don't use an IO scheduler or have shared tags, then we don't need to call into this external function at all. This saves ~2% for such a setup. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19blk-mq: don't handle non-flush requests in blk_insert_flushChristoph Hellwig3-15/+13
Return to the normal blk_mq_submit_bio flow if the bio did not end up actually being a flush because the device didn't support it. Note that this is basically impossible to hit without special instrumentation given that submit_bio_checks already clears these flags usually, so we'd need a tight race to actually hit this code path. With this the call to blk_mq_run_hw_queue for the flush requests can be removed given that the actual flush requests are always issued via the requeue workqueue which runs the queue unconditionally. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211019122553.2467817-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19block: attempt direct issue of plug listJens Axboe2-0/+61
If we have just one queue type in the plug list, then we can extend our direct issue to cover a full plug list as well. This allows sending a batch of requests for direct issue, which is more efficient than doing one-at-a-time kind of issue. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19block: change plugging to use a singly linked listJens Axboe3-39/+49
Use a singly linked list for the blk_plug. This saves 8 bytes in the blk_plug struct, and makes for faster list manipulations than doubly linked lists. As we don't use the doubly linked lists for anything, singly linked is just fine. This yields a bump in default (merging enabled) performance from 7.0 to 7.1M IOPS, and ~7.5M IOPS with merging disabled. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19blk-wbt: prevent NULL pointer dereference in wb_timer_fnAndrea Righi1-0/+3
The timer callback used to evaluate if the latency is exceeded can be executed after the corresponding disk has been released, causing the following NULL pointer dereference: [ 119.987108] BUG: kernel NULL pointer dereference, address: 0000000000000098 [ 119.987617] #PF: supervisor read access in kernel mode [ 119.987971] #PF: error_code(0x0000) - not-present page [ 119.988325] PGD 7c4a4067 P4D 7c4a4067 PUD 7bf63067 PMD 0 [ 119.988697] Oops: 0000 [#1] SMP NOPTI [ 119.988959] CPU: 1 PID: 9353 Comm: cloud-init Not tainted 5.15-rc5+arighi #rc5+arighi [ 119.989520] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014 [ 119.990055] RIP: 0010:wb_timer_fn+0x44/0x3c0 [ 119.990376] Code: 41 8b 9c 24 98 00 00 00 41 8b 94 24 b8 00 00 00 41 8b 84 24 d8 00 00 00 4d 8b 74 24 28 01 d3 01 c3 49 8b 44 24 60 48 8b 40 78 <4c> 8b b8 98 00 00 00 4d 85 f6 0f 84 c4 00 00 00 49 83 7c 24 30 00 [ 119.991578] RSP: 0000:ffffb5f580957da8 EFLAGS: 00010246 [ 119.991937] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000004 [ 119.992412] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88f476d7f780 [ 119.992895] RBP: ffffb5f580957dd0 R08: 0000000000000000 R09: 0000000000000000 [ 119.993371] R10: 0000000000000004 R11: 0000000000000002 R12: ffff88f476c84500 [ 119.993847] R13: ffff88f4434390c0 R14: 0000000000000000 R15: ffff88f4bdc98c00 [ 119.994323] FS: 00007fb90bcd9c00(0000) GS:ffff88f4bdc80000(0000) knlGS:0000000000000000 [ 119.994952] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 119.995380] CR2: 0000000000000098 CR3: 000000007c0d6000 CR4: 00000000000006e0 [ 119.995906] Call Trace: [ 119.996130] ? blk_stat_free_callback_rcu+0x30/0x30 [ 119.996505] blk_stat_timer_fn+0x138/0x140 [ 119.996830] call_timer_fn+0x2b/0x100 [ 119.997136] __run_timers.part.0+0x1d1/0x240 [ 119.997470] ? kvm_clock_get_cycles+0x11/0x20 [ 119.997826] ? ktime_get+0x3e/0xa0 [ 119.998110] ? native_apic_msr_write+0x2c/0x30 [ 119.998456] ? lapic_next_event+0x20/0x30 [ 119.998779] ? clockevents_program_event+0x94/0xf0 [ 119.999150] run_timer_softirq+0x2a/0x50 [ 119.999465] __do_softirq+0xcb/0x26f [ 119.999764] irq_exit_rcu+0x8c/0xb0 [ 120.000057] sysvec_apic_timer_interrupt+0x43/0x90 [ 120.000429] ? asm_sysvec_apic_timer_interrupt+0xa/0x20 [ 120.000836] asm_sysvec_apic_timer_interrupt+0x12/0x20 In this case simply return from the timer callback (no action required) to prevent the NULL pointer dereference. BugLink: https://bugs.launchpad.net/bugs/1947557 Link: https://lore.kernel.org/linux-mm/YWRNVTk9N8K0RMst@arighi-desktop/ Fixes: 34dbad5d26e2 ("blk-stat: convert to callback-based statistics reporting") Signed-off-by: Andrea Righi <andrea.righi@canonical.com> Link: https://lore.kernel.org/r/YW6N2qXpBU3oc50q@arighi-desktop Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19block: align blkdev_dio inlined bio to a cachelineJens Axboe1-1/+1
We get all sorts of unreliable and funky results since the bio is designed to align on a cacheline, which it does not when inlined like this. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19block: move blk_mq_tag_to_rq() inlineJens Axboe2-34/+0
This is in the fast path of driver issue or completion, and it's a single array index operation. Move it inline to avoid a function call for it. This does mean making struct blk_mq_tags block layer public, but there's not really much in there. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19block: get rid of plug list sortingJens Axboe1-19/+0
Even if we have multiple queues in the plug list, chances that they are very interspersed is minimal. Don't bother spending CPU cycles sorting the list. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19block: return whether or not to unplug through booleanJens Axboe3-14/+15
Instead of returning the same queue request through a request pointer, use a boolean to accomplish the same. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19block: don't call blk_status_to_errno in blk_update_requestChristoph Hellwig1-1/+1
We only need to call it to resolve the blk_status_t -> errno mapping for tracing, so move the conversion into the tracepoints that are not called at all when tracing isn't enabled. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19block: move bdev_read_only() into the headerJens Axboe1-6/+0
This is called for every write in the fast path, move it inline next to get_disk_ro() which is called internally. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19block: fix too broad elevator check in blk_mq_free_request()Jens Axboe1-1/+1
We added RQF_ELV to tell whether there's an IO scheduler attached, and RQF_ELVPRIV tells us whether there's an IO scheduler with private data attached. Don't check RQF_ELV in blk_mq_free_request(), what we care about here is just if we have scheduler private data attached. This fixes a boot crash Fixes: 2ff0682da6e0 ("block: store elevator state in request") Reported-by: Yi Zhang <yi.zhang@redhat.com> Reported-by: syzbot+eb8104072aeab6cc1195@syzkaller.appspotmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18block: add support for blk_mq_end_request_batch()Jens Axboe3-19/+70
Instead of calling blk_mq_end_request() on a single request, add a helper that takes the new struct io_comp_batch and completes any request stored in there. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18block: add a struct io_comp_batch argument to fops->iopoll()Jens Axboe5-12/+15
struct io_comp_batch contains a list head and a completion handler, which will allow completions to more effciently completed batches of IO. For now, no functional changes in this patch, we just define the io_comp_batch structure and add the argument to the file_operations iopoll handler. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18block: provide helpers for rq_list manipulationJens Axboe1-14/+5
Instead of open-coding the list additions, traversal, and removal, provide a basic set of helpers. Suggested-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18block: remove some blk_mq_hw_ctx debugfs entriesJens Axboe2-83/+0
Just like the blk_mq_ctx counterparts, we've got a bunch of counters in here that are only for debugfs and are of questionnable value. They are: - dispatched, index of how many requests were dispatched in one go - poll_{considered,invoked,success}, which track poll sucess rates. We're confident in the iopoll implementation at this point, don't bother tracking these. As a bonus, this shrinks each hardware queue from 576 bytes to 512 bytes, dropping a whole cacheline. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18block: remove debugfs blk_mq_ctx dispatched/merged/completed attributesJens Axboe4-68/+1
These were added as part of early days debugging for blk-mq, and they are not really useful anymore. Rather than spend cycles updating them, just get rid of them. As a bonus, this shrinks the per-cpu software queue size from 256b to 192b. That's a whole cacheline less. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18block: cache rq_flags inside blk_mq_rq_ctx_init()Pavel Begunkov1-6/+8
Add a local variable for rq_flags, it helps to compile out some of rq_flags reloads. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18block: blk_mq_rq_ctx_init cache ctx/q/hctxPavel Begunkov1-5/+9
We should have enough of registers in blk_mq_rq_ctx_init(), store them in local vars, so we don't keep reloading them. note: keeping q->elevator may look unnecessary, but it's also used inside inlined blk_mq_tags_from_data(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18block: skip elevator fields init for non-elv queuePavel Begunkov1-14/+14
Don't init rq->hash and rq->rb_node in blk_mq_rq_ctx_init() if there is no elevator. Also, move some other initialisers that imply barriers to the end, so the compiler is free to rearrange and optimise other the rest of them. note: fold in a change from Jens leaving queue_list unconditional, as it might lead to problems otherwise. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18block: store elevator state in requestJens Axboe2-20/+27
Add an rq private RQF_ELV flag, which tells the block layer that this request was initialized on a queue that has an IO scheduler attached. This allows for faster checking in the fast path, rather than having to deference rq->q later on. Elevator switching does full quiesce of the queue before detaching an IO scheduler, so it's safe to cache this in the request itself. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18block: only mark bio as tracked if it really is trackedJens Axboe1-2/+3
We set BIO_TRACKED unconditionally when rq_qos_throttle() is called, even though we may not even have an rq_qos handler. Only mark it as TRACKED if it really is potentially tracked. This saves considerable time for the case where the bio isn't tracked: 2.64% -1.65% [kernel.vmlinux] [k] bio_endio Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18block: move update request helpers into blk-mq.cJens Axboe3-145/+146
For some reason we still have them in blk-core, with the rest of the request completion being in blk-mq. That causes and out-of-line call for each completion. Move them into blk-mq.c instead, where they belong. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18block: remove useless caller argument to print_req_error()Jens Axboe1-5/+4
We have exactly one caller of this, just get rid of adding the useless function name to the output. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18block: don't bother iter advancing a fully done bioJens Axboe1-13/+2
If we're completing nbytes and nbytes is the size of the bio, don't bother with calling into the iterator increment helpers. Just clear the bio size and we're done. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18block: convert the rest of block to bdev_get_queuePavel Begunkov8-22/+22
Convert bdev->bd_disk->queue to bdev_get_queue(), it's uses a cached queue pointer and so is faster. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/addf6ea988c04213697ba3684c853e4ed7642a39.1634219547.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18block: use bdev_get_queue() in blk-core.cPavel Begunkov1-6/+7
Convert bdev->bd_disk->queue to bdev_get_queue(), it's uses a cached queue pointer and so is faster. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/efc41f880262517c8dc32f932f1b23112f21b255.1634219547.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18block: use bdev_get_queue() in bio.cPavel Begunkov1-5/+5
Convert bdev->bd_disk->queue to bdev_get_queue(), it's uses a cached queue pointer and so is faster. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/85c36ea784d285a5075baa10049e6b59e15fb484.1634219547.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18block: use bdev_get_queue() in bdev.cPavel Begunkov1-4/+4
Convert bdev->bd_disk->queue to bdev_get_queue(), it's uses a cached queue pointer and so is faster. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/a352936ce5d9ac719645b1e29b173d931ebcdc02.1634219547.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18block: cache request queue in bdevPavel Begunkov2-1/+4
There are tons of places where we need to get a request_queue only having bdev, which turns into bdev->bd_disk->queue. There are probably a hundred of such places considering inline helpers, and enough of them are in hot paths. Cache queue pointer in struct block_device and make use of it in bdev_get_queue(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/a3bfaecdd28956f03629d0ca5c63ebc096e1c809.1634219547.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18block: handle fast path of bio splitting inlineJens Axboe3-21/+35
The fast path is no splitting needed. Separate the handling into a check part we can inline, and an out-of-line handling path if we do need to split. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18block: use flags instead of bit fields for blkdev_dioJens Axboe1-14/+20
This generates a lot better code for me, and bumps performance from 7650K IOPS to 7750K IOPS. Looking at profiles for the run and running perf diff, it confirms that we're now sending a lot less time there: 6.38% -2.80% [kernel.vmlinux] [k] blkdev_direct_IO Taking it from the 2nd most cycle consumer to only the 9th most at 3.35% of the CPU time. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18block: cache bdev in struct file for raw bdev IOPavel Begunkov1-15/+12
bdev = &BDEV_I(file->f_mapping->host)->bdev Getting struct block_device from a file requires 2 memory dereferences as illustrated above, that takes a toll on performance, so cache it in yet unused file->private_data. That gives a noticeable peak performance improvement. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/8415f9fe12e544b9da89593dfbca8de2b52efe03.1634115360.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18block: don't allow writing to the poll queue attributeChristoph Hellwig1-19/+4
The poll attribute is a historic artefact from before when we had explicit poll queues that require driver specific configuration. Just print a warning when writing to the attribute. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Tested-by: Mark Wunderlich <mark.wunderlich@intel.com> Link: https://lore.kernel.org/r/20211012111226.760968-16-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18block: switch polling to be bio basedChristoph Hellwig6-100/+137
Replace the blk_poll interface that requires the caller to keep a queue and cookie from the submissions with polling based on the bio. Polling for the bio itself leads to a few advantages: - the cookie construction can made entirely private in blk-mq.c - the caller does not need to remember the request_queue and cookie separately and thus sidesteps their lifetime issues - keeping the device and the cookie inside the bio allows to trivially support polling BIOs remapping by stacking drivers - a lot of code to propagate the cookie back up the submission path can be removed entirely. Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Mark Wunderlich <mark.wunderlich@intel.com> Link: https://lore.kernel.org/r/20211012111226.760968-15-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18block: use SLAB_TYPESAFE_BY_RCU for the bio slabChristoph Hellwig1-1/+2
This flags ensures that the pages will not be reused for non-bio allocations before the end of an RCU grace period. With that we can safely use a RCU lookup for bio polling as long as we are fine with occasionally polling the wrong device. Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Mark Wunderlich <mark.wunderlich@intel.com> Link: https://lore.kernel.org/r/20211012111226.760968-13-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18block: rename REQ_HIPRI to REQ_POLLEDChristoph Hellwig6-10/+9
Unlike the RWF_HIPRI userspace ABI which is intentionally kept vague, the bio flag is specific to the polling implementation, so rename and document it properly. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Tested-by: Mark Wunderlich <mark.wunderlich@intel.com> Link: https://lore.kernel.org/r/20211012111226.760968-12-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18io_uring: don't sleep when polling for I/OChristoph Hellwig1-1/+2
There is no point in sleeping for the expected I/O completion timeout in the io_uring async polling model as we never poll for a specific I/O. Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Mark Wunderlich <mark.wunderlich@intel.com> Link: https://lore.kernel.org/r/20211012111226.760968-11-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18block: replace the spin argument to blk_iopoll with a flags argumentChristoph Hellwig3-15/+12
Switch the boolean spin argument to blk_poll to passing a set of flags instead. This will allow to control polling behavior in a more fine grained way. Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Mark Wunderlich <mark.wunderlich@intel.com> Link: https://lore.kernel.org/r/20211012111226.760968-10-hch@lst.de [axboe: adapt to changed io_uring iopoll] Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18blk-mq: remove blk_qc_t_validChristoph Hellwig1-1/+1
Move the trivial check into the only caller. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Tested-by: Mark Wunderlich <mark.wunderlich@intel.com> Link: https://lore.kernel.org/r/20211012111226.760968-9-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18blk-mq: remove blk_qc_t_to_tag and blk_qc_t_is_internalChristoph Hellwig1-3/+5
Merge both functions into their only caller to keep the blk-mq tag to blk_qc_t mapping as private as possible in blk-mq.c. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Tested-by: Mark Wunderlich <mark.wunderlich@intel.com> Link: https://lore.kernel.org/r/20211012111226.760968-8-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18blk-mq: factor out a "classic" poll helperChristoph Hellwig1-64/+56
Factor the code to do the classic full metal polling out of blk_poll into a separate blk_mq_poll_classic helper. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Tested-by: Mark Wunderlich <mark.wunderlich@intel.com> Link: https://lore.kernel.org/r/20211012111226.760968-7-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18blk-mq: factor out a blk_qc_to_hctx helperChristoph Hellwig1-1/+7
Add a helper to get the hctx from a request_queue and cookie, and fold the blk_qc_t_to_queue_num helper into it as no other callers are left. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Tested-by: Mark Wunderlich <mark.wunderlich@intel.com> Link: https://lore.kernel.org/r/20211012111226.760968-6-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18block: don't try to poll multi-bio I/Os in __blkdev_direct_IOChristoph Hellwig1-14/+7
If an iocb is split into multiple bios we can't poll for both. So don't even bother to try to poll in that case. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211012111226.760968-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18block: only check previous entry for plug merge attemptJens Axboe1-23/+13
Currently we scan the entire plug list, which is potentially very expensive. In an IOPS bound workload, we can drive about 5.6M IOPS with merging enabled, and profiling shows that the plug merge check is the (by far) most expensive thing we're doing: Overhead Command Shared Object Symbol + 20.89% io_uring [kernel.vmlinux] [k] blk_attempt_plug_merge + 4.98% io_uring [kernel.vmlinux] [k] io_submit_sqes + 4.78% io_uring [kernel.vmlinux] [k] blkdev_direct_IO + 4.61% io_uring [kernel.vmlinux] [k] blk_mq_submit_bio Instead of browsing the whole list, just check the previously inserted entry. That is enough for a naive merge check and will catch most cases, and for devices that need full merging, the IO scheduler attached to such devices will do that anyway. The plug merge is meant to be an inexpensive check to avoid getting a request, but if we repeatedly scan the list for every single insert, it is very much not a cheap check. With this patch, the workload instead runs at ~7.0M IOPS, providing a 25% improvement. Disabling merging entirely yields another 5% improvement. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18block: move CONFIG_BLOCK guard to top MakefileMasahiro Yamada1-1/+1
Every object under block/ depends on CONFIG_BLOCK. Move the guard to the top Makefile since there is no point to descend into block/ if CONFIG_BLOCK=n. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210927140000.866249-5-masahiroy@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18block: move menu "Partition type" to block/partitions/KconfigMasahiro Yamada2-4/+4
Move the menu to the relevant place. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210927140000.866249-4-masahiroy@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18block: simplify Kconfig filesMasahiro Yamada2-15/+7
Everything under block/ depends on BLOCK. BLOCK_HOLDER_DEPRECATED is selected from drivers/md/Kconfig, which is entirely dependent on BLOCK. Extend the 'if BLOCK' ... 'endif' so it covers the whole block/Kconfig. Also, clean up the definition of BLOCK_COMPAT and BLK_MQ_PCI because COMPAT and PCI are boolean. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210927140000.866249-3-masahiroy@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>