Age | Commit message (Collapse) | Author | Files | Lines |
|
* for-5.18/block: (96 commits)
block: remove bio_devname
ext4: stop using bio_devname
raid5-ppl: stop using bio_devname
raid1: stop using bio_devname
md-multipath: stop using bio_devname
dm-integrity: stop using bio_devname
dm-crypt: stop using bio_devname
pktcdvd: remove a pointless debug check in pkt_submit_bio
block: remove handle_bad_sector
block: fix and cleanup bio_check_ro
bfq: fix use-after-free in bfq_dispatch_request
blk-crypto: show crypto capabilities in sysfs
block: don't delete queue kobject before its children
block: simplify calling convention of elv_unregister_queue()
block: remove redundant semicolon
block: default BLOCK_LEGACY_AUTOLOAD to y
block: update io_ticks when io hang
block, bfq: don't move oom_bfqq
block, bfq: avoid moving bfqq to it's parent bfqg
block, bfq: cleanup bfq_bfqq_to_bfqg()
...
|
|
iocb_bio_iopoll() expects iocb->private to be cleared before
releasing the bio.
We already do this in blkdev_bio_end_io(), but we forgot in the
recently added blkdev_bio_end_io_async().
Fixes: 54a88eb838d3 ("block: add single bio async direct IO helper")
Cc: asml.silence@gmail.com
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220211090136.44471-1-sgarzare@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Pass the block_device that we plan to use this bio for and the
operation to bio_init to optimize the assignment. A NULL block_device
can be passed, both for the passthrough case on a raw request_queue and
to temporarily avoid refactoring some nasty code.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220124091107.642561-19-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Pass the block_device and operation that we plan to use this bio for to
bio_alloc to optimize the assignment. NULL/0 can be passed, both for the
passthrough case on a raw request_queue and to temporarily avoid
refactoring some nasty code.
Also move the gfp_mask argument after the nr_vecs argument for a much
more logical calling convention matching what most of the kernel does.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220124091107.642561-18-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Pass the block_device and operation that we plan to use this bio for to
bio_alloc_kiocb to optimize the assigment.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220124091107.642561-17-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Commit ceaa762527f4 ("block: move direct_IO into our own read_iter
handler") introduced several regressions for bdev DIO:
1. read spanning EOF always returns 0 instead of the number of bytes
read. This is because "count" is assigned early and isn't updated
when the iterator is truncated:
$ lsblk -o name,size /dev/vdb
NAME SIZE
vdb 1G
$ xfs_io -d -c 'pread -b 4M 1021M 4M' /dev/vdb
read 0/4194304 bytes at offset 1070596096
0.000000 bytes, 0 ops; 0.0007 sec (0.000000 bytes/sec and 0.0000 ops/sec)
instead of
$ xfs_io -d -c 'pread -b 4M 1021M 4M' /dev/vdb
read 3145728/4194304 bytes at offset 1070596096
3 MiB, 1 ops; 0.0007 sec (3.865 GiB/sec and 1319.2612 ops/sec)
2. truncated iterator isn't reexpanded
3. iterator isn't reverted on blkdev_direct_IO() error
4. zero size read no longer skips atime update
Fixes: ceaa762527f4 ("block: move direct_IO into our own read_iter handler")
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220201100420.25875-1-idryomov@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Pull block updates from Jens Axboe:
- Unify where the struct request handling code is located in the blk-mq
code (Christoph)
- Header cleanups (Christoph)
- Clean up the io_context handling code (Christoph, me)
- Get rid of ->rq_disk in struct request (Christoph)
- Error handling fix for add_disk() (Christoph)
- request allocation cleanusp (Christoph)
- Documentation updates (Eric, Matthew)
- Remove trivial crypto unregister helper (Eric)
- Reduce shared tag overhead (John)
- Reduce poll_stats memory overhead (me)
- Known indirect function call for dio (me)
- Use atomic references for struct request (me)
- Support request list issue for block and NVMe (me)
- Improve queue dispatch pinning (Ming)
- Improve the direct list issue code (Keith)
- BFQ improvements (Jan)
- Direct completion helper and use it in mmc block (Sebastian)
- Use raw spinlock for the blktrace code (Wander)
- fsync error handling fix (Ye)
- Various fixes and cleanups (Lukas, Randy, Yang, Tetsuo, Ming, me)
* tag 'for-5.17/block-2022-01-11' of git://git.kernel.dk/linux-block: (132 commits)
MAINTAINERS: add entries for block layer documentation
docs: block: remove queue-sysfs.rst
docs: sysfs-block: document virt_boundary_mask
docs: sysfs-block: document stable_writes
docs: sysfs-block: fill in missing documentation from queue-sysfs.rst
docs: sysfs-block: add contact for nomerges
docs: sysfs-block: sort alphabetically
docs: sysfs-block: move to stable directory
block: don't protect submit_bio_checks by q_usage_counter
block: fix old-style declaration
nvme-pci: fix queue_rqs list splitting
block: introduce rq_list_move
block: introduce rq_list_for_each_safe macro
block: move rq_list macros to blk-mq.h
block: drop needless assignment in set_task_ioprio()
block: remove unnecessary trailing '\'
bio.h: fix kernel-doc warnings
block: check minor range in device_add_disk()
block: use "unsigned long" for blk_validate_block_size().
block: fix error unwinding in device_add_disk
...
|
|
Pull block fixes from Jens Axboe:
"A few block fixes that should go into this release:
- NVMe pull request:
- set ana_log_size to 0 after freeing ana_log_buf (Hou Tao)
- show subsys nqn for duplicate cntlids (Keith Busch)
- disable namespace access for unsupported metadata (Keith
Busch)
- report write pointer for a full zone as zone start + zone len
(Niklas Cassel)
- fix use after free when disconnecting a reconnecting ctrl
(Ruozhu Li)
- fix a list corruption in nvmet-tcp (Sagi Grimberg)
- Fix for a regression on DIO single bio async IO (Pavel)
- ioprio seteuid fix (Davidlohr)
- mtd fix that subsequently got reverted as it was broken, will get
re-done and submitted for the next round
- Two MD fixes via Song (Markus, zhangyue)"
* tag 'block-5.16-2021-12-10' of git://git.kernel.dk/linux-block:
Revert "mtd_blkdevs: don't scan partitions for plain mtdblock"
block: fix ioprio_get(IOPRIO_WHO_PGRP) vs setuid(2)
md: fix double free of mddev->private in autorun_array()
md: fix update super 1.0 on rdev size change
nvmet-tcp: fix possible list corruption for unexpected command failure
block: fix single bio async DIO error handling
nvme: fix use after free when disconnecting a reconnecting ctrl
nvme-multipath: set ana_log_size to 0 after free ana_log_buf
mtd_blkdevs: don't scan partitions for plain mtdblock
nvme: report write pointer for a full zone as zone start + zone len
nvme: disable namespace access for unsupported metadata
nvme: show subsys nqn for duplicate cntlids
|
|
BUG: KASAN: use-after-free in io_submit_one+0x496/0x2fe0 fs/aio.c:1882
CPU: 2 PID: 15100 Comm: syz-executor873 Not tainted 5.16.0-rc1-syzk #1
Hardware name: Red Hat KVM, BIOS 1.13.0-2.module+el8.3.0+7860+a7792d29
04/01/2014
Call Trace:
[...]
refcount_dec_and_test include/linux/refcount.h:333 [inline]
iocb_put fs/aio.c:1161 [inline]
io_submit_one+0x496/0x2fe0 fs/aio.c:1882
__do_sys_io_submit fs/aio.c:1938 [inline]
__se_sys_io_submit fs/aio.c:1908 [inline]
__x64_sys_io_submit+0x1c7/0x4a0 fs/aio.c:1908
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3a/0x80 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
__blkdev_direct_IO_async() returns errors from bio_iov_iter_get_pages()
directly, in which case upper layers won't be expecting ->ki_complete
to be called by the block layer and will terminate the request. However,
there is also bio_endio() leading to a second ->ki_complete and a double
free.
Fixes: 54a88eb838d37 ("block: add single bio async direct IO helper")
Reported-by: George Kennedy <george.kennedy@oracle.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/c9eb786f6cef041e159e6287de131bec0719ad5c.1638907997.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Don't call into generic_file_read_iter() if we know it's O_DIRECT, just
set it up ourselves and call our own handler. This avoids an indirect call
for O_DIRECT.
Fall back to filemap_read() if we fail.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
cgroup.h (therefore swap.h, therefore half of the universe)
includes bpf.h which in turn includes module.h and slab.h.
Since we're about to get rid of that dependency we need
to clean things up.
v2: drop the cpu.h include from cacheinfo.h, it's not necessary
and it makes riscv sensitive to ordering of include files.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Krzysztof WilczyĆski <kw@linux.com>
Acked-by: Peter Chen <peter.chen@kernel.org>
Acked-by: SeongJae Park <sj@kernel.org>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lore.kernel.org/all/20211120035253.72074-1-kuba@kernel.org/ # v1
Link: https://lore.kernel.org/all/20211120165528.197359-1-kuba@kernel.org/ # cacheinfo discussion
Link: https://lore.kernel.org/bpf/20211202203400.1208663-1-kuba@kernel.org
|
|
Pull more bdev size updates from Jens Axboe:
"Two followup changes for the bdev-size series from this merge window:
- Add loff_t cast to bdev_nr_bytes() (Christoph)
- Use bdev_nr_bytes() consistently for the block parts at least (me)"
* tag 'for-5.16/bdev-size-2021-11-09' of git://git.kernel.dk/linux-block:
block: use new bdev_nr_bytes() helper for blkdev_{read,write}_iter()
block: add a loff_t cast to bdev_nr_bytes
|
|
We have new helpers for this, use them rather than the slower inode
size reads. This makes the read/write path consistent with most of
the rest of block as well.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/a72767cd-3c6d-47f7-80f4-aa025a17b2cb@kernel.dk
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Pull kiocb->ki_complete() cleanup from Jens Axboe:
"This removes the res2 argument from kiocb->ki_complete().
Only the USB gadget code used it, everybody else passes 0. The USB
guys checked the user gadget code they could find, and everybody just
uses res as expected for the async interface"
* tag 'for-5.16/ki_complete-2021-10-29' of git://git.kernel.dk/linux-block:
fs: get rid of the res2 iocb->ki_complete argument
usb: remove res2 argument from gadget code completions
|
|
Pull bdev size cleanups from Jens Axboe:
"Clean up the bdev size handling with new bdev_nr_bytes() helper"
* tag 'for-5.16/bdev-size-2021-10-29' of git://git.kernel.dk/linux-block: (34 commits)
partitions/ibm: use bdev_nr_sectors instead of open coding it
partitions/efi: use bdev_nr_bytes instead of open coding it
block/ioctl: use bdev_nr_sectors and bdev_nr_bytes
block: cache inode size in bdev
udf: use sb_bdev_nr_blocks
reiserfs: use sb_bdev_nr_blocks
ntfs: use sb_bdev_nr_blocks
jfs: use sb_bdev_nr_blocks
ext4: use sb_bdev_nr_blocks
block: add a sb_bdev_nr_blocks helper
block: use bdev_nr_bytes instead of open coding it in blkdev_fallocate
squashfs: use bdev_nr_bytes instead of open coding it
reiserfs: use bdev_nr_bytes instead of open coding it
pstore/blk: use bdev_nr_bytes instead of open coding it
ntfs3: use bdev_nr_bytes instead of open coding it
nilfs2: use bdev_nr_bytes instead of open coding it
nfs/blocklayout: use bdev_nr_bytes instead of open coding it
jfs: use bdev_nr_bytes instead of open coding it
hfsplus: use bdev_nr_sectors instead of open coding it
hfs: use bdev_nr_sectors instead of open coding it
...
|
|
If we know that a iocb is async we can optimise bio_set_polled() a bit,
add a new helper bio_set_polled_async().
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/8fa137885164a5d05fadcff4c3521da8d5a83d00.1635337135.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Now __blkdev_direct_IO() serves only multi-bio I/O, thus remove
not used anymore single bio refcounting optimisations.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/88eb488aae9ed4852a30f3a7132f296f56e43b80.1635337135.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
With addition of __blkdev_direct_IO_async(), __blkdev_direct_IO() now
serves only multio-bio I/O, which we don't poll. Now we can remove
anything related to I/O polling from it.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/b8c597a6b7ee612df394853bfd24726aee5b898e.1635337135.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Nobody cares about iov iterators state if we return -EIOCBQUEUED, so as
the we now have __blkdev_direct_IO_async(), which gets pages only once,
we can skip expensive iov_iter_advance(). It's around 1-2% of all CPU
spent.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/a6158edfbfa2ae3bc24aed29a72f035df18fad2f.1635337135.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
The second argument was only used by the USB gadget code, yet everyone
pays the overhead of passing a zero to be passed into aio, where it
ends up being part of the aio res2 value.
Now that everybody is passing in zero, kill off the extra argument.
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
As with __blkdev_direct_IO_simple(), we can implement direct IO more
efficiently if there is only one bio. Add __blkdev_direct_IO_async() and
blkdev_bio_end_io_async(). This patch brings me from 4.45-4.5 MIOPS with
nullblk to 4.7+.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/f0ae4109b7a6934adede490f84d188d53b97051b.1635006010.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Don't use shifting by a magic number 9 but replace with a more
descriptive SHIFT_SECTOR.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/068782b9f7e97569fb59a99529b23bb17ea4c5e2.1634755800.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Combine pos and len checks and mark unlikely. Also, don't reexpand if
it's not truncated.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/fff34e613aeaae1ad12977dc4592cb1a1f5d3190.1634755800.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
We get all sorts of unreliable and funky results since the bio is
designed to align on a cacheline, which it does not when inlined like
this.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Use the proper helper to read the block device size.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20211018101130.1838532-25-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
struct io_comp_batch contains a list head and a completion handler, which
will allow completions to more effciently completed batches of IO.
For now, no functional changes in this patch, we just define the
io_comp_batch structure and add the argument to the file_operations iopoll
handler.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
This generates a lot better code for me, and bumps performance from
7650K IOPS to 7750K IOPS. Looking at profiles for the run and running
perf diff, it confirms that we're now sending a lot less time there:
6.38% -2.80% [kernel.vmlinux] [k] blkdev_direct_IO
Taking it from the 2nd most cycle consumer to only the 9th most at
3.35% of the CPU time.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
bdev = &BDEV_I(file->f_mapping->host)->bdev
Getting struct block_device from a file requires 2 memory dereferences
as illustrated above, that takes a toll on performance, so cache it in
yet unused file->private_data. That gives a noticeable peak performance
improvement.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/8415f9fe12e544b9da89593dfbca8de2b52efe03.1634115360.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Replace the blk_poll interface that requires the caller to keep a queue
and cookie from the submissions with polling based on the bio.
Polling for the bio itself leads to a few advantages:
- the cookie construction can made entirely private in blk-mq.c
- the caller does not need to remember the request_queue and cookie
separately and thus sidesteps their lifetime issues
- keeping the device and the cookie inside the bio allows to trivially
support polling BIOs remapping by stacking drivers
- a lot of code to propagate the cookie back up the submission path can
be removed entirely.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Mark Wunderlich <mark.wunderlich@intel.com>
Link: https://lore.kernel.org/r/20211012111226.760968-15-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Switch the boolean spin argument to blk_poll to passing a set of flags
instead. This will allow to control polling behavior in a more fine
grained way.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Mark Wunderlich <mark.wunderlich@intel.com>
Link: https://lore.kernel.org/r/20211012111226.760968-10-hch@lst.de
[axboe: adapt to changed io_uring iopoll]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
If an iocb is split into multiple bios we can't poll for both. So don't
even bother to try to poll in that case.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211012111226.760968-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Simplify the ioctl path and match the code structure on the compat side.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211012104450.659013-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
When running ->fallocate(), blkdev_fallocate() should hold
mapping->invalidate_lock to prevent page cache from being accessed,
otherwise stale data may be read in page cache.
Without this patch, blktests block/009 fails sometimes. With this patch,
block/009 can pass always.
Also as Jan pointed out, no pages can be created in the discarded area
while you are holding the invalidate_lock, so remove the 2nd
truncate_bdev_range().
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20210923023751.1441091-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Add a new block/fops.c for all the file and address_space operations
that provide the block special file support.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210907141303.1371844-2-hch@lst.de
[axboe: correct trailing whitespace while at it]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|