summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2020-05-29null_blk: force complete for timeout requestDongli Zhang1-1/+1
The commit 7b11eab041da ("blk-mq: blk-mq: provide forced completion method") exports new API to force a request to complete without error injection. There should be no error injection when completing a request by timeout handler. Otherwise, the below would hang because timeout handler is failed. echo 100 > /sys/kernel/debug/fail_io_timeout/probability echo 1000 > /sys/kernel/debug/fail_io_timeout/times echo 1 > /sys/block/nullb0/io-timeout-fail dd if=/dev/zero of=/dev/nullb0 bs=512 count=1 oflag=direct With this patch, the timeout handler is able to complete the IO. Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-29blk-mq: drain I/O when all CPUs in a hctx are offlineMing Lei7-4/+133
Most of blk-mq drivers depend on managed IRQ's auto-affinity to setup up queue mapping. Thomas mentioned the following point[1]: "That was the constraint of managed interrupts from the very beginning: The driver/subsystem has to quiesce the interrupt line and the associated queue _before_ it gets shutdown in CPU unplug and not fiddle with it until it's restarted by the core when the CPU is plugged in again." However, current blk-mq implementation doesn't quiesce hw queue before the last CPU in the hctx is shutdown. Even worse, CPUHP_BLK_MQ_DEAD is a cpuhp state handled after the CPU is down, so there isn't any chance to quiesce the hctx before shutting down the CPU. Add new CPUHP_AP_BLK_MQ_ONLINE state to stop allocating from blk-mq hctxs where the last CPU goes away, and wait for completion of in-flight requests. This guarantees that there is no inflight I/O before shutting down the managed IRQ. Add a BLK_MQ_F_STACKING and set it for dm-rq and loop, so we don't need to wait for completion of in-flight requests from these drivers to avoid a potential dead-lock. It is safe to do this for stacking drivers as those do not use interrupts at all and their I/O completions are triggered by underlying devices I/O completion. [1] https://lore.kernel.org/linux-block/alpine.DEB.2.21.1904051331270.1802@nanos.tec.linutronix.de/ [hch: different retry mechanism, merged two patches, minor cleanups] Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-29blk-mq: add blk_mq_all_tag_iterMing Lei2-18/+34
Add a new blk_mq_all_tag_iter function to iterate over all allocated scheduler tags and driver tags. This is more flexible than the existing blk_mq_all_tag_busy_iter function as it allows the callers to do whatever they want on allocated request instead of being limited to started requests. It will be used to implement draining allocated requests on specified hctx in this patchset. [hch: switch from the two booleans to a more readable flags field and consolidate the tags iter functions] Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Daniel Wagner <dwagner@suse.de> Reviewed-by: Bart van Assche <bvanassche@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-29blk-mq: open code __blk_mq_alloc_request in blk_mq_alloc_request_hctxChristoph Hellwig1-21/+23
blk_mq_alloc_request_hctx is only used for NVMeoF connect commands, so tailor it to the specific requirements, and don't bother the general fast path code with its special twinkles. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.de Reviewed-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-29blk-mq: use BLK_MQ_NO_TAG in more placesChristoph Hellwig3-13/+13
Replace various magic -1 constants for tags with BLK_MQ_NO_TAG. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-29blk-mq: rename BLK_MQ_TAG_FAIL to BLK_MQ_NO_TAGChristoph Hellwig3-5/+5
To prepare for wider use of this constant give it a more applicable name. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-29blk-mq: move more request initialization to blk_mq_rq_ctx_initChristoph Hellwig1-17/+19
Don't split request initialization between __blk_mq_alloc_request and blk_mq_rq_ctx_init. Also remove the op argument as it can be derived from the blk_mq_alloc_data structure. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-29blk-mq: simplify the blk_mq_get_request calling conventionChristoph Hellwig1-14/+22
The bio argument is entirely unused, and the request_queue can be passed through the alloc_data, given that it needs to be filled out for the low-level tag allocation anyway. Also rename the function to __blk_mq_alloc_request as the switch between get and alloc in the call chains is rather confusing. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-29blk-mq: remove the bio argument to ->prepare_requestChristoph Hellwig5-5/+5
None of the I/O schedulers actually needs it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-29nvme: force complete cancelled requestsKeith Busch1-1/+1
Use blk_mq_foce_complete_rq() to bypass fake timeout error injection so that request reclaim may proceed. Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-29blk-mq: blk-mq: provide forced completion methodKeith Busch2-2/+14
Drivers may need to bypass error injection for error recovery. Rename __blk_mq_complete_request() to blk_mq_force_complete_rq() and export that function so drivers may skip potential fake timeouts after they've reclaimed lost requests. Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-28block: fix a warning when blkdev.h is included for !CONFIG_BLOCK buildsChristoph Hellwig1-1/+1
disk_start_io_acct and disk_end_io_acct need at least a struct gendisk forward declaration, but for weird historic reasons much of blkdev.h is stubbed out for CONFIG_BLOCK=n. Fix this by stubbing more out for now, but eventually this header will need a massive cleanup. Fixes: 956d510ee78 ("block: add disk/bio-based accounting helpers") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-27block: blk-crypto-fallback: remove redundant initialization of variable errColin Ian King1-1/+1
The variable err is being initialized with a value that is never read and it is being updated later with a new value. The initialization is redundant and can be removed. Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Satya Tangirala <satyat@google.com> Addresses-Coverity: ("Unused value") Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-27block: reduce part_stat_lock() scopeChristoph Hellwig2-3/+5
We only need the stats lock (aka preempt_disable()) for updating the states, not for looking up or dropping the hd_struct reference. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-27block: use __this_cpu_add() instead of access by smp_processor_id()Konstantin Khlebnikov1-1/+1
Most architectures have fast path to access percpu for current cpu. The required preempt_disable() is provided by part_stat_lock(). [hch: rebased] Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-27block: remove rcu_read_lock() from part_stat_lock()Konstantin Khlebnikov2-5/+10
The RCU lock is required only in disk_map_sector_rcu() to lookup the partition. After that request holds reference to related hd_struct. Replace get_cpu() with preempt_disable() - returned cpu index is unused. [hch: rebased] Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-27block: add a blk_account_io_merge_bio helperKonstantin Khlebnikov4-12/+19
Move the non-"new_io" branch of blk_account_io_start() into separate function. Fix merge accounting for discards (they were counted as write merges). The new blk_account_io_merge_bio() doesn't call update_io_ticks() unlike blk_account_io_start(), as there is no reason for that. [hch: rebased] Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-27block: account merge of two requestsKonstantin Khlebnikov1-7/+5
Also rename blk_account_io_merge() into blk_account_io_merge_request() to distinguish it from merging request and bio. [hch: rebased] Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-27block: always use a percpu variable for disk statsChristoph Hellwig5-69/+18
percpu variables have a perfectly fine working stub implementation for UP kernels, so use that. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-27block: move update_io_ticks to blk-core.cChristoph Hellwig3-17/+15
All callers are in blk-core.c, so move update_io_ticks over. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-27block: remove generic_{start,end}_io_acctChristoph Hellwig2-45/+0
Remove these now unused functions. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-27zram: nvdimm: use bio_{start,end}_io_acct and disk_{start,end}_io_acctChristoph Hellwig1-14/+10
Switch zram to use the nicer bio accounting helpers, and as part of that ensure each bio is counted as a single I/O request. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-27nvdimm: use bio_{start,end}_io_acctChristoph Hellwig4-25/+12
Switch dm to use the nicer bio accounting helpers. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-27dm: use bio_{start,end}_io_acctChristoph Hellwig1-7/+2
Switch dm to use the nicer bio accounting helpers. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-27bcache: use bio_{start,end}_io_acctChristoph Hellwig1-14/+4
Switch bcache to use the nicer bio accounting helpers, and call the routines where we also sample the start time to give coherent accounting results. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Acked-by: Coly Li <colyli@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-27lightnvm/pblk: use bio_{start,end}_io_acctChristoph Hellwig2-12/+7
Switch rsxx to use the nicer bio accounting helpers. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-27rsxx: use bio_{start,end}_io_acctChristoph Hellwig1-17/+2
Switch rsxx to use the nicer bio accounting helpers. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-27drbd: use bio_{start,end}_io_acctChristoph Hellwig1-23/+4
Switch drbd to use the nicer bio accounting helpers. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-27block: add disk/bio-based accounting helpersChristoph Hellwig2-0/+62
Add two new helpers to simplify I/O accounting for bio based drivers. Currently these drivers use the generic_start_io_acct and generic_end_io_acct helpers which have very cumbersome calling conventions, don't actually return the time they started accounting, and try to deal with accounting for partitions, which can't happen for bio based drivers. The new helpers will be used to subsequently replace uses of the old helpers. The main API is the bio based wrappes in blkdev.h, but for zram which wants to account rw_page based I/O lower level routines are provided as well. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-22block: remove the disk and queue NULL checks in blkdev_issue_flushChristoph Hellwig1-8/+0
Both of these never can be NULL for a live block device. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-22block: remove the error_sector argument to blkdev_issue_flushChristoph Hellwig22-41/+27
The argument isn't used by any caller, and drivers don't fill out bi_sector for flush requests either. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-19block: Remove unused flush_queue_delayed in struct blk_flush_queueBaolin Wang2-2/+0
The flush_queue_delayed was introdued to hold queue if flush is running for non-queueable flush drive by commit 3ac0cc450870 ("hold queue if flush is running for non-queueable flush drive"), but the non mq parts of the flush code had been removed by commit 7e992f847a08 ("block: remove non mq parts from the flush code"), as well as removing the usage of the flush_queue_delayed flag. Thus remove the unused flush_queue_delayed flag. Signed-off-by: Baolin Wang <baolin.wang7@gmail.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-19null_blk: Zero-initialize read buffers in non-memory-backed modeBart Van Assche1-0/+26
This patch suppresses an uninteresting KMSAN complaint without affecting performance of the null_blk driver if CONFIG_KMSAN is disabled. Reported-by: Alexander Potapenko <glider@google.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Tested-by: Alexander Potapenko <glider@google.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Ming Lei <ming.lei@redhat.com> Cc: Damien Le Moal <damien.lemoal@wdc.com> Cc: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Cc: Alexander Potapenko <glider@google.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-19block: Document the bio_vec propertiesBart Van Assche1-2/+11
Since it is nontrivial that nth_page() does not have to be used for a bio_vec, document this. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Christoph Hellwig <hch@lst.de> CC: Christoph Hellwig <hch@infradead.org> Cc: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-19bio.h: Declare the arguments of the bio iteration functions constBart Van Assche1-3/+3
This change makes it possible to pass 'const struct bio *' arguments to these functions. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Cc: Ming Lei <ming.lei@redhat.com> Cc: Damien Le Moal <damien.lemoal@wdc.com> Cc: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Cc: Alexander Potapenko <glider@google.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-19block: Fix type of first compat_put_{,u}long() argumentBart Van Assche1-2/+2
This patch fixes the following sparse warnings: block/ioctl.c:209:16: warning: incorrect type in argument 1 (different address spaces) block/ioctl.c:209:16: expected void const volatile [noderef] <asn:1> * block/ioctl.c:209:16: got signed int [usertype] *argp block/ioctl.c:214:16: warning: incorrect type in argument 1 (different address spaces) block/ioctl.c:214:16: expected void const volatile [noderef] <asn:1> * block/ioctl.c:214:16: got unsigned int [usertype] *argp block/ioctl.c:666:40: warning: incorrect type in argument 1 (different address spaces) block/ioctl.c:666:40: expected signed int [usertype] *argp block/ioctl.c:666:40: got void [noderef] <asn:1> *argp block/ioctl.c:672:41: warning: incorrect type in argument 1 (different address spaces) block/ioctl.c:672:41: expected unsigned int [usertype] *argp block/ioctl.c:672:41: got void [noderef] <asn:1> *argp Fixes: 9b81648cb5e3 ("compat_ioctl: simplify up block/ioctl.c") Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Arnd Bergmann <arnd@arndb.de> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-19block: merge part_{inc,dev}_in_flight into their only callersChristoph Hellwig3-26/+8
part_inc_in_flight and part_dec_in_flight only have one caller each, and those callers are purely for bio based drivers. Merge each function into the only caller, and remove the superflous blk-mq checks. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-19block: don't call part_{inc,dec}_in_flight for blk-mq devicesChristoph Hellwig2-18/+5
part_inc_in_flight and part_dec_in_flight are no-ops for blk-mq queues, so remove the calls in purely blk-mq callers. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-19block: move the blk-mq calls out of part_in_flight{,_rw}Christoph Hellwig1-14/+14
Don't bother to call part_in_flight / part_in_flight_rw on blk-mq devices, just call the blk-mq versions directly. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-19block: mark blk_account_io_completion staticChristoph Hellwig2-2/+1
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-19blk-mq: allow blk_mq_make_request to consume the q_usage_counter referenceChristoph Hellwig4-31/+37
blk_mq_make_request currently needs to grab an q_usage_counter reference when allocating a request. This is because the block layer grabs one before calling blk_mq_make_request, but also releases it as soon as blk_mq_make_request returns. Remove the blk_queue_exit call after blk_mq_make_request returns, and instead let it consume the reference. This works perfectly fine for the block layer caller, just device mapper needs an extra reference as the old problem still persists there. Open code blk_queue_enter_live in device mapper, as there should be no other callers and this allows better documenting why we do a non-try get. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-19blk-mq: remove a pointless queue enter pair in blk_mq_alloc_request_hctxChristoph Hellwig1-3/+0
No need for two queue references. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-19blk-mq: remove a pointless queue enter pair in blk_mq_alloc_requestChristoph Hellwig1-3/+0
No need for two queue references. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-19blk-mq: move the call to blk_queue_enter_live out of blk_mq_get_requestChristoph Hellwig1-11/+16
Move the blk_queue_enter_live calls into the callers, where they can successively be cleaned up. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-16blktrace: Report pid with note messagesJan Kara1-2/+2
Currently informational messages within block trace do not have PID information of the process reporting the message included. With BFQ it is sometimes useful to have the information and there's no good reason to omit the information from the trace. So just fill in pid information when generating note message. Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Acked-by: Paolo Valente <paolo.valente@linaro.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-16block: remove the REQ_NOWAIT_INLINE flagChristoph Hellwig1-2/+0
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-14block: blk-crypto-fallback for Inline EncryptionSatya Tangirala6-21/+752
Blk-crypto delegates crypto operations to inline encryption hardware when available. The separately configurable blk-crypto-fallback contains a software fallback to the kernel crypto API - when enabled, blk-crypto will use this fallback for en/decryption when inline encryption hardware is not available. This lets upper layers not have to worry about whether or not the underlying device has support for inline encryption before deciding to specify an encryption context for a bio. It also allows for testing without actual inline encryption hardware - in particular, it makes it possible to test the inline encryption code in ext4 and f2fs simply by running xfstests with the inlinecrypt mount option, which in turn allows for things like the regular upstream regression testing of ext4 to cover the inline encryption code paths. For more details, refer to Documentation/block/inline-encryption.rst. Signed-off-by: Satya Tangirala <satyat@google.com> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-14block: Make blk-integrity preclude hardware inline encryptionSatya Tangirala4-0/+59
Whenever a device supports blk-integrity, make the kernel pretend that the device doesn't support inline encryption (essentially by setting the keyslot manager in the request queue to NULL). There's no hardware currently that supports both integrity and inline encryption. However, it seems possible that there will be such hardware in the near future (like the NVMe key per I/O support that might support both inline encryption and PI). But properly integrating both features is not trivial, and without real hardware that implements both, it is difficult to tell if it will be done correctly by the majority of hardware that support both. So it seems best not to support both features together right now, and to decide what to do at probe time. Signed-off-by: Satya Tangirala <satyat@google.com> Reviewed-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-14block: Inline encryption support for blk-mqSatya Tangirala14-7/+684
We must have some way of letting a storage device driver know what encryption context it should use for en/decrypting a request. However, it's the upper layers (like the filesystem/fscrypt) that know about and manages encryption contexts. As such, when the upper layer submits a bio to the block layer, and this bio eventually reaches a device driver with support for inline encryption, the device driver will need to have been told the encryption context for that bio. We want to communicate the encryption context from the upper layer to the storage device along with the bio, when the bio is submitted to the block layer. To do this, we add a struct bio_crypt_ctx to struct bio, which can represent an encryption context (note that we can't use the bi_private field in struct bio to do this because that field does not function to pass information across layers in the storage stack). We also introduce various functions to manipulate the bio_crypt_ctx and make the bio/request merging logic aware of the bio_crypt_ctx. We also make changes to blk-mq to make it handle bios with encryption contexts. blk-mq can merge many bios into the same request. These bios need to have contiguous data unit numbers (the necessary changes to blk-merge are also made to ensure this) - as such, it suffices to keep the data unit number of just the first bio, since that's all a storage driver needs to infer the data unit number to use for each data block in each bio in a request. blk-mq keeps track of the encryption context to be used for all the bios in a request with the request's rq_crypt_ctx. When the first bio is added to an empty request, blk-mq will program the encryption context of that bio into the request_queue's keyslot manager, and store the returned keyslot in the request's rq_crypt_ctx. All the functions to operate on encryption contexts are in blk-crypto.c. Upper layers only need to call bio_crypt_set_ctx with the encryption key, algorithm and data_unit_num; they don't have to worry about getting a keyslot for each encryption context, as blk-mq/blk-crypto handles that. Blk-crypto also makes it possible for request-based layered devices like dm-rq to make use of inline encryption hardware by cloning the rq_crypt_ctx and programming a keyslot in the new request_queue when necessary. Note that any user of the block layer can submit bios with an encryption context, such as filesystems, device-mapper targets, etc. Signed-off-by: Satya Tangirala <satyat@google.com> Reviewed-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-14block: Keyslot Manager for Inline EncryptionSatya Tangirala6-0/+550
Inline Encryption hardware allows software to specify an encryption context (an encryption key, crypto algorithm, data unit num, data unit size) along with a data transfer request to a storage device, and the inline encryption hardware will use that context to en/decrypt the data. The inline encryption hardware is part of the storage device, and it conceptually sits on the data path between system memory and the storage device. Inline Encryption hardware implementations often function around the concept of "keyslots". These implementations often have a limited number of "keyslots", each of which can hold a key (we say that a key can be "programmed" into a keyslot). Requests made to the storage device may have a keyslot and a data unit number associated with them, and the inline encryption hardware will en/decrypt the data in the requests using the key programmed into that associated keyslot and the data unit number specified with the request. As keyslots are limited, and programming keys may be expensive in many implementations, and multiple requests may use exactly the same encryption contexts, we introduce a Keyslot Manager to efficiently manage keyslots. We also introduce a blk_crypto_key, which will represent the key that's programmed into keyslots managed by keyslot managers. The keyslot manager also functions as the interface that upper layers will use to program keys into inline encryption hardware. For more information on the Keyslot Manager, refer to documentation found in block/keyslot-manager.c and linux/keyslot-manager.h. Co-developed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Satya Tangirala <satyat@google.com> Reviewed-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>