Age | Commit message (Collapse) | Author | Files | Lines |
|
Preparation for adding more declarations.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
In commit e869b5462f83 ("blk-mq: Unregister debugfs attributes
earlier"), we shuffled the debugfs cleanup around so that the "state"
attribute was removed before we freed the blk-mq data structures.
However, later changes are going to undo that, so we need to explicitly
disallow running a dead queue.
[Omar: rebased and updated commit message]
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
A large part of blk-mq-debugfs.c is file_operations and seq_file
boilerplate. This sucks as is but will suck even more when schedulers
can define their own debugfs entries. Factor it all out into a single
blk_mq_debugfs_fops which multiplexes as needed. We store the
request_queue, blk_mq_hw_ctx, or blk_mq_ctx in the parent directory
dentry, which is kind of hacky, but it works.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
It's not clear what these numbered directories represent unless you
consult the code. We're about to get rid of the intermediate "mq"
directory, so these would be even more confusing without that context.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Slightly more readable, plus we also strip leading spaces.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
blk_queue_flags_store() currently truncates and returns a short write if
the operation being written is too long. This can give us weird results,
like here:
$ echo "run bar"
echo: write error: invalid argument
$ dmesg
[ 1103.075435] blk_queue_flags_store: unsupported operation bar. Use either 'run' or 'start'
Instead, return an error if the user does this. While we're here, make
the argument names consistent with everywhere else in this file.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Make sure the spelled out flag names match the definition. This also
adds a missing hctx state, BLK_MQ_S_START_ON_RUN, and a missing
cmd_flag, __REQ_NOUNMAP.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
This reads more naturally than spaces.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
By poking at /debug/sched_features I triggered the following splat:
[] ======================================================
[] WARNING: possible circular locking dependency detected
[] 4.11.0-00873-g964c8b7-dirty #694 Not tainted
[] ------------------------------------------------------
[] bash/2109 is trying to acquire lock:
[] (cpu_hotplug_lock.rw_sem){++++++}, at: [<ffffffff8120cb8b>] static_key_slow_dec+0x1b/0x50
[]
[] but task is already holding lock:
[] (&sb->s_type->i_mutex_key#4){+++++.}, at: [<ffffffff81140216>] sched_feat_write+0x86/0x170
[]
[] which lock already depends on the new lock.
[]
[]
[] the existing dependency chain (in reverse order) is:
[]
[] -> #2 (&sb->s_type->i_mutex_key#4){+++++.}:
[] lock_acquire+0x100/0x210
[] down_write+0x28/0x60
[] start_creating+0x5e/0xf0
[] debugfs_create_dir+0x13/0x110
[] blk_mq_debugfs_register+0x21/0x70
[] blk_mq_register_dev+0x64/0xd0
[] blk_register_queue+0x6a/0x170
[] device_add_disk+0x22d/0x440
[] loop_add+0x1f3/0x280
[] loop_init+0x104/0x142
[] do_one_initcall+0x43/0x180
[] kernel_init_freeable+0x1de/0x266
[] kernel_init+0xe/0x100
[] ret_from_fork+0x31/0x40
[]
[] -> #1 (all_q_mutex){+.+.+.}:
[] lock_acquire+0x100/0x210
[] __mutex_lock+0x6c/0x960
[] mutex_lock_nested+0x1b/0x20
[] blk_mq_init_allocated_queue+0x37c/0x4e0
[] blk_mq_init_queue+0x3a/0x60
[] loop_add+0xe5/0x280
[] loop_init+0x104/0x142
[] do_one_initcall+0x43/0x180
[] kernel_init_freeable+0x1de/0x266
[] kernel_init+0xe/0x100
[] ret_from_fork+0x31/0x40
[] *** DEADLOCK ***
[]
[] 3 locks held by bash/2109:
[] #0: (sb_writers#11){.+.+.+}, at: [<ffffffff81292bcd>] vfs_write+0x17d/0x1a0
[] #1: (debugfs_srcu){......}, at: [<ffffffff8155a90d>] full_proxy_write+0x5d/0xd0
[] #2: (&sb->s_type->i_mutex_key#4){+++++.}, at: [<ffffffff81140216>] sched_feat_write+0x86/0x170
[]
[] stack backtrace:
[] CPU: 9 PID: 2109 Comm: bash Not tainted 4.11.0-00873-g964c8b7-dirty #694
[] Hardware name: Intel Corporation S2600GZ/S2600GZ, BIOS SE5C600.86B.02.02.0002.122320131210 12/23/2013
[] Call Trace:
[] lock_acquire+0x100/0x210
[] get_online_cpus+0x2a/0x90
[] static_key_slow_dec+0x1b/0x50
[] static_key_disable+0x20/0x30
[] sched_feat_write+0x131/0x170
[] full_proxy_write+0x97/0xd0
[] __vfs_write+0x28/0x120
[] vfs_write+0xb5/0x1a0
[] SyS_write+0x49/0xa0
[] entry_SYSCALL_64_fastpath+0x23/0xc2
This is because of the cpu hotplug lock rework. Break the chain at #1
by reversing the lock acquisition order. This way i_mutex_key#4 no
longer depends on cpu_hotplug_lock and things are good.
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
A previous commit introduced the sync flush, which we need from
internal callers like blk_mq_quiesce_queue(). However, we also
call the stop helpers from drivers, particularly from ->queue_rq()
when we have to stop processing for a bit. We can't block from
those locations, and we don't have to guarantee that we're
fully flushed.
Fixes: 9f993737906b ("blk-mq: unify hctx delayed_run_work and run_work")
Reviewed-by: Bart Van Assche <Bart.VanAssche@sandisk.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
After queue is frozen, no request in this queue can be in use at all, so
there can't be any .queue_rq() running on this queue. It isn't
necessary to call blk_mq_quiesce_queue() any more, so remove it in both
elevator_switch_mq() and blk_mq_update_nr_requests().
Cc: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Fixed up the description a bit.
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Remove the request_idx parameter, which can't be used safely now that we
support I/O schedulers with blk-mq. Except for a superflous check in
mtip32xx it was unused anyway.
Also pass the tag_set instead of just the driver data - this allows drivers
to avoid some code duplication in a follow on cleanup.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
We have update the troublesome driver (mtip32xx) to deal with this
appropriately. So kill the hack that bypassed scheduler allocation
and insertion for reserved requests.
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Since commit 84253394927c ("remove the mg_disk driver") removed the
only caller of elevator_change(), also remove the elevator_change()
function itself.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Markus Trippelsdorf <markus@trippelsdorf.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Pull block layer updates from Jens Axboe:
- Add BFQ IO scheduler under the new blk-mq scheduling framework. BFQ
was initially a fork of CFQ, but subsequently changed to implement
fairness based on B-WF2Q+, a modified variant of WF2Q. BFQ is meant
to be used on desktop type single drives, providing good fairness.
From Paolo.
- Add Kyber IO scheduler. This is a full multiqueue aware scheduler,
using a scalable token based algorithm that throttles IO based on
live completion IO stats, similary to blk-wbt. From Omar.
- A series from Jan, moving users to separately allocated backing
devices. This continues the work of separating backing device life
times, solving various problems with hot removal.
- A series of updates for lightnvm, mostly from Javier. Includes a
'pblk' target that exposes an open channel SSD as a physical block
device.
- A series of fixes and improvements for nbd from Josef.
- A series from Omar, removing queue sharing between devices on mostly
legacy drivers. This helps us clean up other bits, if we know that a
queue only has a single device backing. This has been overdue for
more than a decade.
- Fixes for the blk-stats, and improvements to unify the stats and user
windows. This both improves blk-wbt, and enables other users to
register a need to receive IO stats for a device. From Omar.
- blk-throttle improvements from Shaohua. This provides a scalable
framework for implementing scalable priotization - particularly for
blk-mq, but applicable to any type of block device. The interface is
marked experimental for now.
- Bucketized IO stats for IO polling from Stephen Bates. This improves
efficiency of polled workloads in the presence of mixed block size
IO.
- A few fixes for opal, from Scott.
- A few pulls for NVMe, including a lot of fixes for NVMe-over-fabrics.
From a variety of folks, mostly Sagi and James Smart.
- A series from Bart, improving our exposed info and capabilities from
the blk-mq debugfs support.
- A series from Christoph, cleaning up how handle WRITE_ZEROES.
- A series from Christoph, cleaning up the block layer handling of how
we track errors in a request. On top of being a nice cleanup, it also
shrinks the size of struct request a bit.
- Removal of mg_disk and hd (sorry Linus) by Christoph. The former was
never used by platforms, and the latter has outlived it's usefulness.
- Various little bug fixes and cleanups from a wide variety of folks.
* 'for-4.12/block' of git://git.kernel.dk/linux-block: (329 commits)
block: hide badblocks attribute by default
blk-mq: unify hctx delay_work and run_work
block: add kblock_mod_delayed_work_on()
blk-mq: unify hctx delayed_run_work and run_work
nbd: fix use after free on module unload
MAINTAINERS: bfq: Add Paolo as maintainer for the BFQ I/O scheduler
blk-mq-sched: alloate reserved tags out of normal pool
mtip32xx: use runtime tag to initialize command header
scsi: Implement blk_mq_ops.show_rq()
blk-mq: Add blk_mq_ops.show_rq()
blk-mq: Show operation, cmd_flags and rq_flags names
blk-mq: Make blk_flags_show() callers append a newline character
blk-mq: Move the "state" debugfs attribute one level down
blk-mq: Unregister debugfs attributes earlier
blk-mq: Only unregister hctxs for which registration succeeded
blk-mq-debugfs: Rename functions for registering and unregistering the mq directory
blk-mq: Let blk_mq_debugfs_register() look up the queue name
blk-mq: Register <dev>/queue/mq after having registered <dev>/queue
ide-pm: always pass 0 error to ide_complete_rq in ide_do_devset
ide-pm: always pass 0 error to __blk_end_request_all
..
|
|
Commit 99e6608c9e74 "block: Add badblock management for gendisks"
allowed for drivers like pmem and software-raid to advertise a list of
bad media areas. However, it inadvertently added a 'badblocks' to all
block devices. Lets clean this up by having the 'badblocks' attribute
not be visible when the driver has not populated a 'struct badblocks'
instance in the gendisk.
Cc: Jens Axboe <axboe@fb.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Reported-by: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Tested-by: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
The only difference between ->run_work and ->delay_work, is that
the latter is used to defer running a queue. This is done by
marking the queue stopped, and scheduling ->delay_work to run
sometime in the future. While the queue is stopped, direct runs
or runs through ->run_work will not run the queue.
If we combine the handlers, then we need to handle two things:
1) If a delayed/stopped run is scheduled, then we should not run
the queue before that has been completed.
2) If a queue is delayed/stopped, the handler needs to restart
the queue. Normally a run of a queue with the stopped bit set
would be a no-op.
Case 1 is handled by modifying a currently pending queue run
to the deadline set by the caller of blk_mq_delay_queue().
Subsequent attempts to queue a queue run will find the work
item already pending, and direct runs will see a stopped queue
as before.
Case 2 is handled by adding a new bit, BLK_MQ_S_START_ON_RUN,
that tells the work handler that it should clear a stopped
queue and run the handler.
Reviewed-by: Bart Van Assche <Bart.VanAssche@sandisk.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
This modifies (or adds, if not currently pending) an existing
delayed work item.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <Bart.VanAssche@sandisk.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
They serve the exact same purpose. Get rid of the non-delayed
work variant, and just run it without delay for the normal case.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <Bart.VanAssche@sandisk.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
At least one driver, mtip32xx, has a hard coded dependency on
the value of the reserved tag used for internal commands. While
that should really be fixed up, for now let's ensure that we just
bypass the scheduler tags an allocation marked as reserved. They
are used for house keeping or error handling, so we can safely
ignore them in the scheduler.
Tested-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
This new callback function will be used in the next patch to show
more information about SCSI requests.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Cc: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Show the operation name, .cmd_flags and .rq_flags as names instead
of numbers.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
This patch does not change any functionality but makes it possible
to produce a single line of output with multiple flag-to-name
translations.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Move the "state" attribute from the top level to the "mq" directory
as requested by Omar.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
We currently call blk_mq_free_queue() from blk_cleanup_queue()
before we unregister the debugfs attributes for that queue in
blk_release_queue(). This leaves a window open during which
accessing most of the mq debugfs attributes would cause a
use-after-free. Additionally, the "state" attribute allows
running the queue, which we should not do after the queue has
entered the "dead" state. Fix both cases by unregistering the
debugfs attributes before freeing queue resources starts.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Hctx unregistration involves calling kobject_del(). kobject_del()
must not be called if kobject_add() has not been called. Hence in
the error path only unregister hctxs for which registration succeeded.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Omar Sandoval <osandov@fb.com>
Cc: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
directory
Since the blk_mq_debugfs_*register_hctxs() functions register and
unregister all attributes under the "mq" directory, rename these
into blk_mq_debugfs_*register_mq().
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
A later patch will move the call of blk_mq_debugfs_register() to
a function to which the queue name is not passed as an argument.
To avoid having to add a 'name' argument to multiple callers, let
blk_mq_debugfs_register() look up the queue name.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
A later patch in this series will modify blk_mq_debugfs_register()
such that it uses q->kobj.parent to determine the name of a
request queue. Hence make sure that that pointer is initialized
before blk_mq_debugfs_register() is called. To avoid lock inversion,
protect sysfs / debugfs registration with the queue sysfs_lock
instead of the global mutex all_q_mutex.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
When registering an integrity profile: if the template's interval_exp is
not 0 use it, otherwise use the ilog2() of logical block size of the
provided gendisk.
This fixes a long-standing DM linear target bug where it cannot pass
integrity data to the underlying device if its logical block size
conflicts with the underlying device's logical block size.
Cc: stable@vger.kernel.org
Reported-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Commit 25520d55cdb6 ("block: Inline blk_integrity in struct gendisk")
introduced blk_integrity_revalidate(), which seems to assume ownership
of the stable pages flag and unilaterally clears it if no blk_integrity
profile is registered:
if (bi->profile)
disk->queue->backing_dev_info->capabilities |=
BDI_CAP_STABLE_WRITES;
else
disk->queue->backing_dev_info->capabilities &=
~BDI_CAP_STABLE_WRITES;
It's called from revalidate_disk() and rescan_partitions(), making it
impossible to enable stable pages for drivers that support partitions
and don't use blk_integrity: while the call in revalidate_disk() can be
trivially worked around (see zram, which doesn't support partitions and
hence gets away with zram_revalidate_disk()), rescan_partitions() can
be triggered from userspace at any time. This breaks rbd, where the
ceph messenger is responsible for generating/verifying CRCs.
Since blk_integrity_{un,}register() "must" be used for (un)registering
the integrity profile with the block layer, move BDI_CAP_STABLE_WRITES
setting there. This way drivers that call blk_integrity_register() and
use integrity infrastructure won't interfere with drivers that don't
but still want stable pages.
Fixes: 25520d55cdb6 ("block: Inline blk_integrity in struct gendisk")
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org # 4.4+, needs backporting
Tested-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Avoid that the following kernel bug gets triggered:
BUG: sleeping function called from invalid context at ./include/linux/buffer_head.h:349
in_atomic(): 1, irqs_disabled(): 0, pid: 8019, name: find
CPU: 10 PID: 8019 Comm: find Tainted: G W I 4.11.0-rc4-dbg+ #2
Call Trace:
dump_stack+0x68/0x93
___might_sleep+0x16e/0x230
__might_sleep+0x4a/0x80
__ext4_get_inode_loc+0x1e0/0x4e0
ext4_iget+0x70/0xbc0
ext4_iget_normal+0x2f/0x40
ext4_lookup+0xb6/0x1f0
lookup_slow+0x104/0x1e0
walk_component+0x19a/0x330
path_lookupat+0x4b/0x100
filename_lookup+0x9a/0x110
user_path_at_empty+0x36/0x40
vfs_statx+0x67/0xc0
SYSC_newfstatat+0x20/0x40
SyS_newfstatat+0xe/0x10
entry_SYSCALL_64_fastpath+0x18/0xad
This happens since the big if/else in blk_mq_make_request() doesn't
have final else section that also drops the ctx. Add that.
Fixes: b00c53e8f411 ("blk-mq: fix schedule-while-atomic with scheduler attached")
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Omar Sandoval <osandov@fb.com>
Added a bit more to the commit log.
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
No point in providing and exporting this helper. There's just
one (real) user of it, just use rq_data_dir().
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
commit c13660a08c8b ("blk-mq-sched: change ->dispatch_requests()
to ->dispatch_request()") removed the last user of this function.
Hence also remove the function itself.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Omar Sandoval <osandov@fb.com>
Cc: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
If the caller passes in wait=true, it has to be able to block
for a driver tag. We just had a bug where flush insertion
would block on tag allocation, while we had preempt disabled.
Ensure that we catch cases like that earlier next time.
Reviewed-by: Bart Van Assche <Bart.VanAssche@sandisk.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Fixes an issue where the size of the poll_stat array in request_queue
does not match the size expected by the new size based bucketing for
IO completion polling.
Fixes: 720b8ccc4500 ("blk-mq: Add a polling specific stats function")
Signed-off-by: Stephen Bates <sbates@raithlin.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
We must have dropped the ctx before we call
blk_mq_sched_insert_request() with can_block=true, otherwise we risk
that a flush request can block on insertion if we are currently out of
tags.
[ 47.667190] BUG: scheduling while atomic: jbd2/sda2-8/2089/0x00000002
[ 47.674493] Modules linked in: x86_pkg_temp_thermal btrfs xor zlib_deflate raid6_pq sr_mod cdre
[ 47.690572] Preemption disabled at:
[ 47.690584] [<ffffffff81326c7c>] blk_mq_sched_get_request+0x6c/0x280
[ 47.701764] CPU: 1 PID: 2089 Comm: jbd2/sda2-8 Not tainted 4.11.0-rc7+ #271
[ 47.709630] Hardware name: Dell Inc. PowerEdge T630/0NT78X, BIOS 2.3.4 11/09/2016
[ 47.718081] Call Trace:
[ 47.720903] dump_stack+0x4f/0x73
[ 47.724694] ? blk_mq_sched_get_request+0x6c/0x280
[ 47.730137] __schedule_bug+0x6c/0xc0
[ 47.734314] __schedule+0x559/0x780
[ 47.738302] schedule+0x3b/0x90
[ 47.741899] io_schedule+0x11/0x40
[ 47.745788] blk_mq_get_tag+0x167/0x2a0
[ 47.750162] ? remove_wait_queue+0x70/0x70
[ 47.754901] blk_mq_get_driver_tag+0x92/0xf0
[ 47.759758] blk_mq_sched_insert_request+0x134/0x170
[ 47.765398] ? blk_account_io_start+0xd0/0x270
[ 47.770679] blk_mq_make_request+0x1b2/0x850
[ 47.775766] generic_make_request+0xf7/0x2d0
[ 47.780860] submit_bio+0x5f/0x120
[ 47.784979] ? submit_bio+0x5f/0x120
[ 47.789631] submit_bh_wbc.isra.46+0x10d/0x130
[ 47.794902] submit_bh+0xb/0x10
[ 47.798719] journal_submit_commit_record+0x190/0x210
[ 47.804686] ? _raw_spin_unlock+0x13/0x30
[ 47.809480] jbd2_journal_commit_transaction+0x180a/0x1d00
[ 47.815925] kjournald2+0xb6/0x250
[ 47.820022] ? kjournald2+0xb6/0x250
[ 47.824328] ? remove_wait_queue+0x70/0x70
[ 47.829223] kthread+0x10e/0x140
[ 47.833147] ? commit_timeout+0x10/0x10
[ 47.837742] ? kthread_create_on_node+0x40/0x40
[ 47.843122] ret_from_fork+0x29/0x40
Fixes: a4d907b6a33b ("blk-mq: streamline blk_mq_make_request")
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Rather than bucketing IO statisics based on direction only we also
bucket based on the IO size. This leads to improved polling
performance. Update the bucket callback function and use it in the
polling latency estimation.
Signed-off-by: Stephen Bates <sbates@raithlin.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
In order to allow for filtering of IO based on some other properties
of the request than direction we allow the bucket function to return
an int.
If the bucket callback returns a negative do no count it in the stats
accumulation.
Signed-off-by: Stephen Bates <sbates@raithlin.com>
Fixed up Kyber scheduler stat callback.
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
If we have a scheduler attached, blk_mq_tag_to_rq() on the
scheduled tags will return NULL if a request is no longer
in flight. This is different than using the normal tags,
where it will always return the fixed request. Check for
this condition for polling, in case we happen to enter
polling for a completed request.
The request address remains valid, so this check and return
should be perfectly safe.
Fixes: bd166ef183c2 ("blk-mq-sched: add framework for MQ capable IO schedulers")
Tested-by: Stephen Bates <sbates@raithlin.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <Bart.VanAssche@sandisk.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Merge blk_mq_ipi_complete_request and blk_mq_stat_add into their only
caller.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Now that all drivers that call blk_mq_complete_requests have a
->complete callback we can remove the direct call to blk_mq_end_request,
as well as the error argument to blk_mq_complete_request.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Bart Van Assche <Bart.VanAssche@sandisk.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
This passes on the scsi_cmnd result field to users of passthrough
requests. Currently we abuse req->errors for this purpose, but that
field will go away in its current form.
Note that the old IDE code abuses the errors field in very creative
ways and stores all kinds of different values in it. I didn't dare
to touch this magic, so the abuses are brought forward 1:1.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Bart Van Assche <Bart.VanAssche@sandisk.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
The function only returns -EIO if rq->errors is non-zero, which is not
very useful and lets a large number of callers ignore the return value.
Just let the callers figure out their error themselves.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Bart Van Assche <Bart.VanAssche@sandisk.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
We trigger this warning:
block/blk-throttle.c: In function ‘blk_throtl_bio’:
block/blk-throttle.c:2042:6: warning: variable ‘ret’ set but not used [-Wunused-but-set-variable]
int ret;
^~~
since we only assign 'ret' if BLK_DEV_THROTTLING_LOW is off, we never
check it.
Reported-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
If we don't have CGROUPS enabled, the compile ends in the
following misery:
In file included from ../block/bfq-iosched.c:105:0:
../block/bfq-iosched.h:819:22: error: array type has incomplete element type
extern struct cftype bfq_blkcg_legacy_files[];
^
../block/bfq-iosched.h:820:22: error: array type has incomplete element type
extern struct cftype bfq_blkg_files[];
^
Move the declarations under the right ifdef.
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
The call to bfq_check_ioprio_change will dereference bic, however,
the null check for bic is after this call. Move the the null
check on bic to before the call to avoid any potential null
pointer dereference issues.
Detected by CoverityScan, CID#1430138 ("Dereference before null check")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Since ioprio_best() translates IOPRIO_CLASS_NONE into IOPRIO_CLASS_BE
and since lower numerical priority values represent a higher priority
a simple numerical comparison is sufficient.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Adam Manzanares <adam.manzanares@wdc.com>
Tested-by: Adam Manzanares <adam.manzanares@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Since only a single caller remains, inline blk_rq_set_prio(). Initialize
req->ioprio even if no I/O priority has been set in the bio nor in the
I/O context.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Adam Manzanares <adam.manzanares@wdc.com>
Tested-by: Adam Manzanares <adam.manzanares@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
|