Age | Commit message (Collapse) | Author | Files | Lines |
|
Since we mainly use soffset in device sector size, we therefore store
this value in rrpc->soffset, instead of the offset in 512byte sector
size. This eliminates the "(ilog2(dev->sec_size) - 9)" calculation on
each I/O.
Signed-off-by: Wenwei Tao <ww.tao0320@gmail.com>
Updated patch description.
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Calculate rrpc total blocks and sectors up front, make sense
to use them. For example, we use rrpc->nr_sects to calculate rrpc
area size, but it makes no sense if we don't initialize it up front,
since it would be zero until we finish rrpc luns init.
Signed-off-by: Wenwei Tao <ww.tao0320@gmail.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
A memory leak occurs if the lower page table is initialized and the
following dev->lun_map fails on allocation.
Rearrange the initialization of lower page table to allow dev->lun_map
to fail gracefully without memory leak.
Reviewed by: Johannes Thumshirn <jthumshirn@suse.de>
Move kfree of dev->lun_map to nvm_free()
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
The get block table command returns a list of blocks and planes
with their associated state. Users, such as gennvm and sysblk,
manages all planes as a single virtual block.
It was therefore natural to fold the bad block list before it is
returned. However, to allow users, which manages on a per-plane
block level, to also use the interface, the get_bb_tbl interface is
changed to not fold by default and instead let the caller fold if
necessary.
Reviewed by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
The flash page size (fpg) and size across planes (pfpg) are convenient
to know when allocating buffer sizes. This has previously been a
calculated in various places. Replace with the pre-calculated values.
Reviewed by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
The nvm_submit_ppa function assumes that users manage all plane
blocks as a single block. Extend the API with nvm_submit_ppa_list
to allow the user to send its own ppa list. If the user submits more
than a single PPA, the user must take care to allocate and free
the corresponding ppa list.
Reviewed by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
The device ->submit_io() callback might fail to submit I/O to device.
In that case, the nvm_submit_ppa function should not wait for
completion. Instead return the ->submit_io() error.
Reviewed by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
This fixes the following warnings:
drivers/lightnvm/sysblk.c:125:9: warning: ‘ret’ may be used
uninitialized in this function
drivers/lightnvm/sysblk.c:275:15: warning: ‘ret’ may be used
uninitialized in this function
In both cases, ret is only set from within a loop that may not be entered.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
This fixes a scenario where device is present and being reset, but a
request to unbind the driver occurs.
A previous patch series addressing a device failure removal scenario
flushed reset_work after controller disable to unblock reset_work waiting
on a completion that wouldn't occur. This isn't safe as-is. The broken
scenario can potentially be induced with:
modprobe nvme && modprobe -r nvme
To fix, the reset work is flushed immediately after setting the controller
removing flag, and any subsequent reset will not proceed with controller
initialization if the flag is set.
The controller status must be polled while active, so the watchdog timer
is also left active until the controller is disabled to cleanup requests
that may be stuck during namespace removal.
[Fixes: ff23a2a15a2117245b4599c1352343c8b8fb4c43]
Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
On receipt of a namespace attribute changed AER, we acquire the
namespace mutex lock before proceeding to scan and validate the
namespace list. In case of namespace detach/delete command,
nvme_ns_remove function deadlocks trying to acquire the already held
lock.
All callers, except nvme_remove_namespaces(), of nvme_ns_remove()
already held namespaces_mutex. So we can simply fix the deadlock by
not acquiring the mutex in nvme_ns_remove() and acquiring it in
nvme_remove_namespaces().
Reported-by: Sunad Bhandary S <sunad.s@samsung.com>
Signed-off-by: Ming Lin <ming.l@ssi.samsung.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sagi Grimerg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Switch to RCU freeing the namespace structure so that
nvme_start_queues, nvme_stop_queues and nvme_kill_queues would
be able to get away with only a RCU read side critical section.
Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lin <ming.l@ssi.samsung.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sagi Grimerg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
This hides command cleanup into nvme.h and fabrics drivers will
also use it.
Signed-off-by: Ming Lin <ming.l@ssi.samsung.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
The transport driver still needs to do the actual submission, but all the
higher level code can be shared.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Move the scan work item and surrounding code to the common code. For now
we need a new finish_scan method to allow the PCI driver to set the
irq affinity hints, but I have plans in the works to obsolete this as well.
Note that this moves the namespace scanning from nvme_wq to the system
workqueue, but as we don't rely on namespace scanning to finish from reset
or I/O this should be fine.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by Jon Derrick: <jonathan.derrick@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
We only should be scanning namespaces if the controller is live. Currently
we call the function just before setting it live, so fix the code up to
move the call to nvme_queue_scan to just below the state change.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Acked-by Jon Derrick: <jonathan.derrick@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Replace the adhoc flags in the PCI driver with a state machine in the
core code. Based on code from Sagi Grimberg for the Fabrics driver.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Acked-by Jon Derrick: <jonathan.derrick@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
It's unused since "NVMe: Move error handling to failed reset handler".
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jon Derrick <jonathan.derrick@intel.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
nvme_core_init does
nvme_core_init does:
1) register_blkdev
2) __register_chrdev
3) class_create
nvme_core_exit should do cleanup in the reverse order.
Signed-off-by: Wang Sheng-Hui <shhuiw@foxmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
If the controller fails and is degraded after a reset, we need to kill
off all requests queues before removing the inaccessble namespaces. This
will prevent del_gendisk from syncing dirty data, which we can't due
from a WQ_MEM_RECLAIM work queue.
Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
"as well as " is miss typed "as well a " in section
"config BLK_DEV_NVME_SCSI"
Signed-off-by: Wang Sheng-Hui <shhuiw@foxmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Controller IDs in NVMe are unsigned 16-bit types. In the Fabrics driver we
actually pass ctrl->id by reference, so we need it to have the correct type.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Simply creating a file system on an skd device, followed by mount and
fstrim will result in errors in the logs and then a BUG(). Let's remove
discard support from that driver. As far as I can tell, it hasn't
worked right since it was merged. This patch also has a side-effect of
cleaning up an unintentional shadowed declaration inside of
skd_end_request.
I tested to ensure that I can still do I/O to the device using xfstests
./check -g quick. I didn't do anything more extensive than that,
though.
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Now that we converted everything to the newer block write cache
interface, kill off the queue flush_flags and queueable flush
entries.
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
This patch adds a check on nvme_watchdog_timer() function to avoid the
call to reset_work() when an error recovery process is ongoing on
controller. The check is made by looking at pci_channel_offline()
result.
If we don't check for this on nvme_watchdog_timer(), error recovery
mechanism can't recover well, because reset_work() won't be able to
do its job (since we're in the middle of an error) and so the
controller is removed from the system before error recovery mechanism
can perform slot reset (which would allow the adapter to recover).
In this patch we also have split the huge condition expression on
nvme_watchdog_timer() by introducing an auxiliary function to help
make the code more readable.
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Depending on options, we might not be using dev in nvme_cancel_io():
drivers/nvme/host/pci.c: In function ‘nvme_cancel_io’:
drivers/nvme/host/pci.c:970:19: warning: unused variable ‘dev’ [-Wunused-variable]
struct nvme_dev *dev = data;
^
So get rid of it, and just cast for the dev_dbg_ratelimited() call.
Fixes: 82b4552b91c4 ("nvme: Use blk-mq helper for IO termination")
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Signed-off-by: Jens Axboe <axboe@fb.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
Signed-off-by: Jens Axboe <axboe@fb.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
Signed-off-by: Jens Axboe <axboe@fb.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
Signed-off-by: Jens Axboe <axboe@fb.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
Signed-off-by: Jens Axboe <axboe@fb.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
Signed-off-by: Jens Axboe <axboe@fb.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
Signed-off-by: Jens Axboe <axboe@fb.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
Signed-off-by: Jens Axboe <axboe@fb.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
Signed-off-by: Jens Axboe <axboe@fb.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
Signed-off-by: Jens Axboe <axboe@fb.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
Signed-off-by: Jens Axboe <axboe@fb.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
Signed-off-by: Jens Axboe <axboe@fb.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
The driver calls it with 0 for flags, since it doesn't have a writeback
cache. Just remove the call, as it's a no-op right now.
Signed-off-by: Jens Axboe <axboe@fb.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
Signed-off-by: Jens Axboe <axboe@fb.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
Signed-off-by: Jens Axboe <axboe@fb.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Switch to the newer interface, instead of using blk_queue_flush()
directly.
Signed-off-by: Jens Axboe <axboe@fb.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Only a single tags array anyway.
Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
blk-mq offers a tagset iterator so let's use that
instead of using nvme_clear_queues.
Note, we changed nvme_queue_cancel_ios name to nvme_cancel_io
as there is no concept of a queue now in this function (we
also lost the print).
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
If the controller is degraded, the driver should stay out of the way so
the user can recover the drive. This patch skips driver initiated async
event requests when the drive is in this state.
Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
This moves nvme_setup_{flush,discard,rw} calls into a common
nvme_setup_cmd() helper. So we can eventually hide all the command
setup in the core module and don't even need to update the fabrics
drivers for any specific command type.
Signed-off-by: Ming Lin <ming.l@ssi.samsung.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
This rewrites nvme_setup_discard() with blk_add_request_payload().
It allocates only the necessary amount(16 bytes) for the payload.
Signed-off-by: Ming Lin <ming.l@ssi.samsung.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
The helper returns the number of bytes that need to be mapped
using PRPs/SGL entries.
Signed-off-by: Ming Lin <ming.l@ssi.samsung.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
When unloading driver, nvme_disable_io_queues() calls nvme_delete_queue()
that sends nvme_admin_delete_cq command to admin sq. So when the command
completed, the lock acquired by nvme_irq() actually belongs to admin queue.
While the lock that nvme_del_cq_end() trying to acquire belongs to io queue.
So it will not deadlock.
This patch adds lock nesting notation to fix following report.
[ 109.840952] =============================================
[ 109.846379] [ INFO: possible recursive locking detected ]
[ 109.851806] 4.5.0+ #180 Tainted: G E
[ 109.856533] ---------------------------------------------
[ 109.861958] swapper/0/0 is trying to acquire lock:
[ 109.866771] (&(&nvmeq->q_lock)->rlock){-.....}, at: [<ffffffffc0820bc6>] nvme_del_cq_end+0x26/0x70 [nvme]
[ 109.876535]
[ 109.876535] but task is already holding lock:
[ 109.882398] (&(&nvmeq->q_lock)->rlock){-.....}, at: [<ffffffffc0820c2b>] nvme_irq+0x1b/0x50 [nvme]
[ 109.891547]
[ 109.891547] other info that might help us debug this:
[ 109.898107] Possible unsafe locking scenario:
[ 109.898107]
[ 109.904056] CPU0
[ 109.906515] ----
[ 109.908974] lock(&(&nvmeq->q_lock)->rlock);
[ 109.913381] lock(&(&nvmeq->q_lock)->rlock);
[ 109.917787]
[ 109.917787] *** DEADLOCK ***
[ 109.917787]
[ 109.923738] May be due to missing lock nesting notation
[ 109.923738]
[ 109.930558] 1 lock held by swapper/0/0:
[ 109.934413] #0: (&(&nvmeq->q_lock)->rlock){-.....}, at: [<ffffffffc0820c2b>] nvme_irq+0x1b/0x50 [nvme]
[ 109.944010]
[ 109.944010] stack backtrace:
[ 109.948389] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G E 4.5.0+ #180
[ 109.955734] Hardware name: Dell Inc. OptiPlex 7010/0YXT71, BIOS A15 08/12/2013
[ 109.962989] 0000000000000000 ffff88011e203c38 ffffffff81383d9c ffffffff81c13540
[ 109.970478] ffffffff826711d0 ffff88011e203ce8 ffffffff810bb429 0000000000000046
[ 109.977964] 0000000000000046 0000000000000000 0000000000b2e597 ffffffff81f4cb00
[ 109.985453] Call Trace:
[ 109.987911] <IRQ> [<ffffffff81383d9c>] dump_stack+0x85/0xc9
[ 109.993711] [<ffffffff810bb429>] __lock_acquire+0x19b9/0x1c60
[ 109.999575] [<ffffffff810b6d1d>] ? trace_hardirqs_off+0xd/0x10
[ 110.005524] [<ffffffff810b386d>] ? complete+0x3d/0x50
[ 110.010688] [<ffffffff810bb760>] lock_acquire+0x90/0xf0
[ 110.016029] [<ffffffffc0820bc6>] ? nvme_del_cq_end+0x26/0x70 [nvme]
[ 110.022418] [<ffffffff81772afb>] _raw_spin_lock_irqsave+0x4b/0x60
[ 110.028632] [<ffffffffc0820bc6>] ? nvme_del_cq_end+0x26/0x70 [nvme]
[ 110.035019] [<ffffffffc0820bc6>] nvme_del_cq_end+0x26/0x70 [nvme]
[ 110.041232] [<ffffffff8135b485>] blk_mq_end_request+0x35/0x60
[ 110.047095] [<ffffffffc0821ad8>] nvme_complete_rq+0x68/0x190 [nvme]
[ 110.053481] [<ffffffff8135b53f>] __blk_mq_complete_request+0x8f/0x130
[ 110.060043] [<ffffffff8135b611>] blk_mq_complete_request+0x31/0x40
[ 110.066343] [<ffffffffc08209e3>] __nvme_process_cq+0x83/0x240 [nvme]
[ 110.072818] [<ffffffffc0820c35>] nvme_irq+0x25/0x50 [nvme]
[ 110.078419] [<ffffffff810cdb66>] handle_irq_event_percpu+0x36/0x110
[ 110.084804] [<ffffffff810cdc77>] handle_irq_event+0x37/0x60
[ 110.090491] [<ffffffff810d0ea3>] handle_edge_irq+0x93/0x150
[ 110.096180] [<ffffffff81012306>] handle_irq+0xa6/0x130
[ 110.101431] [<ffffffff81011abe>] do_IRQ+0x5e/0x120
[ 110.106333] [<ffffffff8177384c>] common_interrupt+0x8c/0x8c
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Ming Lin <ming.l@ssi.samsung.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Multiple users have reported device initialization failure due the driver
not receiving legacy PCI interrupts. This is not unique to any particular
controller, but has been observed on multiple platforms.
There have been no issues reported or observed when with message signaled
interrupts, so this patch attempts to use MSI-x during initialization,
falling back to MSI. If that fails, legacy would become the default.
The setup_io_queues error handling had to change as a result: the admin
queue's msix_entry used to be initialized to the legacy IRQ. The case
where nr_io_queues is 0 would fail request_irq when setting up the admin
queue's interrupt since re-enabling MSI-x fails with 0 vectors, leaving
the admin queue's msix_entry invalid. Instead, return success immediately.
Reported-by: Tim Muhlemmer <muhlemmer@gmail.com>
Reported-by: Jon Derrick <jonathan.derrick@intel.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|