summaryrefslogtreecommitdiff
path: root/drivers/nvme/target
AgeCommit message (Collapse)AuthorFilesLines
2023-10-25nvmet-tcp: Fix a possible UAF in queue intialization setupSagi Grimberg1-5/+2
commit d920abd1e7c4884f9ecd0749d1921b7ab19ddfbd upstream. From Alon: "Due to a logical bug in the NVMe-oF/TCP subsystem in the Linux kernel, a malicious user can cause a UAF and a double free, which may lead to RCE (may also lead to an LPE in case the attacker already has local privileges)." Hence, when a queue initialization fails after the ahash requests are allocated, it is guaranteed that the queue removal async work will be called, hence leave the deallocation to the queue removal. Also, be extra careful not to continue processing the socket, so set queue rcv_state to NVMET_TCP_RECV_ERR upon a socket error. Cc: stable@vger.kernel.org Reported-by: Alon Zahavi <zahavi.alon@gmail.com> Tested-by: Alon Zahavi <zahavi.alon@gmail.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-05-17nvme-fcloop: fix "inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage"Ming Lei1-21/+27
[ Upstream commit 4f86a6ff6fbd891232dda3ca97fd1b9630b59809 ] fcloop_fcp_op() could be called from flush request's ->end_io(flush_end_io) in which the spinlock of fq->mq_flush_lock is grabbed with irq saved/disabled. So fcloop_fcp_op() can't call spin_unlock_irq(&tfcp_req->reqlock) simply which enables irq unconditionally. Fixes the warning by switching to spin_lock_irqsave()/spin_unlock_irqrestore() Fixes: c38dbbfab1bc ("nvme-fcloop: fix inconsistent lock state warnings") Reported-by: Yi Zhang <yi.zhang@redhat.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Ewan D. Milne <emilne@redhat.com> Tested-by: Yi Zhang <yi.zhang@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-03-22nvmet: avoid potential UAF in nvmet_req_complete()Damien Le Moal1-1/+3
[ Upstream commit 6173a77b7e9d3e202bdb9897b23f2a8afe7bf286 ] An nvme target ->queue_response() operation implementation may free the request passed as argument. Such implementation potentially could result in a use after free of the request pointer when percpu_ref_put() is called in nvmet_req_complete(). Avoid such problem by using a local variable to save the sq pointer before calling __nvmet_req_complete(), thus avoiding dereferencing the req pointer after that function call. Fixes: a07b4970f464 ("nvmet: add a generic NVMe target") Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-02-22nvme-fc: fix a missing queue put in nvmet_fc_ls_create_associationAmit Engel1-1/+3
[ Upstream commit 0cab4404874f2de52617de8400c844891c6ea1ce ] As part of nvmet_fc_ls_create_association there is a case where nvmet_fc_alloc_target_queue fails right after a new association with an admin queue is created. In this case, no one releases the get taken in nvmet_fc_alloc_target_assoc. This fix is adding the missing put. Signed-off-by: Amit Engel <Amit.Engel@dell.com> Reviewed-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-01-14nvmet: don't defer passthrough commands with trivial effects to the workqueueChristoph Hellwig1-6/+5
[ Upstream commit 2a459f6933e1c459bffb7cc73fd6c900edc714bd ] Mask out the "Command Supported" and "Logical Block Content Change" bits and only defer execution of commands that have non-trivial effects to the workqueue for synchronous execution. This allows to execute admin commands asynchronously on controllers that provide a Command Supported and Effects log page, and will keep allowing to execute Write commands asynchronously once command effects on I/O commands are taken into account. Fixes: c1fef73f793b ("nvmet: add passthru code to process commands") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-10-26nvmet-tcp: add bounds check on Transfer TagVarun Prakash1-2/+9
[ Upstream commit b6a545ffa2c192b1e6da4a7924edac5ba9f4ea2b ] ttag is used as an index to get cmd in nvmet_tcp_handle_h2c_data_pdu(), add a bounds check to avoid out-of-bounds access. Signed-off-by: Varun Prakash <varun@chelsio.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-09-20nvmet-tcp: fix unhandled tcp states in nvmet_tcp_state_change()Maurizio Lombardi1-0/+3
[ Upstream commit 478814a5584197fa1fb18377653626e3416e7cd6 ] TCP_FIN_WAIT2 and TCP_LAST_ACK were not handled, the connection is closing so we can ignore them and avoid printing the "unhandled state" warning message. [ 1298.852386] nvmet_tcp: queue 2 unhandled state 5 [ 1298.879112] nvmet_tcp: queue 7 unhandled state 5 [ 1298.884253] nvmet_tcp: queue 8 unhandled state 5 [ 1298.889475] nvmet_tcp: queue 9 unhandled state 5 v2: Do not call nvmet_tcp_schedule_release_queue(), just ignore the fin_wait2 and last_ack states. Signed-off-by: Maurizio Lombardi <mlombard@redhat.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-09-15nvmet: fix a use-after-freeBart Van Assche1-2/+4
commit 6a02a61e81c231cc5c680c5dbf8665275147ac52 upstream. Fix the following use-after-free complaint triggered by blktests nvme/004: BUG: KASAN: user-memory-access in blk_mq_complete_request_remote+0xac/0x350 Read of size 4 at addr 0000607bd1835943 by task kworker/13:1/460 Workqueue: nvmet-wq nvme_loop_execute_work [nvme_loop] Call Trace: show_stack+0x52/0x58 dump_stack_lvl+0x49/0x5e print_report.cold+0x36/0x1e2 kasan_report+0xb9/0xf0 __asan_load4+0x6b/0x80 blk_mq_complete_request_remote+0xac/0x350 nvme_loop_queue_response+0x1df/0x275 [nvme_loop] __nvmet_req_complete+0x132/0x4f0 [nvmet] nvmet_req_complete+0x15/0x40 [nvmet] nvmet_execute_io_connect+0x18a/0x1f0 [nvmet] nvme_loop_execute_work+0x20/0x30 [nvme_loop] process_one_work+0x56e/0xa70 worker_thread+0x2d1/0x640 kthread+0x183/0x1c0 ret_from_fork+0x1f/0x30 Cc: stable@vger.kernel.org Fixes: a07b4970f464 ("nvmet: add a generic NVMe target") Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-25nvmet-tcp: fix lockdep complaint on nvmet_tcp_wq flush during queue teardownSagi Grimberg1-1/+2
[ Upstream commit 533d2e8b4d5e4c89772a0adce913525fb86cbbee ] We probably need nvmet_tcp_wq to have MEM_RECLAIM as we are sending/receiving for the socket from works on this workqueue. Also this eliminates lockdep complaints: -- [ 6174.010200] workqueue: WQ_MEM_RECLAIM nvmet-wq:nvmet_tcp_release_queue_work [nvmet_tcp] is flushing !WQ_MEM_RECLAIM nvmet_tcp_wq:nvmet_tcp_io_work [nvmet_tcp] [ 6174.010216] WARNING: CPU: 20 PID: 14456 at kernel/workqueue.c:2628 check_flush_dependency+0x110/0x14c Reported-by: Yi Zhang <yi.zhang@redhat.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-29nvme: split nvme_alloc_request()Chaitanya Kulkarni1-1/+1
[ Upstream commit 39dfe84451b4526a8054cc5a127337bca980dfa3 ] Right now nvme_alloc_request() allocates a request from block layer based on the value of the qid. When qid set to NVME_QID_ANY it used blk_mq_alloc_request() else blk_mq_alloc_request_hctx(). The function nvme_alloc_request() is called from different context, The only place where it uses non NVME_QID_ANY value is for fabrics connect commands :- nvme_submit_sync_cmd() NVME_QID_ANY nvme_features() NVME_QID_ANY nvme_sec_submit() NVME_QID_ANY nvmf_reg_read32() NVME_QID_ANY nvmf_reg_read64() NVME_QID_ANY nvmf_reg_write32() NVME_QID_ANY nvmf_connect_admin_queue() NVME_QID_ANY nvme_submit_user_cmd() NVME_QID_ANY nvme_alloc_request() nvme_keep_alive() NVME_QID_ANY nvme_alloc_request() nvme_timeout() NVME_QID_ANY nvme_alloc_request() nvme_delete_queue() NVME_QID_ANY nvme_alloc_request() nvmet_passthru_execute_cmd() NVME_QID_ANY nvme_alloc_request() nvmf_connect_io_queue() QID __nvme_submit_sync_cmd() nvme_alloc_request() With passthru nvme_alloc_request() now falls into the I/O fast path such that blk_mq_alloc_request_hctx() is never gets called and that adds additional branch check in fast path. Split the nvme_alloc_request() into nvme_alloc_request() and nvme_alloc_request_qid(). Replace each call of the nvme_alloc_request() with NVME_QID_ANY param with a call to newly added nvme_alloc_request() without NVME_QID_ANY. Replace a call to nvme_alloc_request() with QID param with a call to newly added nvme_alloc_request() and nvme_alloc_request_qid() based on the qid value set in the __nvme_submit_sync_cmd(). Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-12-01nvmet: use IOCB_NOWAIT only if the filesystem supports itMaurizio Lombardi1-1/+3
[ Upstream commit c024b226a417c4eb9353ff500b1c823165d4d508 ] Submit I/O requests with the IOCB_NOWAIT flag set only if the underlying filesystem supports it. Fixes: 50a909db36f2 ("nvmet: use IOCB_NOWAIT for file-ns buffered I/O") Signed-off-by: Maurizio Lombardi <mlombard@redhat.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-12-01nvmet-tcp: fix incomplete data digest sendVarun Prakash1-1/+6
[ Upstream commit 102110efdff6beedece6ab9b51664c32ac01e2db ] Current nvmet_try_send_ddgst() code does not check whether all data digest bytes are transmitted, fix this by returning -EAGAIN if all data digest bytes are not transmitted. Fixes: 872d26a391da ("nvmet-tcp: add NVMe over TCP target driver") Signed-off-by: Varun Prakash <varun@chelsio.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-18nvmet-tcp: fix use-after-free when a port is removedIsrael Rukshin1-0/+16
[ Upstream commit 2351ead99ce9164fb42555aee3f96af84c4839e9 ] When removing a port, all its controllers are being removed, but there are queues on the port that doesn't belong to any controller (during connection time). This causes a use-after-free bug for any command that dereferences req->port (like in nvmet_alloc_ctrl). Those queues should be destroyed before freeing the port via configfs. Destroy the remaining queues after the accept_work was cancelled guarantees that no new queue will be created. Signed-off-by: Israel Rukshin <israelr@nvidia.com> Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-18nvmet-rdma: fix use-after-free when a port is removedIsrael Rukshin1-0/+24
[ Upstream commit fcf73a804c7d6bbf0ea63531c6122aa363852e04 ] When removing a port, all its controllers are being removed, but there are queues on the port that doesn't belong to any controller (during connection time). This causes a use-after-free bug for any command that dereferences req->port (like in nvmet_alloc_ctrl). Those queues should be destroyed before freeing the port via configfs. Destroy the remaining queues after the RDMA-CM was destroyed guarantees that no new queue will be created. Signed-off-by: Israel Rukshin <israelr@nvidia.com> Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-18nvmet: fix use-after-free when a port is removedIsrael Rukshin1-0/+2
[ Upstream commit e3e19dcc4c416d65f99f13d55be2b787f8d0050e ] When a port is removed through configfs, any connected controllers are starting teardown flow asynchronously and can still send commands. This causes a use-after-free bug for any command that dereferences req->port (like in nvmet_parse_io_cmd). To fix this, wait for all the teardown scheduled works to complete (like release_work at rdma/tcp drivers). This ensures there are no active controllers when the port is eventually removed. Signed-off-by: Israel Rukshin <israelr@nvidia.com> Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-18nvmet-tcp: fix header digest verificationAmit Engel1-1/+1
[ Upstream commit 86aeda32b887cdaeb0f4b7bfc9971e36377181c7 ] Pass the correct length to nvmet_tcp_verify_hdgst, which is the pdu header length. This fixes a wrong behaviour where header digest verification passes although the digest is wrong. Signed-off-by: Amit Engel <amit.engel@dell.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-18nvmet-tcp: fix a memory leak when releasing a queueMaurizio Lombardi1-0/+3
[ Upstream commit 926245c7d22271307606c88b1fbb2539a8550e94 ] page_frag_free() won't completely release the memory allocated for the commands, the cache page must be explicitly freed by calling __page_frag_cache_drain(). This bug can be easily reproduced by repeatedly executing the following command on the initiator: $echo 1 > /sys/devices/virtual/nvme-fabrics/ctl/nvme0/reset_controller Signed-off-by: Maurizio Lombardi <mlombard@redhat.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: John Meneghini <jmeneghi@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-02nvmet-tcp: fix data digest pointer calculationVarun Prakash1-1/+1
commit e790de54e94a7a15fb725b34724d41d41cbaa60c upstream. exp_ddgst is of type __le32, &cmd->exp_ddgst + cmd->offset increases &cmd->exp_ddgst by 4 * cmd->offset, fix this by type casting &cmd->exp_ddgst to u8 *. Fixes: 872d26a391da ("nvmet-tcp: add NVMe over TCP target driver") Signed-off-by: Varun Prakash <varun@chelsio.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-09-18nvme: code command_id with a genctr for use-after-free validationSagi Grimberg1-2/+2
[ Upstream commit e7006de6c23803799be000a5dcce4d916a36541a ] We cannot detect a (perhaps buggy) controller that is sending us a completion for a request that was already completed (for example sending a completion twice), this phenomenon was seen in the wild a few times. So to protect against this, we use the upper 4 msbits of the nvme sqe command_id to use as a 4-bit generation counter and verify it matches the existing request generation that is incrementing on every execution. The 16-bit command_id structure now is constructed by: | xxxx | xxxxxxxxxxxx | gen request tag This means that we are giving up some possible queue depth as 12 bits allow for a maximum queue depth of 4095 instead of 65536, however we never create such long queues anyways so no real harm done. Suggested-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Acked-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Daniel Wagner <dwagner@suse.de> Tested-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-15nvmet: pass back cntlid on successful completionAmit Engel1-4/+5
[ Upstream commit e804d5abe2d74cfe23f5f83be580d1cdc9307111 ] According to the NVMe specification, the response dword 0 value of the Connect command is based on status code: return cntlid for successful compeltion return IPO and IATTR for connect invalid parameters. Fix a missing error information for a zero sized queue, and return the cntlid also for I/O queue Connect commands. Signed-off-by: Amit Engel <amit.engel@dell.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-20nvme-tcp: can't set sk_user_data without write_lockMaurizio Lombardi1-1/+0
[ Upstream commit 0755d3be2d9bb6ea38598ccd30d6bbaa1a5c3a50 ] The sk_user_data pointer is supposed to be modified only while holding the write_lock "sk_callback_lock", otherwise we could race with other threads and crash the kernel. we can't take the write_lock in nvmet_tcp_state_change() because it would cause a deadlock, but the release_work queue will set the pointer to NULL later so we can simply remove the assignment. Fixes: b5332a9f3f3d ("nvmet-tcp: fix incorrect locking in state_change sk callback") Signed-off-by: Maurizio Lombardi <mlombard@redhat.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14nvmet-fc: do not check for invalid target port in nvmet_fc_handle_fcp_rqst()Hannes Reinecke1-8/+2
[ Upstream commit 2a4a910aa4f0acc428dc8d10227c42e14ed21d10 ] When parsing a request in nvmet_fc_handle_fcp_rqst() we should not check for invalid target ports; if we do the command is aborted from the fcp layer, causing the host to assume a transport error. Rather we should still forward this request to the nvmet layer, which will then correctly fail the command with an appropriate error status. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-06-18nvme-loop: do not warn for deleted controllers during resetHannes Reinecke1-2/+4
[ Upstream commit 6622f9acd29cd4f6272720e827e6406f5a970cb0 ] During concurrent reset and delete calls the reset workqueue is flushed, causing nvme_loop_reset_ctrl_work() to be executed when the controller is in state DELETING or DELETING_NOIO. But this is expected, so we shouldn't issue a WARN_ON here. Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-06-18nvme-loop: check for NVME_LOOP_Q_LIVE in nvme_loop_destroy_admin_queue()Hannes Reinecke1-1/+2
[ Upstream commit 4237de2f73a669e4f89ac0aa2b44fb1a1d9ec583 ] We need to check the NVME_LOOP_Q_LIVE flag in nvme_loop_destroy_admin_queue() to protect against duplicate invocations eg during concurrent reset and remove calls. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-06-18nvme-loop: clear NVME_LOOP_Q_LIVE when nvme_loop_configure_admin_queue() failsHannes Reinecke1-0/+1
[ Upstream commit 1c5f8e882a05de5c011e8c3fbeceb0d1c590eb53 ] When the call to nvme_enable_ctrl() in nvme_loop_configure_admin_queue() fails the NVME_LOOP_Q_LIVE flag is not cleared. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-06-18nvme-loop: reset queue count to 1 in nvme_loop_destroy_io_queues()Hannes Reinecke1-0/+1
[ Upstream commit a6c144f3d2e230f2b3ac5ed8c51e0f0391556197 ] The queue count is increased in nvme_loop_init_io_queues(), so we need to reset it to 1 at the end of nvme_loop_destroy_io_queues(). Otherwise the function is not re-entrant safe, and crash will happen during concurrent reset and remove calls. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-06-16nvmet: fix false keep-alive timeout when a controller is torn downSagi Grimberg2-5/+12
[ Upstream commit aaeadd7075dc9e184bc7876e9dd7b3bada771df2 ] Controller teardown flow may take some time in case it has many I/O queues, and the host may not send us keep-alive during this period. Hence reset the traffic based keep-alive timer so we don't trigger a controller teardown as a result of a keep-alive expiration. Reported-by: Yi Zhang <yi.zhang@redhat.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Tested-by: Yi Zhang <yi.zhang@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-06-10nvmet: fix freeing unallocated p2pmemMax Gurtovoy1-17/+16
[ Upstream commit bcd9a0797d73eeff659582f23277e7ab6e5f18f3 ] In case p2p device was found but the p2p pool is empty, the nvme target is still trying to free the sgl from the p2p pool instead of the regular sgl pool and causing a crash (BUG() is called). Instead, assign the p2p_dev for the request only if it was allocated from p2p pool. This is the crash that was caused: [Sun May 30 19:13:53 2021] ------------[ cut here ]------------ [Sun May 30 19:13:53 2021] kernel BUG at lib/genalloc.c:518! [Sun May 30 19:13:53 2021] invalid opcode: 0000 [#1] SMP PTI ... [Sun May 30 19:13:53 2021] kernel BUG at lib/genalloc.c:518! ... [Sun May 30 19:13:53 2021] RIP: 0010:gen_pool_free_owner+0xa8/0xb0 ... [Sun May 30 19:13:53 2021] Call Trace: [Sun May 30 19:13:53 2021] ------------[ cut here ]------------ [Sun May 30 19:13:53 2021] pci_free_p2pmem+0x2b/0x70 [Sun May 30 19:13:53 2021] pci_p2pmem_free_sgl+0x4f/0x80 [Sun May 30 19:13:53 2021] nvmet_req_free_sgls+0x1e/0x80 [nvmet] [Sun May 30 19:13:53 2021] kernel BUG at lib/genalloc.c:518! [Sun May 30 19:13:53 2021] nvmet_rdma_release_rsp+0x4e/0x1f0 [nvmet_rdma] [Sun May 30 19:13:53 2021] nvmet_rdma_send_done+0x1c/0x60 [nvmet_rdma] Fixes: c6e3f1339812 ("nvmet: add metadata support for block devices") Reviewed-by: Israel Rukshin <israelr@nvidia.com> Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-06-03nvmet-tcp: fix inline data size comparison in nvmet_tcp_queue_responseHou Pu1-1/+1
commit 25df1acd2d36eb72b14c3d00f6b861b1e00b3aab upstream. Using "<=" instead "<" to compare inline data size. Fixes: bdaf13279192 ("nvmet-tcp: fix a segmentation fault during io parsing error") Signed-off-by: Hou Pu <houpu.main@gmail.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-05-26nvmet: seset ns->file when open failsDaniel Wagner1-3/+5
[ Upstream commit 85428beac80dbcace5b146b218697c73e367dcf5 ] Reset the ns->file value to NULL also in the error case in nvmet_file_ns_enable(). The ns->file variable points either to file object or contains the error code after the filp_open() call. This can lead to following problem: When the user first setups an invalid file backend and tries to enable the ns, it will fail. Then the user switches over to a bdev backend and enables successfully the ns. The first received I/O will crash the system because the IO backend is chosen based on the ns->file value: static u16 nvmet_parse_io_cmd(struct nvmet_req *req) { [...] if (req->ns->file) return nvmet_file_parse_io_cmd(req); return nvmet_bdev_parse_io_cmd(req); } Reported-by: Enzo Matsumiya <ematsumiya@suse.com> Signed-off-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-26nvme-loop: fix memory leak in nvme_loop_create_ctrl()Wu Bo1-1/+3
[ Upstream commit 03504e3b54cc8118cc26c064e60a0b00c2308708 ] When creating loop ctrl in nvme_loop_create_ctrl(), if nvme_init_ctrl() fails, the loop ctrl should be freed before jumping to the "out" label. Fixes: 3a85a5de29ea ("nvme-loop: add a NVMe loopback host driver") Signed-off-by: Wu Bo <wubo40@huawei.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-26nvmet: fix memory leak in nvmet_alloc_ctrl()Wu Bo1-1/+1
[ Upstream commit fec356a61aa3d3a66416b4321f1279e09e0f256f ] When creating ctrl in nvmet_alloc_ctrl(), if the cntlid_min is larger than cntlid_max of the subsystem, and jumps to the "out_free_changed_ns_list" label, but the ctrl->sqs lack of be freed. Fix this by jumping to the "out_free_sqs" label. Fixes: 94a39d61f80f ("nvmet: make ctrl-id configurable") Signed-off-by: Wu Bo <wubo40@huawei.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-26nvmet: remove unused ctrl->cqsAmit2-14/+2
[ Upstream commit 6d65aeab7bf6e83e75f53cfdbdb84603e52e1182 ] remove unused cqs from nvmet_ctrl struct this will reduce the allocated memory. Signed-off-by: Amit <amit.engel@dell.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-22nvmet: remove unsupported command noiseKeith Busch1-3/+3
[ Upstream commit 4a20342572f66c5b20a1ee680f5ac0a13703748f ] Nothing can stop a host from submitting invalid commands. The target just needs to respond with an appropriate status, but that's not a target error. Demote invalid command messages to the debug level so these events don't spam the kernel logs. Reported-by: Yi Zhang <yi.zhang@redhat.com> Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-19nvmet-rdma: Fix NULL deref when SEND is completed with errorMichal Kalderon1-2/+2
[ Upstream commit 8cc365f9559b86802afc0208389f5c8d46b4ad61 ] When running some traffic and taking down the link on peer, a retry counter exceeded error is received. This leads to nvmet_rdma_error_comp which tried accessing the cq_context to obtain the queue. The cq_context is no longer valid after the fix to use shared CQ mechanism and should be obtained similar to how it is obtained in other functions from the wc->qp. [ 905.786331] nvmet_rdma: SEND for CQE 0x00000000e3337f90 failed with status transport retry counter exceeded (12). [ 905.832048] BUG: unable to handle kernel NULL pointer dereference at 0000000000000048 [ 905.839919] PGD 0 P4D 0 [ 905.842464] Oops: 0000 1 SMP NOPTI [ 905.846144] CPU: 13 PID: 1557 Comm: kworker/13:1H Kdump: loaded Tainted: G OE --------- - - 4.18.0-304.el8.x86_64 #1 [ 905.872135] RIP: 0010:nvmet_rdma_error_comp+0x5/0x1b [nvmet_rdma] [ 905.878259] Code: 19 4f c0 e8 89 b3 a5 f6 e9 5b e0 ff ff 0f b7 75 14 4c 89 ea 48 c7 c7 08 1a 4f c0 e8 71 b3 a5 f6 e9 4b e0 ff ff 0f 1f 44 00 00 <48> 8b 47 48 48 85 c0 74 08 48 89 c7 e9 98 bf 49 00 e9 c3 e3 ff ff [ 905.897135] RSP: 0018:ffffab601c45fe28 EFLAGS: 00010246 [ 905.902387] RAX: 0000000000000065 RBX: ffff9e729ea2f800 RCX: 0000000000000000 [ 905.909558] RDX: 0000000000000000 RSI: ffff9e72df9567c8 RDI: 0000000000000000 [ 905.916731] RBP: ffff9e729ea2b400 R08: 000000000000074d R09: 0000000000000074 [ 905.923903] R10: 0000000000000000 R11: ffffab601c45fcc0 R12: 0000000000000010 [ 905.931074] R13: 0000000000000000 R14: 0000000000000010 R15: ffff9e729ea2f400 [ 905.938247] FS: 0000000000000000(0000) GS:ffff9e72df940000(0000) knlGS:0000000000000000 [ 905.938249] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 905.950067] nvmet_rdma: SEND for CQE 0x00000000c7356cca failed with status transport retry counter exceeded (12). [ 905.961855] CR2: 0000000000000048 CR3: 000000678d010004 CR4: 00000000007706e0 [ 905.961855] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 905.961856] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 905.961857] PKRU: 55555554 [ 906.010315] Call Trace: [ 906.012778] __ib_process_cq+0x89/0x170 [ib_core] [ 906.017509] ib_cq_poll_work+0x26/0x80 [ib_core] [ 906.022152] process_one_work+0x1a7/0x360 [ 906.026182] ? create_worker+0x1a0/0x1a0 [ 906.030123] worker_thread+0x30/0x390 [ 906.033802] ? create_worker+0x1a0/0x1a0 [ 906.037744] kthread+0x116/0x130 [ 906.040988] ? kthread_flush_work_fn+0x10/0x10 [ 906.045456] ret_from_fork+0x1f/0x40 Fixes: ca0f1a8055be2 ("nvmet-rdma: use new shared CQ mechanism") Signed-off-by: Shai Malin <smalin@marvell.com> Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-19nvmet: fix inline bio check for bdev-nsChaitanya Kulkarni2-1/+7
[ Upstream commit 608a969046e6e0567d05a166be66c77d2dd8220b ] When handling rw commands, for inline bio case we only consider transfer size. This works well when req->sg_cnt fits into the req->inline_bvec, but it will result in the warning in __bio_add_page() when req->sg_cnt > NVMET_MAX_INLINE_BVEC. Consider an I/O size 32768 and first page is not aligned to the page boundary, then I/O is split in following manner :- [ 2206.256140] nvmet: sg->length 3440 sg->offset 656 [ 2206.256144] nvmet: sg->length 4096 sg->offset 0 [ 2206.256148] nvmet: sg->length 4096 sg->offset 0 [ 2206.256152] nvmet: sg->length 4096 sg->offset 0 [ 2206.256155] nvmet: sg->length 4096 sg->offset 0 [ 2206.256159] nvmet: sg->length 4096 sg->offset 0 [ 2206.256163] nvmet: sg->length 4096 sg->offset 0 [ 2206.256166] nvmet: sg->length 4096 sg->offset 0 [ 2206.256170] nvmet: sg->length 656 sg->offset 0 Now the req->transfer_size == NVMET_MAX_INLINE_DATA_LEN i.e. 32768, but the req->sg_cnt is (9) > NVMET_MAX_INLINE_BIOVEC which is (8). This will result in the following warning message :- nvmet_bdev_execute_rw() bio_add_page() __bio_add_page() WARN_ON_ONCE(bio_full(bio, len)); This scenario is very hard to reproduce on the nvme-loop transport only with rw commands issued with the passthru IOCTL interface from the host application and the data buffer is allocated with the malloc() and not the posix_memalign(). Fixes: 73383adfad24 ("nvmet: don't split large I/Os unconditionally") Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-19nvmet: add lba to sect conversion helpersChaitanya Kulkarni2-5/+13
[ Upstream commit 193fcf371f9e3705c14a0bf1d4bfc44af0f7c124 ] In this preparation patch, we add helpers to convert lbas to sectors & sectors to lba. This is needed to eliminate code duplication in the ZBD backend. Use these helpers in the block device backend. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-14nvmet-tcp: fix a segmentation fault during io parsing errorElad Grupi1-8/+31
[ Upstream commit bdaf13279192c60b2b1fc99badef53b494fec055 ] In case there is an io that contains inline data and it goes to parsing error flow, command response will free command and iov before clearing the data on the socket buffer. This will delay the command response until receive flow is completed. Fixes: 872d26a391da ("nvmet-tcp: add NVMe over TCP target driver") Signed-off-by: Elad Grupi <elad.grupi@dell.com> Signed-off-by: Hou Pu <houpu.main@gmail.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-14nvmet-tcp: fix incorrect locking in state_change sk callbackSagi Grimberg1-2/+2
[ Upstream commit b5332a9f3f3d884a1b646ce155e664cc558c1722 ] We are not changing anything in the TCP connection state so we should not take a write_lock but rather a read lock. This caused a deadlock when running nvmet-tcp and nvme-tcp on the same system, where state_change callbacks on the host and on the controller side have causal relationship and made lockdep report on this with blktests: ================================ WARNING: inconsistent lock state 5.12.0-rc3 #1 Tainted: G I -------------------------------- inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-R} usage. nvme/1324 [HC0[0]:SC0[0]:HE1:SE1] takes: ffff888363151000 (clock-AF_INET){++-?}-{2:2}, at: nvme_tcp_state_change+0x21/0x150 [nvme_tcp] {IN-SOFTIRQ-W} state was registered at: __lock_acquire+0x79b/0x18d0 lock_acquire+0x1ca/0x480 _raw_write_lock_bh+0x39/0x80 nvmet_tcp_state_change+0x21/0x170 [nvmet_tcp] tcp_fin+0x2a8/0x780 tcp_data_queue+0xf94/0x1f20 tcp_rcv_established+0x6ba/0x1f00 tcp_v4_do_rcv+0x502/0x760 tcp_v4_rcv+0x257e/0x3430 ip_protocol_deliver_rcu+0x69/0x6a0 ip_local_deliver_finish+0x1e2/0x2f0 ip_local_deliver+0x1a2/0x420 ip_rcv+0x4fb/0x6b0 __netif_receive_skb_one_core+0x162/0x1b0 process_backlog+0x1ff/0x770 __napi_poll.constprop.0+0xa9/0x5c0 net_rx_action+0x7b3/0xb30 __do_softirq+0x1f0/0x940 do_softirq+0xa1/0xd0 __local_bh_enable_ip+0xd8/0x100 ip_finish_output2+0x6b7/0x18a0 __ip_queue_xmit+0x706/0x1aa0 __tcp_transmit_skb+0x2068/0x2e20 tcp_write_xmit+0xc9e/0x2bb0 __tcp_push_pending_frames+0x92/0x310 inet_shutdown+0x158/0x300 __nvme_tcp_stop_queue+0x36/0x270 [nvme_tcp] nvme_tcp_stop_queue+0x87/0xb0 [nvme_tcp] nvme_tcp_teardown_admin_queue+0x69/0xe0 [nvme_tcp] nvme_do_delete_ctrl+0x100/0x10c [nvme_core] nvme_sysfs_delete.cold+0x8/0xd [nvme_core] kernfs_fop_write_iter+0x2c7/0x460 new_sync_write+0x36c/0x610 vfs_write+0x5c0/0x870 ksys_write+0xf9/0x1d0 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xae irq event stamp: 10687 hardirqs last enabled at (10687): [<ffffffff9ec376bd>] _raw_spin_unlock_irqrestore+0x2d/0x40 hardirqs last disabled at (10686): [<ffffffff9ec374d8>] _raw_spin_lock_irqsave+0x68/0x90 softirqs last enabled at (10684): [<ffffffff9f000608>] __do_softirq+0x608/0x940 softirqs last disabled at (10649): [<ffffffff9cdedd31>] do_softirq+0xa1/0xd0 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(clock-AF_INET); <Interrupt> lock(clock-AF_INET); *** DEADLOCK *** 5 locks held by nvme/1324: #0: ffff8884a01fe470 (sb_writers#4){.+.+}-{0:0}, at: ksys_write+0xf9/0x1d0 #1: ffff8886e435c090 (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_write_iter+0x216/0x460 #2: ffff888104d90c38 (kn->active#255){++++}-{0:0}, at: kernfs_remove_self+0x22d/0x330 #3: ffff8884634538d0 (&queue->queue_lock){+.+.}-{3:3}, at: nvme_tcp_stop_queue+0x52/0xb0 [nvme_tcp] #4: ffff888363150d30 (sk_lock-AF_INET){+.+.}-{0:0}, at: inet_shutdown+0x59/0x300 stack backtrace: CPU: 26 PID: 1324 Comm: nvme Tainted: G I 5.12.0-rc3 #1 Hardware name: Dell Inc. PowerEdge R640/06NR82, BIOS 2.10.0 11/12/2020 Call Trace: dump_stack+0x93/0xc2 mark_lock_irq.cold+0x2c/0xb3 ? verify_lock_unused+0x390/0x390 ? stack_trace_consume_entry+0x160/0x160 ? lock_downgrade+0x100/0x100 ? save_trace+0x88/0x5e0 ? _raw_spin_unlock_irqrestore+0x2d/0x40 mark_lock+0x530/0x1470 ? mark_lock_irq+0x1d10/0x1d10 ? enqueue_timer+0x660/0x660 mark_usage+0x215/0x2a0 __lock_acquire+0x79b/0x18d0 ? tcp_schedule_loss_probe.part.0+0x38c/0x520 lock_acquire+0x1ca/0x480 ? nvme_tcp_state_change+0x21/0x150 [nvme_tcp] ? rcu_read_unlock+0x40/0x40 ? tcp_mtu_probe+0x1ae0/0x1ae0 ? kmalloc_reserve+0xa0/0xa0 ? sysfs_file_ops+0x170/0x170 _raw_read_lock+0x3d/0xa0 ? nvme_tcp_state_change+0x21/0x150 [nvme_tcp] nvme_tcp_state_change+0x21/0x150 [nvme_tcp] ? sysfs_file_ops+0x170/0x170 inet_shutdown+0x189/0x300 __nvme_tcp_stop_queue+0x36/0x270 [nvme_tcp] nvme_tcp_stop_queue+0x87/0xb0 [nvme_tcp] nvme_tcp_teardown_admin_queue+0x69/0xe0 [nvme_tcp] nvme_do_delete_ctrl+0x100/0x10c [nvme_core] nvme_sysfs_delete.cold+0x8/0xd [nvme_core] kernfs_fop_write_iter+0x2c7/0x460 new_sync_write+0x36c/0x610 ? new_sync_read+0x600/0x600 ? lock_acquire+0x1ca/0x480 ? rcu_read_unlock+0x40/0x40 ? lock_is_held_type+0x9a/0x110 vfs_write+0x5c0/0x870 ksys_write+0xf9/0x1d0 ? __ia32_sys_read+0xa0/0xa0 ? lockdep_hardirqs_on_prepare.part.0+0x198/0x340 ? syscall_enter_from_user_mode+0x27/0x70 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xae Fixes: 872d26a391da ("nvmet-tcp: add NVMe over TCP target driver") Reported-by: Yi Zhang <yi.zhang@redhat.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-11nvmet: return proper error code from discovery ctrlHou Pu1-2/+4
[ Upstream commit 79695dcd9ad4463a82def7f42960e6d7baa76f0b ] Return NVME_SC_INVALID_FIELD from discovery controller like normal controller when executing identify or get log page command. Signed-off-by: Hou Pu <houpu.main@gmail.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-04-07nvmet-tcp: fix kmap leak when data digest in useElad Grupi1-1/+1
[ Upstream commit bac04454ef9fada009f0572576837548b190bf94 ] When data digest is enabled we should unmap pdu iovec before handling the data digest pdu. Signed-off-by: Elad Grupi <elad.grupi@dell.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-30nvme-rdma: Fix a use after free in nvmet_rdma_write_data_doneLv Yunlong1-3/+2
[ Upstream commit abec6561fc4e0fbb19591a0b35676d8c783b5493 ] In nvmet_rdma_write_data_done, rsp is recoverd by wc->wr_cqe and freed by nvmet_rdma_release_rsp(). But after that, pr_info() used the freed chunk's member object and could leak the freed chunk address with wc->wr_cqe by computing the offset. Signed-off-by: Lv Yunlong <lyl2019@mail.ustc.edu.cn> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-25nvmet: don't check iosqes,iocqes for discovery controllersSagi Grimberg1-3/+14
commit d218a8a3003e84ab136e69a4e30dd4ec7dab2d22 upstream. From the base spec, Figure 78: "Controller Configuration, these fields are defined as parameters to configure an "I/O Controller (IOC)" and not to configure a "Discovery Controller (DC). ... If the controller does not support I/O queues, then this field shall be read-only with a value of 0h Just perform this check for I/O controllers. Fixes: a07b4970f464 ("nvmet: add a generic NVMe target") Reported-by: Belanger, Martin <Martin.Belanger@dell.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-04nvmet: set status to 0 in case for invalid nsidChaitanya Kulkarni1-1/+1
[ Upstream commit 40244ad36bcfb796a6bb9e95bdcbf8ddf3134509 ] For unallocated namespace in nvmet_execute_identify_ns() don't set the status to NVME_SC_INVALID_NS, set it to zero. Fixes: bffcd507780e ("nvmet: set right status on error in id-ns handler") Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-04nvmet: remove extra variable in identify nsChaitanya Kulkarni1-16/+15
[ Upstream commit 3c7b224f1956ed232b24ed2eb2c54e4476c6acb2 ] We remove the extra local variable struct nvmet_ns in nvmet_execute_identify_ns() since req already has ns member that can be reused, this also eliminates the explicit call to nvmet_put_namespace() which is already present in the request completion path. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-04nvmet-tcp: fix potential race of tcp socket closing accept_workSagi Grimberg1-10/+18
[ Upstream commit 0fbcfb089a3f2f2a731d01f0aec8f7697a849c28 ] When we accept a TCP connection and allocate an nvmet-tcp queue we should make sure not to fully establish it or reference it as the connection may be already closing, which triggers queue release work, which does not fence against queue establishment. In order to address such a race, we make sure to check the sk_state and contain the queue reference to be done underneath the sk_callback_lock such that the queue release work correctly fences against it. Fixes: 872d26a391da ("nvmet-tcp: add NVMe over TCP target driver") Reported-by: Elad Grupi <elad.grupi@dell.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-04nvmet-tcp: fix receive data digest calculation for multiple h2cdata PDUsSagi Grimberg1-7/+24
[ Upstream commit fda871c0ba5d2eed2cd1c881573168129da70058 ] When a host sends multiple h2cdata PDUs for a single command, we should verify the data digest calculation per PDU and not per command. Fixes: 872d26a391da ("nvmet-tcp: add NVMe over TCP target driver") Reported-by: Narayan Ayalasomayajula <Narayan.Ayalasomayajula@wdc.com> Tested-by: Narayan Ayalasomayajula <Narayan.Ayalasomayajula@wdc.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-02-10nvmet-tcp: fix out-of-bounds access when receiving multiple h2cdata PDUsSagi Grimberg1-1/+2
[ Upstream commit cb8563f5c735a042ea2dd7df1ad55ae06d63ffeb ] When the host sends multiple h2cdata PDUs, we keep track on the receive progress and calculate the scatterlist index and offsets. The issue is that sg_offset should only be kept for the first iov entry we map in the iovec as this is the difference between our cursor and the sg entry offset itself. In addition, the sg index was calculated wrong because we should not round up when dividing the command byte offset with PAG_SIZE. Fixes: 872d26a391da ("nvmet-tcp: add NVMe over TCP target driver") Reported-by: Narayan Ayalasomayajula <Narayan.Ayalasomayajula@wdc.com> Tested-by: Narayan Ayalasomayajula <Narayan.Ayalasomayajula@wdc.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-02-07nvmet: set right status on error in id-ns handlerChaitanya Kulkarni1-2/+6
[ Upstream commit bffcd507780ea614b5543c66f2e37ce0d55cd449 ] The function nvmet_execute_identify_ns() doesn't set the status if call to nvmet_find_namespace() fails. In that case we set the status of the request to the value return by the nvmet_copy_sgl(). Set the status to NVME_SC_INVALID_NS and adjust the code such that request will have the right status on nvmet_find_namespace() failure. Without this patch :- NVME Identify Namespace 3: nsze : 0 ncap : 0 nuse : 0 nsfeat : 0 nlbaf : 0 flbas : 0 mc : 0 dpc : 0 dps : 0 nmic : 0 rescap : 0 fpi : 0 dlfeat : 0 nawun : 0 nawupf : 0 nacwu : 0 nabsn : 0 nabo : 0 nabspf : 0 noiob : 0 nvmcap : 0 mssrl : 0 mcl : 0 msrc : 0 nsattr : 0 nvmsetid: 0 anagrpid: 0 endgid : 0 nguid : 00000000000000000000000000000000 eui64 : 0000000000000000 lbaf 0 : ms:0 lbads:0 rp:0 (in use) With this patch-series :- feb3b88b501e (HEAD -> nvme-5.11) nvmet: remove extra variable in identify ns 6302aa67210a nvmet: remove extra variable in id-desclist ed57951da453 nvmet: remove extra variable in smart log nsid be384b8c24dc nvmet: set right status on error in id-ns handler NVMe status: INVALID_NS: The namespace or the format of that namespace is invalid(0xb) Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-01-19nvmet-rdma: Fix NULL deref when setting pi_enable and traddr INADDR_ANYIsrael Rukshin1-8/+8
commit 7a84665619bb5da8c8b6517157875a1fd7632014 upstream. When setting port traddr to INADDR_ANY, the listening cm_id->device is NULL. The associate IB device is known only when a connect request event arrives, so checking T10-PI device capability should be done at this stage. Fixes: b09160c3996c ("nvmet-rdma: add metadata/T10-PI support") Signed-off-by: Israel Rukshin <israelr@nvidia.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>