Age | Commit message (Collapse) | Author | Files | Lines |
|
commit d302c6e3a6895608a5856bc708c47bda1770b24d upstream.
Even if no response from hardware, we should make sure that qp related
resources are released to avoid memory leaks.
Fixes: 926a01dc000d ("RDMA/hns: Add QP operations support for hip08 SoC")
Signed-off-by: Yangyang Li <liyangyang20@huawei.com>
Signed-off-by: Weihang Li <liweihang@hisilicon.com>
Link: https://lore.kernel.org/r/1570584110-3659-1-git-send-email-liweihang@hisilicon.com
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit d5b60e26e86a463ca83bb5ec502dda6ea685159e upstream.
This is not the first attempt to fix building random configurations,
unfortunately the attempt in commit a07fc0bb483e ("RDMA/hns: Fix build
error") caused a new problem when CONFIG_INFINIBAND_HNS_HIP06=m and
CONFIG_INFINIBAND_HNS_HIP08=y:
drivers/infiniband/hw/hns/hns_roce_main.o:(.rodata+0xe60): undefined reference to `__this_module'
Revert commits a07fc0bb483e ("RDMA/hns: Fix build error") and
a3e2d4c7e766 ("RDMA/hns: remove obsolete Kconfig comment") to get back to
the previous state, then fix the issues described there differently, by
adding more specific dependencies: INFINIBAND_HNS can now only be built-in
if at least one of HNS or HNS3 are built-in, and the individual back-ends
are only available if that code is reachable from the main driver.
Fixes: a07fc0bb483e ("RDMA/hns: Fix build error")
Fixes: a3e2d4c7e766 ("RDMA/hns: remove obsolete Kconfig comment")
Fixes: dd74282df573 ("RDMA/hns: Initialize the PCI device for hip08 RoCE")
Fixes: 08805fdbeb2d ("RDMA/hns: Split hw v1 driver from hns roce driver")
Link: https://lore.kernel.org/r/20191007211826.3361202-1-arnd@arndb.de
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit cfd82da4e741c16d71a12123bf0cb585af2b8796 upstream.
The restrack function return EINVAL instead of EMSGSIZE when the driver
operation fails.
Fixes: 4b42d05d0b2c ("RDMA/hns: Remove unnecessary kzalloc")
Signed-off-by: Lang Cheng <chenglang@huawei.com>
Signed-off-by: Weihang Li <liweihang@hisilicon.com>
Link: https://lore.kernel.org/r/1567566885-23088-5-git-send-email-liweihang@hisilicon.com
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 9f7d7064009c37cb26eee4a83302cf077fe180d6 upstream.
Type of ah->av.vlan is u16, there will be a problem using le16_to_cpu
on it.
Fixes: 82e620d9c3a0 ("RDMA/hns: Modify the data structure of hns_roce_av")
Signed-off-by: Weihang Li <liweihang@hisilicon.com>
Link: https://lore.kernel.org/r/1567566885-23088-2-git-send-email-liweihang@hisilicon.com
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 515f60004ed985d2b2f03659365752e0b6142986 upstream.
The "ucmd->log_sq_bb_count" variable is a user controlled variable in the
0-255 range. If we shift more than then number of bits in an int then
it's undefined behavior (it shift wraps), and potentially the int could
become negative.
Fixes: 9a4435375cd1 ("IB/hns: Add driver files for hns RoCE driver")
Link: https://lore.kernel.org/r/20190608092514.GC28890@mwanda
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit ca9033ba69c7e3477f207df69867b2ea969197c8 upstream.
In the iowait structure, two iowait_work entries were included to queue a
given object: one for normal IB operations, and the other for TID RDMA
operations. For non-TID RDMA operations, the iowait_work structure for TID
RDMA is initialized to contain a NULL function (not used). When the QP is
reset, the function iowait_cancel_work will be called to cancel any
pending work. The problem is that this function will call
cancel_work_sync() for both iowait_work entries, even though the one for
TID RDMA is not used at all. Eventually, the call cascades to
__flush_work(), wherein a WARN_ON will be triggered due to the fact that
work->func is NULL.
The WARN_ON was introduced in commit 4d43d395fed1 ("workqueue: Try to
catch flush_work() without INIT_WORK().")
This patch fixes the issue by making sure that a work function is present
for TID RDMA before calling cancel_work_sync in iowait_cancel_work.
Fixes: 4d43d395fed1 ("workqueue: Try to catch flush_work() without INIT_WORK().")
Fixes: 5da0fc9dbf89 ("IB/hfi1: Prepare resource waits for dual leg")
Link: https://lore.kernel.org/r/20191219211941.58387.39883.stgit@awfm-01.aw.intel.com
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit c5275723580922e5f3264f96751337661a153c7d upstream.
Some adapters need a fence Work Entry to handle retransmission. Currently
the driver checks for this condition, only if the Send queue entry is
signalled. Implement the condition check, irrespective of the signalled
state of the Work queue entries
Failure to add the fence can result in access to memory that is already
marked as completed, triggering data corruption, transmission failure,
IOMMU failures, etc.
Fixes: 9152e0b722b2 ("RDMA/bnxt_re: HW workarounds for handling specific conditions")
Link: https://lore.kernel.org/r/1574671174-5064-3-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 9a4467a6b282a299b932608ac2c9034f8415359f upstream.
The driver returns an error code for MR dereg, but frees the MR structure.
When the MR dereg is retried due to previous error, the system crashes as
the structure is already freed.
BUG: unable to handle kernel NULL pointer dereference at 00000000000001b8
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
CPU: 7 PID: 12178 Comm: ib_send_bw Kdump: loaded Not tainted 4.18.0-124.el8.x86_64 #1
Hardware name: Dell Inc. PowerEdge R430/03XKDV, BIOS 1.1.10 03/10/2015
RIP: 0010:__dev_printk+0x2a/0x70
Code: 0f 1f 44 00 00 49 89 d1 48 85 f6 0f 84 f6 2b 00 00 4c 8b 46 70 4d 85 c0 75 04 4c 8b
46 10 48 8b 86 a8 00 00 00 48 85 c0 74 16 <48> 8b 08 0f be 7f 01 48 c7 c2 13 ac ac 83 83 ef 30 e9 10 fe ff ff
RSP: 0018:ffffaf7c04607a60 EFLAGS: 00010006
RAX: 00000000000001b8 RBX: ffffa0010c91c488 RCX: 0000000000000246
RDX: ffffaf7c04607a68 RSI: ffffa0010c91caa8 RDI: ffffffff83a788eb
RBP: ffffaf7c04607ac8 R08: 0000000000000000 R09: ffffaf7c04607a68
R10: 0000000000000000 R11: 0000000000000001 R12: ffffaf7c04607b90
R13: 000000000000000e R14: 0000000000000000 R15: 00000000ffffa001
FS: 0000146fa1f1cdc0(0000) GS:ffffa0012fac0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000000001b8 CR3: 000000007680a003 CR4: 00000000001606e0
Call Trace:
dev_err+0x6c/0x90
? dev_printk_emit+0x4e/0x70
bnxt_qplib_rcfw_send_message+0x594/0x660 [bnxt_re]
? dev_err+0x6c/0x90
bnxt_qplib_free_mrw+0x80/0xe0 [bnxt_re]
bnxt_re_dereg_mr+0x2e/0xd0 [bnxt_re]
ib_dereg_mr+0x2f/0x50 [ib_core]
destroy_hw_idr_uobject+0x20/0x70 [ib_uverbs]
uverbs_destroy_uobject+0x2e/0x170 [ib_uverbs]
__uverbs_cleanup_ufile+0x6e/0x90 [ib_uverbs]
uverbs_destroy_ufile_hw+0x61/0x130 [ib_uverbs]
ib_uverbs_close+0x1f/0x80 [ib_uverbs]
__fput+0xb7/0x230
task_work_run+0x8a/0xb0
do_exit+0x2da/0xb40
...
RIP: 0033:0x146fa113a387
Code: Bad RIP value.
RSP: 002b:00007fff945d1478 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff02
RAX: 0000000000000000 RBX: 000055a248908d70 RCX: 0000000000000000
RDX: 0000146fa1f2b000 RSI: 0000000000000001 RDI: 000055a248906488
RBP: 000055a248909630 R08: 0000000000010000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 000055a248906488
R13: 0000000000000001 R14: 0000000000000000 R15: 000055a2489095f0
Do not free the MR structures, when driver returns error to the stack.
Fixes: 872f3578241d ("RDMA/bnxt_re: Add support for MRs with Huge pages")
Link: https://lore.kernel.org/r/1574671174-5064-2-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit b2ff0d510182eb5cc05a65d1b2371af62c4b170c upstream.
When a TID RDMA ACK to RESYNC request is received, the flow PSNs for
pending TID RDMA WRITE segments will be adjusted with the next flow
generation number, based on the resync_psn value extracted from the flow
PSN of the TID RDMA ACK packet. The resync_psn value indicates the last
flow PSN for which a TID RDMA WRITE DATA packet has been received by the
responder and the requester should resend TID RDMA WRITE DATA packets,
starting from the next flow PSN.
However, if resync_psn points to the last flow PSN for a segment and the
next segment flow PSN starts with a new generation number, use of the old
resync_psn to adjust the flow PSN for the next segment will lead to
miscalculation, resulting in WARN_ON and sge rewinding errors:
WARNING: CPU: 4 PID: 146961 at /nfs/site/home/phcvs2/gitrepo/ifs-all/components/Drivers/tmp/rpmbuild/BUILD/ifs-kernel-updates-3.10.0_957.el7.x86_64/hfi1/tid_rdma.c:4764 hfi1_rc_rcv_tid_rdma_ack+0x8f6/0xa90 [hfi1]
Modules linked in: ib_ipoib(OE) hfi1(OE) rdmavt(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfsv3 nfs_acl nfs lockd grace fscache iTCO_wdt iTCO_vendor_support skx_edac intel_powerclamp coretemp intel_rapl iosf_mbi kvm irqbypass crc32_pclmul ghash_clmulni_intel ib_isert iscsi_target_mod target_core_mod aesni_intel lrw gf128mul glue_helper ablk_helper cryptd rpcrdma sunrpc opa_vnic ast ttm ib_iser libiscsi drm_kms_helper scsi_transport_iscsi ipmi_ssif syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev ipmi_si pcspkr sg drm_panel_orientation_quirks ipmi_devintf lpc_ich i2c_i801 ipmi_msghandler wmi rdma_ucm ib_ucm ib_uverbs acpi_cpufreq acpi_power_meter ib_umad rdma_cm ib_cm iw_cm ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic crct10dif_pclmul i2c_algo_bit crct10dif_common
crc32c_intel e1000e ib_core ahci libahci ptp libata pps_core nfit libnvdimm [last unloaded: rdmavt]
CPU: 4 PID: 146961 Comm: kworker/4:0H Kdump: loaded Tainted: G W OE ------------ 3.10.0-957.el7.x86_64 #1
Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.0X.02.0117.040420182310 04/04/2018
Workqueue: hfi0_0 _hfi1_do_tid_send [hfi1]
Call Trace:
<IRQ> [<ffffffff9e361dc1>] dump_stack+0x19/0x1b
[<ffffffff9dc97648>] __warn+0xd8/0x100
[<ffffffff9dc9778d>] warn_slowpath_null+0x1d/0x20
[<ffffffffc05d28c6>] hfi1_rc_rcv_tid_rdma_ack+0x8f6/0xa90 [hfi1]
[<ffffffffc05c21cc>] hfi1_kdeth_eager_rcv+0x1dc/0x210 [hfi1]
[<ffffffffc05c23ef>] ? hfi1_kdeth_expected_rcv+0x1ef/0x210 [hfi1]
[<ffffffffc0574f15>] kdeth_process_eager+0x35/0x90 [hfi1]
[<ffffffffc0575b5a>] handle_receive_interrupt_nodma_rtail+0x17a/0x2b0 [hfi1]
[<ffffffffc056a623>] receive_context_interrupt+0x23/0x40 [hfi1]
[<ffffffff9dd4a294>] __handle_irq_event_percpu+0x44/0x1c0
[<ffffffff9dd4a442>] handle_irq_event_percpu+0x32/0x80
[<ffffffff9dd4a4cc>] handle_irq_event+0x3c/0x60
[<ffffffff9dd4d27f>] handle_edge_irq+0x7f/0x150
[<ffffffff9dc2e554>] handle_irq+0xe4/0x1a0
[<ffffffff9e3795dd>] do_IRQ+0x4d/0xf0
[<ffffffff9e36b362>] common_interrupt+0x162/0x162
<EOI> [<ffffffff9dfa0f79>] ? swiotlb_map_page+0x49/0x150
[<ffffffffc05c2ed1>] hfi1_verbs_send_dma+0x291/0xb70 [hfi1]
[<ffffffffc05c2c40>] ? hfi1_wait_kmem+0xf0/0xf0 [hfi1]
[<ffffffffc05c3f26>] hfi1_verbs_send+0x126/0x2b0 [hfi1]
[<ffffffffc05ce683>] _hfi1_do_tid_send+0x1d3/0x320 [hfi1]
[<ffffffff9dcb9d4f>] process_one_work+0x17f/0x440
[<ffffffff9dcbade6>] worker_thread+0x126/0x3c0
[<ffffffff9dcbacc0>] ? manage_workers.isra.25+0x2a0/0x2a0
[<ffffffff9dcc1c31>] kthread+0xd1/0xe0
[<ffffffff9dcc1b60>] ? insert_kthread_work+0x40/0x40
[<ffffffff9e374c1d>] ret_from_fork_nospec_begin+0x7/0x21
[<ffffffff9dcc1b60>] ? insert_kthread_work+0x40/0x40
This patch fixes the issue by adjusting the resync_psn first if the flow
generation has been advanced for a pending segment.
Fixes: 9e93e967f7b4 ("IB/hfi1: Add a function to receive TID RDMA ACK packet")
Link: https://lore.kernel.org/r/20191219231920.51069.37147.stgit@awfm-01.aw.intel.com
Cc: <stable@vger.kernel.org>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit ed9085fed9d95d5921582e3c8474f3736c5d2782 ]
There are two flow rule destinations: QP and packet. While users are
setting DROP packet rule, the QP should not be set as a destination.
Fixes: 3b3233fbf02e ("IB/mlx5: Add flow counters binding support")
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Raed Salem <raeds@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Link: https://lore.kernel.org/r/20191212091214.315005-4-leon@kernel.org
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 89f988d93c62384758b19323c886db917a80c371 ]
Current code device add sequence is:
ib_register_device()
ib_mad_init()
init_sriov_init()
register_netdev_notifier()
Therefore, the remove sequence should be,
unregister_netdev_notifier()
close_sriov()
mad_cleanup()
ib_unregister_device()
However it is not above.
Hence, make do above remove sequence.
Fixes: fa417f7b520ee ("IB/mlx4: Add support for IBoE")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Link: https://lore.kernel.org/r/20191212091214.315005-3-leon@kernel.org
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit e284b159c6881c8bec9713daba2653268f4c4948 ]
In the first version of Gen P5 ASIC, chip-id was always set to 0x1750 for
all adaptor port configurations. This has been fixed in the new chip rev.
Due to this missing fix users are not able to use adaptors based on latest
chip rev of Broadcom's Gen P5 adaptors.
Fixes: ae8637e13185 ("RDMA/bnxt_re: Add chip context to identify 57500 series")
Link: https://lore.kernel.org/r/1574317343-23300-2-git-send-email-devesh.sharma@broadcom.com
Signed-off-by: Naresh Kumar PBS <nareshkumar.pbs@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Luke Starrett <luke.starrett@broadcom.com>
Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 98998ffe5216c7fa2c0225bb5b049ca5cdf8d195 ]
Due to recent advances in the firmware for Broadcom's gen p5 series of
adaptors the driver code to report hardware counters has been broken
w.r.t. roce devices.
The new firmware command expects dma length to be specified during stat
dma buffer allocation.
Fixes: 2792b5b95ed5 ("bnxt_en: Update firmware interface spec. to 1.10.0.89.")
Link: https://lore.kernel.org/r/1574317343-23300-3-git-send-email-devesh.sharma@broadcom.com
Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit fca5b9dc0986aa49b3f0a7cfe24b6c82422ac1d7 ]
From sparse:
drivers/infiniband/hw/bnxt_re/main.c:1274:18: warning: cast from restricted __le16
drivers/infiniband/hw/bnxt_re/main.c:1275:18: warning: cast from restricted __le16
drivers/infiniband/hw/bnxt_re/main.c:1276:18: warning: cast from restricted __le16
drivers/infiniband/hw/bnxt_re/main.c:1277:21: warning: restricted __le16 degrades to integer
Fixes: 2b827ea1926b ("RDMA/bnxt_re: Query HWRM Interface version from FW")
Link: https://lore.kernel.org/r/1574317343-23300-4-git-send-email-devesh.sharma@broadcom.com
Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 64c264872b8879e2ab9017eefe9514d4c045c60e ]
We cannot rely on the entry memcpy as we only copy the actual size of the
command, the rest of the bytes must be memset to zero.
Currently providing non-zero memory will not have any user visible impact.
However, since admin commands are extendable (in a backwards compatible
way) everything beyond the size of the command must be cleared to prevent
issues in the future.
Fixes: 0420e542569b ("RDMA/efa: Implement functions that submit and complete admin commands")
Link: https://lore.kernel.org/r/20191112092608.46964-1-galpress@amazon.com
Reviewed-by: Daniel Kranzdorf <dkkranzd@amazon.com>
Reviewed-by: Firas JahJah <firasj@amazon.com>
Signed-off-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 73ab512f720298aabe23b34110e3f6a8545b0ba5 ]
There was a missing initialization for the srqs xarray.
SRQs xarray can also be called from irq context when searching
for an element and uses the xa_XXX_irq apis, therefore should
be initialized with IRQ flags.
Fixes: 9fd15987ed27 ("qedr: Convert srqidr to XArray")
Link: https://lore.kernel.org/r/20191027200451.28187-2-michal.kalderon@marvell.com
Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 994195e1537074f56df216a9309f6e366cb35b67 ]
Currently, the error return path when the call to function
dev->dfx->query_cqc_info fails will leak object 'context'. Fix this by
making the error return path via 'err' return return codes rather than
-EMSGSIZE, set ret appropriately for all error return paths and for the
memory leak now return via 'err' rather than just returning without
freeing context.
Link: https://lore.kernel.org/r/20191024131034.19989-1-colin.king@canonical.com
Addresses-Coverity: ("Resource leak")
Fixes: e1c9a0dc2939 ("RDMA/hns: Dump detailed driver-specific CQ")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 24e412c1e00ebfe73619e6b88cbc26c2c7d41b85 ]
User QPs pbl's weren't freed properly.
MR pbls weren't freed properly.
Fixes: e0290cce6ac0 ("qedr: Add support for memory registeration verbs")
Link: https://lore.kernel.org/r/20191027200451.28187-5-michal.kalderon@marvell.com
Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit e8a07de57ea4ca7c2d604871c52826e66899fc70 ]
The parameters npages used to initial mtt of srq->idx_que shouldn't be
same with srq's. And page_shift should be calculated from idx_buf_pg_sz.
This patch fixes above issues and use field named npage and page_shift
in hns_roce_buf instead of two temporary variables to let us use them
anywhere.
Fixes: 18df508c7970 ("RDMA/hns: Remove if-else judgment statements for creating srq")
Signed-off-by: Weihang Li <liweihang@hisilicon.com>
Link: https://lore.kernel.org/r/1567566885-23088-3-git-send-email-liweihang@hisilicon.com
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit 7ee23491b39259ae83899dd93b2a29ef0f22f0a7 upstream.
The permissions of the read-only or write-only sysfs files can be
changed (as root) and the user can then try to read a write-only file or
write to a read-only file which will lead to kernel crash here.
Protect against that by always validating the show/store callbacks.
Link: https://lore.kernel.org/r/d45cc26361a174ae12dbb86c994ef334d257924b.1573096807.git.viresh.kumar@linaro.org
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
srq_desc_size should be rounded up to pow of two before used, or related
calculation may cause allocating wrong size of memory for srq buffer.
Fixes: c7bcb13442e1 ("RDMA/hns: Add SRQ support for hip08 kernel mode")
Link: https://lore.kernel.org/r/1572575610-52530-3-git-send-email-liweihang@hisilicon.com
Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Weihang Li <liweihang@hisilicon.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
Size of pointer to buf field of struct hns_roce_hem_chunk should be
considered when calculating HNS_ROCE_HEM_CHUNK_LEN, or sg table size will
be larger than expected when allocating hem.
Fixes: 9a4435375cd1 ("IB/hns: Add driver files for hns RoCE driver")
Link: https://lore.kernel.org/r/1572575610-52530-2-git-send-email-liweihang@hisilicon.com
Signed-off-by: Sirong Wang <wangsirong@huawei.com>
Signed-off-by: Weihang Li <liweihang@hisilicon.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
Normal RDMA WRITE request never returns IB_WC_RNR_RETRY_EXC_ERR to ULPs
because it does not need post receive buffer on the responder side.
Consequently, as an enhancement to normal RDMA WRITE request inside the
hfi1 driver, TID RDMA WRITE request should not return such an error status
to ULPs, although it does receive RNR NAKs from the responder when TID
resources are not available. This behavior is violated when
qp->s_rnr_retry_cnt is set in current hfi1 implementation.
This patch enforces these semantics by avoiding any reaction to the updates
of the RNR QP attributes.
Fixes: 3c6cb20a0d17 ("IB/hfi1: Add TID RDMA WRITE functionality into RDMA verbs")
Link: https://lore.kernel.org/r/20191025195842.106825.71532.stgit@awfm-01.aw.intel.com
Cc: <stable@vger.kernel.org>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
For a TID RDMA WRITE request, a QP on the responder side could be put into
a queue when a hardware flow is not available. A RNR NAK will be returned
to the requester with a RNR timeout value based on the position of the QP
in the queue. The tid_rdma_flow_wt variable is used to calculate the
timeout value and is determined by using a MTU of 4096 at the module
loading time. This could reduce the timeout value by half from the desired
value, leading to excessive RNR retries.
This patch fixes the issue by calculating the flow weight with the real
MTU assigned to the QP.
Fixes: 07b923701e38 ("IB/hfi1: Add functions to receive TID RDMA WRITE request")
Link: https://lore.kernel.org/r/20191025195836.106825.77769.stgit@awfm-01.aw.intel.com
Cc: <stable@vger.kernel.org>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
The index r_tid_ack is used to indicate the next TID RDMA WRITE request to
acknowledge in the ring s_ack_queue[] on the responder side and should be
set to a valid index other than its initial value before r_tid_tail is
advanced to the next TID RDMA WRITE request and particularly before a TID
RDMA ACK is built. Otherwise, a NULL pointer dereference may result:
BUG: unable to handle kernel paging request at ffff9a32d27abff8
IP: [<ffffffffc0d87ea6>] hfi1_make_tid_rdma_pkt+0x476/0xcb0 [hfi1]
PGD 2749032067 PUD 0
Oops: 0000 1 SMP
Modules linked in: osp(OE) ofd(OE) lfsck(OE) ost(OE) mgc(OE) osd_zfs(OE) lquota(OE) lustre(OE) lmv(OE) mdc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) ib_ipoib(OE) hfi1(OE) rdmavt(OE) nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache ib_isert iscsi_target_mod target_core_mod ib_ucm dm_mirror dm_region_hash dm_log mlx5_ib dm_mod zfs(POE) rpcrdma sunrpc rdma_ucm ib_uverbs opa_vnic ib_iser zunicode(POE) ib_umad zavl(POE) icp(POE) sb_edac intel_powerclamp coretemp rdma_cm intel_rapl iosf_mbi iw_cm libiscsi scsi_transport_iscsi kvm ib_cm iTCO_wdt mxm_wmi iTCO_vendor_support irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd zcommon(POE) znvpair(POE) pcspkr spl(OE) mei_me
sg mei ioatdma lpc_ich joydev i2c_i801 shpchp ipmi_si ipmi_devintf ipmi_msghandler wmi acpi_power_meter ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic mgag200 mlx5_core drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ixgbe ahci ttm mlxfw ib_core libahci devlink mdio crct10dif_pclmul crct10dif_common drm ptp libata megaraid_sas crc32c_intel i2c_algo_bit pps_core i2c_core dca [last unloaded: rdmavt]
CPU: 15 PID: 68691 Comm: kworker/15:2H Kdump: loaded Tainted: P W OE ------------ 3.10.0-862.2.3.el7_lustre.x86_64 #1
Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS SE5C610.86B.01.01.0016.033120161139 03/31/2016
Workqueue: hfi0_0 _hfi1_do_tid_send [hfi1]
task: ffff9a01f47faf70 ti: ffff9a11776a8000 task.ti: ffff9a11776a8000
RIP: 0010:[<ffffffffc0d87ea6>] [<ffffffffc0d87ea6>] hfi1_make_tid_rdma_pkt+0x476/0xcb0 [hfi1]
RSP: 0018:ffff9a11776abd08 EFLAGS: 00010002
RAX: ffff9a32d27abfc0 RBX: ffff99f2d27aa000 RCX: 00000000ffffffff
RDX: 0000000000000000 RSI: 0000000000000220 RDI: ffff99f2ffc05300
RBP: ffff9a11776abd88 R08: 000000000001c310 R09: ffffffffc0d87ad4
R10: 0000000000000000 R11: 0000000000000000 R12: ffff9a117a423c00
R13: ffff9a117a423c00 R14: ffff9a03500c0000 R15: ffff9a117a423cb8
FS: 0000000000000000(0000) GS:ffff9a117e9c0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff9a32d27abff8 CR3: 0000002748a0e000 CR4: 00000000001607e0
Call Trace:
[<ffffffffc0d88874>] _hfi1_do_tid_send+0x194/0x320 [hfi1]
[<ffffffffaf0b2dff>] process_one_work+0x17f/0x440
[<ffffffffaf0b3ac6>] worker_thread+0x126/0x3c0
[<ffffffffaf0b39a0>] ? manage_workers.isra.24+0x2a0/0x2a0
[<ffffffffaf0bae31>] kthread+0xd1/0xe0
[<ffffffffaf0bad60>] ? insert_kthread_work+0x40/0x40
[<ffffffffaf71f5f7>] ret_from_fork_nospec_begin+0x21/0x21
[<ffffffffaf0bad60>] ? insert_kthread_work+0x40/0x40
hfi1 0000:05:00.0: hfi1_0: reserved_op: opcode 0xf2, slot 2, rsv_used 1, rsv_ops 1
Code: 00 00 41 8b 8d d8 02 00 00 89 c8 48 89 45 b0 48 c1 65 b0 06 48 8b 83 a0 01 00 00 48 01 45 b0 48 8b 45 b0 41 80 bd 10 03 00 00 00 <48> 8b 50 38 4c 8d 7a 50 74 45 8b b2 d0 00 00 00 85 f6 0f 85 72
RIP [<ffffffffc0d87ea6>] hfi1_make_tid_rdma_pkt+0x476/0xcb0 [hfi1]
RSP <ffff9a11776abd08>
CR2: ffff9a32d27abff8
This problem can happen if a RESYNC request is received before r_tid_ack
is modified.
This patch fixes the issue by making sure that r_tid_ack is set to a valid
value before a TID RDMA ACK is built. Functions are defined to simplify
the code.
Fixes: 07b923701e38 ("IB/hfi1: Add functions to receive TID RDMA WRITE request")
Fixes: 7cf0ad679de4 ("IB/hfi1: Add a function to receive TID RDMA RESYNC packet")
Link: https://lore.kernel.org/r/20191025195830.106825.44022.stgit@awfm-01.aw.intel.com
Cc: <stable@vger.kernel.org>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
If an hfi1 card is inserted in a Gen4 systems, the driver will avoid the
gen3 speed bump and the card will operate at half speed.
This is because the driver avoids the gen3 speed bump when the parent bus
speed isn't identical to gen3, 8.0GT/s. This is not compatible with gen4
and newer speeds.
Fix by relaxing the test to explicitly look for the lower capability
speeds which inherently allows for gen4 and all future speeds.
Fixes: 7724105686e7 ("IB/hfi1: add driver files")
Link: https://lore.kernel.org/r/20191101192059.106248.1699.stgit@awfm-01.aw.intel.com
Cc: <stable@vger.kernel.org>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: James Erwin <james.erwin@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
eq->buf_list->buf and eq->buf_list should also be freed when eqe_hop_num
is set to 0, or there will be memory leaks.
Fixes: a5073d6054f7 ("RDMA/hns: Add eq support of hip08")
Link: https://lore.kernel.org/r/1572072995-11277-3-git-send-email-liweihang@hisilicon.com
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Weihang Li <liweihang@hisilicon.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
_put_ep_safe() and _put_pass_ep_safe() free the skb before it is freed by
process_work(). fix double free by freeing the skb only in process_work().
Fixes: 1dad0ebeea1c ("iw_cxgb4: Avoid touch after free error in ARP failure handlers")
Link: https://lore.kernel.org/r/1572006880-5800-1-git-send-email-bharat@chelsio.com
Signed-off-by: Dakshaja Uppalapati <dakshaja@chelsio.com>
Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
The mkey_table xarray is touched by the reg_mr_callback() function which
is called from a hard irq. Thus all other uses of xa_lock must use the
_irq variants.
WARNING: inconsistent lock state
5.4.0-rc1 #12 Not tainted
--------------------------------
inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
python3/343 [HC0[0]:SC0[0]:HE1:SE1] takes:
ffff888182be1d40 (&(&xa->xa_lock)->rlock#3){?.-.}, at: xa_erase+0x12/0x30
{IN-HARDIRQ-W} state was registered at:
lock_acquire+0xe1/0x200
_raw_spin_lock_irqsave+0x35/0x50
reg_mr_callback+0x2dd/0x450 [mlx5_ib]
mlx5_cmd_exec_cb_handler+0x2c/0x70 [mlx5_core]
mlx5_cmd_comp_handler+0x355/0x840 [mlx5_core]
[..]
Possible unsafe locking scenario:
CPU0
----
lock(&(&xa->xa_lock)->rlock#3);
<Interrupt>
lock(&(&xa->xa_lock)->rlock#3);
*** DEADLOCK ***
2 locks held by python3/343:
#0: ffff88818eb4bd38 (&uverbs_dev->disassociate_srcu){....}, at: ib_uverbs_ioctl+0xe5/0x1e0 [ib_uverbs]
#1: ffff888176c76d38 (&file->hw_destroy_rwsem){++++}, at: uobj_destroy+0x2d/0x90 [ib_uverbs]
stack backtrace:
CPU: 3 PID: 343 Comm: python3 Not tainted 5.4.0-rc1 #12
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
Call Trace:
dump_stack+0x86/0xca
print_usage_bug.cold.50+0x2e5/0x355
mark_lock+0x871/0xb50
? match_held_lock+0x20/0x250
? check_usage_forwards+0x240/0x240
__lock_acquire+0x7de/0x23a0
? __kasan_check_read+0x11/0x20
? mark_lock+0xae/0xb50
? mark_held_locks+0xb0/0xb0
? find_held_lock+0xca/0xf0
lock_acquire+0xe1/0x200
? xa_erase+0x12/0x30
_raw_spin_lock+0x2a/0x40
? xa_erase+0x12/0x30
xa_erase+0x12/0x30
mlx5_ib_dealloc_mw+0x55/0xa0 [mlx5_ib]
uverbs_dealloc_mw+0x3c/0x70 [ib_uverbs]
uverbs_free_mw+0x1a/0x20 [ib_uverbs]
destroy_hw_idr_uobject+0x49/0xa0 [ib_uverbs]
[..]
Fixes: 0417791536ae ("RDMA/mlx5: Add missing synchronize_srcu() for MW cases")
Link: https://lore.kernel.org/r/20191024234910.GA9038@ziepe.ca
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
Remove spaces from the reported firmware version string.
Actual value:
$ cat /sys/class/infiniband/qedr0/fw_ver
8. 37. 7. 0
Expected value:
$ cat /sys/class/infiniband/qedr0/fw_ver
8.37.7.0
Fixes: ec72fce401c6 ("qedr: Add support for RoCE HW init")
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Acked-by: Michal Kalderon <michal.kalderon@marvell.com>
Link: https://lore.kernel.org/r/20191007210730.7173-1-kamalheib1@gmail.com
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
pass_accept_req() is using the same skb for handling accept request and
sending accept reply to HW. Here req and rpl structures are pointing to
same skb->data which is over written by INIT_TP_WR() and leads to
accessing corrupt req fields in accept_cr() while checking for ECN flags.
Reordered code in accept_cr() to fetch correct req fields.
Fixes: 92e7ae7172 ("iw_cxgb4: Choose appropriate hw mtu index and ISS for iWARP connections")
Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com>
Link: https://lore.kernel.org/r/20191003104353.11590-1-bharat@chelsio.com
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
There is no reason for a different pad buffer for the two
packet types.
Expand the current buffer allocation to allow for both
packet types.
Fixes: f8195f3b14a0 ("IB/hfi1: Eliminate allocation while atomic")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Kaike Wan <kaike.wan@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Link: https://lore.kernel.org/r/20191004204934.26838.13099.stgit@awfm-01.aw.intel.com
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
A TID RDMA READ request could be retried under one of the following
conditions:
- The RC retry timer expires;
- A later TID RDMA READ RESP packet is received before the next
expected one.
For the latter, under normal conditions, the PSN in IB space is used
for comparison. More specifically, the IB PSN in the incoming TID RDMA
READ RESP packet is compared with the last IB PSN of a given TID RDMA
READ request to determine if the request should be retried. This is
similar to the retry logic for noraml RDMA READ request.
However, if a TID RDMA READ RESP packet is lost due to congestion,
header suppresion will be disabled and each incoming packet will raise
an interrupt until the hardware flow is reloaded. Under this condition,
each packet KDETH PSN will be checked by software against r_next_psn
and a retry will be requested if the packet KDETH PSN is later than
r_next_psn. Since each TID RDMA READ segment could have up to 64
packets and each TID RDMA READ request could have many segments, we
could make far more retries under such conditions, and thus leading to
RETRY_EXC_ERR status.
This patch fixes the issue by removing the retry when the incoming
packet KDETH PSN is later than r_next_psn. Instead, it resorts to
RC timer and normal IB PSN comparison for any request retry.
Fixes: 9905bf06e890 ("IB/hfi1: Add functions to receive TID RDMA READ response")
Cc: <stable@vger.kernel.org>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Link: https://lore.kernel.org/r/20191004204035.26542.41684.stgit@awfm-01.aw.intel.com
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Before QP is closed it changes to ERROR state, when this happens
the QP was left with old rate limit that was already removed from
the table.
Fixes: 7d29f349a4b9 ("IB/mlx5: Properly adjust rate limit on QP state transitions")
Signed-off-by: Rafi Wiener <rafiw@mellanox.com>
Signed-off-by: Oleg Kuporosov <olegk@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Link: https://lore.kernel.org/r/20191002120243.16971-1-leon@kernel.org
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
While MR uses live as the SRCU 'update', the MW case uses the xarray
directly, xa_erase() causes the MW to become inaccessible to the pagefault
thread.
Thus whenever a MW is removed from the xarray we must synchronize_srcu()
before freeing it.
This must be done before freeing the mkey as re-use of the mkey while the
pagefault thread is using the stale mkey is undesirable.
Add the missing synchronizes to MW and DEVX indirect mkey and delete the
bogus protection against double destroy in mlx5_core_destroy_mkey()
Fixes: 534fd7aac56a ("IB/mlx5: Manage indirection mkey upon DEVX flow for ODP")
Fixes: 6aec21f6a832 ("IB/mlx5: Page faults handling infrastructure")
Link: https://lore.kernel.org/r/20191001153821.23621-7-jgg@ziepe.ca
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
live is used to signal to the pagefault thread that the MR is initialized
and ready for use. It should be after the umem is assigned and all other
setup is completed. This prevents races (at least) of the form:
CPU0 CPU1
mlx5_ib_alloc_implicit_mr()
implicit_mr_alloc()
live = 1
imr->umem = umem
num_pending_prefetch_inc()
if (live)
atomic_inc(num_pending_prefetch)
atomic_set(num_pending_prefetch,0) // Overwrites other thread's store
Further, live is being used with SRCU as the 'update' in an
acquire/release fashion, so it can not be read and written raw.
Move all live = 1's to after MR initialization is completed and use
smp_store_release/smp_load_acquire() for manipulating it.
Add a missing live = 0 when an implicit MR child is deleted, before
queuing work to do synchronize_srcu().
The barriers in update_odp_mr() were some broken attempt to create a
acquire/release, but were not even applied consistently and missed the
point, delete it as well.
Fixes: 6aec21f6a832 ("IB/mlx5: Page faults handling infrastructure")
Link: https://lore.kernel.org/r/20191001153821.23621-6-jgg@ziepe.ca
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
During destroy setting live = 0 and then synchronize_srcu() prevents
num_pending_prefetch from incrementing, and also, ensures that all work
holding that count is queued on the WQ. Testing before causes races of the
form:
CPU0 CPU1
dereg_mr()
mlx5_ib_advise_mr_prefetch()
srcu_read_lock()
num_pending_prefetch_inc()
if (!live)
live = 0
atomic_read() == 0
// skip flush_workqueue()
atomic_inc()
queue_work();
srcu_read_unlock()
WARN_ON(atomic_read()) // Fails
Swap the order so that the synchronize_srcu() prevents this.
Fixes: a6bc3875f176 ("IB/mlx5: Protect against prefetch of invalid MR")
Link: https://lore.kernel.org/r/20191001153821.23621-5-jgg@ziepe.ca
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
This fixes a race of the form:
CPU0 CPU1
mlx5_ib_invalidate_range() mlx5_ib_invalidate_range()
// This one actually makes npages == 0
ib_umem_odp_unmap_dma_pages()
if (npages == 0 && !dying)
// This one does nothing
ib_umem_odp_unmap_dma_pages()
if (npages == 0 && !dying)
dying = 1;
dying = 1;
schedule_work(&umem_odp->work);
// Double schedule of the same work
schedule_work(&umem_odp->work); // BOOM
npages and dying must be read and written under the umem_mutex lock.
Since whenever ib_umem_odp_unmap_dma_pages() is called mlx5 must also call
mlx5_ib_update_xlt, and both need to be done in the same locking region,
hoist the lock out of unmap.
This avoids an expensive double critical section in
mlx5_ib_invalidate_range().
Fixes: 81713d3788d2 ("IB/mlx5: Add implicit MR support")
Link: https://lore.kernel.org/r/20191001153821.23621-4-jgg@ziepe.ca
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
mlx5_ib_update_xlt() must be protected against parallel free of the MR it
is accessing, also it must be called single threaded while updating the
HW. Otherwise we can have races of the form:
CPU0 CPU1
mlx5_ib_update_xlt()
mlx5_odp_populate_klm()
odp_lookup() == NULL
pklm = ZAP
implicit_mr_get_data()
implicit_mr_alloc()
<update interval tree>
mlx5_ib_update_xlt
mlx5_odp_populate_klm()
odp_lookup() != NULL
pklm = VALID
mlx5_ib_post_send_wait()
mlx5_ib_post_send_wait() // Replaces VALID with ZAP
This can be solved by putting both the SRCU and the umem_mutex lock around
every call to mlx5_ib_update_xlt(). This ensures that the content of the
interval tree relavent to mlx5_odp_populate_klm() (ie mr->parent == mr)
will not change while it is running, and thus the posted WRs to update the
KLM will always reflect the correct information.
The race above will resolve by either having CPU1 wait till CPU0 completes
the ZAP or CPU0 will run after the add and instead store VALID.
The pagefault path adding children already holds the umem_mutex and SRCU,
so the only missed lock is during MR destruction.
Fixes: 81713d3788d2 ("IB/mlx5: Add implicit MR support")
Link: https://lore.kernel.org/r/20191001153821.23621-3-jgg@ziepe.ca
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
This code is completely broken, the umem of a ODP MR simply cannot be
discarded without a lot more locking, nor can an ODP mkey be blithely
destroyed via destroy_mkey().
Fixes: 6aec21f6a832 ("IB/mlx5: Page faults handling infrastructure")
Link: https://lore.kernel.org/r/20191001153821.23621-2-jgg@ziepe.ca
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
Nicolas pointed out that the cxgb4 driver is doing dma off of the stack,
which is generally considered a very bad thing. On some architectures it
could be a security problem, but odds are none of them actually run this
driver, so it's just a "normal" bug.
Resolve this by allocating the memory for a message off of the heap
instead of the stack. kmalloc() always will give us a proper memory
location that DMA will work correctly from.
Link: https://lore.kernel.org/r/20191001165611.GA3542072@kroah.com
Reported-by: Nicolas Waisman <nico@semmle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Tested-by: Potnuri Bharat Teja <bharat@chelsio.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
i40iw IB device registration fails with ENODEV.
ib_register_device
setup_device/setup_port_data
i40iw_port_immutable
ib_query_port
iw_query_port
ib_device_get_netdev(ENODEV)
ib_device_get_netdev() does not have a netdev associated
with the ibdev and thus fails.
Use ib_device_set_netdev() to associate netdev to ibdev
in i40iw before IB device registration.
Fixes: 4929116bdf72 ("RDMA/core: Add common iWARP query port")
Link: https://lore.kernel.org/r/20190925164524.856-1-shiraz.saleem@intel.com
Signed-off-by: Shiraz, Saleem <shiraz.saleem@intel.com>
Reviewed-by: Kamal Heib <kamalheib1@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
dump_qp() is wrongly trying to dump SRQ structures as QP when SRQ is used
by the application. This patch matches the QPID before dumping them. Also
removes unwanted SRQ id addition to QP id xarray.
Fixes: 2f43129127e6 ("cxgb4: Convert qpidr to XArray")
Link: https://lore.kernel.org/r/20190930074119.20046-1-bharat@chelsio.com
Signed-off-by: Rahul Kundu <rahul.kundu@chelsio.com>
Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
In sdma_init if rhashtable_init fails the allocated memory for
tmp_sdma_rht should be released.
Fixes: 5a52a7acf7e2 ("IB/hfi1: NULL pointer dereference when freeing rhashtable")
Link: https://lore.kernel.org/r/20190925144543.10141-1-navid.emamdoost@gmail.com
Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com>
Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
An extra kfree cleanup was missed since these are now deallocated by core.
Link: https://lore.kernel.org/r/1568848066-12449-1-git-send-email-aditr@vmware.com
Cc: <stable@vger.kernel.org>
Fixes: 68e326dea1db ("RDMA: Handle SRQ allocations by IB/core")
Signed-off-by: Adit Ranadive <aditr@vmware.com>
Reviewed-by: Vishnu Dasa <vdasa@vmware.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
"unlikely(IS_ERR_OR_NULL(x))" is excessive. IS_ERR_OR_NULL() already uses
unlikely() internally.
Link: http://lkml.kernel.org/r/20190829165025.15750-8-efremov@linux.com
Signed-off-by: Denis Efremov <efremov@linux.com>
Cc: Mike Marciniszyn <mike.marciniszyn@intel.com>
Cc: Joe Perches <joe@perches.com>
Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
[11~From: John Hubbard <jhubbard@nvidia.com>
Subject: mm/gup: add make_dirty arg to put_user_pages_dirty_lock()
Patch series "mm/gup: add make_dirty arg to put_user_pages_dirty_lock()",
v3.
There are about 50+ patches in my tree [2], and I'll be sending out the
remaining ones in a few more groups:
* The block/bio related changes (Jerome mostly wrote those, but I've had
to move stuff around extensively, and add a little code)
* mm/ changes
* other subsystem patches
* an RFC that shows the current state of the tracking patch set. That
can only be applied after all call sites are converted, but it's good to
get an early look at it.
This is part a tree-wide conversion, as described in fc1d8e7cca2d ("mm:
introduce put_user_page*(), placeholder versions").
This patch (of 3):
Provide more capable variation of put_user_pages_dirty_lock(), and delete
put_user_pages_dirty(). This is based on the following:
1. Lots of call sites become simpler if a bool is passed into
put_user_page*(), instead of making the call site choose which
put_user_page*() variant to call.
2. Christoph Hellwig's observation that set_page_dirty_lock() is
usually correct, and set_page_dirty() is usually a bug, or at least
questionable, within a put_user_page*() calling chain.
This leads to the following API choices:
* put_user_pages_dirty_lock(page, npages, make_dirty)
* There is no put_user_pages_dirty(). You have to
hand code that, in the rare case that it's
required.
[jhubbard@nvidia.com: remove unused variable in siw_free_plist()]
Link: http://lkml.kernel.org/r/20190729074306.10368-1-jhubbard@nvidia.com
Link: http://lkml.kernel.org/r/20190724044537.10458-2-jhubbard@nvidia.com
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Pull RDMA subsystem updates from Jason Gunthorpe:
"This cycle mainly saw lots of bug fixes and clean up code across the
core code and several drivers, few new functional changes were made.
- Many cleanup and bug fixes for hns
- Various small bug fixes and cleanups in hfi1, mlx5, usnic, qed,
bnxt_re, efa
- Share the query_port code between all the iWarp drivers
- General rework and cleanup of the ODP MR umem code to fit better
with the mmu notifier get/put scheme
- Support rdma netlink in non init_net name spaces
- mlx5 support for XRC devx and DC ODP"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (99 commits)
RDMA: Fix double-free in srq creation error flow
RDMA/efa: Fix incorrect error print
IB/mlx5: Free mpi in mp_slave mode
IB/mlx5: Use the original address for the page during free_pages
RDMA/bnxt_re: Fix spelling mistake "missin_resp" -> "missing_resp"
RDMA/hns: Package operations of rq inline buffer into separate functions
RDMA/hns: Optimize cmd init and mode selection for hip08
IB/hfi1: Define variables as unsigned long to fix KASAN warning
IB/{rdmavt, hfi1, qib}: Add a counter for credit waits
IB/hfi1: Add traces for TID RDMA READ
RDMA/siw: Relax from kmap_atomic() use in TX path
IB/iser: Support up to 16MB data transfer in a single command
RDMA/siw: Fix page address mapping in TX path
RDMA: Fix goto target to release the allocated memory
RDMA/usnic: Avoid overly large buffers on stack
RDMA/odp: Add missing cast for 32 bit
RDMA/hns: Use devm_platform_ioremap_resource() to simplify code
Documentation/infiniband: update name of some functions
RDMA/cma: Fix false error message
RDMA/hns: Fix wrong assignment of qp_access_flags
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull hmm updates from Jason Gunthorpe:
"This is more cleanup and consolidation of the hmm APIs and the very
strongly related mmu_notifier interfaces. Many places across the tree
using these interfaces are touched in the process. Beyond that a
cleanup to the page walker API and a few memremap related changes
round out the series:
- General improvement of hmm_range_fault() and related APIs, more
documentation, bug fixes from testing, API simplification &
consolidation, and unused API removal
- Simplify the hmm related kconfigs to HMM_MIRROR and DEVICE_PRIVATE,
and make them internal kconfig selects
- Hoist a lot of code related to mmu notifier attachment out of
drivers by using a refcount get/put attachment idiom and remove the
convoluted mmu_notifier_unregister_no_release() and related APIs.
- General API improvement for the migrate_vma API and revision of its
only user in nouveau
- Annotate mmu_notifiers with lockdep and sleeping region debugging
Two series unrelated to HMM or mmu_notifiers came along due to
dependencies:
- Allow pagemap's memremap_pages family of APIs to work without
providing a struct device
- Make walk_page_range() and related use a constant structure for
function pointers"
* tag 'for-linus-hmm' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (75 commits)
libnvdimm: Enable unit test infrastructure compile checks
mm, notifier: Catch sleeping/blocking for !blockable
kernel.h: Add non_block_start/end()
drm/radeon: guard against calling an unpaired radeon_mn_unregister()
csky: add missing brackets in a macro for tlb.h
pagewalk: use lockdep_assert_held for locking validation
pagewalk: separate function pointers from iterator data
mm: split out a new pagewalk.h header from mm.h
mm/mmu_notifiers: annotate with might_sleep()
mm/mmu_notifiers: prime lockdep
mm/mmu_notifiers: add a lockdep map for invalidate_range_start/end
mm/mmu_notifiers: remove the __mmu_notifier_invalidate_range_start/end exports
mm/hmm: hmm_range_fault() infinite loop
mm/hmm: hmm_range_fault() NULL pointer bug
mm/hmm: fix hmm_range_fault()'s handling of swapped out pages
mm/mmu_notifiers: remove unregister_no_release
RDMA/odp: remove ib_ucontext from ib_umem
RDMA/odp: use mmu_notifier_get/put for 'struct ib_ucontext_per_mm'
RDMA/mlx5: Use odp instead of mr->umem in pagefault_mr
RDMA/mlx5: Use ib_umem_start instead of umem.address
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs namei updates from Al Viro:
"Pathwalk-related stuff"
[ Audit-related cleanups, misc simplifications, and easier to follow
nd->root refcounts - Linus ]
* 'work.namei' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
devpts_pty_kill(): don't bother with d_delete()
infiniband: don't bother with d_delete()
hypfs: don't bother with d_delete()
fs/namei.c: keep track of nd->root refcount status
fs/namei.c: new helper - legitimize_root()
kill the last users of user_{path,lpath,path_dir}()
namei.h: get the comments on LOOKUP_... in sync with reality
kill LOOKUP_NO_EVAL, don't bother including namei.h from audit.h
audit_inode(): switch to passing AUDIT_INODE_...
filename_mountpoint(): make LOOKUP_NO_EVAL unconditional there
filename_lookup(): audit_inode() argument is always 0
|