Age | Commit message (Collapse) | Author | Files | Lines |
|
Use the correct kmap()/kunmap() flow to determine page address used for
CRC computation. Using page_address() is wrong, since page might be in
highmem.
Fixes: b9be6f18cf9e ("rdma/siw: transmit path")
Link: https://lore.kernel.org/r/20190909132427.30264-1-bmt@zurich.ibm.com
Reported-by: Krishnamraju Eraparaju <krishna2@chelsio.com>
Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
In bnxt_re_create_srq(), when ib_copy_to_udata() fails allocated memory
should be released by goto fail.
Fixes: 37cb11acf1f7 ("RDMA/bnxt_re: Add SRQ support for Broadcom adapters")
Link: https://lore.kernel.org/r/20190910222120.16517-1-navid.emamdoost@gmail.com
Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
It's never a good idea to put a 1000-byte buffer on the kernel stack. The
compiler warns about this instance when usnic_ib_log_vf() gets inlined
into usnic_ib_pci_probe():
drivers/infiniband/hw/usnic/usnic_ib_main.c:543:12: error: stack frame size of 1044 bytes in function 'usnic_ib_pci_probe' [-Werror,-Wframe-larger-than=]
As this is only called for debugging purposes in the setup path, it's
trivial to convert to a dynamic allocation.
Link: https://lore.kernel.org/r/20190906155730.2750200-1-arnd@arndb.de
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
length is a size_t which is unsigned int on 32 bit:
../drivers/infiniband/core/umem_odp.c: In function 'ib_init_umem_odp':
../include/linux/overflow.h:59:15: warning: comparison of distinct pointer types lacks a cast
59 | (void) (&__a == &__b); \
| ^~
../drivers/infiniband/core/umem_odp.c:220:7: note: in expansion of macro 'check_add_overflow'
Fixes: 204e3e5630c5 ("RDMA/odp: Check for overflow when computing the umem_odp end")
Link: https://lore.kernel.org/r/20190908080726.30017-1-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
Use devm_platform_ioremap_resource() to simplify the code a bit. This is
detected by coccinelle.
Link: https://lore.kernel.org/r/20190906141727.26552-1-yuehaibing@huawei.com
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
In addr_handler(), assuming status == 0 and the device already has been
acquired (id_priv->cma_dev != NULL), we get the following incorrect
"error" message:
RDMA CM: ADDR_ERROR: failed to resolve IP. status 0
Fixes: 498683c6a7ee ("IB/cma: Add debug messages to error flows")
Link: https://lore.kernel.org/r/20190902092731.1055757-1-haakon.bugge@oracle.com
Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2019-09-01 (Software steering support)
Abstract:
--------
Mellanox ConnetX devices supports packet matching, packet modification and
redirection. These functionalities are also referred to as flow-steering.
To configure a steering rule, the rule is written to the device owned
memory, this memory is accessed and cached by the device when processing
a packet.
Steering rules are constructed from multiple steering entries (STE).
Rules are configured using the Firmware command interface. The Firmware
processes the given driver command and translates them to STEs, then
writes them to the device memory in the current steering tables.
This process is slow due to the architecture of the command interface and
the processing complexity of each rule.
The highlight of this patchset is to cut the middle man (The firmware) and
do steering rules programming into device directly from the driver, with
no firmware intervention whatsoever.
Motivation:
-----------
Software (driver managed) steering allows for high rule insertion rates
compared to the FW steering described above, this is achieved by using
internal RDMA writes to the device owned memory instead of the slow
command interface to program steering rules.
Software (driver managed) steering, doesn't depend on new FW
for new steering functionality, new implementations can be done in the
driver skipping the FW layer.
Performance:
------------
The insertion rate on a single core using the new approach allows
programming ~300K rules per sec. (Done via direct raw test to the new mlx5
sw steering layer, without any kernel layer involved).
Test: TC L2 rules
33K/s with Software steering (this patchset).
5K/s with FW and current driver.
This will improve OVS based solution performance.
Architecture and implementation details:
----------------------------------------
Software steering will be dynamically selected via devlink device
parameter. Example:
$ devlink dev param show pci/0000:06:00.0 name flow_steering_mode
pci/0000:06:00.0:
name flow_steering_mode type driver-specific
values:
cmode runtime value smfs
mlx5 software steering module a.k.a (DR - Direct Rule) is implemented
and contained in mlx5/core/steering directory and controlled by
MLX5_SW_STEERING kconfig flag.
mlx5 core steering layer (fs_core) already provides a shim layer for
implementing different steering mechanisms, software steering will
leverage that as seen at the end of this series.
When Software Steering for a specific steering domain
(NIC/RDMA/Vport/ESwitch, etc ..) is supported, it will cause rules
targeting this domain to be created using SW steering instead of FW.
The implementation includes:
Domain - The steering domain is the object that all other object resides
in. It holds the memory allocator, send engine, locks and other shared
data needed by lower objects such as table, matcher, rule, action.
Each domain can contain multiple tables. Domain is equivalent to
namespaces e.g (NIC/RDMA/Vport/ESwitch, etc ..) as implemented
currently in mlx5_core fs_core (flow steering core).
Table - Table objects are used for holding multiple matchers, each table
has a level used to prevent processing loops. Packets are being
directed to this table once it is set as the root table, this is done
by fs_core using a FW command. A packet is being processed inside the
table matcher by matcher until a successful hit, otherwise the packet
will perform the default action.
Matcher - Matchers objects are used to specify the fields mask for
matching when processing a packet. A matcher belongs to a table, each
matcher can hold multiple rules, each rule with different matching
values corresponding to the matcher mask. Each matcher has a priority
used for rule processing order inside the table.
Action - Action objects are created to specify different steering actions
such as count, reformat (encapsulate, decapsulate, ...), modify
header, forward to table and many other actions. When creating a rule
a sequence of actions can be provided to be executed on a successful
match.
Rule - Rule objects are used to specify a specific match on packets as
well as the actions that should be executed. A rule belongs to a
matcher.
STE - This layer is used to hold the specific STE format for the device
and to convert the requested rule to STEs. Each rule is constructed of
an STE chain, Multiple rules construct a steering graph. Each node in
the graph is a hash table containing multiple STEs. The index of each
STE in the hash table is being calculated using a CRC32 hash function.
Memory pool - Used for managing and caching device owned memory for rule
insertion. The memory is being allocated using DM (device memory) API.
Communication with device - layer for standard RDMA operation using RC QP
to configure the device steering.
Command utility - This module holds all of the FW commands that are
required for SW steering to function.
Patch planning and files:
-------------------------
1) First patch, adds the support to Add flow steering actions to fs_cmd
shim layer.
2) Next 12 patch will add a file per each Software steering
functionality/module as described above. (See patches with title: DR, *)
3) Add CONFIG_MLX5_SW_STEERING for software steering support and enable
build with the new files
4) Next two patches will add the support for software steering in mlx5
steering shim layer
net/mlx5: Add API to set the namespace steering mode
net/mlx5: Add direct rule fs_cmd implementation
5) Last two patches will add the new devlink parameter to select mlx5
steering mode, will be valid only for switchdev mode for now.
Two modes are supported:
1. DMFS - Device managed flow steering
2. SMFS - Software/Driver managed flow steering.
In the DMFS mode, the HW steering entities are created through the
FW. In the SMFS mode this entities are created though the driver
directly.
The driver will use the devlink steering mode only if the steering
domain supports it, for now SMFS will manages only the switchdev
eswitch steering domain.
User command examples:
- Set SMFS flow steering mode::
$ devlink dev param set pci/0000:06:00.0 name flow_steering_mode value "smfs" cmode runtime
- Read device flow steering mode::
$ devlink dev param show pci/0000:06:00.0 name flow_steering_mode
pci/0000:06:00.0:
name flow_steering_mode type driver-specific
values:
cmode runtime value smfs
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add flow steering actions: modify header and packet reformat
to the fs_cmd shim layer. This allows each namespace to define
possibly different functionality for alloc/dealloc action commands.
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Dentries are never retained there; d_delete() + dput() is no
different from d_drop() + dput().
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
r8152 conflicts are the NAPI fixes in 'net' overlapping with
some tasklet stuff in net-next
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Merge mlx5-next patches needed for upcoming mlx5 software steering.
1) Alex adds HW bits and definitions required for SW steering
2) Ariel moves device memory management to mlx5_core (From mlx5_ib)
3) Maor, Cleanups and fixups for eswitch mode and RoCE
4) Mark, Set only stag for match untagged packets
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Move the device memory allocation and deallocation commands
SW ICM memory to mlx5_core to expose this API for all
mlx5_core users.
This comes as preparation for supporting SW steering in kernel
where it will be required to allocate and register device
memory for direct rule insertion.
In addition, an API to register this device memory for future
remote access operations is introduced using the create_mkey
commands.
Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
mlx5 HW spec and bits updates:
1) Aya exposes IP-in-IP capability in mlx5_core.
2) Maxim exposes lag tx port affinity capabilities.
3) Moshe adds VNIC_ENV internal rq counter bits.
4) ODP capabilities for DC transport
Misc updates:
5) Saeed, two compiler warnings cleanups
6) Add XRQ legacy commands opcodes
7) Use refcount_t for refcount
8) fix a -Wstringop-truncation warning
|
|
We used wrong shifts when set qp_attr->qp_access_flag,
this patch exchange V2_QP_RRE_S and V2_QP_RWE_S to fix it.
Fixes: 2a3d923f8730 ("RDMA/hns: Replace magic numbers with #defines")
Signed-off-by: Weihang Li <liweihang@hisilicon.com>
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Link: https://lore.kernel.org/r/1566393276-42555-10-git-send-email-oulijun@huawei.com
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Delete the assignment of srq->ibsrq.ext.xrc.srq_num, beacause this
value is not used.
Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Link: https://lore.kernel.org/r/1566393276-42555-9-git-send-email-oulijun@huawei.com
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Because if the value of srqwqe_buf_pg_sz is zero, npages and
ib_umem_page_count are equivalent, page_shif and PAGE_SHIFT
are equivalent in hns_roce_create_srq. Here remove it.
Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Link: https://lore.kernel.org/r/1566393276-42555-8-git-send-email-oulijun@huawei.com
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
If the hardware is resetting, the driver should not perform
the mailbox operation.Function-clear needs to add relevant judgment.
Signed-off-by: Lang Cheng <chenglang@huawei.com>
Link: https://lore.kernel.org/r/1566393276-42555-7-git-send-email-oulijun@huawei.com
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Sparse is whining about the u32 and __le32 mixed usage in the driver.
The roce_set_field() is used to __le32 data of hardware only.
If a variable is not delivered to the hardware, the __le32 type and
related operations are not required.
Signed-off-by: Lang Cheng <chenglang@huawei.com>
Link: https://lore.kernel.org/r/1566393276-42555-6-git-send-email-oulijun@huawei.com
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Here uses the meaningful macro instead of the magic number
for readability.
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Lang Chen <chenglang@huawei.com>
Link: https://lore.kernel.org/r/1566393276-42555-5-git-send-email-oulijun@huawei.com
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
we change type of some members to u32/u8 from __le32 as well as
split sl_tclass_flowlabel into three variables in hns_roce_av.
Signed-off-by: Lang Cheng <chenglang@huawei.com>
Link: https://lore.kernel.org/r/1566393276-42555-4-git-send-email-oulijun@huawei.com
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Walking the address list of an inet6_dev requires
appropriate locking. Since the called function
siw_listen_address() may sleep, we have to use
rtnl_lock() instead of read_lock_bh().
Also introduces sanity checks if we got a device
from in_dev_get() or in6_dev_get().
Reported-by: Bart Van Assche <bvanassche@acm.org>
Fixes: 6c52fdc244b5 ("rdma/siw: connection management")
Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com>
Link: https://lore.kernel.org/r/20190828130355.22830-1-bmt@zurich.ibm.com
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Michael Guralnik says:
====================
The series adds support for on-demand paging for DC transport.
As DC is a mlx-only transport, the capabilities are exposed to the user
using DEVX objects and later on through mlx5dv_query_device.
====================
Based on the mlx5-next branch from
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux for
dependencies
* branch 'mlx5-odp-dc':
IB/mlx5: Add page fault handler for DC initiator WQE
IB/mlx5: Remove check of FW capabilities in ODP page fault handling
net/mlx5: Set ODP capabilities for DC transport to max
|
|
Parsing DC initiator WQEs upon page fault requires skipping an address
vector segment, as in UD WQEs.
Link: https://lore.kernel.org/r/20190819120815.21225-4-leon@kernel.org
Signed-off-by: Michael Guralnik <michaelgur@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
As page fault handling is initiated by FW, there is no need to check that
the ODP supports the operation and transport.
Link: https://lore.kernel.org/r/20190819120815.21225-3-leon@kernel.org
Signed-off-by: Michael Guralnik <michaelgur@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
Minor conflict in r8169, bug fix had two versions in net
and net-next, take the net-next hunks.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The dev_kfree_skb() function performs also input parameter validation.
Thus the test around the shown calls is not needed.
This issue was detected by using the Coccinelle software.
Link: https://lore.kernel.org/r/16df4c50-1f61-d7c4-3fc8-3073666d281d@web.de
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
Use FIELD_SIZEOF macro instead of hard coding it in field_avail macro.
Link: https://lore.kernel.org/r/20190826115350.21718-3-galpress@amazon.com
Reviewed-by: Daniel Kranzdorf <dkkranzd@amazon.com>
Reviewed-by: Firas JahJah <firasj@amazon.com>
Signed-off-by: Gal Pressman <galpress@amazon.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
EFA driver is not a kverbs provider, the check for MR umem is redundant.
Link: https://lore.kernel.org/r/20190826115350.21718-2-galpress@amazon.com
Reviewed-by: Firas JahJah <firasj@amazon.com>
Reviewed-by: Yossi Leybovich <sleybo@amazon.com>
Signed-off-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
Currently user applications can only steer TCP/IP(NIC RX/RX) traffic.
This patch adds RDMA_RX as a new flow type to allow the user to insert
steering rules to control RDMA traffic.
Two destinations are supported(but not set at the same time): devx
flow table object and QP.
Signed-off-by: Mark Zhang <markz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Link: https://lore.kernel.org/r/20190819113626.20284-4-leon@kernel.org
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Fixes improper casting between addresses and unsigned types.
Changes siw_pbl_get_buffer() function to return appropriate
dma_addr_t, and not u64.
Also fixes debug prints. Now any potentially kernel private
pointers are printed formatted as '%pK', to allow keeping that
information secret.
Fixes: d941bfe500be ("RDMA/siw: Change CQ flags from 64->32 bits")
Fixes: b0fff7317bb4 ("rdma/siw: completion queue methods")
Fixes: 8b6a361b8c48 ("rdma/siw: receive path")
Fixes: b9be6f18cf9e ("rdma/siw: transmit path")
Fixes: f29dd55b0236 ("rdma/siw: queue pair methods")
Fixes: 2251334dcac9 ("rdma/siw: application buffer management")
Fixes: 303ae1cdfdf7 ("rdma/siw: application interface")
Fixes: 6c52fdc244b5 ("rdma/siw: connection management")
Fixes: a531975279f3 ("rdma/siw: main include file")
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reported-by: Jason Gunthorpe <jgg@ziepe.ca>
Reported-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com>
Link: https://lore.kernel.org/r/20190822173738.26817-1-bmt@zurich.ibm.com
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
All user level and most in-kernel applications submit WQEs
where the SG list entries are all of a single type.
iSER in particular, however, will send us WQEs with mixed SG
types: sge[0] = kernel buffer, sge[1] = PBL region.
Check and set is_kva on each SG entry individually instead of
assuming the first SGE type carries through to the last.
This fixes iSER over siw.
Fixes: b9be6f18cf9e ("rdma/siw: transmit path")
Reported-by: Krishnamraju Eraparaju <krishna2@chelsio.com>
Tested-by: Krishnamraju Eraparaju <krishna2@chelsio.com>
Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com>
Link: https://lore.kernel.org/r/20190822150741.21871-1-bmt@zurich.ibm.com
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Driver copies FW commands to the HW queue as units of 16 bytes. Some
of the command structures are not exact multiple of 16. So while copying
the data from those structures, the stack out of bounds messages are
reported by KASAN. The following error is reported.
[ 1337.530155] ==================================================================
[ 1337.530277] BUG: KASAN: stack-out-of-bounds in bnxt_qplib_rcfw_send_message+0x40a/0x850 [bnxt_re]
[ 1337.530413] Read of size 16 at addr ffff888725477a48 by task rmmod/2785
[ 1337.530540] CPU: 5 PID: 2785 Comm: rmmod Tainted: G OE 5.2.0-rc6+ #75
[ 1337.530541] Hardware name: Dell Inc. PowerEdge R730/0599V5, BIOS 1.0.4 08/28/2014
[ 1337.530542] Call Trace:
[ 1337.530548] dump_stack+0x5b/0x90
[ 1337.530556] ? bnxt_qplib_rcfw_send_message+0x40a/0x850 [bnxt_re]
[ 1337.530560] print_address_description+0x65/0x22e
[ 1337.530568] ? bnxt_qplib_rcfw_send_message+0x40a/0x850 [bnxt_re]
[ 1337.530575] ? bnxt_qplib_rcfw_send_message+0x40a/0x850 [bnxt_re]
[ 1337.530577] __kasan_report.cold.3+0x37/0x77
[ 1337.530581] ? _raw_write_trylock+0x10/0xe0
[ 1337.530588] ? bnxt_qplib_rcfw_send_message+0x40a/0x850 [bnxt_re]
[ 1337.530590] kasan_report+0xe/0x20
[ 1337.530592] memcpy+0x1f/0x50
[ 1337.530600] bnxt_qplib_rcfw_send_message+0x40a/0x850 [bnxt_re]
[ 1337.530608] ? bnxt_qplib_creq_irq+0xa0/0xa0 [bnxt_re]
[ 1337.530611] ? xas_create+0x3aa/0x5f0
[ 1337.530613] ? xas_start+0x77/0x110
[ 1337.530615] ? xas_clear_mark+0x34/0xd0
[ 1337.530623] bnxt_qplib_free_mrw+0x104/0x1a0 [bnxt_re]
[ 1337.530631] ? bnxt_qplib_destroy_ah+0x110/0x110 [bnxt_re]
[ 1337.530633] ? bit_wait_io_timeout+0xc0/0xc0
[ 1337.530641] bnxt_re_dealloc_mw+0x2c/0x60 [bnxt_re]
[ 1337.530648] bnxt_re_destroy_fence_mr+0x77/0x1d0 [bnxt_re]
[ 1337.530655] bnxt_re_dealloc_pd+0x25/0x60 [bnxt_re]
[ 1337.530677] ib_dealloc_pd_user+0xbe/0xe0 [ib_core]
[ 1337.530683] srpt_remove_one+0x5de/0x690 [ib_srpt]
[ 1337.530689] ? __srpt_close_all_ch+0xc0/0xc0 [ib_srpt]
[ 1337.530692] ? xa_load+0x87/0xe0
...
[ 1337.530840] do_syscall_64+0x6d/0x1f0
[ 1337.530843] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 1337.530845] RIP: 0033:0x7ff5b389035b
[ 1337.530848] Code: 73 01 c3 48 8b 0d 2d 0b 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d fd 0a 2c 00 f7 d8 64 89 01 48
[ 1337.530849] RSP: 002b:00007fff83425c28 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
[ 1337.530852] RAX: ffffffffffffffda RBX: 00005596443e6750 RCX: 00007ff5b389035b
[ 1337.530853] RDX: 000000000000000a RSI: 0000000000000800 RDI: 00005596443e67b8
[ 1337.530854] RBP: 0000000000000000 R08: 00007fff83424ba1 R09: 0000000000000000
[ 1337.530856] R10: 00007ff5b3902960 R11: 0000000000000206 R12: 00007fff83425e50
[ 1337.530857] R13: 00007fff8342673c R14: 00005596443e6260 R15: 00005596443e6750
[ 1337.530885] The buggy address belongs to the page:
[ 1337.530962] page:ffffea001c951dc0 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0
[ 1337.530964] flags: 0x57ffffc0000000()
[ 1337.530967] raw: 0057ffffc0000000 0000000000000000 ffffffff1c950101 0000000000000000
[ 1337.530970] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
[ 1337.530970] page dumped because: kasan: bad access detected
[ 1337.530996] Memory state around the buggy address:
[ 1337.531072] ffff888725477900: 00 00 00 00 f1 f1 f1 f1 00 00 00 00 00 f2 f2 f2
[ 1337.531180] ffff888725477980: 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00
[ 1337.531288] >ffff888725477a00: 00 f2 f2 f2 f2 f2 f2 00 00 00 f2 00 00 00 00 00
[ 1337.531393] ^
[ 1337.531478] ffff888725477a80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 1337.531585] ffff888725477b00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 1337.531691] ==================================================================
Fix this by passing the exact size of each FW command to
bnxt_qplib_rcfw_send_message as req->cmd_size. Before sending
the command to HW, modify the req->cmd_size to number of 16 byte units.
Fixes: 1ac5a4047975 ("RDMA/bnxt_re: Add bnxt_re RoCE driver")
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Link: https://lore.kernel.org/r/1566468170-489-1-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
At this point the ucontext is only being stored to access the ib_device,
so just store the ib_device directly instead. This is more natural and
logical as the umem has nothing to do with the ucontext.
Link: https://lore.kernel.org/r/20190806231548.25242-8-jgg@ziepe.ca
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
This is a significant simplification, no extra list is kept per FD, and
the interval tree is now shared between all the ucontexts, reducing
overhead if there are multiple ucontexts active.
Link: https://lore.kernel.org/r/20190806231548.25242-7-jgg@ziepe.ca
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
Jason Gunthorpe says:
====================
This is a collection of general cleanups for ODP to clarify some of the
flows around umem creation and use of the interval tree.
====================
The branch is based on v5.3-rc5 due to dependencies
* odp_fixes:
RDMA/mlx5: Use odp instead of mr->umem in pagefault_mr
RDMA/mlx5: Use ib_umem_start instead of umem.address
RDMA/core: Make invalidate_range a device operation
RDMA/odp: Use kvcalloc for the dma_list and page_list
RDMA/odp: Check for overflow when computing the umem_odp end
RDMA/odp: Provide ib_umem_odp_release() to undo the allocs
RDMA/odp: Split creating a umem_odp from ib_umem_get
RDMA/odp: Make the three ways to create a umem_odp clear
RMDA/odp: Consolidate umem_odp initialization
RDMA/odp: Make it clearer when a umem is an implicit ODP umem
RDMA/odp: Iterate over the whole rbtree directly
RDMA/odp: Use the common interval tree library instead of generic
RDMA/mlx5: Fix MR npages calculation for IB_ACCESS_HUGETLB
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
These are the same thing since mr always comes from odp->private. It is
confusing to reference the same memory via two names.
Link: https://lore.kernel.org/r/20190819111710.18440-13-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
These are subtly different, the address is the original VA requested
during umem_get, while ib_umem_start() is the version that is rounded to
the proper page size, ie is the true start of the umem's dma map.
Link: https://lore.kernel.org/r/20190819111710.18440-12-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
The callback function 'invalidate_range' is implemented in a driver so the
place for it is in the ib_device_ops structure and not in ib_ucontext.
Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Guy Levi <guyle@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Link: https://lore.kernel.org/r/20190819111710.18440-11-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
There is no specific need for these to be in the valloc space, let the
system decide automatically how to do the allocation.
Link: https://lore.kernel.org/r/20190819111710.18440-10-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
Since the page size can be extended in the ODP case by IB_ACCESS_HUGETLB
the existing overflow checks done by ib_umem_get() are not
sufficient. Check for overflow again.
Further, remove the unchecked math from the inlines and just use the
precomputed value stored in the interval_tree_node.
Link: https://lore.kernel.org/r/20190819111710.18440-9-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
Now that there are allocator APIs that return the ib_umem_odp directly
it should be freed through a umem_odp free'er as well.
Link: https://lore.kernel.org/r/20190819111710.18440-8-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
This is the last creation API that is overloaded for both, there is very
little code sharing and a driver has to be specifically ready for a
umem_odp to be created to use the odp version.
Link: https://lore.kernel.org/r/20190819111710.18440-7-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
The three paths to build the umem_odps are kind of muddled, they are:
- As a normal ib_mr umem
- As a child in an implicit ODP umem tree
- As the root of an implicit ODP umem tree
Only the first two are actually umem's, the last is an abuse.
The implicit case can only be triggered by explicit driver request, it
should never be co-mingled with the normal case. While we are here, make
sensible function names and add some comments to make this clearer.
Link: https://lore.kernel.org/r/20190819111710.18440-6-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
This is done in two different places, consolidate all the post-allocation
initialization into a single function.
Link: https://lore.kernel.org/r/20190819111710.18440-5-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
Implicit ODP umems are special, they don't have any page lists, they don't
exist in the interval tree and they are never DMA mapped.
Instead of trying to guess this based on a zero length use an explicit
flag.
Further, do not allow non-implicit umems to be 0 size.
Link: https://lore.kernel.org/r/20190819111710.18440-4-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
Instead of intersecting a full interval, just iterate over every element
directly. This is faster and clearer.
Link: https://lore.kernel.org/r/20190819111710.18440-3-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
ODP is working with userspace VA's in the interval tree which always fit
into an unsigned long, so we can use the common code.
This comes at a cost of a 16 byte increase in ib_umem_odp struct size due
to storing the interval tree start/last in addition to the umem
addr/length. However these values were computed and are performance
critical for the interval lookup, so this seems like a worthwhile trade
off.
Removes 2k of .text from the kernel.
Link: https://lore.kernel.org/r/20190819111710.18440-2-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
In fault_opcodes_write(), 'data' is allocated through kcalloc(). However,
it is not deallocated in the following execution if an error occurs,
leading to memory leaks. To fix this issue, introduce the 'free_data' label
to free 'data' before returning the error.
Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Link: https://lore.kernel.org/r/1566154486-3713-1-git-send-email-wenwen@cs.uga.edu
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
In fault_opcodes_read(), 'data' is not deallocated if debugfs_file_get()
fails, leading to a memory leak. To fix this bug, introduce the 'free_data'
label to free 'data' before returning the error.
Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Link: https://lore.kernel.org/r/1566156571-4335-1-git-send-email-wenwen@cs.uga.edu
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
In mlx4_ib_alloc_pv_bufs(), 'tun_qp->tx_ring' is allocated through
kcalloc(). However, it is not always deallocated in the following execution
if an error occurs, leading to memory leaks. To fix this issue, free
'tun_qp->tx_ring' whenever an error occurs.
Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Link: https://lore.kernel.org/r/1566159781-4642-1-git-send-email-wenwen@cs.uga.edu
Signed-off-by: Doug Ledford <dledford@redhat.com>
|