summaryrefslogtreecommitdiff
path: root/include/rdma/ib_verbs.h
AgeCommit message (Collapse)AuthorFilesLines
2019-02-09RDMA/devices: Use xarray to store the client_dataJason Gunthorpe1-7/+16
Now that we have a small ID for each client we can use xarray instead of linearly searching linked lists for client data. This will give much faster and scalable client data lookup, and will lets us revise the locking scheme. Since xarray can store 'going_down' using a mark just entirely eliminate the struct ib_client_data and directly store the client_data value in the xarray. However this does require a special iterator as we must still iterate over any NULL client_data values. Also eliminate the client_data_lock in favour of internal xarray locking. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-09RDMA/devices: Use xarray to store the clientsJason Gunthorpe1-1/+2
This gives each client a unique ID and will let us move client_data to use xarray, and revise the locking scheme. clients have to be add/removed in strict FIFO/LIFO order as they interdepend. To support this the client_ids are assigned to increase in FIFO order. The existing linked list is kept to support reverse iteration until xarray can get a reverse iteration API. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com>
2019-02-09RDMA/device: Get rid of reg_stateJason Gunthorpe1-6/+0
This really has no purpose anymore, refcount can be used to tell if the device is still registered. Keeping it around just invites mis-use. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com>
2019-02-09RDMA: Handle PD allocations by IB/coreLeon Romanovsky1-4/+5
The PD allocations in IB/core allows us to simplify drivers and their error flows in their .alloc_pd() paths. The changes in .alloc_pd() go hand in had with relevant update in .dealloc_pd(). We will use this opportunity and convert .dealloc_pd() to don't fail, as it was suggested a long time ago, failures are not happening as we have never seen a WARN_ON print. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-09RDMA/core: Share driver structure size with coreLeon Romanovsky1-0/+13
Add new macros to be used in drivers while registering ops structure and IB/core while calling allocation routines, so drivers won't need to perform kzalloc/kfree in their paths. The change in allocation stage allows us to initialize common fields prior to calling to drivers (e.g. restrack). Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-05Merge tag 'v5.0-rc5' into rdma.git for-nextJason Gunthorpe1-2/+22
Linux 5.0-rc5 Needed to merge the include/uapi changes so we have an up to date single-tree for these files. Patches already posted are also expected to need this for dependencies.
2019-02-05IB/core: Remove ib_sg_dma_address() and ib_sg_dma_len()Bart Van Assche1-27/+0
Keeping single line wrapper functions is not useful. Hence remove the ib_sg_dma_address() and ib_sg_dma_len() functions. This patch does not change any functionality. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-05IB/uverbs: Expose XRC ODP device capabilitiesMoni Shoua1-0/+1
Expose XRC ODP capabilities as part of the extended device capabilities. Signed-off-by: Moni Shoua <monis@mellanox.com> Reviewed-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-05IB/core: Allocate a bit for SRQ ODP supportMoni Shoua1-0/+1
The ODP support matrix is per operation and per transport. The support for each transport (RC, UD, etc.) is described with a bit field. ODP for SRQ WQEs is considered a different kind of support from ODP for RQ WQs and therefore needs a different capability bit. Signed-off-by: Moni Shoua <monis@mellanox.com> Reviewed-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-31RDMA/core: Use the ops infrastructure to keep all callbacks in one placeLeon Romanovsky1-0/+5
As preparation to hide rdma_restrack_root, refactor the code to use the ops structure instead of a special callback which is hidden in rdma_restrack_root. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-31RDMA: Add indication for in kernel API support to IB deviceGal Pressman1-0/+5
Drivers that do not provide kernel verbs support should not be used by ib kernel clients at all. In case a device does not implement all mandatory verbs for kverbs usage mark it as a non kverbs provider and prevent its usage for all clients except for uverbs. The device is marked as a non kverbs provider using the 'kverbs_provider' flag which should only be set by the core code. The clients can choose whether kverbs are requested for its usage using the 'no_kverbs_req' flag which is currently set for uverbs only. This patch allows drivers to remove mandatory verbs stubs and simply set the callbacks to NULL. The IB device will be registered as a non-kverbs provider. Note that verbs that are required for the device registration process must be implemented. Signed-off-by: Gal Pressman <galpress@amazon.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-31RDMA: Provide safe ib_alloc_device() functionLeon Romanovsky1-1/+7
All callers to ib_alloc_device() provide a larger size than struct ib_device and rely on the fact that struct ib_device is embedded in their driver specific structure as the first member. Provide a safer variant of ib_alloc_device() that checks and enforces this approach to make sure the drivers are using it right. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-31RDMA/srp: Increase max_segment_sizeBart Van Assche1-0/+13
The default behavior of the SCSI core is to set the block layer request queue parameter max_segment_size to 64 KB. That means that elements of scatterlists are limited to 64 KB. Since RDMA adapters support larger sizes, increase max_segment_size for the SRP initiator. Notes: - The SCSI max_segment_size parameter was introduced in kernel v5.0. See also commit 50c2e9107f17 ("scsi: introduce a max_segment_size host_template parameters"). - Some other block drivers already set max_segment_size to UINT_MAX, e.g. nbd and rbd. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-22RDMA/device: Expose ib_device_try_get(()Jason Gunthorpe1-2/+22
It turns out future patches need this capability quite widely now, not just for netlink, so provide two global functions to manage the registration lock refcount. This also moves the point the lock becomes 1 to within ib_register_device() so that the semantics of the public API are very sane and clear. Calling ib_device_try_get() will fail on devices that are only allocated but not yet registered. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Parav Pandit <parav@mellanox.com>
2019-01-14RDMA: Introduce and use rdma_device_to_ibdev()Parav Pandit1-0/+23
Introduce and use rdma_device_to_ibdev() API for those drivers which are registering one sysfs group and also use in ib_core. In subsequent patch, device->provider_ibdev one-to-one mapping is no longer holds true during accessing sysfs entries. Therefore, introduce an API rdma_device_to_ibdev() that provides such information. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-14RDMA: Rename port_callback to init_portParav Pandit1-3/+7
Most provider routines are callback routines which ib core invokes. _callback suffix doesn't convey information about when such callback is invoked. Therefore, rename port_callback to init_port. Additionally, store the init_port function pointer in ib_device_ops, so that it can be accessed in subsequent patches when binding rdma device to net namespace. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-11IB/{core,hw}: Have ib_umem_get extract the ib_ucontext from ib_udataJason Gunthorpe1-0/+1
ib_umem_get() can only be called in a method callback, which always has a udata parameter. This allows ib_umem_get() to derive the ucontext pointer directly from the udata without requiring the drivers to find it in some way or another. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
2019-01-09RDMA: Clean structures from CONFIG_INFINIBAND_ON_DEMAND_PAGINGLeon Romanovsky1-2/+0
CONFIG_INFINIBAND_ON_DEMAND_PAGING is used in general structures to micro-optimize the memory footprint. Remove it, so it will allow us to simplify various ODP device flows. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-20RDMA: Mark if destroy address handle is in a sleepable contextGal Pressman1-2/+8
Introduce a 'flags' field to destroy address handle callback and add a flag that marks whether the callback is executed in an atomic context or not. This will allow drivers to wait for completion instead of polling for it when it is allowed. Signed-off-by: Gal Pressman <galpress@amazon.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-20RDMA: Mark if create address handle is in a sleepable contextGal Pressman1-2/+9
Introduce a 'flags' field to create address handle callback and add a flag that marks whether the callback is executed in an atomic context or not. This will allow drivers to wait for completion instead of polling for it when it is allowed. Signed-off-by: Gal Pressman <galpress@amazon.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-19IB/uverbs: Add support to advise_mrMoni Shoua1-0/+4
Add new ioctl method for the MR object - ADVISE_MR. This command can be used by users to give an advice or directions to the kernel about an address range that belongs to memory regions. A new ib_device callback, advise_mr(), is introduced here to suupport the new command. This command takes the following arguments: - pd: The protection domain to which all memory regions belong - advice: The type of the advice * IB_UVERBS_ADVISE_MR_ADVICE_PREFETCH - Pre-fetch a range of an on-demand paging MR * IB_UVERBS_ADVISE_MR_ADVICE_PREFETCH_WRITE - Pre-fetch a range of an on-demand paging MR with write intention - flags: The properties of the advice * IB_UVERBS_ADVISE_MR_FLAG_FLUSH - Operation must end before return to the caller - sg_list: The list of memory ranges - num_sge: The number of memory ranges in the list - attrs: More attributes to be parsed by the provider Signed-off-by: Moni Shoua <monis@mellanox.com> Reviewed-by: Guy Levi <guyle@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-12RDMA: Start use ib_device_opsKamal Heib1-290/+13
Make all the required change to start use the ib_device_ops structure. Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-12RDMA/core: Introduce ib_device_opsKamal Heib1-0/+242
This change introduces the ib_device_ops structure that defines all the InfiniBand device operations in one place, so the code will be more readable and clean, unlike today when the ops are mixed with ib_device data members. The providers will need to define the supported operations and assign them using ib_set_device_ops(), that will also make the providers code more readable and clean. Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-11IB/core: Add new IB ratesMichael Guralnik1-1/+5
Add the new rates that were added to Infiniband spec as part of HDR and 2x support. Signed-off-by: Michael Guralnik <michaelgur@mellanox.com> Reviewed-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-11IB/core: Add 2X port widthMichael Guralnik1-0/+2
Add the new 2X port width that is part of IB spec 1.3 Signed-off-by: Michael Guralnik <michaelgur@mellanox.com> Reviewed-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-11IB/core: Add CapabilityMask2 to port attributesMichael Guralnik1-0/+1
CapabilityMask2 was added in IB Spec 1.3 under PortInfo attribute. The new Capapbility mask is needed in order to expose the new 2X width and HDR speed. Signed-off-by: Michael Guralnik <michaelgur@mellanox.com> Reviewed-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-03RDMA/restrack: Track ucontextLeon Romanovsky1-0/+4
Add ability to track allocated ib_ucontext, which are limited resource and worth to be visible by users. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-11-27RDMA/uverbs: Do not pass ib_uverbs_file to ioctl methodsJason Gunthorpe1-2/+2
The uverbs_attr_bundle already contains this pointer, and most methods don't actually need it. Get rid of the redundant function argument. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-11-27RDMA/uverbs: Replace ib_uverbs_file with uverbs_attr_bundle for writeJason Gunthorpe1-1/+1
Now that we can add meta-data to the description of write() methods we need to pass the uverbs_attr_bundle into all write based handlers so future patches can use it as a container for any new data transferred out of the core. This is the first step to bringing the write() and ioctl() methods to a common interface signature. This is a simple search/replace, and we push the attr down into the uobj and other APIs to keep changes minimal. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-11-22RDMA/core: Sync unregistration with netlink commandsParav Pandit1-1/+7
When the rdma device is getting removed, get resource info can race with device removal, as below: CPU-0 CPU-1 -------- -------- rdma_nl_rcv_msg() nldev_res_get_cq_dumpit() mutex_lock(device_lock); get device reference mutex_unlock(device_lock); [..] ib_unregister_device() /* Valid reference to * device->dev exists. */ ib_dealloc_device() [..] provider->fill_res_entry(); Even though device object is not freed, fill_res_entry() can get called on device which doesn't have a driver anymore. Kernel core device reference count is not sufficient, as this only keeps the structure valid, and doesn't guarantee the driver is still loaded. Similar race can occur with device renaming and device removal, where device_rename() tries to rename a unregistered device. While this is fine for devices of a class which are not net namespace aware, but it is incorrect for net namespace aware class coming in subsequent series. If a class is net namespace aware, then the below [1] call trace is observed in above situation. Therefore, to avoid the race, keep a reference count and let device unregistration wait until all netlink users drop the reference. [1] Call trace: kernfs: ns required in 'infiniband' for 'mlx5_0' WARNING: CPU: 18 PID: 44270 at fs/kernfs/dir.c:842 kernfs_find_ns+0x104/0x120 libahci i2c_core mlxfw libata dca [last unloaded: devlink] RIP: 0010:kernfs_find_ns+0x104/0x120 Call Trace: kernfs_find_and_get_ns+0x2e/0x50 sysfs_rename_link_ns+0x40/0xb0 device_rename+0xb2/0xf0 ib_device_rename+0xb3/0x100 [ib_core] nldev_set_doit+0x165/0x190 [ib_core] rdma_nl_rcv_msg+0x249/0x250 [ib_core] ? netlink_deliver_tap+0x8f/0x3e0 rdma_nl_rcv+0xd6/0x120 [ib_core] netlink_unicast+0x17c/0x230 netlink_sendmsg+0x2f0/0x3e0 sock_sendmsg+0x30/0x40 __sys_sendto+0xdc/0x160 Fixes: da5c85078215 ("RDMA/nldev: add driver-specific resource tracking") Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-11-22RDMA/uverbs: Use a linear list to describe the compiled-in uapiJason Gunthorpe1-1/+1
The 'tree' data structure is very hard to build at compile time, and this makes it very limited. The new radix tree based compiler can handle a more complex input language that does not require the compiler to perfectly group everything into a neat tree structure. Instead use a simple list to describe to input, where the list elements can be of various different 'opcodes' instructing the radix compiler what to do. Start out with opcodes chaining to other definition lists and chaining to the existing 'tree' definition. Replace the very top level of the 'object tree' with this list type and get rid of struct uverbs_object_tree_def and DECLARE_UVERBS_OBJECT_TREE. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-11-22RDMA/core: Remove unused header files mm.h, socket.h, scatterlist.hParav Pandit1-3/+0
Structures of ib_verbs.h don't use fields/structures of mm.h, socket.h or scatterlist.h. So remove such header files inclusion. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-11-07IB/uverbs: fix a typoRami Rosen1-1/+1
This patch fixes a typo in include/rdma/ib_verbs.h. See: https://www.merriam-webster.com/dictionary/lieu Signed-off-by: Rami Rosen <ramirose@gmail.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-10-26Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds1-56/+93
Pull rdma updates from Jason Gunthorpe: "This has been a smaller cycle with many of the commits being smallish code fixes and improvements across the drivers. - Driver updates for bnxt_re, cxgb4, hfi1, hns, mlx5, nes, qedr, and rxe - Memory window support in hns - mlx5 user API 'flow mutate/steering' allows accessing the full packet mangling and matching machinery from user space - Support inter-working with verbs API calls in the 'devx' mlx5 user API, and provide options to use devx with less privilege - Modernize the use of syfs and the device interface to use attribute groups and cdev properly for uverbs, and clean up some of the core code's device list management - More progress on net namespaces for RDMA devices - Consolidate driver BAR mmapping support into core code helpers and rework how RDMA holds poitners to mm_struct for get_user_pages cases - First pass to use 'dev_name' instead of ib_device->name - Device renaming for RDMA devices" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (242 commits) IB/mlx5: Add support for extended atomic operations RDMA/core: Fix comment for hw stats init for port == 0 RDMA/core: Refactor ib_register_device() function RDMA/core: Fix unwinding flow in case of error to register device ib_srp: Remove WARN_ON in srp_terminate_io() IB/mlx5: Allow scatter to CQE without global signaled WRs IB/mlx5: Verify that driver supports user flags IB/mlx5: Support scatter to CQE for DC transport type RDMA/drivers: Use core provided API for registering device attributes RDMA/core: Allow existing drivers to set one sysfs group per device IB/rxe: Remove unnecessary enum values RDMA/umad: Use kernel API to allocate umad indexes RDMA/uverbs: Use kernel API to allocate uverbs indexes RDMA/core: Increase total number of RDMA ports across all devices IB/mlx4: Add port and TID to MAD debug print IB/mlx4: Enable debug print of SMPs RDMA/core: Rename ports_parent to ports_kobj RDMA/core: Do not expose unsupported counters IB/mlx4: Refer to the device kobject instead of ports_parent RDMA/nldev: Allow IB device rename through RDMA netlink ...
2018-10-17RDMA/core: Allow existing drivers to set one sysfs group per deviceParav Pandit1-2/+28
Currently many rdma drivers are creating device attribute files using device_create_file() with device specific attributes. Device specific attributes should be exposed via well defined netlink device attributes in future. Introduce an API rdma_set_device_sysfs_group() for existing drivers to set a group for sysfs attributes for legacy. This API is only for exposing legacy attributes which existed for sometime now. New drivers should not be using this API and rather follow netlink path. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-10-16RDMA/core: Rename ports_parent to ports_kobjParav Pandit1-1/+1
Normally kobj objects have kobj suffix to reflect it. Rename ports_parent to ports_kobj. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-10-11RDMA/netdev: Fix netlink support in IPoIBDenis Drozdov1-0/+7
IPoIB netlink support was broken by the below commit since integrating the rdma_netdev support relies on an allocation flow for netdevs that was controlled by the ipoib driver while netdev's rtnl_newlink implementation assumes that the netdev will be allocated by netlink. Such situation leads to crash in __ipoib_device_add, once trying to reuse netlink device. This patch fixes the kernel oops for both mlx4 and mlx5 devices triggered by the following command: Fixes: cd565b4b51e5 ("IB/IPoIB: Support acceleration options callbacks") Signed-off-by: Denis Drozdov <denisd@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Feras Daoud <ferasda@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-10-11RDMA/netdev: Hoist alloc_netdev_mqs out of the driverDenis Drozdov1-2/+21
netdev has several interfaces that expect to call alloc_netdev_mqs from the core code, with the driver only providing the arguments. This is incompatible with the rdma_netdev interface that returns the netdev directly. Thus re-organize the API used by ipoib so that the verbs core code calls alloc_netdev_mqs for the driver. This is done by allowing the drivers to provide the allocation parameters via a 'get_params' callback and then initializing an allocated netdev as a second step. Fixes: cd565b4b51e5 ("IB/IPoIB: Support acceleration options callbacks") Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Denis Drozdov <denisd@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-10-04IB/mlx4: Avoid implicit enumerated type conversionNathan Chancellor1-1/+1
Clang warns when one enumerated type is implicitly converted to another. drivers/infiniband/hw/mlx4/mad.c:1811:41: warning: implicit conversion from enumeration type 'enum mlx4_ib_qp_flags' to different enumeration type 'enum ib_qp_create_flags' [-Wenum-conversion] qp_init_attr.init_attr.create_flags = MLX4_IB_SRIOV_TUNNEL_QP; ~ ^~~~~~~~~~~~~~~~~~~~~~~ drivers/infiniband/hw/mlx4/mad.c:1819:41: warning: implicit conversion from enumeration type 'enum mlx4_ib_qp_flags' to different enumeration type 'enum ib_qp_create_flags' [-Wenum-conversion] qp_init_attr.init_attr.create_flags = MLX4_IB_SRIOV_SQP; ~ ^~~~~~~~~~~~~~~~~ The type mlx4_ib_qp_flags explicitly provides supplemental values to the type ib_qp_create_flags. Make that clear to Clang by changing the create_flags type to u32. Reported-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-10-04RDMA: Remove unused parameter from ib_modify_qp_is_ok()Kamal Heib1-3/+1
The ll parameter is not used in ib_modify_qp_is_ok(), so remove it. Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-26RDMA: Fully setup the device name in ib_register_deviceJason Gunthorpe1-3/+3
The current code has two copies of the device name, ibdev->dev and dev_name(&ibdev->dev), and they are setup at different times, which is very confusing. Set them both up at the same time and make dev_name() the lead name, which is the proper use of the driver core APIs. To make it very clear that the name is not valid until registration pass it in to the ib_register_device() call rather than messing with ibdev->name directly. Also the reorganization now checks that dev_name is unique even if it does not contain a %. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Acked-by: Adit Ranadive <aditr@vmware.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Acked-by: Devesh Sharma <devesh.sharma@broadcom.com> Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
2018-09-21RDMA/uverbs: Get rid of ucontext->tgidJason Gunthorpe1-1/+0
Nothing uses this now, just delete it. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-09-21RDMA/umem: Use umem->owning_mm inside ODPJason Gunthorpe1-20/+2
Since ODP had a single struct mmu_notifier located in the ucontext it could only handle a single MM at a time, and this prevented it from using the new owning_mm system. With the prior rework it is now simple to let ODP track multiple MMs per ucontext, finish the job so that the per_mm is allocated on a mm by mm basis, and freed when the last umem is dropped from the ucontext. As a side effect the new saner locking removes the lockdep splat about nesting the umem_rwsem between mmu_notifier_unregister and ib_umem_odp_release. It also makes ODP work with multiple processes, across, fork, etc. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-09-21RDMA/umem: Move all the ODP related stuff out of ucontext and into per_mmJason Gunthorpe1-12/+20
This is the first step to make ODP use the owning_mm that is now part of struct ib_umem. Each ODP umem is linked to a single per_mm structure, which in turn, is linked to a single mm, via the embedded mmu_notifier. This first patch introduces the structure and reworks eveything to use it. This also needs to introduce tgid into the ib_ucontext_per_mm, as get_user_pages_remote() requires the originating task for statistics tracking. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-09-21RDMA/umem: Use ib_umem_odp in all function signatures connected to ODPJason Gunthorpe1-1/+3
All of these functions already require the ODP version of the umem struct, make this very clear by having the signature require it. This paves the way to using the container_of() pattern to link umem_odp and umem together. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-09-20RDMA/ucontext: Add a core API for mmaping driver IO memoryJason Gunthorpe1-0/+22
To support disassociation and PCI hot unplug, we have to track all the VMAs that refer to the device IO memory. When disassociation occurs the VMAs have to be revised to point to the zero page, not the IO memory, to allow the physical HW to be unplugged. The three drivers supporting this implemented three different versions of this algorithm, all leaving something to be desired. This new common implementation has a few differences from the driver versions: - Track all VMAs, including splitting/truncating/etc. Tie the lifetime of the private data allocation to the lifetime of the vma. This avoids any tricks with setting vm_ops which Linus didn't like. (see link) - Support multiple mms, and support properly tracking mmaps triggered by processes other than the one first opening the uverbs fd. This makes fork behavior of disassociation enabled drivers the same as fork support in normal drivers. - Don't use crazy get_task stuff. - Simplify the approach for to racing between vm_ops close and disassociation, fixing the related bugs most of the driver implementations had. Since we are in core code the tracking list can be placed in struct ib_uverbs_ufile, which has a lifetime strictly longer than any VMAs created by mmap on the uverbs FD. Link: https://www.spinics.net/lists/stable/msg248747.html Link: https://lkml.kernel.org/r/CA+55aFxJTV_g46AQPoPXen-UPiqR1HGMZictt7VpC-SMFbm3Cw@mail.gmail.com Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-09-18IB/rxe: Revise the ib_wr_opcode enumJason Gunthorpe1-14/+20
This enum has become part of the uABI, as both RXE and the ib_uverbs_post_send() command expect userspace to supply values from this enum. So it should be properly placed in include/uapi/rdma. In userspace this enum is called 'enum ibv_wr_opcode' as part of libibverbs.h. That enum defines different values for IB_WR_LOCAL_INV, IB_WR_SEND_WITH_INV, and IB_WR_LSO. These were introduced (incorrectly, it turns out) into libiberbs in 2015. The kernel has changed its mind on the numbering for several of the IB_WC values over the years, but has remained stable on IB_WR_LOCAL_INV and below. Based on this we can conclude that there is no real user space user of the values beyond IB_WR_ATOMIC_FETCH_AND_ADD, as they have never worked via rdma-core. This is confirmed by inspection, only rxe uses the kernel enum and implements the latter operations. rxe has clearly never worked with these attributes from userspace. Other drivers that support these opcodes implement the functionality without calling out to the kernel. To make IB_WR_SEND_WITH_INV and related work for RXE in userspace we choose to renumber the IB_WR enum in the kernel to match the uABI that userspace has bee using since before Soft RoCE was merged. This is an overall simpler configuration for the whole software stack, and obviously can't break anything existing. Reported-by: Seth Howell <seth.howell@intel.com> Tested-by: Seth Howell <seth.howell@intel.com> Fixes: 8700e3e7c485 ("Soft RoCE driver") Cc: <stable@vger.kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-11RDMA/uverbs: Move flow resources initializationMark Bloch1-14/+0
Use ib_set_flow() when initializing flow related resources. Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-11RDMA/core: Document QP @event_handler functionChuck Lever1-0/+2
Add helpful warning for RDMA consumer implementers. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-06RDMA/core: Define client_data_lock as rwlock instead of spinlockParav Pandit1-2/+3
Even though device registration/unregistration and client registration/unregistration is not a performance path, define the client_data_lock as rwlock for code clarity. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>