Age | Commit message (Collapse) | Author | Files | Lines |
|
Pull block updates from Jens Axboe:
- Series of merge handling cleanups (Baolin, Christoph)
- Series of blk-throttle fixes and cleanups (Baolin)
- Series cleaning up BDI, seperating the block device from the
backing_dev_info (Christoph)
- Removal of bdget() as a generic API (Christoph)
- Removal of blkdev_get() as a generic API (Christoph)
- Cleanup of is-partition checks (Christoph)
- Series reworking disk revalidation (Christoph)
- Series cleaning up bio flags (Christoph)
- bio crypt fixes (Eric)
- IO stats inflight tweak (Gabriel)
- blk-mq tags fixes (Hannes)
- Buffer invalidation fixes (Jan)
- Allow soft limits for zone append (Johannes)
- Shared tag set improvements (John, Kashyap)
- Allow IOPRIO_CLASS_RT for CAP_SYS_NICE (Khazhismel)
- DM no-wait support (Mike, Konstantin)
- Request allocation improvements (Ming)
- Allow md/dm/bcache to use IO stat helpers (Song)
- Series improving blk-iocost (Tejun)
- Various cleanups (Geert, Damien, Danny, Julia, Tetsuo, Tian, Wang,
Xianting, Yang, Yufen, yangerkun)
* tag 'block-5.10-2020-10-12' of git://git.kernel.dk/linux-block: (191 commits)
block: fix uapi blkzoned.h comments
blk-mq: move cancel of hctx->run_work to the front of blk_exit_queue
blk-mq: get rid of the dead flush handle code path
block: get rid of unnecessary local variable
block: fix comment and add lockdep assert
blk-mq: use helper function to test hw stopped
block: use helper function to test queue register
block: remove redundant mq check
block: invoke blk_mq_exit_sched no matter whether have .exit_sched
percpu_ref: don't refer to ref->data if it isn't allocated
block: ratelimit handle_bad_sector() message
blk-throttle: Re-use the throtl_set_slice_end()
blk-throttle: Open code __throtl_de/enqueue_tg()
blk-throttle: Move service tree validation out of the throtl_rb_first()
blk-throttle: Move the list operation after list validation
blk-throttle: Fix IO hang for a corner case
blk-throttle: Avoid tracking latency if low limit is invalid
blk-throttle: Avoid getting the current time if tg->last_finish_time is 0
blk-throttle: Remove a meaningless parameter for throtl_downgrade_state()
block: Remove redundant 'return' statement
...
|
|
Was missed during the initial review of the below patch
Fixes: 227a0e142e37 ("IB/mlx4: Add support for REJ due to timeout")
Link: https://lore.kernel.org/r/1602253482-6718-1-git-send-email-haakon.bugge@oracle.com
Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Fix a bug in rxe_rcv() that causes all multicast packets to be
dropped. Currently rxe_match_dgid() is called for each packet to verify
that the destination IP address matches one of the entries in the port
source GID table. This is incorrect for IP multicast addresses since they
do not appear in the GID table.
Add code to detect multicast addresses.
Change function name to rxe_chk_dgid() which is clearer.
Link: https://lore.kernel.org/r/20201008212753.265249-1-rpearson@hpe.com
Signed-off-by: Bob Pearson <rpearson@hpe.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
The changes referenced below replaced sbk_clone)_ by taking additional
references, passing the skb along and then freeing the skb. This
deleted the packets before they could be processed and additionally
passed bad data in each packet. Since pkt is stored in skb->cb
changing pkt->qp changed it for all the packets.
Replace skb_get() by sbk_clone() in rxe_rcv_mcast_pkt() for cases where
multiple QPs are receiving multicast packets on the same address.
Delete kfree_skb() because the packets need to live until they have been
processed by each QP. They are freed later.
Fixes: 86af61764151 ("IB/rxe: remove unnecessary skb_clone")
Fixes: fe896ceb5772 ("IB/rxe: replace refcount_inc with skb_get")
Link: https://lore.kernel.org/r/20201008203651.256958-1-rpearson@hpe.com
Signed-off-by: Bob Pearson <rpearson@hpe.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Struct rxe_mem had pd, lkey and rkey values both in itself and in the
struct ib_mr which is also included in rxe_mem.
Delete these entries and replace references with the ones in ibmr.Add
mr_pd, mr_lkey and mr_rkey macros which extract these values from mr.
Link: https://lore.kernel.org/r/20201008212818.265303-1-rpearson@hpe.com
Signed-off-by: Bob Pearson <rpearson@hpe.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
An incorrect sizeof is being used, struct rvt_ibport ** is not correct, it
should be struct rvt_ibport *. Note that since ** is the same size as
* this is not causing any issues. Improve this fix by using
sizeof(*rdi->ports) as this allows us to not even reference the type
of the pointer. Also remove line breaks as the entire statement can
fit on one line.
Link: https://lore.kernel.org/r/20201008095204.82683-1-colin.king@canonical.com
Addresses-Coverity: ("Sizeof not portable (SIZEOF_MISMATCH)")
Fixes: ff6acd69518e ("IB/rdmavt: Add device structure allocation")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Acked-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
An incorrect sizeof is being used, u64 * is not correct, it should be just
u64 for a table of umem_pgs number of u64 items in the pbl_tbl. Use the
idiom sizeof(*pbl_tbl) to get the object type without the need to
explicitly use u64.
Link: https://lore.kernel.org/r/20201006114700.537916-1-colin.king@canonical.com
Addresses-Coverity: ("Sizeof not portable (SIZEOF_MISMATCH)")
Fixes: 1ac5a4047975 ("RDMA/bnxt_re: Add bnxt_re RoCE driver")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
This driver is taking the SGL out of the umem and passing it through a
struct bnxt_qplib_sg_info. Instead of passing the SGL pass the umem and
then use rdma_umem_for_each_dma_block() directly.
Move the calls of ib_umem_num_dma_blocks() closer to their actual point of
use, npages is only set for non-umem pbl flows.
Link: https://lore.kernel.org/r/0-v1-b37437a73f35+49c-bnxt_re_dma_block_jgg@nvidia.com
Acked-by: Selvin Xavier <selvin.xavier@broadcom.com>
Tested-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
'struct percpu_ref' is often embedded into one user structure, and the
instance is usually referenced in fast path, however actually only
'percpu_count_ptr' is needed in fast path.
So move other fields into one new structure of 'percpu_ref_data', and
allocate it dynamically via kzalloc(), then memory footprint of
'percpu_ref' in fast path is reduced a lot and becomes suitable to put
into hot cacheline of user structure.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Tested-by: Veronika Kabatova <vkabatov@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Tejun Heo <tj@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Remove the implementation of ib_umem_add_sg_table and instead
call to __sg_alloc_table_from_pages which already has the logic to
merge contiguous pages.
Besides that it removes duplicated functionality, it reduces the
memory consumption of the SG table significantly. Prior to this
patch, the SG table was allocated in advance regardless consideration
of contiguous pages.
In huge pages system of 2MB page size, without this change, the SG table
would contain x512 SG entries.
E.g. for 100GB memory registration:
Number of entries Size
Before 26214400 600.0MB
After 51200 1.2MB
Link: https://lore.kernel.org/r/20201004154340.1080481-5-leon@kernel.org
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Pull networking fixes from David Miller:
1) Make sure SKB control block is in the proper state during IPSEC
ESP-in-TCP encapsulation. From Sabrina Dubroca.
2) Various kinds of attributes were not being cloned properly when we
build new xfrm_state objects from existing ones. Fix from Antony
Antony.
3) Make sure to keep BTF sections, from Tony Ambardar.
4) TX DMA channels need proper locking in lantiq driver, from Hauke
Mehrtens.
5) Honour route MTU during forwarding, always. From Maciej
Żenczykowski.
6) Fix races in kTLS which can result in crashes, from Rohit
Maheshwari.
7) Skip TCP DSACKs with rediculous sequence ranges, from Priyaranjan
Jha.
8) Use correct address family in xfrm state lookups, from Herbert Xu.
9) A bridge FDB flush should not clear out user managed fdb entries
with the ext_learn flag set, from Nikolay Aleksandrov.
10) Fix nested locking of netdev address lists, from Taehee Yoo.
11) Fix handling of 32-bit DATA_FIN values in mptcp, from Mat Martineau.
12) Fix r8169 data corruptions on RTL8402 chips, from Heiner Kallweit.
13) Don't free command entries in mlx5 while comp handler could still be
running, from Eran Ben Elisha.
14) Error flow of request_irq() in mlx5 is busted, due to an off by one
we try to free and IRQ never allocated. From Maor Gottlieb.
15) Fix leak when dumping netlink policies, from Johannes Berg.
16) Sendpage cannot be performed when a page is a slab page, or the page
count is < 1. Some subsystems such as nvme were doing so. Create a
"sendpage_ok()" helper and use it as needed, from Coly Li.
17) Don't leak request socket when using syncookes with mptcp, from
Paolo Abeni.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (111 commits)
net/core: check length before updating Ethertype in skb_mpls_{push,pop}
net: mvneta: fix double free of txq->buf
net_sched: check error pointer in tcf_dump_walker()
net: team: fix memory leak in __team_options_register
net: typhoon: Fix a typo Typoon --> Typhoon
net: hinic: fix DEVLINK build errors
net: stmmac: Modify configuration method of EEE timers
tcp: fix syn cookied MPTCP request socket leak
libceph: use sendpage_ok() in ceph_tcp_sendpage()
scsi: libiscsi: use sendpage_ok() in iscsi_tcp_segment_map()
drbd: code cleanup by using sendpage_ok() to check page for kernel_sendpage()
tcp: use sendpage_ok() to detect misused .sendpage
nvme-tcp: check page by sendpage_ok() before calling kernel_sendpage()
net: add WARN_ONCE in kernel_sendpage() for improper zero-copy send
net: introduce helper sendpage_ok() in include/linux/net.h
net: usb: pegasus: Proper error handing when setting pegasus' MAC address
net: core: document two new elements of struct net_device
netlink: fix policy dump leak
net/mlx5e: Fix race condition on nhe->n pointer in neigh update
net/mlx5e: Fix VLAN create flow
...
|
|
Report the "ipoib pkey", "mode" and "umcast" netlink attributes for every
IPoiB interface type, not just children created with 'ip link add'.
After setting the rtnl_link_ops for the parent interface, implement the
dellink() callback to block users from trying to remove it.
Fixes: 862096a8bbf8 ("IB/ipoib: Add more rtnl_link_ops callbacks")
Link: https://lore.kernel.org/r/20201004132948.26669-1-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Expose the query GID table and entry API to user space by adding two new
methods and method handlers to the device object.
This API provides a faster way to query a GID table using single call and
will be used in libibverbs to improve current approach that requires
multiple calls to open, close and read multiple sysfs files for a single
GID table entry.
Link: https://lore.kernel.org/r/20200923165015.2491894-5-leon@kernel.org
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Introduce rdma_query_gid_table which enables querying all the GID tables
of a given device and copying the attributes of all valid GID entries to a
provided buffer.
This API provides a faster way to query a GID table using single call and
will be used in libibverbs to improve current approach that requires
multiple calls to open, close and read multiple sysfs files for a single
GID table entry.
Link: https://lore.kernel.org/r/20200923165015.2491894-4-leon@kernel.org
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Separate IB_GID_TYPE_IB and IB_GID_TYPE_ROCE to two different values, so
enum ib_gid_type will match the gid types of the new query GID table API
which will be introduced in the following patches.
This change in enum ib_gid_type requires to change also enum
rdma_network_type by separating RDMA_NETWORK_IB and RDMA_NETWORK_ROCE_V1
values.
Link: https://lore.kernel.org/r/20200923165015.2491894-3-leon@kernel.org
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Change the error code returned from rdma_get_gid_attr when the GID entry
is invalid but the GID index is in the gid table size range to -ENODATA
instead of -EINVAL.
This change is done in order to provide a more accurate error reporting to
be used by the new GID query API in user space. Nevertheless, -EINVAL is
still returned from sysfs in the aforementioned case to maintain
compatibility with user space that expects -EINVAL.
Link: https://lore.kernel.org/r/20200923165015.2491894-2-leon@kernel.org
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Making a change to fix following sparse warnings reported by kbuild bot.
CHECK drivers/infiniband/hw/qedr/verbs.c
drivers/infiniband/hw/qedr/verbs.c:3872:59: warning: incorrect type in assignment (different base types)
drivers/infiniband/hw/qedr/verbs.c:3872:59: expected restricted __le32 [usertype] sge_prod
drivers/infiniband/hw/qedr/verbs.c:3872:59: got unsigned int [usertype] sge_prod
drivers/infiniband/hw/qedr/verbs.c:3875:59: warning: incorrect type in assignment (different base types)
drivers/infiniband/hw/qedr/verbs.c:3875:59: expected restricted __le32 [usertype] wqe_prod
drivers/infiniband/hw/qedr/verbs.c:3875:59: got unsigned int [usertype] wqe_prod
Link: https://lore.kernel.org/r/20201001100959.19940-1-palok@marvell.com
Reported-by: kbuild test robot <lkp@intel.com>
Fixes: acca72e2b031 ("RDMA/qedr: SRQ's bug fixes")
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
Signed-off-by: Alok Prasad <palok@marvell.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
The only usage of these is to pass their address to sysfs_create_group()
and sysfs_remove_group(), both which takes const pointers. Make it const
to allow the compiler to put them in read-only memory.
Link: https://lore.kernel.org/r/20200930224004.24279-3-rikard.falkeborn@gmail.com
Signed-off-by: Rikard Falkeborn <rikard.falkeborn@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
The only usage of the pma_table field in the ib_port struct is to pass its
address to sysfs_create_group() and sysfs_remove_group(). Make it const to
make it possible to constify a couple of static struct
attribute_group. This allows the compiler to put them in read-only memory.
Link: https://lore.kernel.org/r/20200930224004.24279-2-rikard.falkeborn@gmail.com
Signed-off-by: Rikard Falkeborn <rikard.falkeborn@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Sync device with CPU pages upon ODP MR registration. mlx5 already has to
zero the HW's version of the PAS list, may as well deliver a PAS list that
matches the current CPU page tables configuration.
Link: https://lore.kernel.org/r/20200930163828.1336747-5-leon@kernel.org
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Extend advice MR to support non faulting mode, this can improve
performance by increasing the populated page tables in the device.
Link: https://lore.kernel.org/r/20200930163828.1336747-4-leon@kernel.org
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Enable ODP sync without faulting, this improves performance by reducing
the number of page faults in the system.
The gain from this option is that the device page table can be aligned
with the presented pages in the CPU page table without causing page
faults.
As of that, the overhead on data path from hardware point of view to
trigger a fault which end-up by calling the driver to bring the pages
will be dropped.
Link: https://lore.kernel.org/r/20200930163828.1336747-3-leon@kernel.org
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Move to use hmm_range_fault() instead of get_user_pags_remote() to improve
performance in a few aspects:
This includes:
- Dropping the need to allocate and free memory to hold its output
- No need any more to use put_page() to unpin the pages
- The logic to detect contiguous pages is done based on the returned
order, no need to run per page and evaluate.
In addition, moving to use hmm_range_fault() enables to reduce page faults
in the system with it's snapshot mode, this will be introduced in next
patches from this series.
As part of this, cleanup some flows and use the required data structures
to work with hmm_range_fault().
Link: https://lore.kernel.org/r/20200930163828.1336747-2-leon@kernel.org
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
This three thread race can result in the work being run once the callback
becomes NULL:
CPU1 CPU2 CPU3
netevent_callback()
process_one_req() rdma_addr_cancel()
[..]
spin_lock_bh()
set_timeout()
spin_unlock_bh()
spin_lock_bh()
list_del_init(&req->list);
spin_unlock_bh()
req->callback = NULL
spin_lock_bh()
if (!list_empty(&req->list))
// Skipped!
// cancel_delayed_work(&req->work);
spin_unlock_bh()
process_one_req() // again
req->callback() // BOOM
cancel_delayed_work_sync()
The solution is to always cancel the work once it is completed so any
in between set_timeout() does not result in it running again.
Cc: stable@vger.kernel.org
Fixes: 44e75052bc2a ("RDMA/rdma_cm: Make rdma_addr_cancel into a fence")
Link: https://lore.kernel.org/r/20200930072007.1009692-1-leon@kernel.org
Reported-by: Dan Aloni <dan@kernelim.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Nothing reads this any more, and the reason for its existence has passed
due to the deferred fput() scheme.
Fixes: 8ea1f989aa07 ("drivers/IB,usnic: reduce scope of mmap_sem")
Link: https://lore.kernel.org/r/0-v1-df64ff042436+42-uctx_closing_jgg@nvidia.com
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
list field is not used anywhere
Link: https://lore.kernel.org/r/20200930131407.6438-1-gi-oh.kim@clous.ionos.com
Signed-off-by: Gioh Kim <gi-oh.kim@cloud.ionos.com>
Acked-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Some code was removed but the variables were still there, and some
parameters have been changed to be queried from firmware. So the
definitions of them are no longer needed.
Fixes: 2a3d923f8730 ("RDMA/hns: Replace magic numbers with #defines")
Fixes: 82e620d9c3a0 ("RDMA/hns: Modify the data structure of hns_roce_av")
Fixes: 82547469782a ("IB/hns: Implement the add_gid/del_gid and optimize the GIDs management")
Fixes: 21b97f538765 ("RDMA/hns: Fixup qp release bug")
Link: https://lore.kernel.org/r/1601371934-40003-1-git-send-email-liweihang@huawei.com
Signed-off-by: Lang Cheng <chenglang@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
There is no real need to have an intermediate pointer for the same struct,
remove it, and use struct directly.
Link: https://lore.kernel.org/r/20200926102450.2966017-11-leon@kernel.org
Acked-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
As preparation for the removal of QP allocation logic, we need to ensure
that ib_core allocates the right amount of memory before a call to the
driver create_qp(). It requires from driver to have the same structs for
all types of QPs.
Link: https://lore.kernel.org/r/20200926102450.2966017-10-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
GSI QP can't be created from the user space, hence the udata check is
always false (udata == NULL). Remove that check and simplify the flow.
Link: https://lore.kernel.org/r/20200926102450.2966017-9-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
The ioctl flow checks that the user provides only a supported list of QP
types, while write flow didn't do it and relied on the driver to check
it. Align those flows to fail as early as possible.
Link: https://lore.kernel.org/r/20200926102450.2966017-8-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Since all mlx4 QP have same storage type, move the QP allocation to be in
one place. This change is preparation to removal of such allocation from
the driver.
Link: https://lore.kernel.org/r/20200926102450.2966017-7-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Refactor the storage struct of mlx4 GSI QP to be embedded in mlx4_ib
QP. This allows to remove internal memory allocation of QP struct which is
hidden inside the mlx4_ib_create_qp() flow.
Link: https://lore.kernel.org/r/20200926102450.2966017-6-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
GSI QP doesn't need signal QP type because it is initialized statically to
zero, which is IB_SIGNAL_ALL_WR also wr->send_flags isn't set too. This
means that the GSI QP signal QP type can be removed.
Link: https://lore.kernel.org/r/20200926102450.2966017-5-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
There is no reason to have separate create flow for the GSI QP, while
general create_qp routine has all needed checks and ability to allocate
and free the proper struct mlx5_ib_qp.
Link: https://lore.kernel.org/r/20200926102450.2966017-4-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Remove duplication of mlx5_ib_qp and mlx5_ib_gsi_qp fields. This change
returns the memory footprint of mlx5_ib QP to be as it was before
embedding GSI QP.
Link: https://lore.kernel.org/r/20200926102450.2966017-3-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
The GSI QPs have different create flow from the regular QPs, but it is not
really needed. Update the code to use mlx5_ib_qp as a storage class for
all outside of GSI calls.
Link: https://lore.kernel.org/r/20200926102450.2966017-2-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
infrastructure
Functions related to nested interface infrastructure such as
netdev_walk_all_{ upper | lower }_dev() pass both private functions
and "data" pointer to handle their own things.
At this point, the data pointer type is void *.
In order to make it easier to expand common variables and functions,
this new netdev_nested_priv structure is added.
In the following patch, a new member variable will be added into this
struct to fix the lockdep issue.
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
sizeof() when applied to a pointer typed expression should give the size
of the pointed data, even if the data is a pointer.
Fixes: e1f24a79f424 ("IB/mlx5: Support congestion related counters")
Link: https://lore.kernel.org/r/20200917081354.2083293-1-liushixin2@huawei.com
Signed-off-by: Liu Shixin <liushixin2@huawei.com>
Acked-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
HIP08 supports RC inline up to size of 32 Bytes, and all data should be
put into SQWQE. For HIP09, this capability is extended to 1024 Bytes, if
length of data is longer than 32 Bytes, they will be filled into extended
sge space.
Link: https://lore.kernel.org/r/1599744069-9968-1-git-send-email-liweihang@huawei.com
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
The sq_sig_type field should be filled when querying QP, or the users may
get a wrong value.
Fixes: 926a01dc000d ("RDMA/hns: Add QP operations support for hip08 SoC")
Link: https://lore.kernel.org/r/1600509802-44382-9-git-send-email-liweihang@huawei.com
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
The hardware will add AckReq flag in BTH header according to the value of
ack_req_freq to request ACK from responder for the packets with this flag.
It should be greater than or equal to lp_pktn_ini instead of using a fixed
value.
Fixes: 7b9bd73ed13d ("RDMA/hns: Fix wrong assignment of lp_pktn_ini in QPC")
Link: https://lore.kernel.org/r/1600509802-44382-8-git-send-email-liweihang@huawei.com
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
The rnr_retry returned to the user is not correct, it should be got from
another fields in QPC.
Fixes: bfe860351e31 ("RDMA/hns: Fix cast from or to restricted __le32 for driver")
Link: https://lore.kernel.org/r/1600509802-44382-7-git-send-email-liweihang@huawei.com
Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
calc_pg_sz() may gets a data calculation overflow if the PAGE_SIZE is 64 KB
and hop_num is 2. It is because that all variables involved in calculation
are defined in type of int. So change the type of bt_chunk_size,
buf_chunk_size and obj_per_chunk_default to u64.
Fixes: ba6bb7e97421 ("RDMA/hns: Add interfaces to get pf capabilities from firmware")
Link: https://lore.kernel.org/r/1600509802-44382-6-git-send-email-liweihang@huawei.com
Signed-off-by: Jiaran Zhang <zhangjiaran@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
According to the RoCE v1 specification, the sl (service level) 0-7 are
mapped directly to priorities 0-7 respectively, sl 8-15 are reserved. The
driver should verify whether the the value of sl is larger than 7, if so,
an exception should be returned.
Fixes: 926a01dc000d ("RDMA/hns: Add QP operations support for hip08 SoC")
Link: https://lore.kernel.org/r/1600509802-44382-5-git-send-email-liweihang@huawei.com
Signed-off-by: Jiaran Zhang <zhangjiaran@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Change "initialze" to "initialize".
Fixes: 8f3e9f3ea087 ("IB/hns: Add code for refreshing CQ CI using TPTR")
Link: https://lore.kernel.org/r/1600509802-44382-4-git-send-email-liweihang@huawei.com
Signed-off-by: Lang Cheng <chenglang@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
HIP08 doesn't support modifying the maximum number of outstanding WR in an
SRQ. However, the driver does not return a failure message, and users may
mistakenly think that the resizing is executed successfully. So the driver
needs to intercept this operation.
Fixes: ffb1308b88b6 ("RDMA/hns: Move SRQ code to the reasonable place")
Link: https://lore.kernel.org/r/1600509802-44382-3-git-send-email-liweihang@huawei.com
Signed-off-by: Yangyang Li <liyangyang20@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
According to the IB specifications, the verbs should return an immediate
error when the users set an unsupported opcode. Furthermore, refactor codes
about opcode in process of post_send to make the difference between opcodes
clearer.
Link: https://lore.kernel.org/r/1600509802-44382-2-git-send-email-liweihang@huawei.com
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
For HIP09, size of SCCC (Soft Congestion Control Context) is increased to
64 Bytes from 32 Bytes. The hardware will get the configuration of SCCC
from driver instead of using a fixed value.
Link: https://lore.kernel.org/r/1600245806-56321-5-git-send-email-liweihang@huawei.com
Signed-off-by: Yangyang Li <liyangyang20@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
The new version of RoCEE supports using QPC in size of 256B or 512B, so
that HIP09 can supports new congestion control algorithms by using QPC in
larger size.
Link: https://lore.kernel.org/r/1600245806-56321-4-git-send-email-liweihang@huawei.com
Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Yangyang Li <liyangyang20@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|