Age | Commit message (Collapse) | Author | Files | Lines |
|
In order to account HB for sq_wr_avail properly, move sq_wr_avail from
rtrs_srv_con to rtrs_con.
Although rtrs-clt do not care sq_wr_avail, but still init it to
max_send_wr.
Fixes: b38041d50add ("RDMA/rtrs: Do not signal for heatbeat")
Link: https://lore.kernel.org/r/20210712060750.16494-7-jinpu.wang@ionos.com
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Reviewed-by: Aleksei Marov <aleksei.marov@ionos.com>
Reviewed-by: Gioh Kim <gi-oh.kim@ionos.com>
Reviewed-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
flags is not used, so remove it from rtrs_post_rdma_write_imm_empty.
Link: https://lore.kernel.org/r/20210712060750.16494-6-jinpu.wang@ionos.com
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Reviewed-by: Aleksei Marov <aleksei.marov@ionos.com>
Reviewed-by: Gioh Kim <gi-oh.kim@ionos.com>
Reviewed-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
It's only used in rtrs.c, so no need to export.
Link: https://lore.kernel.org/r/20210712060750.16494-5-jinpu.wang@ionos.com
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Reviewed-by: Aleksei Marov <aleksei.marov@ionos.com>
Reviewed-by: Gioh Kim <gi-oh.kim@ionos.com>
Reviewed-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
On idle session, because we do not do signal for heartbeat, it will
overflow the send queue after sometime.
To avoid that, we need to enable the signal for heartbeat. To do that, add
a new member signal_interval in rtrs_path, which will set min of
queue_depth and SERVICE_CON_QUEUE_DEPTH, and track it for both heartbeat
and IO, so the sq queue full accounting is correct.
Fixes: b38041d50add ("RDMA/rtrs: Do not signal for heatbeat")
Link: https://lore.kernel.org/r/20210712060750.16494-4-jinpu.wang@ionos.com
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Reviewed-by: Aleksei Marov <aleksei.marov@ionos.com>
Reviewed-by: Gioh Kim <gi-oh.kim@ionos.com>
Reviewed-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
It could help debugging in case of error happens.
Link: https://lore.kernel.org/r/20210712060750.16494-2-jinpu.wang@ionos.com
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Reviewed-by: Aleksei Marov <aleksei.marov@ionos.com>
Reviewed-by: Gioh Kim <gi-oh.kim@ionos.com>
Reviewed-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Introduce tail wr, we can send as the last wr, we want to send the local
invalidate wr after rdma wr in later patch.
While at it, also fix coding style issue.
Link: https://lore.kernel.org/r/20210621055340.11789-2-jinpu.wang@ionos.com
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Reviewed-by: Md Haris Iqbal <haris.iqbal@cloud.ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Those variables are passed to create_cq, create_qp, rtrs_iu_alloc and
rtrs_iu_free, so these *_size means the num of unit. And cq_size also
means number of cq element.
Also move the setting of cq_num to common path.
Link: https://lore.kernel.org/r/20210614090337.29557-5-jinpu.wang@ionos.com
Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
Reviewed-by: Md Haris Iqbal <haris.iqbal@cloud.ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
When re-connecting, it resets hb_missed_max to 0.
Before the first re-connecting, client will trigger re-connection
when it gets hb-ack more than 5 times. But after the first
re-connecting, clients will do re-connection whenever it does
not get hb-ack because hb_missed_max is 0.
There is no need to reset hb_missed_max when re-connecting.
hb_missed_max should be kept until closing the session.
Fixes: c0894b3ea69d3 ("RDMA/rtrs: core: lib functions shared between client and server modules")
Link: https://lore.kernel.org/r/20210528113018.52290-16-jinpu.wang@ionos.com
Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
This patch adds new multipath policy: min-latency. Client checks the
latency of each path when it sends the heart-beat. And it sends IO to the
path with the minimum latency.
Link: https://lore.kernel.org/r/20210407113444.150961-2-gi-oh.kim@ionos.com
Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
There is common code converting addresses of source machine and
destination machine to a string. We already have a struct rtrs_addr to
store two addresses. This patch introduces a new function that converts
two addresses into one string with struct rtrs_addr.
Link: https://lore.kernel.org/r/20210325153308.1214057-14-gi-oh.kim@ionos.com
Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Have the driver use shared CQs which provids a ~10%-20% improvement during
test.
Instead of opening a CQ for each QP per connection, a CQ for each QP will
be provided by the RDMA core driver that will be shared between the QPs on
that core reducing interrupt overhead.
Link: https://lore.kernel.org/r/20210222141551.54345-1-jinpu.wang@cloud.ionos.com
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
When KASAN is enabled, we notice warning below:
[ 483.436975] ==================================================================
[ 483.437234] BUG: KASAN: stack-out-of-bounds in _mlx5_ib_post_send+0x188a/0x2560 [mlx5_ib]
[ 483.437430] Read of size 4 at addr ffff88a195fd7d30 by task kworker/1:3/6954
[ 483.437731] CPU: 1 PID: 6954 Comm: kworker/1:3 Kdump: loaded Tainted: G O 5.4.82-pserver #5.4.82-1+feature+linux+5.4.y+dbg+20201210.1532+987e7a6~deb10
[ 483.437976] Hardware name: Supermicro Super Server/X11DDW-L, BIOS 3.3 02/21/2020
[ 483.438168] Workqueue: rtrs_server_wq hb_work [rtrs_core]
[ 483.438323] Call Trace:
[ 483.438486] dump_stack+0x96/0xe0
[ 483.438646] ? _mlx5_ib_post_send+0x188a/0x2560 [mlx5_ib]
[ 483.438802] print_address_description.constprop.6+0x1b/0x220
[ 483.438966] ? _mlx5_ib_post_send+0x188a/0x2560 [mlx5_ib]
[ 483.439133] ? _mlx5_ib_post_send+0x188a/0x2560 [mlx5_ib]
[ 483.439285] __kasan_report.cold.9+0x1a/0x32
[ 483.439444] ? _mlx5_ib_post_send+0x188a/0x2560 [mlx5_ib]
[ 483.439597] kasan_report+0x10/0x20
[ 483.439752] _mlx5_ib_post_send+0x188a/0x2560 [mlx5_ib]
[ 483.439910] ? update_sd_lb_stats+0xfb1/0xfc0
[ 483.440073] ? set_reg_wr+0x520/0x520 [mlx5_ib]
[ 483.440222] ? update_group_capacity+0x340/0x340
[ 483.440377] ? find_busiest_group+0x314/0x870
[ 483.440526] ? update_sd_lb_stats+0xfc0/0xfc0
[ 483.440683] ? __bitmap_and+0x6f/0x100
[ 483.440832] ? __lock_acquire+0xa2/0x2150
[ 483.440979] ? __lock_acquire+0xa2/0x2150
[ 483.441128] ? __lock_acquire+0xa2/0x2150
[ 483.441279] ? debug_lockdep_rcu_enabled+0x23/0x60
[ 483.441430] ? lock_downgrade+0x390/0x390
[ 483.441582] ? __lock_acquire+0xa2/0x2150
[ 483.441729] ? __lock_acquire+0xa2/0x2150
[ 483.441876] ? newidle_balance+0x425/0x8f0
[ 483.442024] ? __lock_acquire+0xa2/0x2150
[ 483.442172] ? debug_lockdep_rcu_enabled+0x23/0x60
[ 483.442330] hb_work+0x15d/0x1d0 [rtrs_core]
[ 483.442479] ? schedule_hb+0x50/0x50 [rtrs_core]
[ 483.442627] ? lock_downgrade+0x390/0x390
[ 483.442781] ? process_one_work+0x40d/0xa50
[ 483.442931] process_one_work+0x4ee/0xa50
[ 483.443082] ? pwq_dec_nr_in_flight+0x110/0x110
[ 483.443231] ? do_raw_spin_lock+0x119/0x1d0
[ 483.443383] worker_thread+0x65/0x5c0
[ 483.443532] ? process_one_work+0xa50/0xa50
[ 483.451839] kthread+0x1e2/0x200
[ 483.451983] ? kthread_create_on_node+0xc0/0xc0
[ 483.452139] ret_from_fork+0x3a/0x50
The problem is we use wrong type when send wr, hw driver expect the type
of IB_WR_RDMA_WRITE_WITH_IMM wr should be ib_rdma_wr, and doing
container_of to access member. The fix is simple use ib_rdma_wr instread
of ib_send_wr.
Fixes: c0894b3ea69d ("RDMA/rtrs: core: lib functions shared between client and server modules")
Link: https://lore.kernel.org/r/20201217141915.56989-20-jinpu.wang@cloud.ionos.com
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Reviewed-by: Gioh Kim <gi-oh.kim@cloud.ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
For HB, there is no need to generate signal for completion.
Also remove a comment accordingly.
Fixes: c0894b3ea69d ("RDMA/rtrs: core: lib functions shared between client and server modules")
Link: https://lore.kernel.org/r/20201217141915.56989-16-jinpu.wang@cloud.ionos.com
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Reported-by: Gioh Kim <gi-oh.kim@cloud.ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
rtrs does not have same limit for both max_send_wr and max_recv_wr,
To allow client and server set different values, export in a separate
parameter for rtrs_cq_qp_create.
Also fix the type accordingly, u32 should be used instead of u16.
Fixes: c0894b3ea69d ("RDMA/rtrs: core: lib functions shared between client and server modules")
Link: https://lore.kernel.org/r/20201217141915.56989-2-jinpu.wang@cloud.ionos.com
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Reviewed-by: Md Haris Iqbal <haris.iqbal@cloud.ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Since the three functions share the similar logic, let's introduce one
common function for it.
Link: https://lore.kernel.org/r/20201023074353.21946-12-jinpu.wang@cloud.ionos.com
Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
The direction of DMA operation is already in the rtrs_iu
Link: https://lore.kernel.org/r/20201023074353.21946-8-jinpu.wang@cloud.ionos.com
Signed-off-by: Gioh Kim <gi-oh.kim@cloud.ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
This is a set of library functions existing as a rtrs-core module, used by
client and server modules.
Mainly these functions wrap IB and RDMA calls and provide a bit higher
abstraction for implementing of RTRS protocol on client or server sides.
Link: https://lore.kernel.org/r/20200511135131.27580-5-danil.kipnis@cloud.ionos.com
Signed-off-by: Danil Kipnis <danil.kipnis@cloud.ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|