Age | Commit message (Collapse) | Author | Files | Lines |
|
commit 828f6fa65ce7e80f77f5ab12942e44eb3d9d174e upstream.
1. Release pid before enter odp flow
2. Release pid when fail to allocate memory
Fixes: 87773dd56d54 ("IB: ib_umem_release() should decrement mm->pinned_vm from ib_umem_get")
Fixes: 8ada2c1c0c1d ("IB/core: Add support for on demand paging regions")
Signed-off-by: Kenneth Lee <liguozhu@hisilicon.com>
Reviewed-by: Haggai Eran <haggaie@mellanox.com>
Reviewed-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit b4cfe3971f6eab542dd7ecc398bfa1aeec889934 upstream.
If IPV6 has not been enabled in the underlying kernel, we must avoid
calling IPV6 procedures in rdma_cm.ko.
This requires using "IS_ENABLED(CONFIG_IPV6)" in "if" statements
surrounding any code which calls external IPV6 procedures.
In the instance fixed here, procedure cma_bind_addr() called
ipv6_addr_type() -- which resulted in calling external procedure
__ipv6_addr_type().
Fixes: 6c26a77124ff ("RDMA/cma: fix IPv6 address resolution")
Cc: Spencer Baugh <sbaugh@catern.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Reviewed-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit d3a2418ee36a59bc02e9d454723f3175dcf4bfd9 upstream.
This patch avoids that Coverity complains about not checking the
ib_find_pkey() return value.
Fixes: commit 547af76521b3 ("IB/multicast: Report errors on multicast groups if P_key changes")
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 2fe2f378dd45847d2643638c07a7658822087836 upstream.
The array ib_mad_mgmt_class_table.method_table has MAX_MGMT_CLASS
(80) elements. Hence compare the array index with that value instead
of with IB_MGMT_MAX_METHODS (128). This patch avoids that Coverity
reports the following:
Overrunning array class->method_table of 80 8-byte elements at element index 127 (byte offset 1016) using index convert_mgmt_class(mad_hdr->mgmt_class) (which evaluates to 127).
Fixes: commit b7ab0b19a85f ("IB/mad: Verify mgmt class in received MADs")
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Reviewed-by: Hal Rosenstock <hal@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 9db0ff53cb9b43ed75bacd42a89c1a0ab048b2b0 upstream.
When there is a CM id object that has port assigned to it, it means that
the cm-id asked for the specific port that it should go by it, but if
that port was removed (hot-unplug event) the cm-id was not updated.
In order to fix that the port keeps a list of all the cm-id's that are
planning to go by it, whenever the port is removed it marks all of them
as invalid.
This commit fixes a kernel panic which happens when running traffic between
guests and we force reboot a guest mid traffic, it triggers a kernel panic:
Call Trace:
[<ffffffff815271fa>] ? panic+0xa7/0x16f
[<ffffffff8152b534>] ? oops_end+0xe4/0x100
[<ffffffff8104a00b>] ? no_context+0xfb/0x260
[<ffffffff81084db2>] ? del_timer_sync+0x22/0x30
[<ffffffff8104a295>] ? __bad_area_nosemaphore+0x125/0x1e0
[<ffffffff81084240>] ? process_timeout+0x0/0x10
[<ffffffff8104a363>] ? bad_area_nosemaphore+0x13/0x20
[<ffffffff8104aabf>] ? __do_page_fault+0x31f/0x480
[<ffffffff81065df0>] ? default_wake_function+0x0/0x20
[<ffffffffa0752675>] ? free_msg+0x55/0x70 [mlx5_core]
[<ffffffffa0753434>] ? cmd_exec+0x124/0x840 [mlx5_core]
[<ffffffff8105a924>] ? find_busiest_group+0x244/0x9f0
[<ffffffff8152d45e>] ? do_page_fault+0x3e/0xa0
[<ffffffff8152a815>] ? page_fault+0x25/0x30
[<ffffffffa024da25>] ? cm_alloc_msg+0x35/0xc0 [ib_cm]
[<ffffffffa024e821>] ? ib_send_cm_dreq+0xb1/0x1e0 [ib_cm]
[<ffffffffa024f836>] ? cm_destroy_id+0x176/0x320 [ib_cm]
[<ffffffffa024fb00>] ? ib_destroy_cm_id+0x10/0x20 [ib_cm]
[<ffffffffa034f527>] ? ipoib_cm_free_rx_reap_list+0xa7/0x110 [ib_ipoib]
[<ffffffffa034f590>] ? ipoib_cm_rx_reap+0x0/0x20 [ib_ipoib]
[<ffffffffa034f5a5>] ? ipoib_cm_rx_reap+0x15/0x20 [ib_ipoib]
[<ffffffff81094d20>] ? worker_thread+0x170/0x2a0
[<ffffffff8109b2a0>] ? autoremove_wake_function+0x0/0x40
[<ffffffff81094bb0>] ? worker_thread+0x0/0x2a0
[<ffffffff8109aef6>] ? kthread+0x96/0xa0
[<ffffffff8100c20a>] ? child_rip+0xa/0x20
[<ffffffff8109ae60>] ? kthread+0x0/0xa0
[<ffffffff8100c200>] ? child_rip+0x0/0x20
Fixes: a977049dacde ("[PATCH] IB: Add the kernel CM implementation")
Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 5b810a242c28e1d8d64d718cebe75b79d86a0b2d upstream.
The real QP is destroyed in case of the ref count reaches zero, but
for XRC target QPs this call was missed and caused to QP leaks.
Let's call to destroy for all flows.
Fixes: 0e0ec7e0638e ('RDMA/core: Export ib_open_qp() to share XRC...')
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 3c7ba5760ab8eedec01159b267bb9bfcffe522ac upstream.
sg_alloc_table gets unsigned int as parameter while the driver
returns it as size_t. Check npages isn't greater than maximum
unsigned int.
Fixes: eeb8461e36c9 ("IB: Refactor umem to use linear SG table")
Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 68c6bcdd8bd00394c234b915ab9b97c74104130c upstream.
The function send_leave sets the member: group->query_id
(group->query_id = ret) after calling the sa_query, but leave_handler
can be executed before the setting and it might delete the group object,
and will get a memory corruption.
Additionally, this patch gets rid of group->query_id variable which is
not used.
Fixes: faec2f7b96b5 ('IB/sa: Track multicast join/leave requests')
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit d1e09f304a1d9651c5059ebfeb696dc2effc9b32 upstream.
Fixes an oops that might happen if uverbs_close races with
remove_one.
Both contexts may run ib_uverbs_cleanup_ucontext, it depends
on the flow.
Currently, there is no protection for a case that remove_one
didn't make the cleanup it runs to its end, the underlying
ib_device was freed then uverbs_close will call
ib_uverbs_cleanup_ucontext and OOPs.
Above might happen if uverbs_close deleted the file from the list
then remove_one didn't find it and runs to its end.
Fixes to protect against that case by a new cleanup lock so that
ib_uverbs_cleanup_ucontext will be called always before that
remove_one is ended.
Fixes: 35d4a0b63dc0 ("IB/uverbs: Fix race between ib_uverbs_open and remove_one")
Reported-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 5ed935e861a4cbf2158ad3386d6d26edd60d2658 upstream.
In case ibnl_put_msg fails in send_nlmsg_done,
the function returns with -ENOMEM without freeing.
This patch fixes this behavior.
Fixes: 30dc5e63d6a5 ("RDMA/core: Add support for iWARP Port Mapper user space service")
Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 0f377d86252d11bfea941852785e3094b93601a7 upstream.
Fixes a direct call to kfree_skb when nlmsg_free should be used.
Fixes: 2ca546b92a02 ('IB/sa: Route SA pathrecord query through netlink')
Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 943f44d94aa26bfdcaafc40d3701e24eeb58edce upstream.
ib_cm_notify() can be called from interrupt context. Hence do not
reenable interrupts unconditionally in cm_establish().
This patch avoids that lockdep reports the following warning:
WARNING: CPU: 0 PID: 23317 at kernel/locking/lockdep.c:2624 trace _hardirqs_on_caller+0x112/0x1b0
DEBUG_LOCKS_WARN_ON(current->hardirq_context)
Call Trace:
<IRQ> [<ffffffff812bd0e5>] dump_stack+0x67/0x92
[<ffffffff81056f21>] __warn+0xc1/0xe0
[<ffffffff81056f8a>] warn_slowpath_fmt+0x4a/0x50
[<ffffffff810a5932>] trace_hardirqs_on_caller+0x112/0x1b0
[<ffffffff810a59dd>] trace_hardirqs_on+0xd/0x10
[<ffffffff815992c7>] _raw_spin_unlock_irq+0x27/0x40
[<ffffffffa0382e9c>] ib_cm_notify+0x25c/0x290 [ib_cm]
[<ffffffffa068fbc1>] srpt_qp_event+0xa1/0xf0 [ib_srpt]
[<ffffffffa04efb97>] mlx4_ib_qp_event+0x67/0xd0 [mlx4_ib]
[<ffffffffa034ec0a>] mlx4_qp_event+0x5a/0xc0 [mlx4_core]
[<ffffffffa03365f8>] mlx4_eq_int+0x3d8/0xcf0 [mlx4_core]
[<ffffffffa0336f9c>] mlx4_msi_x_interrupt+0xc/0x20 [mlx4_core]
[<ffffffff810b0914>] handle_irq_event_percpu+0x64/0x100
[<ffffffff810b09e4>] handle_irq_event+0x34/0x60
[<ffffffff810b3a6a>] handle_edge_irq+0x6a/0x150
[<ffffffff8101ad05>] handle_irq+0x15/0x20
[<ffffffff8101a66c>] do_IRQ+0x5c/0x110
[<ffffffff8159a2c9>] common_interrupt+0x89/0x89
[<ffffffff81297a17>] blk_run_queue_async+0x37/0x40
[<ffffffffa0163e53>] rq_completed+0x43/0x70 [dm_mod]
[<ffffffffa0164896>] dm_softirq_done+0x176/0x280 [dm_mod]
[<ffffffff812a26c2>] blk_done_softirq+0x52/0x90
[<ffffffff8105bc1f>] __do_softirq+0x10f/0x230
[<ffffffff8105bec8>] irq_exit+0xa8/0xb0
[<ffffffff8103653e>] smp_trace_call_function_single_interrupt+0x2e/0x30
[<ffffffff81036549>] smp_call_function_single_interrupt+0x9/0x10
[<ffffffff8159a959>] call_function_single_interrupt+0x89/0x90
<EOI>
Fixes: commit be4b499323bf (IB/cm: Do not queue work to a device that's going away)
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Erez Shitrit <erezsh@mellanox.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Nikolay Borisov <kernel@kyup.com>
Acked-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit e6bd18f57aad1a2d1ef40e646d03ed0f2515c9e3 upstream.
The drivers/infiniband stack uses write() as a replacement for
bi-directional ioctl(). This is not safe. There are ways to
trigger write calls that result in the return structure that
is normally written to user space being shunted off to user
specified kernel memory instead.
For the immediate repair, detect and deny suspicious accesses to
the write API.
For long term, update the user space libraries and the kernel API
to something that doesn't present the same security vulnerabilities
(likely a structured ioctl() interface).
The impacted uAPI interfaces are generally only available if
hardware from drivers/infiniband is installed in the system.
Reported-by: Jann Horn <jann@thejh.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
[ Expanded check to all known write() entry points ]
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 649367735ee5dedb128d9fac0b86ba7e0fe7ae3b upstream.
cma_validate_port wrongly assumed that Ethernet devices are RoCE
devices and thus their ndev should be matched in the GID table.
This broke the iWarp support. Fixing that matching the ndev only if
we work on a RoCE port.
Cc: <stable@vger.kernel.org> # 4.4.x-
Fixes: abae1b71dd37 ('IB/cma: cma_validate_port should verify the port
and netdevice')
Reported-by: Hariprasad Shenai <hariprasad@chelsio.com>
Tested-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 4bfdf635c668869c69fd18ece37ec66fb6f38fcf upstream.
ib_send_cm_drep() calls cm_enter_timewait() while holding a spinlock
that can be locked from inside an interrupt handler. Hence do not
enable interrupts inside cm_enter_timewait() if called with interrupts
disabled.
This patch fixes e.g. the following deadlock:
Acked-by: Erez Shitrit <erezsh@mellanox.com>
=================================
[ INFO: inconsistent lock state ]
4.4.0-rc7+ #1 Tainted: G E
---------------------------------
inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
swapper/8/0 [HC1[1]:SC0[0]:HE0:SE1] takes:
(&(&cm_id_priv->lock)->rlock){?.+...}, at: [<ffffffffa036eec4>] cm_establish+0x
74/0x1b0 [ib_cm]
{HARDIRQ-ON-W} state was registered at:
[<ffffffff810a3c11>] mark_held_locks+0x71/0x90
[<ffffffff810a3e87>] trace_hardirqs_on_caller+0xa7/0x1c0
[<ffffffff810a3fad>] trace_hardirqs_on+0xd/0x10
[<ffffffff8151c40b>] _raw_spin_unlock_irq+0x2b/0x40
[<ffffffffa036ea8e>] cm_enter_timewait+0xae/0x100 [ib_cm]
[<ffffffffa036ff76>] ib_send_cm_drep+0xb6/0x190 [ib_cm]
[<ffffffffa052ed08>] srp_cm_handler+0x128/0x1a0 [ib_srp]
[<ffffffffa0370340>] cm_process_work+0x20/0xf0 [ib_cm]
[<ffffffffa0371335>] cm_dreq_handler+0x135/0x2c0 [ib_cm]
[<ffffffffa03733c5>] cm_work_handler+0x75/0xd0 [ib_cm]
[<ffffffff8107184d>] process_one_work+0x1bd/0x460
[<ffffffff81073148>] worker_thread+0x118/0x420
[<ffffffff81078454>] kthread+0xe4/0x100
[<ffffffff8151cbbf>] ret_from_fork+0x3f/0x70
irq event stamp: 1672286
hardirqs last enabled at (1672283): [<ffffffff81408ec0>] poll_idle+0x10/0x80
hardirqs last disabled at (1672284): [<ffffffff8151d304>] common_interrupt+0x84/0x89
softirqs last enabled at (1672286): [<ffffffff8105b4dc>] _local_bh_enable+0x1c/0x50
softirqs last disabled at (1672285): [<ffffffff8105b697>] irq_enter+0x47/0x70
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&(&cm_id_priv->lock)->rlock);
<Interrupt>
lock(&(&cm_id_priv->lock)->rlock);
*** DEADLOCK ***
no locks held by swapper/8/0.
stack backtrace:
CPU: 8 PID: 0 Comm: swapper/8 Tainted: G E 4.4.0-rc7+ #1
Hardware name: Dell Inc. PowerEdge R430/03XKDV, BIOS 1.0.2 11/17/2014
ffff88045af5e950 ffff88046e503a88 ffffffff81251c1b 0000000000000007
0000000000000006 0000000000000003 ffff88045af5ddc0 ffff88046e503ad8
ffffffff810a32f4 0000000000000000 0000000000000000 0000000000000001
Call Trace:
<IRQ> [<ffffffff81251c1b>] dump_stack+0x4f/0x74
[<ffffffff810a32f4>] print_usage_bug+0x184/0x190
[<ffffffff810a36e2>] mark_lock_irq+0xf2/0x290
[<ffffffff810a3995>] mark_lock+0x115/0x1b0
[<ffffffff810a3b8c>] mark_irqflags+0x15c/0x170
[<ffffffff810a4fef>] __lock_acquire+0x1ef/0x560
[<ffffffff810a53c2>] lock_acquire+0x62/0x80
[<ffffffff8151bd33>] _raw_spin_lock_irqsave+0x43/0x60
[<ffffffffa036eec4>] cm_establish+0x74/0x1b0 [ib_cm]
[<ffffffffa036f031>] ib_cm_notify+0x31/0x100 [ib_cm]
[<ffffffffa0637f24>] srpt_qp_event+0x54/0xd0 [ib_srpt]
[<ffffffffa0196052>] mlx4_ib_qp_event+0x72/0xc0 [mlx4_ib]
[<ffffffffa00775b9>] mlx4_qp_event+0x69/0xd0 [mlx4_core]
[<ffffffffa006000e>] mlx4_eq_int+0x51e/0xd50 [mlx4_core]
[<ffffffffa006084f>] mlx4_msi_x_interrupt+0xf/0x20 [mlx4_core]
[<ffffffff810b67b0>] handle_irq_event_percpu+0x40/0x110
[<ffffffff810b68bf>] handle_irq_event+0x3f/0x70
[<ffffffff810ba7f9>] handle_edge_irq+0x79/0x120
[<ffffffff81007f3d>] handle_irq+0x5d/0x130
[<ffffffff810071fd>] do_IRQ+0x6d/0x130
[<ffffffff8151d309>] common_interrupt+0x89/0x89
<EOI> [<ffffffff8140895f>] cpuidle_enter_state+0xcf/0x200
[<ffffffff81408aa2>] cpuidle_enter+0x12/0x20
[<ffffffff810990d6>] call_cpuidle+0x36/0x60
[<ffffffff81099163>] cpuidle_idle_call+0x63/0x110
[<ffffffff8109930a>] cpu_idle_loop+0xfa/0x130
[<ffffffff8109934e>] cpu_startup_entry+0xe/0x10
[<ffffffff8103c443>] start_secondary+0x83/0x90
Fixes: commit be4b499323bf ("IB/cm: Do not queue work to a device that's going away")
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Previously, cma_match_net_dev called cma_protocol_roce which
tried to verify that the IB device uses RoCE protocol. However,
if rdma_id wasn't bound to a port, then the check would occur
against the first port of the device without regard to whether
that port was even of the same type as the type of port the
incoming packet was received on.
Fix this by passing the port of the request and only checking
against the same port of the device.
Reported-by: Or Gerlitz <gerlitz.or@gmail.com>
Fixes: b8cab5dab15f ('IB/cma: Accept connection without a valid netdev on RoCE')
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Receipt of CM MAD with other than the Send method for an attribute
other than the ClassPortInfo attribute is invalid.
CM attributes other than ClassPortInfo only use the send method.
The SRP initiator does not maintain a timeout policy for CM connect
requests relies on the CM layer to do that. The result was that
the SRP initiator hung as the connect request never completed.
A new SRP target has been observed to respond to Send CM REQ
with GetResp of CM REQ with bad status. This is non conformant
with IBA spec but exposes a vulnerability in the current MAD/CM
code which will respond to the incoming GetResp of CM REQ as if
it was a valid incoming Send of CM REQ rather than tossing
this on the floor. It also causes the MAD layer not to
retransmit the original REQ even though it has not received a REP.
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Hal Rosenstock <hal@mellanox.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Ensure that validate_ipv4_net_dev() calls rcu_read_unlock() if
fib_lookup() fails. Detected by sparse. Compile-tested only.
Fixes: "IB/cma: Validate routing of incoming requests" (commit f887f2ac87c2).
Cc: Haggai Eran <haggaie@mellanox.com>
Cc: stable <stable@vger.kernel.org>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Haggai Eran <haggaie@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
On 12/03/2015 01:18 AM, Christoph Hellwig wrote:
> The patch looks good to me, but while we touch this area, how about
> throwing in a few cosmetic fixes as well?
How about the patch below ? In that version of the ib_sg_to_pages() fix
these concerns have been addressed and additionally to more bugs have been fixed.
------------
[PATCH] IB core: Fix ib_sg_to_pages()
Fix the code for detecting gaps. A gap occurs not only if the
second or later scatterlist element is not aligned but also if
any scatterlist element other than the last does not end at a
page boundary.
In the code for coalescing contiguous elements, ensure that
mr->length is correct and that last_page_addr is up-to-date.
Ensure that this function returns a negative
error code instead of zero if the first set_page() call fails.
Fixes: commit 4c67e2bfc8b7 ("IB/core: Introduce new fast registration API")
Reported-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
It was found by Saurabh Sengar that the netlink code tried to allocate
memory with GFP_KERNEL while holding a spinlock. While it is possible
to fix the issue by replacing GFP_KERNEL with GFP_ATOMIC, it is better
to get rid of the spinlock while sending the packet. However, in order
to protect against a race condition that a quick response may be received
before the request is put on the request list, we need to put the request
on the list first.
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reported-by: Saurabh Sengar <saurabh.truth@gmail.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
The current implementation gets a spin_lock, and at any scale with
qib and hfi1 post send, the lock contention grows exponentially
with the number of QPs.
idr_find() is RCU compatibile, so read doesn't need the lock.
Change to use rcu_read_lock() and rcu_read_unlock() in
__idr_get_uobj().
kfree_rcu() is used to insure a grace period between the
idr removal and actual free.
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-By: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Commit e622f2f4ad21 ("IB: split struct ib_send_wr")
introduced a regression for HCAs whose user mode post
sends go through ib_uverbs_post_send().
The code didn't account for the fact that the first sge is
offset by an operation dependent length. The allocation did,
but the pointer to the destination sge list is computed without
that knowledge. The sge list copy_from_user() then corrupts
fields in the work request
Store the operation dependent length in a local variable and
compute the sge list copy_from_user() destination using that length.
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Merge second patch-bomb from Andrew Morton:
- most of the rest of MM
- procfs
- lib/ updates
- printk updates
- bitops infrastructure tweaks
- checkpatch updates
- nilfs2 update
- signals
- various other misc bits: coredump, seqfile, kexec, pidns, zlib, ipc,
dma-debug, dma-mapping, ...
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (102 commits)
ipc,msg: drop dst nil validation in copy_msg
include/linux/zutil.h: fix usage example of zlib_adler32()
panic: release stale console lock to always get the logbuf printed out
dma-debug: check nents in dma_sync_sg*
dma-mapping: tidy up dma_parms default handling
pidns: fix set/getpriority and ioprio_set/get in PRIO_USER mode
kexec: use file name as the output message prefix
fs, seqfile: always allow oom killer
seq_file: reuse string_escape_str()
fs/seq_file: use seq_* helpers in seq_hex_dump()
coredump: change zap_threads() and zap_process() to use for_each_thread()
coredump: ensure all coredumping tasks have SIGNAL_GROUP_COREDUMP
signal: remove jffs2_garbage_collect_thread()->allow_signal(SIGCONT)
signal: introduce kernel_signal_stop() to fix jffs2_garbage_collect_thread()
signal: turn dequeue_signal_lock() into kernel_dequeue_signal()
signals: kill block_all_signals() and unblock_all_signals()
nilfs2: fix gcc uninitialized-variable warnings in powerpc build
nilfs2: fix gcc unused-but-set-variable warnings
MAINTAINERS: nilfs2: add header file for tracing
nilfs2: add tracepoints for analyzing reading and writing metadata files
...
|
|
sleep and avoiding waking kswapd
__GFP_WAIT has been used to identify atomic context in callers that hold
spinlocks or are in interrupts. They are expected to be high priority and
have access one of two watermarks lower than "min" which can be referred
to as the "atomic reserve". __GFP_HIGH users get access to the first
lower watermark and can be called the "high priority reserve".
Over time, callers had a requirement to not block when fallback options
were available. Some have abused __GFP_WAIT leading to a situation where
an optimisitic allocation with a fallback option can access atomic
reserves.
This patch uses __GFP_ATOMIC to identify callers that are truely atomic,
cannot sleep and have no alternative. High priority users continue to use
__GFP_HIGH. __GFP_DIRECT_RECLAIM identifies callers that can sleep and
are willing to enter direct reclaim. __GFP_KSWAPD_RECLAIM to identify
callers that want to wake kswapd for background reclaim. __GFP_WAIT is
redefined as a caller that is willing to enter direct reclaim and wake
kswapd for background reclaim.
This patch then converts a number of sites
o __GFP_ATOMIC is used by callers that are high priority and have memory
pools for those requests. GFP_ATOMIC uses this flag.
o Callers that have a limited mempool to guarantee forward progress clear
__GFP_DIRECT_RECLAIM but keep __GFP_KSWAPD_RECLAIM. bio allocations fall
into this category where kswapd will still be woken but atomic reserves
are not used as there is a one-entry mempool to guarantee progress.
o Callers that are checking if they are non-blocking should use the
helper gfpflags_allow_blocking() where possible. This is because
checking for __GFP_WAIT as was done historically now can trigger false
positives. Some exceptions like dm-crypt.c exist where the code intent
is clearer if __GFP_DIRECT_RECLAIM is used instead of the helper due to
flag manipulations.
o Callers that built their own GFP flags instead of starting with GFP_KERNEL
and friends now also need to specify __GFP_KSWAPD_RECLAIM.
The first key hazard to watch out for is callers that removed __GFP_WAIT
and was depending on access to atomic reserves for inconspicuous reasons.
In some cases it may be appropriate for them to use __GFP_HIGH.
The second key hazard is callers that assembled their own combination of
GFP flags instead of starting with something like GFP_KERNEL. They may
now wish to specify __GFP_KSWAPD_RECLAIM. It's almost certainly harmless
if it's missed in most cases as other activity will wake kswapd.
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Vitaly Wool <vitalywool@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Move the __attribute_const__ declarations such that sparse understands
that these apply to the function itself and not to the return type.
This avoids that sparse reports error messages like the following:
drivers/infiniband/core/verbs.c:73:12: error: symbol 'ib_event_msg' redeclared with different type (originally declared at include/rdma/ib_verbs.h:470) - different modifiers
Fixes: 2b1b5b601230 ("IB/core, cma: Nice log-friendly string helpers")
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
No callers and no providers left, go ahead and remove it.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
The new fast registration verb ib_map_mr_sg receives a scatterlist
and converts it to a page list under the verbs API thus hiding
the specific HW mapping details away from the consumer.
The provider drivers are provided with a generic helper ib_sg_to_pages
that converts a scatterlist into a vector of page addresses. The
drivers can still perform any HW specific page address setting
by passing a set_page function pointer which will be invoked for
each page address. This allows drivers to avoid keeping a shadow
page vectors and convert them to HW specific translations by doing
extra copies.
This API will allow ULPs to remove the duplicated code of constructing
a page vector from a given sg list.
The send work request ib_reg_wr also shrinks as it will contain only
mr, key and access flags in addition.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Tested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
|
|
Add support for network namespaces from user space. This is done by passing
the network namespace of the process instead of init_net.
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Yotam Kenneth <yotamke@mellanox.com>
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: Guy Shapiro <guysh@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Add support for network namespaces in the ib_cma module. This is
accomplished by:
1. Adding network namespace parameter for rdma_create_id. This parameter is
used to populate the network namespace field in rdma_id_private.
rdma_create_id keeps a reference on the network namespace.
2. Using the network namespace from the rdma_id instead of init_net inside
of ib_cma, when listening on an ID and when looking for an ID for an
incoming request.
3. Decrementing the reference count for the appropriate network namespace
when calling rdma_destroy_id.
In order to preserve the current behavior init_net is passed when calling
from other modules.
Signed-off-by: Guy Shapiro <guysh@mellanox.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Yotam Kenneth <yotamke@mellanox.com>
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Keep a struct for each network namespace containing the IDRs for the RDMA
CM port spaces. The struct is created dynamically using the generic_net
mechanism.
This patch is internal infrastructure work for the following patches. In
this patch, init_net is statically used as the network namespace for
the new port-space API.
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Yotam Kenneth <yotamke@mellanox.com>
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: Guy Shapiro <guysh@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Add network namespace support to the ib_addr module. For that, all the
address resolution and matching should be done using the appropriate
namespace instead of init_net.
This is achieved by:
1. Adding an explicit network namespace argument to exported function that
require a namespace.
2. Saving the namespace in the rdma_addr_client structure.
3. Using it when calling networking functions.
In order to preserve the behavior of calling modules, &init_net is
passed as the parameter in calls from other modules. This is modified as
namespace support is added on more levels.
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Yotam Kenneth <yotamke@mellanox.com>
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: Guy Shapiro <guysh@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
The GID cache accompanies every GID with attributes.
The GID attributes link the GID with its netdevice, which could be
resolved to smac and vlan id easily. Since we've added the netdevice
(ifindex and net) to the path record, storing the L2 attributes is
duplicated data and hence these attributes are removed.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-By: Devesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Smac and vlan id could be resolved from the GID attribute, and thus
these attributes aren't needed anymore. Removing them.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-By: Devesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
The cm and cma don't need to explicitly handle vlan and smac,
as they are resolved from the GID index now. Removing this
portion of code.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-By: Devesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Previously, vlan id and source MAC were used from QP attributes. Since
the net device is now stored in the GID attributes, they could be used
instead of getting this information from the QP attributes.
IB_QP_SMAC, IB_QP_ALT_SMAC, IB_QP_VID and IB_QP_ALT_VID were removed
because there is no known libibverbs that uses them.
This commit also modifies the vendors (mlx4, ocrdma) drivers in order
to use the new approach.
ocrdma driver changes were done by Somnath Kotur <Somnath.Kotur@Avagotech.Com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
GID cache API users might want to search for GIDs with specific
attributes rather than just specifying GID, net device and port.
This is used in a later patch, where we find the sgid index by
L2 Ethernet attributes.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-By: Devesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Previously, cma_validate_port searched for GIDs in IB cache and then
tried to verify the found port. This could fail when there are
identical GIDs on both ports. In addition, netdevice should be taken
into account when searching the GID table.
Fixing cma_validate_port to search only the relevant port's cache
and netdevice.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Previously, the CM has searched the cache for any sgid_index whose
GID matches the path's GID. Since the path record stores the net
device, the CM should now search only for GIDs which originated from
this net device.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-By: Devesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
In order to find the sgid_index, one could just query the IB cache
with the correct GID and netdevice. Therefore, instead of storing
the L2 attributes directly in the path, we only store the
ifindex and net and use them later to get the sgid_index.
The vlan_id and smac L2 attributes are removed in a later patch.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-By: Devesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Sometime consumers might want to search for a GID in a specific port.
For example, when a WC arrives and we want to search the GID
that matches that port - it's better to search only the relevant
port.
Exposing and renaming ib_cache_gid_find_by_port in order to match
the naming convention of the module.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Adding an ability to query the IB cache by a netdev and get the
attributes of a GID. These parameters are necessary in order to
successfully resolve the required GID (when the netdevice is known)
and get the Ethernet L2 attributes from a GID.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-By: Devesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Allow setting IB_QP_CREATE_BLOCK_MULTICAST_LOOPBACK at create_flags in
ib_uverbs_create_qp_ex.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
ib_uverbs_ex_create_qp follows the extension verbs
mechanism. New features (for example, QP creation flags
field which is added in a downstream patch) could used
via user-space libraries without breaking the ABI.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
The INIT_UDATA() macro requires a pointer or unsigned long argument for
both input and output buffer, and all callers had a cast from when
the code was merged until a recent restructuring, so now we get
core/uverbs_cmd.c: In function 'ib_uverbs_create_cq':
core/uverbs_cmd.c:1481:66: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
This makes the code behave as before by adding back the cast to
unsigned long.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: 565197dd8fb1 ("IB/core: Extend ib_uverbs_create_cq")
Reviewed-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
ib_send_cm_sidr_rep could sometimes erase the node from the sidr
(depending on errors in the process). Since ib_send_cm_sidr_rep is
called both from cm_sidr_req_handler and cm_destroy_id, cm_id_priv
could be either erased from the rb_tree twice or not erased at all.
Fixing that by making sure it's erased only once before freeing
cm_id_priv.
Fixes: a977049dacde ('[PATCH] IB: Add the kernel CM implementation')
Signed-off-by: Doron Tsur <doront@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
When discussing the patches to demux ids in rdma_cm instead of ib_cm, it
was decided that it is best to use the P_Key value in the packet headers.
However, the mlx5 and ipath drivers are currently unable to send correct
P_Key values in GMP headers. They always send using a single P_Key that is
set during the GSI QP initialization.
Change the rdma_cm code to look at the P_Key value that is part of the
packet payload as a workaround. Once the drivers are fixed this patch can
be reverted.
Fixes: 4c21b5bcef73 ("IB/cma: Add net_dev and private data checks to
RDMA CM")
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Allocating a workqueue might fail, which wasn't checked so far and would
lead to NULL ptr derefs when an attempt to use it was made.
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
If the lookup of a listening ID failed for an AF_IB request, the code
would try to call dev_put() on a NULL net_dev.
Fixes: be688195bd08 ("IB/cma: Fix net_dev reference leak with failed
requests")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
When using ifup/ifdown while executing enum_netdev_ipv4_ips,
ifa could become invalid and cause use after free error.
Fixing it by protecting with RCU lock.
Fixes: 03db3a2d81e6 ('IB/core: Add RoCE GID table management')
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|