summaryrefslogtreecommitdiff
path: root/drivers/infiniband/core
AgeCommit message (Collapse)AuthorFilesLines
2017-02-01IB/umem: Release pid in error and ODP flowKenneth Lee1-0/+2
commit 828f6fa65ce7e80f77f5ab12942e44eb3d9d174e upstream. 1. Release pid before enter odp flow 2. Release pid when fail to allocate memory Fixes: 87773dd56d54 ("IB: ib_umem_release() should decrement mm->pinned_vm from ib_umem_get") Fixes: 8ada2c1c0c1d ("IB/core: Add support for on demand paging regions") Signed-off-by: Kenneth Lee <liguozhu@hisilicon.com> Reviewed-by: Haggai Eran <haggaie@mellanox.com> Reviewed-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-01RDMA/cma: Fix unknown symbol when CONFIG_IPV6 is not enabledJack Morgenstein1-1/+2
commit b4cfe3971f6eab542dd7ecc398bfa1aeec889934 upstream. If IPV6 has not been enabled in the underlying kernel, we must avoid calling IPV6 procedures in rdma_cm.ko. This requires using "IS_ENABLED(CONFIG_IPV6)" in "if" statements surrounding any code which calls external IPV6 procedures. In the instance fixed here, procedure cma_bind_addr() called ipv6_addr_type() -- which resulted in calling external procedure __ipv6_addr_type(). Fixes: 6c26a77124ff ("RDMA/cma: fix IPv6 address resolution") Cc: Spencer Baugh <sbaugh@catern.com> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Reviewed-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-09IB/multicast: Check ib_find_pkey() return valueBart Van Assche1-2/+5
commit d3a2418ee36a59bc02e9d454723f3175dcf4bfd9 upstream. This patch avoids that Coverity complains about not checking the ib_find_pkey() return value. Fixes: commit 547af76521b3 ("IB/multicast: Report errors on multicast groups if P_key changes") Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-09IB/mad: Fix an array index checkBart Van Assche1-1/+1
commit 2fe2f378dd45847d2643638c07a7658822087836 upstream. The array ib_mad_mgmt_class_table.method_table has MAX_MGMT_CLASS (80) elements. Hence compare the array index with that value instead of with IB_MGMT_MAX_METHODS (128). This patch avoids that Coverity reports the following: Overrunning array class->method_table of 80 8-byte elements at element index 127 (byte offset 1016) using index convert_mgmt_class(mad_hdr->mgmt_class) (which evaluates to 127). Fixes: commit b7ab0b19a85f ("IB/mad: Verify mgmt class in received MADs") Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Sean Hefty <sean.hefty@intel.com> Reviewed-by: Hal Rosenstock <hal@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-11-26IB/cm: Mark stale CM id's whenever the mad agent was unregisteredMark Bloch1-16/+110
commit 9db0ff53cb9b43ed75bacd42a89c1a0ab048b2b0 upstream. When there is a CM id object that has port assigned to it, it means that the cm-id asked for the specific port that it should go by it, but if that port was removed (hot-unplug event) the cm-id was not updated. In order to fix that the port keeps a list of all the cm-id's that are planning to go by it, whenever the port is removed it marks all of them as invalid. This commit fixes a kernel panic which happens when running traffic between guests and we force reboot a guest mid traffic, it triggers a kernel panic: Call Trace: [<ffffffff815271fa>] ? panic+0xa7/0x16f [<ffffffff8152b534>] ? oops_end+0xe4/0x100 [<ffffffff8104a00b>] ? no_context+0xfb/0x260 [<ffffffff81084db2>] ? del_timer_sync+0x22/0x30 [<ffffffff8104a295>] ? __bad_area_nosemaphore+0x125/0x1e0 [<ffffffff81084240>] ? process_timeout+0x0/0x10 [<ffffffff8104a363>] ? bad_area_nosemaphore+0x13/0x20 [<ffffffff8104aabf>] ? __do_page_fault+0x31f/0x480 [<ffffffff81065df0>] ? default_wake_function+0x0/0x20 [<ffffffffa0752675>] ? free_msg+0x55/0x70 [mlx5_core] [<ffffffffa0753434>] ? cmd_exec+0x124/0x840 [mlx5_core] [<ffffffff8105a924>] ? find_busiest_group+0x244/0x9f0 [<ffffffff8152d45e>] ? do_page_fault+0x3e/0xa0 [<ffffffff8152a815>] ? page_fault+0x25/0x30 [<ffffffffa024da25>] ? cm_alloc_msg+0x35/0xc0 [ib_cm] [<ffffffffa024e821>] ? ib_send_cm_dreq+0xb1/0x1e0 [ib_cm] [<ffffffffa024f836>] ? cm_destroy_id+0x176/0x320 [ib_cm] [<ffffffffa024fb00>] ? ib_destroy_cm_id+0x10/0x20 [ib_cm] [<ffffffffa034f527>] ? ipoib_cm_free_rx_reap_list+0xa7/0x110 [ib_ipoib] [<ffffffffa034f590>] ? ipoib_cm_rx_reap+0x0/0x20 [ib_ipoib] [<ffffffffa034f5a5>] ? ipoib_cm_rx_reap+0x15/0x20 [ib_ipoib] [<ffffffff81094d20>] ? worker_thread+0x170/0x2a0 [<ffffffff8109b2a0>] ? autoremove_wake_function+0x0/0x40 [<ffffffff81094bb0>] ? worker_thread+0x0/0x2a0 [<ffffffff8109aef6>] ? kthread+0x96/0xa0 [<ffffffff8100c20a>] ? child_rip+0xa/0x20 [<ffffffff8109ae60>] ? kthread+0x0/0xa0 [<ffffffff8100c200>] ? child_rip+0x0/0x20 Fixes: a977049dacde ("[PATCH] IB: Add the kernel CM implementation") Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Reviewed-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-11-26IB/uverbs: Fix leak of XRC target QPsTariq Toukan1-5/+2
commit 5b810a242c28e1d8d64d718cebe75b79d86a0b2d upstream. The real QP is destroyed in case of the ref count reaches zero, but for XRC target QPs this call was missed and caused to QP leaks. Let's call to destroy for all flows. Fixes: 0e0ec7e0638e ('RDMA/core: Export ib_open_qp() to share XRC...') Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Noa Osherovich <noaos@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-11-26IB/core: Avoid unsigned int overflow in sg_alloc_tableMark Bloch1-1/+1
commit 3c7ba5760ab8eedec01159b267bb9bfcffe522ac upstream. sg_alloc_table gets unsigned int as parameter while the driver returns it as size_t. Check npages isn't greater than maximum unsigned int. Fixes: eeb8461e36c9 ("IB: Refactor umem to use linear SG table") Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-10-07IB/core: Fix use after free in send_leave functionErez Shitrit1-11/+2
commit 68c6bcdd8bd00394c234b915ab9b97c74104130c upstream. The function send_leave sets the member: group->query_id (group->query_id = ret) after calling the sa_query, but leave_handler can be executed before the setting and it might delete the group object, and will get a memory corruption. Additionally, this patch gets rid of group->query_id variable which is not used. Fixes: faec2f7b96b5 ('IB/sa: Track multicast join/leave requests') Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-24IB/uverbs: Fix race between uverbs_close and remove_oneJason Gunthorpe2-13/+25
commit d1e09f304a1d9651c5059ebfeb696dc2effc9b32 upstream. Fixes an oops that might happen if uverbs_close races with remove_one. Both contexts may run ib_uverbs_cleanup_ucontext, it depends on the flow. Currently, there is no protection for a case that remove_one didn't make the cleanup it runs to its end, the underlying ib_device was freed then uverbs_close will call ib_uverbs_cleanup_ucontext and OOPs. Above might happen if uverbs_close deleted the file from the list then remove_one didn't find it and runs to its end. Fixes to protect against that case by a new cleanup lock so that ib_uverbs_cleanup_ucontext will be called always before that remove_one is ended. Fixes: 35d4a0b63dc0 ("IB/uverbs: Fix race between ib_uverbs_open and remove_one") Reported-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-20IB/IWPM: Fix a potential skb leakMark Bloch1-0/+1
commit 5ed935e861a4cbf2158ad3386d6d26edd60d2658 upstream. In case ibnl_put_msg fails in send_nlmsg_done, the function returns with -ENOMEM without freeing. This patch fixes this behavior. Fixes: 30dc5e63d6a5 ("RDMA/core: Add support for iWARP Port Mapper user space service") Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-20IB/SA: Use correct free functionMark Bloch1-1/+1
commit 0f377d86252d11bfea941852785e3094b93601a7 upstream. Fixes a direct call to kfree_skb when nlmsg_free should be used. Fixes: 2ca546b92a02 ('IB/sa: Route SA pathrecord query through netlink') Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-07-27IB/cm: Fix a recently introduced locking bugBart Van Assche1-2/+2
commit 943f44d94aa26bfdcaafc40d3701e24eeb58edce upstream. ib_cm_notify() can be called from interrupt context. Hence do not reenable interrupts unconditionally in cm_establish(). This patch avoids that lockdep reports the following warning: WARNING: CPU: 0 PID: 23317 at kernel/locking/lockdep.c:2624 trace _hardirqs_on_caller+0x112/0x1b0 DEBUG_LOCKS_WARN_ON(current->hardirq_context) Call Trace: <IRQ> [<ffffffff812bd0e5>] dump_stack+0x67/0x92 [<ffffffff81056f21>] __warn+0xc1/0xe0 [<ffffffff81056f8a>] warn_slowpath_fmt+0x4a/0x50 [<ffffffff810a5932>] trace_hardirqs_on_caller+0x112/0x1b0 [<ffffffff810a59dd>] trace_hardirqs_on+0xd/0x10 [<ffffffff815992c7>] _raw_spin_unlock_irq+0x27/0x40 [<ffffffffa0382e9c>] ib_cm_notify+0x25c/0x290 [ib_cm] [<ffffffffa068fbc1>] srpt_qp_event+0xa1/0xf0 [ib_srpt] [<ffffffffa04efb97>] mlx4_ib_qp_event+0x67/0xd0 [mlx4_ib] [<ffffffffa034ec0a>] mlx4_qp_event+0x5a/0xc0 [mlx4_core] [<ffffffffa03365f8>] mlx4_eq_int+0x3d8/0xcf0 [mlx4_core] [<ffffffffa0336f9c>] mlx4_msi_x_interrupt+0xc/0x20 [mlx4_core] [<ffffffff810b0914>] handle_irq_event_percpu+0x64/0x100 [<ffffffff810b09e4>] handle_irq_event+0x34/0x60 [<ffffffff810b3a6a>] handle_edge_irq+0x6a/0x150 [<ffffffff8101ad05>] handle_irq+0x15/0x20 [<ffffffff8101a66c>] do_IRQ+0x5c/0x110 [<ffffffff8159a2c9>] common_interrupt+0x89/0x89 [<ffffffff81297a17>] blk_run_queue_async+0x37/0x40 [<ffffffffa0163e53>] rq_completed+0x43/0x70 [dm_mod] [<ffffffffa0164896>] dm_softirq_done+0x176/0x280 [dm_mod] [<ffffffff812a26c2>] blk_done_softirq+0x52/0x90 [<ffffffff8105bc1f>] __do_softirq+0x10f/0x230 [<ffffffff8105bec8>] irq_exit+0xa8/0xb0 [<ffffffff8103653e>] smp_trace_call_function_single_interrupt+0x2e/0x30 [<ffffffff81036549>] smp_call_function_single_interrupt+0x9/0x10 [<ffffffff8159a959>] call_function_single_interrupt+0x89/0x90 <EOI> Fixes: commit be4b499323bf (IB/cm: Do not queue work to a device that's going away) Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Erez Shitrit <erezsh@mellanox.com> Cc: Sean Hefty <sean.hefty@intel.com> Cc: Nikolay Borisov <kernel@kyup.com> Acked-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-05-05IB/security: Restrict use of the write() interfaceJason Gunthorpe3-0/+12
commit e6bd18f57aad1a2d1ef40e646d03ed0f2515c9e3 upstream. The drivers/infiniband stack uses write() as a replacement for bi-directional ioctl(). This is not safe. There are ways to trigger write calls that result in the return structure that is normally written to user space being shunted off to user specified kernel memory instead. For the immediate repair, detect and deny suspicious accesses to the write API. For long term, update the user space libraries and the kernel API to something that doesn't present the same security vulnerabilities (likely a structured ioctl() interface). The impacted uAPI interfaces are generally only available if hardware from drivers/infiniband is installed in the system. Reported-by: Jann Horn <jann@thejh.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> [ Expanded check to all known write() entry points ] Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-03-04IB/cma: Fix RDMA port validation for iWarpMatan Barak1-1/+1
commit 649367735ee5dedb128d9fac0b86ba7e0fe7ae3b upstream. cma_validate_port wrongly assumed that Ethernet devices are RoCE devices and thus their ndev should be matched in the GID table. This broke the iWarp support. Fixing that matching the ndev only if we work on a RoCE port. Cc: <stable@vger.kernel.org> # 4.4.x- Fixes: abae1b71dd37 ('IB/cma: cma_validate_port should verify the port and netdevice') Reported-by: Hariprasad Shenai <hariprasad@chelsio.com> Tested-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-03-04IB/cm: Fix a recently introduced deadlockBart Van Assche1-2/+2
commit 4bfdf635c668869c69fd18ece37ec66fb6f38fcf upstream. ib_send_cm_drep() calls cm_enter_timewait() while holding a spinlock that can be locked from inside an interrupt handler. Hence do not enable interrupts inside cm_enter_timewait() if called with interrupts disabled. This patch fixes e.g. the following deadlock: Acked-by: Erez Shitrit <erezsh@mellanox.com> ================================= [ INFO: inconsistent lock state ] 4.4.0-rc7+ #1 Tainted: G E --------------------------------- inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage. swapper/8/0 [HC1[1]:SC0[0]:HE0:SE1] takes: (&(&cm_id_priv->lock)->rlock){?.+...}, at: [<ffffffffa036eec4>] cm_establish+0x 74/0x1b0 [ib_cm] {HARDIRQ-ON-W} state was registered at: [<ffffffff810a3c11>] mark_held_locks+0x71/0x90 [<ffffffff810a3e87>] trace_hardirqs_on_caller+0xa7/0x1c0 [<ffffffff810a3fad>] trace_hardirqs_on+0xd/0x10 [<ffffffff8151c40b>] _raw_spin_unlock_irq+0x2b/0x40 [<ffffffffa036ea8e>] cm_enter_timewait+0xae/0x100 [ib_cm] [<ffffffffa036ff76>] ib_send_cm_drep+0xb6/0x190 [ib_cm] [<ffffffffa052ed08>] srp_cm_handler+0x128/0x1a0 [ib_srp] [<ffffffffa0370340>] cm_process_work+0x20/0xf0 [ib_cm] [<ffffffffa0371335>] cm_dreq_handler+0x135/0x2c0 [ib_cm] [<ffffffffa03733c5>] cm_work_handler+0x75/0xd0 [ib_cm] [<ffffffff8107184d>] process_one_work+0x1bd/0x460 [<ffffffff81073148>] worker_thread+0x118/0x420 [<ffffffff81078454>] kthread+0xe4/0x100 [<ffffffff8151cbbf>] ret_from_fork+0x3f/0x70 irq event stamp: 1672286 hardirqs last enabled at (1672283): [<ffffffff81408ec0>] poll_idle+0x10/0x80 hardirqs last disabled at (1672284): [<ffffffff8151d304>] common_interrupt+0x84/0x89 softirqs last enabled at (1672286): [<ffffffff8105b4dc>] _local_bh_enable+0x1c/0x50 softirqs last disabled at (1672285): [<ffffffff8105b697>] irq_enter+0x47/0x70 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&(&cm_id_priv->lock)->rlock); <Interrupt> lock(&(&cm_id_priv->lock)->rlock); *** DEADLOCK *** no locks held by swapper/8/0. stack backtrace: CPU: 8 PID: 0 Comm: swapper/8 Tainted: G E 4.4.0-rc7+ #1 Hardware name: Dell Inc. PowerEdge R430/03XKDV, BIOS 1.0.2 11/17/2014 ffff88045af5e950 ffff88046e503a88 ffffffff81251c1b 0000000000000007 0000000000000006 0000000000000003 ffff88045af5ddc0 ffff88046e503ad8 ffffffff810a32f4 0000000000000000 0000000000000000 0000000000000001 Call Trace: <IRQ> [<ffffffff81251c1b>] dump_stack+0x4f/0x74 [<ffffffff810a32f4>] print_usage_bug+0x184/0x190 [<ffffffff810a36e2>] mark_lock_irq+0xf2/0x290 [<ffffffff810a3995>] mark_lock+0x115/0x1b0 [<ffffffff810a3b8c>] mark_irqflags+0x15c/0x170 [<ffffffff810a4fef>] __lock_acquire+0x1ef/0x560 [<ffffffff810a53c2>] lock_acquire+0x62/0x80 [<ffffffff8151bd33>] _raw_spin_lock_irqsave+0x43/0x60 [<ffffffffa036eec4>] cm_establish+0x74/0x1b0 [ib_cm] [<ffffffffa036f031>] ib_cm_notify+0x31/0x100 [ib_cm] [<ffffffffa0637f24>] srpt_qp_event+0x54/0xd0 [ib_srpt] [<ffffffffa0196052>] mlx4_ib_qp_event+0x72/0xc0 [mlx4_ib] [<ffffffffa00775b9>] mlx4_qp_event+0x69/0xd0 [mlx4_core] [<ffffffffa006000e>] mlx4_eq_int+0x51e/0xd50 [mlx4_core] [<ffffffffa006084f>] mlx4_msi_x_interrupt+0xf/0x20 [mlx4_core] [<ffffffff810b67b0>] handle_irq_event_percpu+0x40/0x110 [<ffffffff810b68bf>] handle_irq_event+0x3f/0x70 [<ffffffff810ba7f9>] handle_edge_irq+0x79/0x120 [<ffffffff81007f3d>] handle_irq+0x5d/0x130 [<ffffffff810071fd>] do_IRQ+0x6d/0x130 [<ffffffff8151d309>] common_interrupt+0x89/0x89 <EOI> [<ffffffff8140895f>] cpuidle_enter_state+0xcf/0x200 [<ffffffff81408aa2>] cpuidle_enter+0x12/0x20 [<ffffffff810990d6>] call_cpuidle+0x36/0x60 [<ffffffff81099163>] cpuidle_idle_call+0x63/0x110 [<ffffffff8109930a>] cpu_idle_loop+0xfa/0x130 [<ffffffff8109934e>] cpu_startup_entry+0xe/0x10 [<ffffffff8103c443>] start_secondary+0x83/0x90 Fixes: commit be4b499323bf ("IB/cm: Do not queue work to a device that's going away") Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-12-23IB/cma: cma_match_net_dev needs to take into account port_numMatan Barak1-7/+9
Previously, cma_match_net_dev called cma_protocol_roce which tried to verify that the IB device uses RoCE protocol. However, if rdma_id wasn't bound to a port, then the check would occur against the first port of the device without regard to whether that port was even of the same type as the type of port the incoming packet was received on. Fix this by passing the port of the request and only checking against the same port of the device. Reported-by: Or Gerlitz <gerlitz.or@gmail.com> Fixes: b8cab5dab15f ('IB/cma: Accept connection without a valid netdev on RoCE') Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-08IB/mad: Require CM send method for everything except ClassPortInfoHal Rosenstock1-0/+5
Receipt of CM MAD with other than the Send method for an attribute other than the ClassPortInfo attribute is invalid. CM attributes other than ClassPortInfo only use the send method. The SRP initiator does not maintain a timeout policy for CM connect requests relies on the CM layer to do that. The result was that the SRP initiator hung as the connect request never completed. A new SRP target has been observed to respond to Send CM REQ with GetResp of CM REQ with bad status. This is non conformant with IBA spec but exposes a vulnerability in the current MAD/CM code which will respond to the incoming GetResp of CM REQ as if it was a valid incoming Send of CM REQ rather than tossing this on the floor. It also causes the MAD layer not to retransmit the original REQ even though it has not received a REP. Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Hal Rosenstock <hal@mellanox.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-08IB/cma: Add a missing rcu_read_unlock()Bart Van Assche1-4/+1
Ensure that validate_ipv4_net_dev() calls rcu_read_unlock() if fib_lookup() fails. Detected by sparse. Compile-tested only. Fixes: "IB/cma: Validate routing of incoming requests" (commit f887f2ac87c2). Cc: Haggai Eran <haggaie@mellanox.com> Cc: stable <stable@vger.kernel.org> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Haggai Eran <haggaie@mellanox.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-08IB core: Fix ib_sg_to_pages()Bart Van Assche1-21/+22
On 12/03/2015 01:18 AM, Christoph Hellwig wrote: > The patch looks good to me, but while we touch this area, how about > throwing in a few cosmetic fixes as well? How about the patch below ? In that version of the ib_sg_to_pages() fix these concerns have been addressed and additionally to more bugs have been fixed. ------------ [PATCH] IB core: Fix ib_sg_to_pages() Fix the code for detecting gaps. A gap occurs not only if the second or later scatterlist element is not aligned but also if any scatterlist element other than the last does not end at a page boundary. In the code for coalescing contiguous elements, ensure that mr->length is correct and that last_page_addr is up-to-date. Ensure that this function returns a negative error code instead of zero if the first set_page() call fails. Fixes: commit 4c67e2bfc8b7 ("IB/core: Introduce new fast registration API") Reported-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-08IB/sa: Put netlink request into the request list before sendingKaike Wan1-15/+17
It was found by Saurabh Sengar that the netlink code tried to allocate memory with GFP_KERNEL while holding a spinlock. While it is possible to fix the issue by replacing GFP_KERNEL with GFP_ATOMIC, it is better to get rid of the spinlock while sending the packet. However, in order to protect against a race condition that a quick response may be received before the request is put on the request list, we need to put the request on the list first. Signed-off-by: Kaike Wan <kaike.wan@intel.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reported-by: Saurabh Sengar <saurabh.truth@gmail.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-08IB/core: use RCU for uverbs id lookupMike Marciniszyn1-5/+7
The current implementation gets a spin_lock, and at any scale with qib and hfi1 post send, the lock contention grows exponentially with the number of QPs. idr_find() is RCU compatibile, so read doesn't need the lock. Change to use rcu_read_lock() and rcu_read_unlock() in __idr_get_uobj(). kfree_rcu() is used to insure a grace period between the idr removal and actual free. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-By: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-08IB/core: Fix user mode post wr corruptionMike Marciniszyn1-5/+10
Commit e622f2f4ad21 ("IB: split struct ib_send_wr") introduced a regression for HCAs whose user mode post sends go through ib_uverbs_post_send(). The code didn't account for the fact that the first sge is offset by an operation dependent length. The allocation did, but the pointer to the destination sge list is computed without that knowledge. The sge list copy_from_user() then corrupts fields in the work request Store the operation dependent length in a local variable and compute the sge list copy_from_user() destination using that length. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-11-08Merge branch 'akpm' (patches from Andrew)Linus Torvalds1-1/+1
Merge second patch-bomb from Andrew Morton: - most of the rest of MM - procfs - lib/ updates - printk updates - bitops infrastructure tweaks - checkpatch updates - nilfs2 update - signals - various other misc bits: coredump, seqfile, kexec, pidns, zlib, ipc, dma-debug, dma-mapping, ... * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (102 commits) ipc,msg: drop dst nil validation in copy_msg include/linux/zutil.h: fix usage example of zlib_adler32() panic: release stale console lock to always get the logbuf printed out dma-debug: check nents in dma_sync_sg* dma-mapping: tidy up dma_parms default handling pidns: fix set/getpriority and ioprio_set/get in PRIO_USER mode kexec: use file name as the output message prefix fs, seqfile: always allow oom killer seq_file: reuse string_escape_str() fs/seq_file: use seq_* helpers in seq_hex_dump() coredump: change zap_threads() and zap_process() to use for_each_thread() coredump: ensure all coredumping tasks have SIGNAL_GROUP_COREDUMP signal: remove jffs2_garbage_collect_thread()->allow_signal(SIGCONT) signal: introduce kernel_signal_stop() to fix jffs2_garbage_collect_thread() signal: turn dequeue_signal_lock() into kernel_dequeue_signal() signals: kill block_all_signals() and unblock_all_signals() nilfs2: fix gcc uninitialized-variable warnings in powerpc build nilfs2: fix gcc unused-but-set-variable warnings MAINTAINERS: nilfs2: add header file for tracing nilfs2: add tracepoints for analyzing reading and writing metadata files ...
2015-11-07mm, page_alloc: distinguish between being unable to sleep, unwilling to ↵Mel Gorman1-1/+1
sleep and avoiding waking kswapd __GFP_WAIT has been used to identify atomic context in callers that hold spinlocks or are in interrupts. They are expected to be high priority and have access one of two watermarks lower than "min" which can be referred to as the "atomic reserve". __GFP_HIGH users get access to the first lower watermark and can be called the "high priority reserve". Over time, callers had a requirement to not block when fallback options were available. Some have abused __GFP_WAIT leading to a situation where an optimisitic allocation with a fallback option can access atomic reserves. This patch uses __GFP_ATOMIC to identify callers that are truely atomic, cannot sleep and have no alternative. High priority users continue to use __GFP_HIGH. __GFP_DIRECT_RECLAIM identifies callers that can sleep and are willing to enter direct reclaim. __GFP_KSWAPD_RECLAIM to identify callers that want to wake kswapd for background reclaim. __GFP_WAIT is redefined as a caller that is willing to enter direct reclaim and wake kswapd for background reclaim. This patch then converts a number of sites o __GFP_ATOMIC is used by callers that are high priority and have memory pools for those requests. GFP_ATOMIC uses this flag. o Callers that have a limited mempool to guarantee forward progress clear __GFP_DIRECT_RECLAIM but keep __GFP_KSWAPD_RECLAIM. bio allocations fall into this category where kswapd will still be woken but atomic reserves are not used as there is a one-entry mempool to guarantee progress. o Callers that are checking if they are non-blocking should use the helper gfpflags_allow_blocking() where possible. This is because checking for __GFP_WAIT as was done historically now can trigger false positives. Some exceptions like dm-crypt.c exist where the code intent is clearer if __GFP_DIRECT_RECLAIM is used instead of the helper due to flag manipulations. o Callers that built their own GFP flags instead of starting with GFP_KERNEL and friends now also need to specify __GFP_KSWAPD_RECLAIM. The first key hazard to watch out for is callers that removed __GFP_WAIT and was depending on access to atomic reserves for inconspicuous reasons. In some cases it may be appropriate for them to use __GFP_HIGH. The second key hazard is callers that assembled their own combination of GFP flags instead of starting with something like GFP_KERNEL. They may now wish to specify __GFP_KSWAPD_RECLAIM. It's almost certainly harmless if it's missed in most cases as other activity will wake kswapd. Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Vitaly Wool <vitalywool@gmail.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-10-31IB/core, cma: Make __attribute_const__ declarations sparse-friendlyBart Van Assche2-3/+3
Move the __attribute_const__ declarations such that sparse understands that these apply to the function itself and not to the return type. This avoids that sparse reports error messages like the following: drivers/infiniband/core/verbs.c:73:12: error: symbol 'ib_event_msg' redeclared with different type (originally declared at include/rdma/ib_verbs.h:470) - different modifiers Fixes: 2b1b5b601230 ("IB/core, cma: Nice log-friendly string helpers") Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-29IB/core: Remove old fast registration APISagi Grimberg1-25/+0
No callers and no providers left, go ahead and remove it. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-29IB/core: Introduce new fast registration APISagi Grimberg1-0/+107
The new fast registration verb ib_map_mr_sg receives a scatterlist and converts it to a page list under the verbs API thus hiding the specific HW mapping details away from the consumer. The provider drivers are provided with a generic helper ib_sg_to_pages that converts a scatterlist into a vector of page addresses. The drivers can still perform any HW specific page address setting by passing a set_page function pointer which will be invoked for each page address. This allows drivers to avoid keeping a shadow page vectors and convert them to HW specific translations by doing extra copies. This API will allow ULPs to remove the duplicated code of constructing a page vector from a given sg list. The send work request ib_reg_wr also shrinks as it will contain only mr, key and access flags in addition. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Tested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-29Merge branch 'wr-cleanup' into k.o/for-4.4Doug Ledford4-83/+104
2015-10-28IB/ucma: Take the network namespace from the processGuy Shapiro1-2/+3
Add support for network namespaces from user space. This is done by passing the network namespace of the process instead of init_net. Signed-off-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Yotam Kenneth <yotamke@mellanox.com> Signed-off-by: Shachar Raindel <raindel@mellanox.com> Signed-off-by: Guy Shapiro <guysh@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-28IB/cma: Add support for network namespacesGuy Shapiro2-19/+30
Add support for network namespaces in the ib_cma module. This is accomplished by: 1. Adding network namespace parameter for rdma_create_id. This parameter is used to populate the network namespace field in rdma_id_private. rdma_create_id keeps a reference on the network namespace. 2. Using the network namespace from the rdma_id instead of init_net inside of ib_cma, when listening on an ID and when looking for an ID for an incoming request. 3. Decrementing the reference count for the appropriate network namespace when calling rdma_destroy_id. In order to preserve the current behavior init_net is passed when calling from other modules. Signed-off-by: Guy Shapiro <guysh@mellanox.com> Signed-off-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Yotam Kenneth <yotamke@mellanox.com> Signed-off-by: Shachar Raindel <raindel@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-28IB/cma: Separate port allocation to network namespacesHaggai Eran1-24/+70
Keep a struct for each network namespace containing the IDRs for the RDMA CM port spaces. The struct is created dynamically using the generic_net mechanism. This patch is internal infrastructure work for the following patches. In this patch, init_net is statically used as the network namespace for the new port-space API. Signed-off-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Yotam Kenneth <yotamke@mellanox.com> Signed-off-by: Shachar Raindel <raindel@mellanox.com> Signed-off-by: Guy Shapiro <guysh@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-28IB/addr: Pass network namespace as a parameterGuy Shapiro2-8/+10
Add network namespace support to the ib_addr module. For that, all the address resolution and matching should be done using the appropriate namespace instead of init_net. This is achieved by: 1. Adding an explicit network namespace argument to exported function that require a namespace. 2. Saving the namespace in the rdma_addr_client structure. 3. Using it when calling networking functions. In order to preserve the behavior of calling modules, &init_net is passed as the parameter in calls from other modules. This is modified as namespace support is added on more levels. Signed-off-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Yotam Kenneth <yotamke@mellanox.com> Signed-off-by: Shachar Raindel <raindel@mellanox.com> Signed-off-by: Guy Shapiro <guysh@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-22IB/core: Remove smac and vlan id from path recordMatan Barak3-6/+0
The GID cache accompanies every GID with attributes. The GID attributes link the GID with its netdevice, which could be resolved to smac and vlan id easily. Since we've added the netdevice (ifindex and net) to the path record, storing the L2 attributes is duplicated data and hence these attributes are removed. Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-By: Devesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-22IB/core: Remove smac and vlan id from qp_attr and ah_attrMatan Barak3-6/+0
Smac and vlan id could be resolved from the GID attribute, and thus these attributes aren't needed anymore. Removing them. Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-By: Devesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-22IB/cm: Remove the usage of smac and vid of qp_attr and cm_avMatan Barak2-36/+0
The cm and cma don't need to explicitly handle vlan and smac, as they are resolved from the GID index now. Removing this portion of code. Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-By: Devesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-22IB/core: Use GID table in AH creation and dmac resolutionMatan Barak4-71/+98
Previously, vlan id and source MAC were used from QP attributes. Since the net device is now stored in the GID attributes, they could be used instead of getting this information from the QP attributes. IB_QP_SMAC, IB_QP_ALT_SMAC, IB_QP_VID and IB_QP_ALT_VID were removed because there is no known libibverbs that uses them. This commit also modifies the vendors (mlx4, ocrdma) drivers in order to use the new approach. ocrdma driver changes were done by Somnath Kotur <Somnath.Kotur@Avagotech.Com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-22IB/cache: Add ib_find_gid_by_filter cache APIMatan Barak1-0/+93
GID cache API users might want to search for GIDs with specific attributes rather than just specifying GID, net device and port. This is used in a later patch, where we find the sgid index by L2 Ethernet attributes. Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-By: Devesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-22IB/cma: cma_validate_port should verify the port and netdeviceMatan Barak1-8/+18
Previously, cma_validate_port searched for GIDs in IB cache and then tried to verify the found port. This could fail when there are identical GIDs on both ports. In addition, netdevice should be taken into account when searching the GID table. Fixing cma_validate_port to search only the relevant port's cache and netdevice. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-22IB/cm: cm_init_av_by_path should find a GID by its netdeviceMatan Barak1-2/+5
Previously, the CM has searched the cache for any sgid_index whose GID matches the path's GID. Since the path record stores the net device, the CM should now search only for GIDs which originated from this net device. Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-By: Devesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-22IB/core: Add netdev to path recordMatan Barak2-2/+13
In order to find the sgid_index, one could just query the IB cache with the correct GID and netdevice. Therefore, instead of storing the L2 attributes directly in the path, we only store the ifindex and net and use them later to get the sgid_index. The vlan_id and smac L2 attributes are removed in a later patch. Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-By: Devesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-22IB/core: Expose and rename ib_find_cached_gid_by_port cache APIMatan Barak3-11/+7
Sometime consumers might want to search for a GID in a specific port. For example, when a WC arrives and we want to search the GID that matches that port - it's better to search only the relevant port. Exposing and renaming ib_cache_gid_find_by_port in order to match the naming convention of the module. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-22IB/core: Add netdev and gid attributes paramteres to cacheMatan Barak9-22/+36
Adding an ability to query the IB cache by a netdev and get the attributes of a GID. These parameters are necessary in order to successfully resolve the required GID (when the netdevice is known) and get the Ethernet L2 attributes from a GID. Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-By: Devesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-22IB/core: Allow setting create flags in QP init attributeEran Ben Elisha1-1/+1
Allow setting IB_QP_CREATE_BLOCK_MULTICAST_LOOPBACK at create_flags in ib_uverbs_create_qp_ex. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-22IB/core: Extend ib_uverbs_create_qpEran Ben Elisha3-65/+191
ib_uverbs_ex_create_qp follows the extension verbs mechanism. New features (for example, QP creation flags field which is added in a downstream patch) could used via user-space libraries without breaking the ABI. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-21IB/core: avoid 32-bit warningArnd Bergmann1-1/+1
The INIT_UDATA() macro requires a pointer or unsigned long argument for both input and output buffer, and all callers had a cast from when the code was merged until a recent restructuring, so now we get core/uverbs_cmd.c: In function 'ib_uverbs_create_cq': core/uverbs_cmd.c:1481:66: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] This makes the code behave as before by adding back the cast to unsigned long. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: 565197dd8fb1 ("IB/core: Extend ib_uverbs_create_cq") Reviewed-by: Yann Droneaud <ydroneaud@opteya.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-21IB/cm: Fix rb-tree duplicate free and use-after-freeDoron Tsur1-1/+9
ib_send_cm_sidr_rep could sometimes erase the node from the sidr (depending on errors in the process). Since ib_send_cm_sidr_rep is called both from cm_sidr_req_handler and cm_destroy_id, cm_id_priv could be either erased from the rb_tree twice or not erased at all. Fixing that by making sure it's erased only once before freeing cm_id_priv. Fixes: a977049dacde ('[PATCH] IB: Add the kernel CM implementation') Signed-off-by: Doron Tsur <doront@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-20IB/cma: Use inner P_Key to determine netdevHaggai Eran1-2/+2
When discussing the patches to demux ids in rdma_cm instead of ib_cm, it was decided that it is best to use the P_Key value in the packet headers. However, the mlx5 and ipath drivers are currently unable to send correct P_Key values in GMP headers. They always send using a single P_Key that is set during the GSI QP initialization. Change the rdma_cm code to look at the P_Key value that is part of the packet payload as a workaround. Once the drivers are fixed this patch can be reverted. Fixes: 4c21b5bcef73 ("IB/cma: Add net_dev and private data checks to RDMA CM") Signed-off-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-20IB/ucma: check workqueue allocation before usageSasha Levin1-1/+6
Allocating a workqueue might fail, which wasn't checked so far and would lead to NULL ptr derefs when an attempt to use it was made. Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-20IB/cma: Potential NULL dereference in cma_id_from_eventHaggai Eran1-1/+1
If the lookup of a listening ID failed for an AF_IB request, the code would try to call dev_put() on a NULL net_dev. Fixes: be688195bd08 ("IB/cma: Fix net_dev reference leak with failed requests") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-20IB/core: Fix use after free of ifaMatan Barak1-8/+27
When using ifup/ifdown while executing enum_netdev_ipv4_ips, ifa could become invalid and cause use after free error. Fixing it by protecting with RCU lock. Fixes: 03db3a2d81e6 ('IB/core: Add RoCE GID table management') Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>