kernel/linux.git/drivers/infiniband, branch v4.11.5

RDMA/qib,hfi1: Fix MR reference count leak on write with immediate

2017-06-07T10:10:12+00:00

commit 1feb40067cf04ae48d65f728d62ca255c9449178 upstream. The handling of IB_RDMA_WRITE_ONLY_WITH_IMMEDIATE will leak a memory reference when a buffer cannot be allocated for returning the immediate data. The issue is that the rkey validation has already occurred and the RNR nak fails to release the reference that was fruitlessly gotten. The the peer will send the identical single packet request when its RNR timer pops. The fix is to release the held reference prior to the rnr nak exit. This is the only sequence the requires both rkey validation and the buffer allocation on the same packet. Tested-by: Tadeusz Struk Reviewed-by: Dennis Dalessandro Signed-off-by: Mike Marciniszyn Signed-off-by: Dennis Dalessandro Signed-off-by: Doug Ledford Signed-off-by: Greg Kroah-Hartman

RDMA/srp: Fix NULL deref at srp_destroy_qp()

2017-06-07T10:10:12+00:00

commit 95c2ef50c726a51d580c35ae8dccd383abaa8701 upstream. If srp_init_qp() fails at srp_create_ch_ib() then ch->send_cq may be NULL. Calling directly to ib_destroy_qp() is sufficient because no work requests were posted on the created qp. Fixes: 9294000d6d89 ("IB/srp: Drain the send queue before destroying a QP") Signed-off-by: Israel Rukshin Reviewed-by: Max Gurtovoy Reviewed-by: Bart van Assche -- Signed-off-by: Doug Ledford Signed-off-by: Greg Kroah-Hartman

IB/hfi1: Protect the global dev_cntr_names and port_cntr_names

2017-05-25T13:46:30+00:00

commit 62eed66e98b4c2286fef2ce5911d8d75b7515f7b upstream. Protect the global dev_cntr_names and port_cntr_names with the global mutex as they are allocated and freed in a function called per device. Otherwise there is a danger of double free and memory leaks. Fixes: Commit b7481944b06e ("IB/hfi1: Show statistics counters under IB stats interface") Reviewed-by: Dennis Dalessandro Reviewed-by: Easwar Hariharan Signed-off-by: Tadeusz Struk Signed-off-by: Dennis Dalessandro Signed-off-by: Mike Marciniszyn Signed-off-by: Doug Ledford Signed-off-by: Greg Kroah-Hartman

IB/hfi1: Fix a subcontext memory leak

2017-05-25T13:46:17+00:00

commit 224d71f910102c966cdcd782c97e096d5e26e4da upstream. The only context that frees user_exp_rcv data structures is the last context closed (from a sub-context set). This leaks the allocations from the other sub-contexts. Separate the common frees from the specific frees and call them at the appropriate time. Using KEDR to check for memory leaks we get: Before test: [leak_check] Possible leaks: 25 After test: [leak_check] Possible leaks: 31 (6 leaked data structures) After patch applied (before and after test have the same value) [leak_check] Possible leaks: 25 Each leak is 192 + 13440 + 6720 = 20352 bytes per sub-context. Reviewed-by: Mike Marciniszyn Signed-off-by: Michael J. Ruhl Signed-off-by: Dennis Dalessandro Signed-off-by: Doug Ledford Signed-off-by: Greg Kroah-Hartman

IB/hfi1: Return an error on memory allocation failure

2017-05-25T13:46:17+00:00

commit 94679061dcdddbafcf24e3bfb526e54dedcc2f2f upstream. If the eager buffer allocation fails, it is necessary to return an error code. Reviewed-by: Mike Marciniszyn Signed-off-by: Michael J. Ruhl Signed-off-by: Dennis Dalessandro Signed-off-by: Doug Ledford Signed-off-by: Greg Kroah-Hartman

infiniband: call ipv6 route lookup via the stub interface

2017-05-25T13:46:11+00:00

commit eea40b8f624f25cbc02d55f2d93203f60cee9341 upstream. The infiniband address handle can be triggered to resolve an ipv6 address in response to MAD packets, regardless of the ipv6 module being disabled via the kernel command line argument. That will cause a call into the ipv6 routing code, which is not initialized, and a conseguent oops. This commit addresses the above issue replacing the direct lookup call with an indirect one via the ipv6 stub, which is properly initialized according to the ipv6 status (e.g. if ipv6 is disabled, the routing lookup fails gracefully) Signed-off-by: Paolo Abeni Signed-off-by: Doug Ledford Signed-off-by: Greg Kroah-Hartman

mlx5: Fix mlx5_ib_map_mr_sg mr length

2017-05-25T13:46:11+00:00

commit 0a49f2c31c3efbeb0de3e4b5598764887f629be2 upstream. In case we got an initial sg_offset, we need to account for it in the mr length. Fixes: ff2ba9936591 ("IB/core: Add passing an offset into the SG to ib_map_mr_sg") Signed-off-by: Sagi Grimberg Tested-by: Israel Rukshin Signed-off-by: Doug Ledford Signed-off-by: Greg Kroah-Hartman

IB/hfi1: Prevent kernel QP post send hard lockups

2017-05-20T12:49:45+00:00

commit b6eac931b9bb2bce4db7032c35b41e5e34ec22a5 upstream. The driver progress routines can call cond_resched() when a timeslice is exhausted and irqs are enabled. If the ULP had been holding a spin lock without disabling irqs and the post send directly called the progress routine, the cond_resched() could yield allowing another thread from the same ULP to deadlock on that same lock. Correct by replacing the current hfi1_do_send() calldown with a unique one for post send and adding an argument to hfi1_do_send() to indicate that the send engine is running in a thread. If the routine is not running in a thread, avoid calling cond_resched(). Fixes: Commit 831464ce4b74 ("IB/hfi1: Don't call cond_resched in atomic mode when sending packets") Reviewed-by: Dennis Dalessandro Signed-off-by: Mike Marciniszyn Signed-off-by: Dennis Dalessandro Signed-off-by: Doug Ledford Signed-off-by: Greg Kroah-Hartman

IB/mlx4: Reduce SRIOV multicast cleanup warning message to debug level

2017-05-20T12:49:44+00:00

commit fb7a91746af18b2ebf596778b38a709cdbc488d3 upstream. A warning message during SRIOV multicast cleanup should have actually been a debug level message. The condition generating the warning does no harm and can fill the message log. In some cases, during testing, some tests were so intense as to swamp the message log with these warning messages, causing a stall in the console message log output task. This stall caused an NMI to be sent to all CPUs (so that they all dumped their stacks into the message log). Aside from the message flood causing an NMI, the tests all passed. Once the message flood which caused the NMI is removed (by reducing the warning message to debug level), the NMI no longer occurs. Sample message log (console log) output illustrating the flood and resultant NMI (snippets with comments and modified with ... instead of hex digits, to satisfy checkpatch.pl): _mlx4_ib_mcg_port_cleanup: ... WARNING: group refcount 1!!!... *** About 4000 almost identical lines in less than one second *** _mlx4_ib_mcg_port_cleanup: ... WARNING: group refcount 1!!!... INFO: rcu_sched detected stalls on CPUs/tasks: { 17} (...) *** { 17} above indicates that CPU 17 was the one that stalled *** sending NMI to all CPUs: ... NMI backtrace for cpu 17 CPU: 17 PID: 45909 Comm: kworker/17:2 Hardware name: HP ProLiant DL360p Gen8, BIOS P71 09/08/2013 Workqueue: events fb_flashcursor task: ffff880478...... ti: ffff88064e...... task.ti: ffff88064e...... RIP: 0010:[ffffffff81......] [ffffffff81......] io_serial_in+0x15/0x20 RSP: 0018:ffff88064e257cb0 EFLAGS: 00000002 RAX: 0000000000...... RBX: ffffffff81...... RCX: 0000000000...... RDX: 0000000000...... RSI: 0000000000...... RDI: ffffffff81...... RBP: ffff88064e...... R08: ffffffff81...... R09: 0000000000...... R10: 0000000000...... R11: ffff88064e...... R12: 0000000000...... R13: 0000000000...... R14: ffffffff81...... R15: 0000000000...... FS: 0000000000......(0000) GS:ffff8804af......(0000) knlGS:000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080...... CR2: 00007f2a2f...... CR3: 0000000001...... CR4: 0000000000...... DR0: 0000000000...... DR1: 0000000000...... DR2: 0000000000...... DR3: 0000000000...... DR6: 00000000ff...... DR7: 0000000000...... Stack: ffff88064e...... ffffffff81...... ffffffff81...... 0000000000...... ffffffff81...... ffff88064e...... ffffffff81...... ffffffff81...... ffffffff81...... ffff88064e...... ffffffff81...... 0000000000...... Call Trace: [] wait_for_xmitr+0x3b/0xa0 [] serial8250_console_putchar+0x1c/0x30 [] ? serial8250_console_write+0x140/0x140 [] uart_console_write+0x3a/0x80 [] serial8250_console_write+0xae/0x140 [] call_console_drivers.constprop.15+0x91/0xf0 [] console_unlock+0x3bf/0x400 [] fb_flashcursor+0x5d/0x140 [] ? bit_clear+0x120/0x120 [] process_one_work+0x17b/0x470 [] worker_thread+0x11b/0x400 [] ? rescuer_thread+0x400/0x400 [] kthread+0xcf/0xe0 [] ? kthread_create_on_node+0x140/0x140 [] ret_from_fork+0x58/0x90 [] ? kthread_create_on_node+0x140/0x140 Code: 48 89 e5 d3 e6 48 63 f6 48 03 77 10 8b 06 5d c3 66 0f 1f 44 00 00 66 66 66 6 As indicated in the stack trace above, the console output task got swamped. Fixes: b9c5d6a64358 ("IB/mlx4: Add multicast group (MCG) paravirtualization for SR-IOV") Signed-off-by: Jack Morgenstein Signed-off-by: Leon Romanovsky Signed-off-by: Doug Ledford Signed-off-by: Greg Kroah-Hartman

IB/mlx4: Fix ib device initialization error flow

2017-05-20T12:49:44+00:00

commit 99e68909d5aba1861897fe7afc3306c3c81b6de0 upstream. In mlx4_ib_add, procedure mlx4_ib_alloc_eqs is called to allocate EQs. However, in the mlx4_ib_add error flow, procedure mlx4_ib_free_eqs is not called to free the allocated EQs. Fixes: e605b743f33d ("IB/mlx4: Increase the number of vectors (EQs) available for ULPs") Signed-off-by: Jack Morgenstein Signed-off-by: Leon Romanovsky Signed-off-by: Doug Ledford Signed-off-by: Greg Kroah-Hartman