summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2020-07-09bpf: net: Avoid incorrect bpf_sk_reuseport_detach callMartin KaFai Lau2-3/+5
bpf_sk_reuseport_detach is currently called when sk->sk_user_data is not NULL. It is incorrect because sk->sk_user_data may not be managed by the bpf's reuseport_array. It has been reported in [1] that, the bpf_sk_reuseport_detach() which is called from udp_lib_unhash() has corrupted the sk_user_data managed by l2tp. This patch solves it by using another bit (defined as SK_USER_DATA_BPF) of the sk_user_data pointer value. It marks that a sk_user_data is managed/owned by BPF. The patch depends on a PTRMASK introduced in commit f1ff5ce2cd5e ("net, sk_msg: Clear sk_user_data pointer on clone if tagged"). [ Note: sk->sk_user_data is used by bpf's reuseport_array only when a sk is added to the bpf's reuseport_array. i.e. doing setsockopt(SO_REUSEPORT) and having "sk->sk_reuseport == 1" alone will not stop sk->sk_user_data being used by other means. ] [1]: https://lore.kernel.org/netdev/20200706121259.GA20199@katalix.com/ Fixes: 5dc4c4b7d4e8 ("bpf: Introduce BPF_MAP_TYPE_REUSEPORT_SOCKARRAY") Reported-by: James Chapman <jchapman@katalix.com> Reported-by: syzbot+9f092552ba9a5efca5df@syzkaller.appspotmail.com Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: James Chapman <jchapman@katalix.com> Acked-by: James Chapman <jchapman@katalix.com> Link: https://lore.kernel.org/bpf/20200709061110.4019316-1-kafai@fb.com
2020-07-09bpf: net: Avoid copying sk_user_data of reuseport_array during sk_cloneMartin KaFai Lau1-4/+9
It makes little sense for copying sk_user_data of reuseport_array during sk_clone_lock(). This patch reuses the SK_USER_DATA_NOCOPY bit introduced in commit f1ff5ce2cd5e ("net, sk_msg: Clear sk_user_data pointer on clone if tagged"). It is used to mark the sk_user_data is not supposed to be copied to its clone. Although the cloned sk's sk_user_data will not be used/freed in bpf_sk_reuseport_detach(), this change can still allow the cloned sk's sk_user_data to be used by some other means. Freeing the reuseport_array's sk_user_data does not require a rcu grace period. Thus, the existing rcu_assign_sk_user_data_nocopy() is not used. Fixes: 5dc4c4b7d4e8 ("bpf: Introduce BPF_MAP_TYPE_REUSEPORT_SOCKARRAY") Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/20200709061104.4018798-1-kafai@fb.com
2020-07-09Merge branch 'mptcp-introduce-msk-diag-interface'David S. Miller12-23/+444
Paolo Abeni says: ==================== mptcp: introduce msk diag interface This series implements the diag interface for the MPTCP sockets. Since the MPTCP protocol value can't be represented with the current diag uAPI, the first patch introduces an extended attribute allowing user-space to specify lager protocol values. The token APIs are then extended to allow traversing the whole token container. Patch 3 carries the actual diag interface implementation, and later patch bring-in some functional self-tests. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-09selftests/mptcp: add diag interface testsPaolo Abeni3-5/+140
basic functional test, triggering the msk diag interface code. Require appropriate iproute2 support, skip elsewhere. Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-09mptcp: add MPTCP socket diag interfacePaolo Abeni4-0/+192
exposes basic inet socket attribute, plus some MPTCP socket fields comprising PM status and MPTCP-level sequence numbers. Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-09mptcp: add msk interations helperPaolo Abeni2-1/+62
mptcp_token_iter_next() allow traversing all the MPTCP sockets inside the token container belonging to the given network namespace with a quite standard iterator semantic. That will be used by the next patch, but keep the API generic, as we plan to use this later for PM's sake. Additionally export mptcp_token_get_sock(), as it also will be used by the diag module. Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-09inet_diag: support for wider protocol numbersPaolo Abeni3-17/+50
After commit bf9765145b85 ("sock: Make sk_protocol a 16-bit value") the current size of 'sdiag_protocol' is not sufficient to represent the possible protocol values. This change introduces a new inet diag request attribute to let user space specify the relevant protocol number using u32 values. The attribute is parsed by inet diag core on get/dump command and the extended protocol value, if available, is preferred to 'sdiag_protocol' to lookup the diag handler. The parse attributed are exposed to all the diag handlers via the cb->data. Note that inet_diag_dump_one_icsk() is left unmodified, as it will not be used by protocol using the extended attribute. Suggested-by: David S. Miller <davem@davemloft.net> Co-developed-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: Christoph Paasch <cpaasch@apple.com> Acked-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-09ethtool: fix genlmsg_put() failure handling in ethnl_default_dumpit()Michal Kubecek1-14/+13
If the genlmsg_put() call in ethnl_default_dumpit() fails, we bail out without checking if we already have some messages in current skb like we do with ethnl_default_dump_one() failure later. Therefore if existing messages almost fill up the buffer so that there is not enough space even for netlink and genetlink header, we lose all prepared messages and return and error. Rather than duplicating the skb->len check, move the genlmsg_put(), genlmsg_cancel() and genlmsg_end() calls into ethnl_default_dump_one(). This is also more logical as all message composition will be in ethnl_default_dump_one() and only iteration logic will be left in ethnl_default_dumpit(). Fixes: 728480f12442 ("ethtool: default handlers for GET requests") Reported-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-09net: enetc: use eth_broadcast_addr() to assign broadcastXu Wang1-1/+1
This patch is to use eth_broadcast_addr() to assign broadcast address insetad of memset(). Signed-off-by: Xu Wang <vulab@iscas.ac.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-09net_sched: fix a memory leak in atm_tc_init()Cong Wang1-4/+4
When tcf_block_get() fails inside atm_tc_init(), atm_tc_put() is called to release the qdisc p->link.q. But the flow->ref prevents it to do so, as the flow->ref is still zero. Fix this by moving the p->link.ref initialization before tcf_block_get(). Fixes: 6529eaba33f0 ("net: sched: introduce tcf block infractructure") Reported-and-tested-by: syzbot+d411cff6ab29cc2c311b@syzkaller.appspotmail.com Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-09qed: Populate nvm-file attributes while reading nvm config partition.Sudarsana Reddy Kalluru4-9/+21
NVM config file address will be modified when the MBI image is upgraded. Driver would return stale config values if user reads the nvm-config (via ethtool -d) in this state. The fix is to re-populate nvm attribute info while reading the nvm config values/partition. Changes from previous version: ------------------------------- v3: Corrected the formatting in 'Fixes' tag. v2: Added 'Fixes' tag. Fixes: 1ac4329a1cff ("qed: Add configuration information to register dump and debug data") Signed-off-by: Sudarsana Reddy Kalluru <skalluru@marvell.com> Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-09drm/amdgpu: don't do soft recovery if gpu_recovery=0Marek Olšák1-1/+2
It's impossible to debug shader hangs with soft recovery. Signed-off-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2020-07-09drm/radeon: fix double freeTom Rix1-4/+3
clang static analysis flags this error drivers/gpu/drm/radeon/ci_dpm.c:5652:9: warning: Use of memory after it is freed [unix.Malloc] kfree(rdev->pm.dpm.ps[i].ps_priv); ^~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/gpu/drm/radeon/ci_dpm.c:5654:2: warning: Attempt to free released memory [unix.Malloc] kfree(rdev->pm.dpm.ps); ^~~~~~~~~~~~~~~~~~~~~~ problem is reported in ci_dpm_fini, with these code blocks. for (i = 0; i < rdev->pm.dpm.num_ps; i++) { kfree(rdev->pm.dpm.ps[i].ps_priv); } kfree(rdev->pm.dpm.ps); The first free happens in ci_parse_power_table where it cleans up locally on a failure. ci_dpm_fini also does a cleanup. ret = ci_parse_power_table(rdev); if (ret) { ci_dpm_fini(rdev); return ret; } So remove the cleanup in ci_parse_power_table and move the num_ps calculation to inside the loop so ci_dpm_fini will know how many array elements to free. Fixes: cc8dbbb4f62a ("drm/radeon: add dpm support for CI dGPUs (v2)") Signed-off-by: Tom Rix <trix@redhat.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2020-07-09drm/amd/display: add dmcub check on RENOIRAaron Ma1-1/+1
RENOIR loads dmub fw not dmcu, check dmcu only will prevent loading iram, it breaks backlight control. Bug: https://bugzilla.kernel.org/show_bug.cgi?id=208277 Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Aaron Ma <aaron.ma@canonical.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2020-07-09drm/amdgpu: add TMR destory function for pspHuang Rui1-4/+53
TMR is required to be destoried with GFX_CMD_ID_DESTROY_TMR while the system goes to suspend. Otherwise, PSP may return the failure state (0xFFFF007) on Gfx-2-PSP command GFX_CMD_ID_SETUP_TMR after do multiple times suspend/resume. Signed-off-by: Huang Rui <ray.huang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2020-07-09drm/amdgpu: asd function needs to be unloaded in suspend phaseHuang Rui1-0/+6
Unload ASD function in suspend phase. Signed-off-by: Huang Rui <ray.huang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2020-07-09cifs: update internal module version numberSteve French1-1/+1
To 2.28 Signed-off-by: Steve French <stfrench@microsoft.com>
2020-07-09cifs: fix reference leak for tlinkRonnie Sahlberg1-1/+8
Don't leak a reference to tlink during the NOTIFY ioctl Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Aurelien Aptel <aaptel@suse.com> CC: Stable <stable@vger.kernel.org> # v5.6+
2020-07-09arm64/alternatives: don't patch up internal branchesArd Biesheuvel1-14/+2
Commit f7b93d42945c ("arm64/alternatives: use subsections for replacement sequences") moved the alternatives replacement sequences into subsections, in order to keep the as close as possible to the code that they replace. Unfortunately, this broke the logic in branch_insn_requires_update, which assumed that any branch into kernel executable code was a branch that required updating, which is no longer the case now that the code sequences that are patched in are in the same section as the patch site itself. So the only way to discriminate branches that require updating and ones that don't is to check whether the branch targets the replacement sequence itself, and so we can drop the call to kernel_text_address() entirely. Fixes: f7b93d42945c ("arm64/alternatives: use subsections for replacement sequences") Reported-by: Alexandru Elisei <alexandru.elisei@arm.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Alexandru Elisei <alexandru.elisei@arm.com> Link: https://lore.kernel.org/r/20200709125953.30918-1-ardb@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2020-07-09s390/mm: fix huge pte soft dirty copyingJanosch Frank1-1/+1
If the pmd is soft dirty we must mark the pte as soft dirty (and not dirty). This fixes some cases for guest migration with huge page backings. Cc: <stable@vger.kernel.org> # 4.8 Fixes: bc29b7ac1d9f ("s390/mm: clean up pte/pmd encoding") Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Janosch Frank <frankja@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2020-07-09arm64: Add missing sentinel to erratum_1463225Florian Fainelli1-0/+1
When the erratum_1463225 array was introduced a sentinel at the end was missing thus causing a KASAN: global-out-of-bounds in is_affected_midr_range_list on arm64 error. Fixes: a9e821b89daa ("arm64: Add KRYO4XX gold CPU cores to erratum list 1463225 and 1418040") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org> Link: https://lore.kernel.org/linux-arm-kernel/CA+G9fYs3EavpU89-rTQfqQ9GgxAMgMAk7jiiVrfP0yxj5s+Q6g@mail.gmail.com/ Link: https://lore.kernel.org/r/20200709051345.14544-1-f.fainelli@gmail.com Signed-off-by: Will Deacon <will@kernel.org>
2020-07-09io_uring: fix memleak in __io_sqe_files_update()Yang Yingliang1-1/+3
I got a memleak report when doing some fuzz test: BUG: memory leak unreferenced object 0xffff888113e02300 (size 488): comm "syz-executor401", pid 356, jiffies 4294809529 (age 11.954s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ a0 a4 ce 19 81 88 ff ff 60 ce 09 0d 81 88 ff ff ........`....... backtrace: [<00000000129a84ec>] kmem_cache_zalloc include/linux/slab.h:659 [inline] [<00000000129a84ec>] __alloc_file+0x25/0x310 fs/file_table.c:101 [<000000003050ad84>] alloc_empty_file+0x4f/0x120 fs/file_table.c:151 [<000000004d0a41a3>] alloc_file+0x5e/0x550 fs/file_table.c:193 [<000000002cb242f0>] alloc_file_pseudo+0x16a/0x240 fs/file_table.c:233 [<00000000046a4baa>] anon_inode_getfile fs/anon_inodes.c:91 [inline] [<00000000046a4baa>] anon_inode_getfile+0xac/0x1c0 fs/anon_inodes.c:74 [<0000000035beb745>] __do_sys_perf_event_open+0xd4a/0x2680 kernel/events/core.c:11720 [<0000000049009dc7>] do_syscall_64+0x56/0xa0 arch/x86/entry/common.c:359 [<00000000353731ca>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 BUG: memory leak unreferenced object 0xffff8881152dd5e0 (size 16): comm "syz-executor401", pid 356, jiffies 4294809529 (age 11.954s) hex dump (first 16 bytes): 01 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<0000000074caa794>] kmem_cache_zalloc include/linux/slab.h:659 [inline] [<0000000074caa794>] lsm_file_alloc security/security.c:567 [inline] [<0000000074caa794>] security_file_alloc+0x32/0x160 security/security.c:1440 [<00000000c6745ea3>] __alloc_file+0xba/0x310 fs/file_table.c:106 [<000000003050ad84>] alloc_empty_file+0x4f/0x120 fs/file_table.c:151 [<000000004d0a41a3>] alloc_file+0x5e/0x550 fs/file_table.c:193 [<000000002cb242f0>] alloc_file_pseudo+0x16a/0x240 fs/file_table.c:233 [<00000000046a4baa>] anon_inode_getfile fs/anon_inodes.c:91 [inline] [<00000000046a4baa>] anon_inode_getfile+0xac/0x1c0 fs/anon_inodes.c:74 [<0000000035beb745>] __do_sys_perf_event_open+0xd4a/0x2680 kernel/events/core.c:11720 [<0000000049009dc7>] do_syscall_64+0x56/0xa0 arch/x86/entry/common.c:359 [<00000000353731ca>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 If io_sqe_file_register() failed, we need put the file that get by fget() to avoid the memleak. Fixes: c3a31e605620 ("io_uring: add support for IORING_REGISTER_FILES_UPDATE") Cc: stable@vger.kernel.org Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-07-09io_uring: export cq overflow status to userspaceXiaoguang Wang2-2/+10
For those applications which are not willing to use io_uring_enter() to reap and handle cqes, they may completely rely on liburing's io_uring_peek_cqe(), but if cq ring has overflowed, currently because io_uring_peek_cqe() is not aware of this overflow, it won't enter kernel to flush cqes, below test program can reveal this bug: static void test_cq_overflow(struct io_uring *ring) { struct io_uring_cqe *cqe; struct io_uring_sqe *sqe; int issued = 0; int ret = 0; do { sqe = io_uring_get_sqe(ring); if (!sqe) { fprintf(stderr, "get sqe failed\n"); break;; } ret = io_uring_submit(ring); if (ret <= 0) { if (ret != -EBUSY) fprintf(stderr, "sqe submit failed: %d\n", ret); break; } issued++; } while (ret > 0); assert(ret == -EBUSY); printf("issued requests: %d\n", issued); while (issued) { ret = io_uring_peek_cqe(ring, &cqe); if (ret) { if (ret != -EAGAIN) { fprintf(stderr, "peek completion failed: %s\n", strerror(ret)); break; } printf("left requets: %d\n", issued); continue; } io_uring_cqe_seen(ring, cqe); issued--; printf("left requets: %d\n", issued); } } int main(int argc, char *argv[]) { int ret; struct io_uring ring; ret = io_uring_queue_init(16, &ring, 0); if (ret) { fprintf(stderr, "ring setup failed: %d\n", ret); return 1; } test_cq_overflow(&ring); return 0; } To fix this issue, export cq overflow status to userspace by adding new IORING_SQ_CQ_OVERFLOW flag, then helper functions() in liburing, such as io_uring_peek_cqe, can be aware of this cq overflow and do flush accordingly. Signed-off-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-07-09libnvdimm/security: Fix key lookup permissionsDan Williams1-1/+1
As of commit 8c0637e950d6 ("keys: Make the KEY_NEED_* perms an enum rather than a mask") lookup_user_key() needs an explicit declaration of what it wants to do with the key. Add KEY_NEED_SEARCH to fix a warning with the below signature, and fixes the inability to retrieve a key. WARNING: CPU: 15 PID: 6276 at security/keys/permission.c:35 key_task_permission+0xd3/0x140 [..] RIP: 0010:key_task_permission+0xd3/0x140 [..] Call Trace: lookup_user_key+0xeb/0x6b0 ? vsscanf+0x3df/0x840 ? key_validate+0x50/0x50 ? key_default_cmp+0x20/0x20 nvdimm_get_user_key_payload.part.0+0x21/0x110 [libnvdimm] nvdimm_security_store+0x67d/0xb20 [libnvdimm] security_store+0x67/0x1a0 [libnvdimm] kernfs_fop_write+0xcf/0x1c0 vfs_write+0xde/0x1d0 ksys_write+0x68/0xe0 do_syscall_64+0x5c/0xa0 entry_SYSCALL_64_after_hwframe+0x49/0xb3 Fixes: 8c0637e950d6 ("keys: Make the KEY_NEED_* perms an enum rather than a mask") Suggested-by: David Howells <dhowells@redhat.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Ira Weiny <ira.weiny@intel.com> Link: https://lore.kernel.org/r/159297332630.1304143.237026690015653759.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2020-07-09RDMA/mlx5: Set PD pointers for the error flow unwindLeon Romanovsky1-1/+2
ib_pd is accessed internally during destroy of the TIR/TIS, but PD can be not set yet. This leading to the following kernel panic. BUG: kernel NULL pointer dereference, address: 0000000000000074 PGD 8000000079eaa067 P4D 8000000079eaa067 PUD 7ae81067 PMD 0 Oops: 0000 [#1] SMP PTI CPU: 1 PID: 709 Comm: syz-executor.0 Not tainted 5.8.0-rc3 #41 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 RIP: 0010:destroy_raw_packet_qp_tis drivers/infiniband/hw/mlx5/qp.c:1189 [inline] RIP: 0010:destroy_raw_packet_qp drivers/infiniband/hw/mlx5/qp.c:1527 [inline] RIP: 0010:destroy_qp_common+0x2ca/0x4f0 drivers/infiniband/hw/mlx5/qp.c:2397 Code: 00 85 c0 74 2e e8 56 18 55 ff 48 8d b3 28 01 00 00 48 89 ef e8 d7 d3 ff ff 48 8b 43 08 8b b3 c0 01 00 00 48 8b bd a8 0a 00 00 <0f> b7 50 74 e8 0d 6a fe ff e8 28 18 55 ff 49 8d 55 50 4c 89 f1 48 RSP: 0018:ffffc900007bbac8 EFLAGS: 00010293 RAX: 0000000000000000 RBX: ffff88807949e800 RCX: 0000000000000998 RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88807c180140 RBP: ffff88807b50c000 R08: 000000000002d379 R09: ffffc900007bba00 R10: 0000000000000001 R11: 000000000002d358 R12: ffff888076f37000 R13: ffff88807949e9c8 R14: ffffc900007bbe08 R15: ffff888076f37000 FS: 00000000019bf940(0000) GS:ffff88807dd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000074 CR3: 0000000076d68004 CR4: 0000000000360ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: mlx5_ib_create_qp+0xf36/0xf90 drivers/infiniband/hw/mlx5/qp.c:3014 _ib_create_qp drivers/infiniband/core/core_priv.h:333 [inline] create_qp+0x57f/0xd20 drivers/infiniband/core/uverbs_cmd.c:1443 ib_uverbs_create_qp+0xcf/0x100 drivers/infiniband/core/uverbs_cmd.c:1564 ib_uverbs_write+0x5fa/0x780 drivers/infiniband/core/uverbs_main.c:664 __vfs_write+0x3f/0x90 fs/read_write.c:495 vfs_write+0xc7/0x1f0 fs/read_write.c:559 ksys_write+0x5e/0x110 fs/read_write.c:612 do_syscall_64+0x3e/0x70 arch/x86/entry/common.c:359 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x466479 Code: Bad RIP value. RSP: 002b:00007ffd057b62b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 000000000073bf00 RCX: 0000000000466479 RDX: 0000000000000070 RSI: 0000000020000240 RDI: 0000000000000003 RBP: 00000000019bf8fc R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff R13: 0000000000000bf6 R14: 00000000004cb859 R15: 00000000006fefc0 Fixes: 6c41965d647a ("RDMA/mlx5: Don't access ib_qp fields in internal destroy QP path") Link: https://lore.kernel.org/r/20200707110612.882962-4-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-09IB/mlx5: Fix 50G per lane indicationAya Levin1-1/+1
Some released FW versions mistakenly don't set the capability that 50G per lane link-modes are supported for VFs (ptys_extended_ethernet capability bit). Use PTYS.ext_eth_proto_capability instead, as this indication is always accurate. If PTYS.ext_eth_proto_capability is valid (has a non-zero value) conclude that the HCA supports 50G per lane. Otherwise, conclude that the HCA doesn't support 50G per lane. Fixes: 08e8676f1607 ("IB/mlx5: Add support for 50Gbps per lane link modes") Link: https://lore.kernel.org/r/20200707110612.882962-3-leon@kernel.org Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-09bonding: don't need RTNL for ipsec helpersJarod Wilson1-3/+3
The bond_ipsec_* helpers don't need RTNL, and can potentially get called without it being held, so switch from rtnl_dereference() to rcu_dereference() to access bond struct data. Lightly tested with xfrm bonding, no problems found, should address the syzkaller bug referenced below. Reported-by: syzbot+582c98032903dcc04816@syzkaller.appspotmail.com CC: Huy Nguyen <huyn@mellanox.com> CC: Saeed Mahameed <saeedm@mellanox.com> CC: Jay Vosburgh <j.vosburgh@gmail.com> CC: Veaceslav Falico <vfalico@gmail.com> CC: Andy Gospodarek <andy@greyhouse.net> CC: "David S. Miller" <davem@davemloft.net> CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com> CC: Jakub Kicinski <kuba@kernel.org> CC: Steffen Klassert <steffen.klassert@secunet.com> CC: Herbert Xu <herbert@gondor.apana.org.au> CC: netdev@vger.kernel.org CC: intel-wired-lan@lists.osuosl.org Signed-off-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-09selftests: kmod: Add module address visibility testKees Cook1-0/+36
Make sure we don't regress the CAP_SYSLOG behavior of the module address visibility via /proc/modules nor /sys/module/*/sections/*. Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Kees Cook <keescook@chromium.org>
2020-07-09bpf: Check correct cred for CAP_SYSLOG in bpf_dump_raw_ok()Kees Cook3-19/+24
When evaluating access control over kallsyms visibility, credentials at open() time need to be used, not the "current" creds (though in BPF's case, this has likely always been the same). Plumb access to associated file->f_cred down through bpf_dump_raw_ok() and its callers now that kallsysm_show_value() has been refactored to take struct cred. Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: bpf@vger.kernel.org Cc: stable@vger.kernel.org Fixes: 7105e828c087 ("bpf: allow for correlation of maps and helpers in dump") Signed-off-by: Kees Cook <keescook@chromium.org>
2020-07-09kprobes: Do not expose probe addresses to non-CAP_SYSLOGKees Cook1-2/+2
The kprobe show() functions were using "current"'s creds instead of the file opener's creds for kallsyms visibility. Fix to use seq_file->file->f_cred. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: stable@vger.kernel.org Fixes: 81365a947de4 ("kprobes: Show address of kprobes if kallsyms does") Fixes: ffb9bd68ebdb ("kprobes: Show blacklist addresses as same as kallsyms does") Signed-off-by: Kees Cook <keescook@chromium.org>
2020-07-09module: Do not expose section addresses to non-CAP_SYSLOGKees Cook1-3/+3
The printing of section addresses in /sys/module/*/sections/* was not using the correct credentials to evaluate visibility. Before: # cat /sys/module/*/sections/.*text 0xffffffffc0458000 ... # capsh --drop=CAP_SYSLOG -- -c "cat /sys/module/*/sections/.*text" 0xffffffffc0458000 ... After: # cat /sys/module/*/sections/*.text 0xffffffffc0458000 ... # capsh --drop=CAP_SYSLOG -- -c "cat /sys/module/*/sections/.*text" 0x0000000000000000 ... Additionally replaces the existing (safe) /proc/modules check with file->f_cred for consistency. Reported-by: Dominik Czarnota <dominik.czarnota@trailofbits.com> Fixes: be71eda5383f ("module: Fix display of wrong module .text address") Cc: stable@vger.kernel.org Tested-by: Jessica Yu <jeyu@kernel.org> Acked-by: Jessica Yu <jeyu@kernel.org> Signed-off-by: Kees Cook <keescook@chromium.org>
2020-07-09module: Refactor section attr into bin attributeKees Cook1-21/+24
In order to gain access to the open file's f_cred for kallsym visibility permission checks, refactor the module section attributes to use the bin_attribute instead of attribute interface. Additionally removes the redundant "name" struct member. Cc: stable@vger.kernel.org Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Tested-by: Jessica Yu <jeyu@kernel.org> Acked-by: Jessica Yu <jeyu@kernel.org> Signed-off-by: Kees Cook <keescook@chromium.org>
2020-07-09kallsyms: Refactor kallsyms_show_value() to take credKees Cook5-12/+18
In order to perform future tests against the cred saved during open(), switch kallsyms_show_value() to operate on a cred, and have all current callers pass current_cred(). This makes it very obvious where callers are checking the wrong credential in their "read" contexts. These will be fixed in the coming patches. Additionally switch return value to bool, since it is always used as a direct permission check, not a 0-on-success, negative-on-error style function return. Cc: stable@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org>
2020-07-09cxgb4: fix all-mask IP address comparisonRahul Lakkireddy1-5/+5
Convert all-mask IP address to Big Endian, instead, for comparison. Fixes: f286dd8eaad5 ("cxgb4: use correct type for all-mask IP address comparison") Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-09dt-bindings: dp83869: Fix the type of deviceFabio Estevam1-1/+1
DP83869 is an Ethernet PHY, not a charger, so fix the documentation accordingly. Fixes: 4d66c56f7efe ("dt-bindings: net: dp83869: Add TI dp83869 phy") Signed-off-by: Fabio Estevam <festevam@gmail.com> Acked-by: Dan Murphy <dmurphy@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-09dt-bindings: dp83867: Fix the type of deviceFabio Estevam1-1/+1
DP83867 is an Ethernet PHY, not a charger, so fix the documentation accordingly. Fixes: 74ac28f16486 ("dt-bindings: dp83867: Convert DP83867 to yaml") Signed-off-by: Fabio Estevam <festevam@gmail.com> Acked-by: Dan Murphy <dmurphy@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-09tipc: fix retransmission on unicast linksHamish Martin1-8/+18
A scenario has been observed where a 'bc_init' message for a link is not retransmitted if it fails to be received by the peer. This leads to the peer never establishing the link fully and it discarding all other data received on the link. In this scenario the message is lost in transit to the peer. The issue is traced to the 'nxt_retr' field of the skb not being initialised for links that aren't a bc_sndlink. This leads to the comparison in tipc_link_advance_transmq() that gates whether to attempt retransmission of a message performing in an undesirable way. Depending on the relative value of 'jiffies', this comparison: time_before(jiffies, TIPC_SKB_CB(skb)->nxt_retr) may return true or false given that 'nxt_retr' remains at the uninitialised value of 0 for non bc_sndlinks. This is most noticeable shortly after boot when jiffies is initialised to a high value (to flush out rollover bugs) and we compare a jiffies of, say, 4294940189 to zero. In that case time_before returns 'true' leading to the skb not being retransmitted. The fix is to ensure that all skbs have a valid 'nxt_retr' time set for them and this is achieved by refactoring the setting of this value into a central function. With this fix, transmission losses of 'bc_init' messages do not stall the link establishment forever because the 'bc_init' message is retransmitted and the link eventually establishes correctly. Fixes: 382f598fb66b ("tipc: reduce duplicate packets for unicast traffic") Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: Hamish Martin <hamish.martin@alliedtelesis.co.nz> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-09bonding: deal with xfrm state in all modes and add more error-checkingJarod Wilson1-14/+25
It's possible that device removal happens when the bond is in non-AB mode, and addition happens in AB mode, so bond_ipsec_del_sa() never gets called, which leaves security associations in an odd state if bond_ipsec_add_sa() then gets called after switching the bond into AB. Just call add and delete universally for all modes to keep things consistent. However, it's also possible that this code gets called when the system is shutting down, and the xfrm subsystem has already been disconnected from the bond device, so we need to do some error-checking and bail, lest we hit a null ptr deref. Fixes: a3b658cfb664 ("bonding: allow xfrm offload setup post-module-load") CC: Huy Nguyen <huyn@mellanox.com> CC: Saeed Mahameed <saeedm@mellanox.com> CC: Jay Vosburgh <j.vosburgh@gmail.com> CC: Veaceslav Falico <vfalico@gmail.com> CC: Andy Gospodarek <andy@greyhouse.net> CC: "David S. Miller" <davem@davemloft.net> CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com> CC: Jakub Kicinski <kuba@kernel.org> CC: Steffen Klassert <steffen.klassert@secunet.com> CC: Herbert Xu <herbert@gondor.apana.org.au> CC: netdev@vger.kernel.org CC: intel-wired-lan@lists.osuosl.org Signed-off-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-09Merge branch 'RTL8366RB-tagging-support'David S. Miller6-23/+149
Linus Walleij says: ==================== RTL8366RB tagging support This patch set adds DSA tagging support to the RTL8366RB DSA driver. There is a minor performance improvement in the tag parser compared to the previous patch set and the review tags have been collected. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-09net: dsa: rtl8366rb: Support the CPU DSA tagLinus Walleij2-23/+9
This activates the support to use the CPU tag to properly direct ingress traffic to the right port. Bit 15 in register RTL8368RB_CPU_CTRL_REG can be set to 1 to disable the insertion of the CPU tag which is what the code currently does. The bit 15 define calls this setting RTL8368RB_CPU_INSTAG which is confusing since the inverse meaning is implied: programmers may think that setting this bit to 1 will *enable* inserting the tag rather than disabling it, so rename this setting in bit 15 to RTL8368RB_CPU_NO_TAG which is more to the point. After this e.g. ping works out-of-the-box with the RTL8366RB. Cc: DENG Qingfang <dqfext@gmail.com> Cc: Mauri Sandberg <sandberg@mailfence.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-09net: dsa: tag_rtl4_a: Implement Realtek 4 byte A tagLinus Walleij4-0/+140
This implements the known parts of the Realtek 4 byte tag protocol version 0xA, as found in the RTL8366RB DSA switch. It is designated as protocol version 0xA as a different Realtek 4 byte tag format with protocol version 0x9 is known to exist in the Realtek RTL8306 chips. The tag and switch chip lacks public documentation, so the tag format has been reverse-engineered from packet dumps. As only ingress traffic has been available for analysis an egress tag has not been possible to develop (even using educated guesses about bit fields) so this is as far as it gets. It is not known if the switch even supports egress tagging. Excessive attempts to figure out the egress tag format was made. When nothing else worked, I just tried all bit combinations with 0xannp where a is protocol and p is port. I looped through all values several times trying to get a response from ping, without any positive result. Using just these ingress tags however, the switch functionality is vastly improved and the packets find their way into the destination port without any tricky VLAN configuration. On the D-Link DIR-685 the LAN ports now come up and respond to ping without any command line configuration so this is a real improvement for users. Egress packets need to be restricted to the proper target ports using VLAN, which the RTL8366RB DSA switch driver already sets up. Cc: DENG Qingfang <dqfext@gmail.com> Cc: Mauri Sandberg <sandberg@mailfence.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-09net/mlx5: Added support for 100Gbps per lane link modesMeir Lichtinger3-1/+26
This patch exposes new link modes using 100Gbps per lane, including 100G, 200G and 400G modes. Signed-off-by: Meir Lichtinger <meirl@mellanox.com> Reviewed-by: Aya Levin <ayal@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-09ethtool: Add support for 100Gbps per lane link modesMeir Lichtinger4-1/+61
Define 100G, 200G and 400G link modes using 100Gbps per lane LR, ER and FR are defined as a single link mode because they are using same technology and by design are fully interoperable. EEPROM content indicates if the module is LR, ER, or FR, and the user space ethtool decoder is planned to support decoding these modes in the EEPROM. Signed-off-by: Meir Lichtinger <meirl@mellanox.com> CC: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Aya Levin <ayal@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-09l2tp: remove skb_dst_set() from l2tp_xmit_skb()Xin Long1-4/+1
In the tx path of l2tp, l2tp_xmit_skb() calls skb_dst_set() to set skb's dst. However, it will eventually call inet6_csk_xmit() or ip_queue_xmit() where skb's dst will be overwritten by: skb_dst_set_noref(skb, dst); without releasing the old dst in skb. Then it causes dst/dev refcnt leak: unregister_netdevice: waiting for eth0 to become free. Usage count = 1 This can be reproduced by simply running: # modprobe l2tp_eth && modprobe l2tp_ip # sh ./tools/testing/selftests/net/l2tp.sh So before going to inet6_csk_xmit() or ip_queue_xmit(), skb's dst should be dropped. This patch is to fix it by removing skb_dst_set() from l2tp_xmit_skb() and moving skb_dst_drop() into l2tp_xmit_core(). Fixes: 3557baabf280 ("[L2TP]: PPP over L2TP driver core") Reported-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: James Chapman <jchapman@katalix.com> Tested-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-09Merge branch 'bnxt_en-Driver-update-for-net-next'David S. Miller4-69/+246
Michael Chan says: ==================== bnxt_en: Driver update for net-next. This patchset implements ethtool -X to setup user-defined RSS indirection table. The new infrastructure also allows the proper logical ring index to be used to populate the RSS indirection when queried by ethtool -x. Prior to these patches, we were incorrectly populating the output of ethtool -x with internal ring IDs which would make no sense to the user. The last 2 patches add some cleanups to the VLAN acceleration logic and check the firmware capabilities before allowing VLAN acceleration offloads. v4: Move bnxt_get_rxfh_indir_size() fix to a new patch #2. Modify patch #7 to revert RSS map to default only when necessary. v3: Use ALIGN() in patch 5. Add warning messages in patch 6. v2: Some RSS indirection table changes requested by Jakub Kicinski. ==================== Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-09bnxt_en: allow firmware to disable VLAN offloadsEdwin Peer2-3/+22
Bare-metal use cases require giving firmware and the embedded application processor control over VLAN offloads. The driver should not attempt to override or utilize this feature in such scenarios since it will not work as expected. Signed-off-by: Edwin Peer <edwin.peer@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-09bnxt_en: clean up VLAN feature bit handlingEdwin Peer2-21/+18
The hardware VLAN offload feature on our NIC does not have separate knobs for handling customer and service tags on RX. Either offloading of both must be enabled or both must be disabled. Introduce definitions for the combined feature set in order to clean up the code and make this constraint more clear. Technically these features can be separately enabled on TX, however, since the default is to turn both on, the combined TX feature set is also introduced for code consistency. Signed-off-by: Edwin Peer <edwin.peer@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-09bnxt_en: Implement ethtool -X to set indirection table.Michael Chan2-1/+67
With the new infrastructure in place, we can now support the setting of the indirection table from ethtool. When changing channels, in a rare case that firmware cannot reserve the rings that were promised, we will still try to keep the RSS map and only revert to default when absolutely necessary. v4: Revert RSS map to default during ring change only when absolutely necessary. v3: Add warning messages when firmware cannot reserve the requested RX rings, and when the RSS table entries have to change to default. v2: When changing channels, if the RSS table size changes and RSS map is non-default, return error. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-09bnxt_en: Return correct RSS indirection table entries to ethtool -x.Michael Chan1-4/+5
Now that we have the logical indirection table, we can return these proper logical indices directly to ethtool -x instead of the physical IDs. Reported-by: Jakub Kicinski <kicinski@fb.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-09bnxt_en: Fill HW RSS table from the RSS logical indirection table.Michael Chan1-36/+52
Now that we have the logical table, we can fill the HW RSS table using the logical table's entries and converting them to the HW specific format. Re-initialize the logical table to standard distribution if the number of RX rings changes during ring reservation. v4: Use bnxt_get_rxfh_indir_size() to get the RSS table size. v2: Use ALIGN() to roundup the RSS table size. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>