kernel/linux.git - Linux kernel stable tree (mirror)

Age	Commit message (Collapse)	Author	Files	Lines
2018-12-04	iomap: partially revert 4721a601099 (simulated directio short read on EFAULT)	Darrick J. Wong	1	-9/+0
	In commit 4721a601099, we tried to fix a problem wherein directio reads into a splice pipe will bounce EFAULT/EAGAIN all the way out to userspace by simulating a zero-byte short read. This happens because some directio read implementations (xfs) will call bio_iov_iter_get_pages to grab pipe buffer pages and issue asynchronous reads, but as soon as we run out of pipe buffers that _get_pages call returns EFAULT, which the splice code translates to EAGAIN and bounces out to userspace. In that commit, the iomap code catches the EFAULT and simulates a zero-byte read, but that causes assertion errors on regular splice reads because xfs doesn't allow short directio reads. This causes infinite splice() loops and assertion failures on generic/095 on overlayfs because xfs only permit total success or total failure of a directio operation. The underlying issue in the pipe splice code has now been fixed by changing the pipe splice loop to avoid avoid reading more data than there is space in the pipe. Therefore, it's no longer necessary to simulate the short directio, so remove the hack from iomap. Fixes: 4721a601099 ("iomap: dio data corruption and spurious errors when pipes fill") Reported-by: Murphy Zhou <jencce.kernel@gmail.com> Ranted-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-12-04	Merge branch 'parisc-4.20-4' of ↵	Linus Torvalds	1	-0/+7
	git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux Pull parisc fix from Helge Deller: "On parisc, use -ffunction-sections compiler option when building 32-bit kernel modules to avoid sysfs-warnings when loading such modules. This got broken with kernel v4.18" * 'parisc-4.20-4' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux: parisc: Enable -ffunction-sections for modules on 32-bit kernel
2018-12-04	splice: don't read more than available pipe space	Darrick J. Wong	1	-1/+6
	In commit 4721a601099, we tried to fix a problem wherein directio reads into a splice pipe will bounce EFAULT/EAGAIN all the way out to userspace by simulating a zero-byte short read. This happens because some directio read implementations (xfs) will call bio_iov_iter_get_pages to grab pipe buffer pages and issue asynchronous reads, but as soon as we run out of pipe buffers that _get_pages call returns EFAULT, which the splice code translates to EAGAIN and bounces out to userspace. In that commit, the iomap code catches the EFAULT and simulates a zero-byte read, but that causes assertion errors on regular splice reads because xfs doesn't allow short directio reads. The brokenness is compounded by splice_direct_to_actor immediately bailing on do_splice_to returning <= 0 without ever calling ->actor (which empties out the pipe), so if userspace calls back we'll EFAULT again on the full pipe, and nothing ever gets copied. Therefore, teach splice_direct_to_actor to clamp its requests to the amount of free space in the pipe and remove the simulated short read from the iomap directio code. Fixes: 4721a601099 ("iomap: dio data corruption and spurious errors when pipes fill") Reported-by: Murphy Zhou <jencce.kernel@gmail.com> Ranted-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-12-04	vfs: allow some remap flags to be passed to vfs_clone_file_range	Darrick J. Wong	1	-1/+1
	In overlayfs, ovl_remap_file_range calls vfs_clone_file_range on the lower filesystem's inode, passing through whatever remap flags it got from its caller. Since vfs_copy_file_range first tries a filesystem's remap function with REMAP_FILE_CAN_SHORTEN, this can get passed through to the second vfs_copy_file_range call, and this isn't an issue. Change the WARN_ON to look only for the DEDUP flag. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2018-12-04	xfs: fix inverted return from xfs_btree_sblock_verify_crc	Eric Sandeen	1	-1/+1
	xfs_btree_sblock_verify_crc is a bool so should not be returning a failaddr_t; worse, if xfs_log_check_lsn fails it returns __this_address which looks like a boolean true (i.e. success) to the caller. (interestingly xfs_btree_lblock_verify_crc doesn't have the issue) Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-12-04	xfs: fix PAGE_MASK usage in xfs_free_file_space	Darrick J. Wong	1	-2/+2
	In commit e53c4b598, I tried to teach xfs to force writeback when we fzero/fpunch right up to EOF so that if EOF is in the middle of a page, the post-EOF part of the page gets zeroed before we return to userspace. Unfortunately, I missed the part where PAGE_MASK is ~(PAGE_SIZE - 1), which means that we totally fail to zero if we're fpunching and EOF is within the first page. Worse yet, the same PAGE_MASK thinko plagues the filemap_write_and_wait_range call, so we'd initiate writeback of the entire file, which (mostly) masked the thinko. Drop the tricky PAGE_MASK and replace it with correct usage of PAGE_SIZE and the proper rounding macros. Fixes: e53c4b598 ("xfs: ensure post-EOF zeroing happens after zeroing part of a file") Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2018-12-04	phy: Revert toggling reset changes.	David S. Miller	2	-14/+5
	This reverts: ef1b5bf506b1 ("net: phy: Fix not to call phy_resume() if PHY is not attached") 8c85f4b81296 ("net: phy: micrel: add toggling phy reset if PHY is not attached") Andrew Lunn informs me that there are alternative efforts underway to fix this more properly. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-04	Merge branch 'for-linus' of ↵	Linus Torvalds	12	-52/+48
	git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input Pull input updates from Dmitry Torokhov: "Mostly new IDs for Elan/Synaptics touchpads, plus a few small fixups" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: omap-keypad - fix keyboard debounce configuration Input: xpad - quirk all PDP Xbox One gamepads Input: synaptics - enable SMBus for HP 15-ay000 Input: synaptics - add PNP ID for ThinkPad P50 to SMBus Input: elan_i2c - add ACPI ID for Lenovo IdeaPad 330-15ARR Input: elan_i2c - add support for ELAN0621 touchpad Input: hyper-v - fix wakeup from suspend-to-idle Input: atkbd - clean up indentation issue Input: st1232 - convert to SPDX identifiers Input: migor_ts - convert to SPDX identifiers Input: dt-bindings - fix a typo in file input-reset.txt Input: cros_ec_keyb - fix button/switch capability reports Input: elan_i2c - add ELAN0620 to the ACPI table Input: matrix_keypad - check for errors from of_get_named_gpio()
2018-12-04	Merge branch 'mlxsw-Add-one-armed-router-support'	David S. Miller	8	-9/+287
	Ido Schimmel says: ==================== mlxsw: Add one-armed router support Up until now, when a packet was routed by the ASIC through the same router interface (RIF) from which it ingressed from, the ASIC passed the sole copy of the packet to the kernel. This allowed the kernel to route the packet and also potentially generate an ICMP redirect. There are scenarios (e.g., "one-armed router") where packets are intentionally routed this way and are therefore not deemed as exceptions. In such scenarios the current method of trapping packets to the CPU is problematic, as it results in major packet loss. This patchset solves the problem by having the ASIC forward the packet, but also send a copy to the CPU, which gives the kernel the opportunity to generate required exceptions. To prevent the kernel from forwarding such packets again, the driver marks them with 'offload_l3_fwd_mark', which causes the kernel to consume them in ip{,6}_forward_finish(). Patch #1 renames 'offload_mr_fwd_mark' to 'offload_l3_fwd_mark'. When set, the field indicates that a packet was already forwarded in L3 (unicast / multicast) by a capable device. Patch #2 teaches the kernel to consume unicast packets that have 'offload_l3_fwd_mark' set. Patch #3 changes mlxsw to mirror loopbacked (iRIF == eRIF) packets, instead of trapping them. Patch #4 adds a test case for above mentioned scenario. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-04	selftests: mlxsw: Add one-armed router test	Ido Schimmel	1	-0/+259
	Construct a "one-armed router" topology and test that packets are forwarded by the ASIC and that a copy of the packet is sent to the kernel, which does not forward the packet again. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-04	mlxsw: spectrum: Mirror loopbacked packets instead of trapping them	Ido Schimmel	2	-1/+4
	When the ASIC detects that a unicast packet is routed through the same router interface (RIF) from which it ingressed (iRIF == eRIF), it raises a trap called loopback error (LBERROR). Thus far, this trap was configured to send a sole copy of the packet to the CPU so that ICMP redirect packets could be potentially generated by the kernel. This is problematic as the CPU cannot forward packets at 3.2Tb/s and there are scenarios (e.g., "one-armed router") where iRIF == eRIF is not an exception. Solve this by changing the trap to send a copy of the packet to the CPU. To prevent the kernel from forwarding the packet again, it is marked with 'offload_l3_fwd_mark'. The trap is configured in a trap group of its own with a dedicated policer in order not to prevent packets trapped by other traps from reaching the CPU. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-04	net: Do not route unicast IP packets twice	Ido Schimmel	2	-0/+14
	Packets marked with 'offload_l3_fwd_mark' were already forwarded by a capable device and should not be forwarded again by the kernel. Therefore, have the kernel consume them. The check is performed in ip{,6}_forward_finish() in order to allow the kernel to process such packets in ip{,6}_forward() and generate required exceptions. For example, ICMP redirects. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-04	skbuff: Rename 'offload_mr_fwd_mark' to 'offload_l3_fwd_mark'	Ido Schimmel	4	-8/+10
	Commit abf4bb6b63d0 ("skbuff: Add the offload_mr_fwd_mark field") added the 'offload_mr_fwd_mark' field to indicate that a packet has already undergone L3 multicast routing by a capable device. The field is used to prevent the kernel from forwarding a packet through a netdev through which the device has already forwarded the packet. Currently, no unicast packet is routed by both the device and the kernel, but this is about to change by subsequent patches and we need to be able to mark such packets, so that they will no be forwarded twice. Instead of adding yet another field to 'struct sk_buff', we can just rename 'offload_mr_fwd_mark' to 'offload_l3_fwd_mark', as a packet either has a multicast or a unicast destination IP. While at it, add a comment about both 'offload_fwd_mark' and 'offload_l3_fwd_mark'. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-04	Merge branch 'bpf-verifier-resilience'	Daniel Borkmann	2	-16/+91
	Alexei Starovoitov says: ==================== Three patches to improve verifier ability to handle pathological bpf programs with a lot of branches: - make sure prog_load syscall can be aborted - improve branch taken analysis - introduce per-insn complexity limit for unprivileged programs ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-12-04	bpf: add per-insn complexity limit	Alexei Starovoitov	1	-1/+6
	malicious bpf program may try to force the verifier to remember a lot of distinct verifier states. Put a limit to number of per-insn 'struct bpf_verifier_state'. Note that hitting the limit doesn't reject the program. It potentially makes the verifier do more steps to analyze the program. It means that malicious programs will hit BPF_COMPLEXITY_LIMIT_INSNS sooner instead of spending cpu time walking long link list. The limit of BPF_COMPLEXITY_LIMIT_STATES==64 affects cilium progs with slight increase in number of "steps" it takes to successfully verify the programs: before after bpf_lb-DLB_L3.o 1940 1940 bpf_lb-DLB_L4.o 3089 3089 bpf_lb-DUNKNOWN.o 1065 1065 bpf_lxc-DDROP_ALL.o 28052 \| 28162 bpf_lxc-DUNKNOWN.o 35487 \| 35541 bpf_netdev.o 10864 10864 bpf_overlay.o 6643 6643 bpf_lcx_jit.o 38437 38437 But it also makes malicious program to be rejected in 0.4 seconds vs 6.5 Hence apply this limit to unprivileged programs only. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Edward Cree <ecree@solarflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-12-04	bpf: improve verifier branch analysis	Alexei Starovoitov	2	-15/+82
	pathological bpf programs may try to force verifier to explode in the number of branch states: 20: (d5) if r1 s<= 0x24000028 goto pc+0 21: (b5) if r0 <= 0xe1fa20 goto pc+2 22: (d5) if r1 s<= 0x7e goto pc+0 23: (b5) if r0 <= 0xe880e000 goto pc+0 24: (c5) if r0 s< 0x2100ecf4 goto pc+0 25: (d5) if r1 s<= 0xe880e000 goto pc+1 26: (c5) if r0 s< 0xf4041810 goto pc+0 27: (d5) if r1 s<= 0x1e007e goto pc+0 28: (b5) if r0 <= 0xe86be000 goto pc+0 29: (07) r0 += 16614 30: (c5) if r0 s< 0x6d0020da goto pc+0 31: (35) if r0 >= 0x2100ecf4 goto pc+0 Teach verifier to recognize always taken and always not taken branches. This analysis is already done for == and != comparison. Expand it to all other branches. It also helps real bpf programs to be verified faster: before after bpf_lb-DLB_L3.o 2003 1940 bpf_lb-DLB_L4.o 3173 3089 bpf_lb-DUNKNOWN.o 1080 1065 bpf_lxc-DDROP_ALL.o 29584 28052 bpf_lxc-DUNKNOWN.o 36916 35487 bpf_netdev.o 11188 10864 bpf_overlay.o 6679 6643 bpf_lcx_jit.o 39555 38437 Reported-by: Anatoly Trosinenko <anatoly.trosinenko@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Edward Cree <ecree@solarflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-12-04	bpf: check pending signals while verifying programs	Alexei Starovoitov	1	-0/+3
	Malicious user space may try to force the verifier to use as much cpu time and memory as possible. Hence check for pending signals while verifying the program. Note that suspend of sys_bpf(PROG_LOAD) syscall will lead to EAGAIN, since the kernel has to release the resources used for program verification. Reported-by: Anatoly Trosinenko <anatoly.trosinenko@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Edward Cree <ecree@solarflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-12-04	Merge branch 'prog_test_run-improvement'	Alexei Starovoitov	7	-7/+120
	Lorenz Bauer says: ==================== Right now, there is no safe way to use BPF_PROG_TEST_RUN with data_out. This is because bpf_test_finish copies the output buffer to user space without checking its size. This can lead to the kernel overwriting data in user space after the buffer if xdp_adjust_head and friends are in play. Thanks to everyone for their advice and patience with this patch set! Changes in v5: * Fix up libbpf.map Changes in v4: * Document bpf_prog_test_run and bpf_prog_test_run_xattr * Use struct bpf_prog_test_run_attr for return values Changes in v3: * Introduce bpf_prog_test_run_xattr instead of modifying the existing function Changes in v2: * Make the syscall return ENOSPC if data_size_out is too small * Make bpf_prog_test_run return EINVAL if size_out is missing * Document the new behaviour of data_size_out ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-12-04	selftests: add a test for bpf_prog_test_run_xattr	Lorenz Bauer	1	-1/+54
	Make sure that bpf_prog_test_run_xattr returns the correct length and that the kernel respects the output size hint. Also check that errno indicates ENOSPC if there is a short output buffer given. Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-12-04	libbpf: add bpf_prog_test_run_xattr	Lorenz Bauer	3	-0/+43
	Add a new function, which encourages safe usage of the test interface. bpf_prog_test_run continues to work as before, but should be considered unsafe. Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-12-04	tools: sync uapi/linux/bpf.h	Lorenz Bauer	1	-2/+5
	Pull changes from "bpf: respect size hint to BPF_PROG_TEST_RUN if present". Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-12-04	bpf: respect size hint to BPF_PROG_TEST_RUN if present	Lorenz Bauer	2	-4/+18
	Use data_size_out as a size hint when copying test output to user space. ENOSPC is returned if the output buffer is too small. Callers which so far did not set data_size_out are not affected. Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-12-04	Revert "exec: make de_thread() freezable"	Rafael J. Wysocki	1	-3/+2
	Revert commit c22397888f1e "exec: make de_thread() freezable" as requested by Ingo Molnar: "So there's a new regression in v4.20-rc4, my desktop produces this lockdep splat: [ 1772.588771] WARNING: pkexec/4633 still has locks held! [ 1772.588773] 4.20.0-rc4-custom-00213-g93a49841322b #1 Not tainted [ 1772.588775] ------------------------------------ [ 1772.588776] 1 lock held by pkexec/4633: [ 1772.588778] #0: 00000000ed85fbf8 (&sig->cred_guard_mutex){+.+.}, at: prepare_bprm_creds+0x2a/0x70 [ 1772.588786] stack backtrace: [ 1772.588789] CPU: 7 PID: 4633 Comm: pkexec Not tainted 4.20.0-rc4-custom-00213-g93a49841322b #1 [ 1772.588792] Call Trace: [ 1772.588800] dump_stack+0x85/0xcb [ 1772.588803] flush_old_exec+0x116/0x890 [ 1772.588807] ? load_elf_phdrs+0x72/0xb0 [ 1772.588809] load_elf_binary+0x291/0x1620 [ 1772.588815] ? sched_clock+0x5/0x10 [ 1772.588817] ? search_binary_handler+0x6d/0x240 [ 1772.588820] search_binary_handler+0x80/0x240 [ 1772.588823] load_script+0x201/0x220 [ 1772.588825] search_binary_handler+0x80/0x240 [ 1772.588828] __do_execve_file.isra.32+0x7d2/0xa60 [ 1772.588832] ? strncpy_from_user+0x40/0x180 [ 1772.588835] __x64_sys_execve+0x34/0x40 [ 1772.588838] do_syscall_64+0x60/0x1c0 The warning gets triggered by an ancient lockdep check in the freezer: (gdb) list 0xffffffff812ece06 0xffffffff812ece06 is in flush_old_exec (./include/linux/freezer.h:57). 52 DO NOT ADD ANY NEW CALLERS OF THIS FUNCTION 53 * If try_to_freeze causes a lockdep warning it means the caller may deadlock 54 */ 55 static inline bool try_to_freeze_unsafe(void) 56 { 57 might_sleep(); 58 if (likely(!freezing(current))) 59 return false; 60 return __refrigerator(false); 61 } I reviewed the ->cred_guard_mutex code, and the mutex is held across all of exec() - and we always did this. But there's this recent -rc4 commit: > Chanho Min (1): > exec: make de_thread() freezable c22397888f1e: exec: make de_thread() freezable I believe this commit is bogus, you cannot call try_to_freeze() from de_thread(), because it's holding the ->cred_guard_mutex." Reported-by: Ingo Molnar <mingo@kernel.org> Tested-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-12-04	btrfs: tree-checker: Don't check max block group size as current max chunk ↵	Qu Wenruo	1	-5/+3
	size limit is unreliable [BUG] A completely valid btrfs will refuse to mount, with error message like: BTRFS critical (device sdb2): corrupt leaf: root=2 block=239681536 slot=172 \ bg_start=12018974720 bg_len=10888413184, invalid block group size, \ have 10888413184 expect (0, 10737418240] This has been reported several times as the 4.19 kernel is now being used. The filesystem refuses to mount, but is otherwise ok and booting 4.18 is a workaround. Btrfs check returns no error, and all kernels used on this fs is later than 2011, which should all have the 10G size limit commit. [CAUSE] For a 12 devices btrfs, we could allocate a chunk larger than 10G due to stripe stripe bump up. __btrfs_alloc_chunk() \|- max_stripe_size = 1G \|- max_chunk_size = 10G \|- data_stripe = 11 \|- if (1G * 11 > 10G) { stripe_size = 976128930; stripe_size = round_up(976128930, SZ_16M) = 989855744 However the final stripe_size (989855744) * 11 = 10888413184, which is still larger than 10G. [FIX] For the comprehensive check, we need to do the full check at chunk read time, and rely on bg <-> chunk mapping to do the check. We could just skip the length check for now. Fixes: fce466eab7ac ("btrfs: tree-checker: Verify block_group_item") Cc: stable@vger.kernel.org # v4.19+ Reported-by: Wang Yugui <wangyugui@e16-tech.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-12-04	drm/fb-helper: Fix typo in parameter description	Wei Yongjun	1	-1/+1
	Fix typo in parameter description. Fixes: 4be9bd10e22d ("drm/fb_helper: Allow leaking fbdev smem_start") Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Acked-by: Neil Armstrong <narmstrong@baylibre.com> Signed-off-by: Neil Armstrong <narmstrong@baylibre.com> Link: https://patchwork.freedesktop.org/patch/msgid/1543905135-35293-1-git-send-email-weiyongjun1@huawei.com
2018-12-04	kprobes/x86: Fix instruction patching corruption when copying more than one ↵	Masami Hiramatsu	1	-1/+1
	RIP-relative instruction After copy_optimized_instructions() copies several instructions to the working buffer it tries to fix up the real RIP address, but it adjusts the RIP-relative instruction with an incorrect RIP address for the 2nd and subsequent instructions due to a bug in the logic. This will break the kernel pretty badly (with likely outcomes such as a kernel freeze, a crash, or worse) because probed instructions can refer to the wrong data. For example putting kprobes on cpumask_next() typically hits this bug. cpumask_next() is normally like below if CONFIG_CPUMASK_OFFSTACK=y (in this case nr_cpumask_bits is an alias of nr_cpu_ids): <cpumask_next>: 48 89 f0 mov %rsi,%rax 8b 35 7b fb e2 00 mov 0xe2fb7b(%rip),%esi # ffffffff82db9e64 <nr_cpu_ids> 55 push %rbp ... If we put a kprobe on it and it gets jump-optimized, it gets patched by the kprobes code like this: <cpumask_next>: e9 95 7d 07 1e jmpq 0xffffffffa000207a 7b fb jnp 0xffffffff81f8a2e2 <cpumask_next+2> e2 00 loop 0xffffffff81f8a2e9 <cpumask_next+9> 55 push %rbp This shows that the first two MOV instructions were copied to a trampoline buffer at 0xffffffffa000207a. Here is the disassembled result of the trampoline, skipping the optprobe template instructions: # Dump of assembly code from 0xffffffffa000207a to 0xffffffffa00020ea: 54 push %rsp ... 48 83 c4 08 add $0x8,%rsp 9d popfq 48 89 f0 mov %rsi,%rax 8b 35 82 7d db e2 mov -0x1d24827e(%rip),%esi # 0xffffffff82db9e67 <nr_cpu_ids+3> This dump shows that the second MOV accesses (nr_cpu_ids+3) instead of the original nr_cpu_ids. This leads to a kernel freeze because cpumask_next() always returns 0 and for_each_cpu() never ends. Fix this by adding 'len' correctly to the real RIP address while copying. [ mingo: Improved the changelog. ] Reported-by: Michael Rodin <michael@rodin.online> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org # v4.15+ Fixes: 63fef14fc98a ("kprobes/x86: Make insn buffer always ROX and use text_poke()") Link: http://lkml.kernel.org/r/153504457253.22602.1314289671019919596.stgit@devbox Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-04	net/mlx5: Update mlx5_ifc with DEVX UCTX capabilities bits	Yishai Hadas	1	-2/+14
	Expose device capabilities for DEVX user context, it includes which caps the device is supported and a matching bit to set as part of user context creation. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-12-04	RDMA/mlx5: Unfold modify RMP function	Leon Romanovsky	1	-28/+34
	There is no need to perform modify_rmp in two separate function, while one of them uses stack as a placeholder for data while other allocates it dynamically. Combine those two functions to one call instead of two. Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-12-04	RDMA/mlx5: Unfold create RMP function	Leon Romanovsky	1	-19/+16
	There is no need to perform create_rmp in two separate function, while one of them uses stack as a placeholder for data while other allocates it dynamically. Combine those two functions to one instead of two. Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-12-04	RDMA/mlx5: Initialize SRQ tables on mlx5_ib	Leon Romanovsky	11	-122/+101
	Transfer initialization and cleanup from mlx5_priv struct of mlx5_core_dev to be part of mlx5_ib_dev. This completes removal of SRQ from mlx5_core. Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-12-04	RDMA/mlx5: Update SRQ functions signatures to mlx5_ib format	Leon Romanovsky	4	-77/+83
	Reflect the change of moving SRQ code from mlx5_core to mlx5_ib by updating function signatures do not require mlx5_core_dev as an input, because all operations in mlx5_ib are supposed to use mlx5_ib_dev. Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-12-04	RDMA/mlx5: Use stages for callback to setup and release DEVX	Leon Romanovsky	2	-7/+20
	Reuse existing infrastructure to initialize and release DEVX uid. The DevX interface is intended for user space access, so it is supposed to be initialized before ib_register_device(). Also it isn't supported in switchdev mode and don't need to initialize it in that mode. Fixes: 76dc5a8406bf ("IB/mlx5: Manage device uid for DEVX white list commands") Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-12-04	RDMA/mlx5: Remove SRQ signature global flag	Leon Romanovsky	1	-4/+1
	SRQ signature is not supported, hence no need for special static global variable to announce it. Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-12-04	net/mlx5: Move SRQ functions to RDMA part	Leon Romanovsky	10	-724/+717
	There is no need to keep SRQ which is RDMA object in mlx5_core. In this patch, we partially move the execution code, while next patches will move table initialization/release logic too. Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-12-04	net/mlx5: Remove references to local mlx5_core functions	Leon Romanovsky	1	-19/+3
	As a preparation to move SRQ functionality to RDMA, drop all references to mlx5_core logic and make SRQ be dependent on shared code only. Most of the time, we are interested to know if events are working/not working and it is possible with previous commit ("net/mlx5: Debug print for forwarded async events"). Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-12-04	net/mlx5: Remove not-used lib/eq.h header file	Leon Romanovsky	1	-1/+0
	lib/eq.h is needed for EQ manipulation which are not performed in SRQ. Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-12-04	net/mlx5: Remove dead transobj code	Leon Romanovsky	2	-71/+0
	Delete functions which are not called and not needed. Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-12-04	net/mlx5: Align SRQ licenses and copyright information	Leon Romanovsky	3	-87/+6
	Ensure that both RDMA and netdev parts of SRQ implementation has same copyright and license information annotated by SPDX tags. Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-12-04	net: marvell: convert to DEFINE_SHOW_ATTRIBUTE	Yangtao Li	2	-26/+2
	Use DEFINE_SHOW_ATTRIBUTE macro to simplify the code. Signed-off-by: Yangtao Li <tiny.windzz@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-04	net: qca_spi: convert to DEFINE_SHOW_ATTRIBUTE	Yangtao Li	1	-14/+2
	Use DEFINE_SHOW_ATTRIBUTE macro to simplify the code. Signed-off-by: Yangtao Li <tiny.windzz@gmail.com> Acked-by: Stefan Wahren <stefan.wahren@i2se.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-04	net: stmmac: convert to DEFINE_SHOW_ATTRIBUTE	Yangtao Li	1	-30/+4
	Use DEFINE_SHOW_ATTRIBUTE macro to simplify the code. Signed-off-by: Yangtao Li <tiny.windzz@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-04	nfp: convert to DEFINE_SHOW_ATTRIBUTE	Yangtao Li	1	-34/+8
	Use DEFINE_SHOW_ATTRIBUTE macro to simplify the code. Signed-off-by: Yangtao Li <tiny.windzz@gmail.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-04	Merge branch 'mlx4_core-cleanups'	David S. Miller	1	-9/+8
	Tariq Toukan says: ==================== mlx4_core cleanups This patchset by Erez contains cleanups to the mlx4_core driver. Patch 1 replaces -EINVAL with -EOPNOTSUPP for unsupported operations. Patch 2 fixes some coding style issues. Series generated against net-next commit: 97e6c858a26e net: usb: aqc111: Initialize wol_cfg with memset in aqc111_suspend ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-04	net/mlx4_core: Fix several coding style errors	Erez Alfasi	1	-3/+3
	Fix 3 checkpatch errors within mlx4/main.c: - Unnecessary mlx4_debug_level global variable initialization to 0. - Prohibited space before comma. - Whitespaces instead of tab. Signed-off-by: Erez Alfasi <ereza@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-04	net/mlx4_core: Fix return codes of unsupported operations	Erez Alfasi	1	-6/+5
	Functions __set_port_type and mlx4_check_port_params returned -EINVAL while the proper return code is -EOPNOTSUPP as a result of an unsupported operation. All drivers should generate this and all users should check for it when detecting an unsupported functionality. Signed-off-by: Erez Alfasi <ereza@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-04	net: phy: Also request modules for C45 IDs	Jose Abreu	1	-1/+15
	Logic of phy_device_create() requests PHY modules according to PHY ID but for C45 PHYs we use different field for the IDs. Let's also request the modules for these IDs. Changes from v1: - Only request C22 modules if C45 are not present (Andrew) Signed-off-by: Jose Abreu <joabreu@synopsys.com> Cc: Andrew Lunn <andrew@lunn.ch> Cc: Florian Fainelli <f.fainelli@gmail.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Joao Pinto <joao.pinto@synopsys.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-04	Merge branch 'octeontx2-next'	David S. Miller	10	-184/+1095
	Jerin Jacob says: ==================== octeontx2-af: NIX and NPC enhancements This patchset is a continuation to earlier submitted four patch series to add a new driver for Marvell's OcteonTX2 SOC's Resource virtualization unit (RVU) admin function driver. 1. octeontx2-af: Add RVU Admin Function driver https://www.spinics.net/lists/netdev/msg528272.html 2. octeontx2-af: NPA and NIX blocks initialization https://www.spinics.net/lists/netdev/msg529163.html 3. octeontx2-af: NPC parser and NIX blocks initialization https://www.spinics.net/lists/netdev/msg530252.html 4. octeontx2-af: NPC MCAM support and FLR handling https://www.spinics.net/lists/netdev/msg534392.html This patch series adds support for below NPC block: - Add NPC(mkex) profile support for various Key extraction configurations NIX block: - Enable dynamic RSS flow key algorithm configuration - Enhancements on Rx checksum and error checks - Add support for Tx packet marking support - TL1 schedule queue allocation enhancements - Add LSO format configuration mbox - VLAN TPID configuration - Skip multicast entry init for broadcast tables v2: - Rename FLOW_* to NIX_FLOW_* to avoid serious global namespace collisions, as we have various FLOW_* definitions coming from include/uapi/linux/pkt_cls.h for example.(David Miller) - Pack the arguments of rvu_get_tl1_schqs() function as 80 columns allows.(David Miller) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-04	octeontx2-af: Enable mkex profile	Vamsi Attunuru	8	-2/+213
	The following set of NPC registers allow the driver to configure NPC to generate different key value schemes to compare against packet payload in MCAM search. NPC_AF_INTF(0..1)_KEX_CFG NPC_AF_KEX_LDATA(0..1)_FLAGS_CFG NPC_AF_INTF(0..1)_LID(0..7)_LT(0..15)_LD(0..1)_CFG NPC_AF_INTF(0..1)_LDATA(0..1)_FLAGS(0..15)_CFG Currently, the AF driver populates these registers to configure the default values to address the most common use cases such as key generation for channel number + DMAC. The secure firmware stores different configuration value of these registers to enable different NPC use case along with the name for the lookup. Patch loads profile binary from secure firmware over the exiting CGX mailbox interface and apply the profile. AF driver shall fall back to the default configuration in case of any errors. The AF consumer driver can know the selected profile on response to NPC_GET_KEX_CFG mailbox by introducing mkex_pfl_name in the struct npc_get_kex_cfg_rsp. Signed-off-by: Vamsi Attunuru <vamsi.attunuru@marvell.com> Signed-off-by: Jerin Jacob <jerinj@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-04	octeontx2-af: Add LSO format configuration mailbox	Nithin Dabilpuram	3	-7/+91
	NIX_AF_LSO_FORMAT(0..31)_FIELD(0..7) register enables an SW defined means to define LSO packet modification formats. 0..31 works as an index to choose the algorithm, On success, the mailbox returns the index to the client of chosen LSO algorithm selection. This index will be used in configuring the transmit descriptors. Add mailbox interface to dynamically reserve and configure LSO format. This commit also fixes 'sizem1' for NIX_LSOALG_TCP_FLAGS to '1' i.e 2 Bytes. Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: Jerin Jacob <jerinj@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-04	octeontx2-af: Add L3 and L4 packet verification mailbox	Vidhya Raman	3	-0/+55
	Adds mailbox support for L4 checksum verification and L3 and L4 length verification configuration. Signed-off-by: Vidhya Raman <vraman@marvell.com> Signed-off-by: Jerin Jacob <jerinj@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>