BMC/Intel-BMC/linux.git - Intel OpenBMC Linux kernel source tree (mirror)

Age	Commit message (Collapse)	Author	Files	Lines
2018-06-05	overflow.h: Add allocation size calculation helpers	Kees Cook	2	-5/+78
	In preparation for replacing unchecked overflows for memory allocations, this creates helpers for the 3 most common calculations: array_size(a, b): 2-dimensional array array3_size(a, b, c): 3-dimensional array struct_size(ptr, member, n): struct followed by n-many trailing members Each of these return SIZE_MAX on overflow instead of wrapping around. (Additionally renames a variable named "array_size" to avoid future collision.) Co-developed-by: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Kees Cook <keescook@chromium.org>
2018-06-05	test_overflow: Report test failures	Kees Cook	1	-15/+39
	This adjusts the overflow test to report failures, and prepares to add allocation tests. Signed-off-by: Kees Cook <keescook@chromium.org>
2018-06-05	test_overflow: macrofy some more, do more tests for free	Rasmus Villemoes	1	-26/+22
	Obviously a+b==b+a and ab==ba, but the implementation of the fallback checks are not entirely symmetric in how they treat a and b. So we might as well check the (b,a,r,of) tuple as well as the (a,b,r,of) one for + and *. Rather than more copy-paste, factor out the common part to check_one_op. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Kees Cook <keescook@chromium.org>
2018-06-05	lib: add runtime test of check_*_overflow functions	Rasmus Villemoes	3	-0/+291
	This adds a small module for testing that the check_*_overflow functions work as expected, whether implemented in C or using gcc builtins. Example output: test_overflow: u8 : 18 tests test_overflow: s8 : 19 tests test_overflow: u16: 17 tests test_overflow: s16: 17 tests test_overflow: u32: 17 tests test_overflow: s32: 17 tests test_overflow: u64: 17 tests test_overflow: s64: 21 tests Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> [kees: add output to commit log, drop u64 tests on 32-bit] Signed-off-by: Kees Cook <keescook@chromium.org>
2018-06-05	netdev-FAQ: clarify DaveM's position for stable backports	Cong Wang	1	-0/+9
	Per discussion with David at netconf 2018, let's clarify DaveM's position of handling stable backports in netdev-FAQ. This is important for people relying on upstream -stable releases. Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-05	Merge branch 'for-linus' of ↵	Linus Torvalds	25	-166/+435
	git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 updates from Martin Schwidefsky: - A rework for the s390 arch random code, the TRNG instruction is rather slow and should not be used on the interrupt path - A fix for a memory leak in the zcrypt driver - Changes to the early boot code to add a compile time check for code that may not use the .bss section, with the goal to avoid initrd corruptions - Add an interface to get the physical network ID (pnetid), this is useful to group network devices that are attached to the same network - Some cleanup for the linker script - Some code improvement for the dasd driver - Two fixes for the perf sampling support * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/zcrypt: Fix CCA and EP11 CPRB processing failure memory leak. s390/archrandom: Rework arch random implementation. s390/net: add pnetid support s390/dasd: simplify locking in dasd_times_out s390/cio: add test for ccwgroup device s390/cio: add helper to query utility strings per given ccw device s390: remove no-op macro VMLINUX_SYMBOL() s390: remove closung punctuation from spectre messages s390: introduce compile time check for empty .bss section s390/early: move functions which may not access bss section to extra file s390/early: get rid of #ifdef CONFIG_BLK_DEV_INITRD s390/early: get rid of memmove_early s390/cpum_sf: Add data entry sizes to sampling trailer entry perf: fix invalid bit in diagnostic entry
2018-06-05	Merge branch 'for-next' of ↵	Linus Torvalds	14	-371/+230
	git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu Pull m68knommu updates from Greg Ungerer: "These changes all relate to converting the IO access functions for the ColdFire (and all other non-MMU m68k) platforms to use asm-generic IO instead. This makes the IO support the same on all ColdFire (regardless of MMU enabled or not) and means we can now support PCI in non-MMU mode. As a bonus these changes remove more code than they add" * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu: m68k: fix ColdFire PCI config reads and writes m68k: introduce iomem() macro for __iomem conversions m68k: allow ColdFire PCI bus on MMU and non-MMU configuration m68k: fix ioremapping for internal ColdFire peripherals m68k: fix read/write multi-byte IO for PCI on ColdFire m68k: don't redefine access functions if we have PCI m68k: remove old ColdFire IO access support code m68k: use io_no.h for MMU and non-MMU enabled ColdFire m68k: setup PCI support code in io_no.h m68k: group io mapping definitions and functions m68k: rework raw access macros for the non-MMU case m68k: use asm-generic/io.h for non-MMU io access functions m68k: put definition guards around virt_to_phys and phys_to_virt m68k: move *_relaxed macros into io_no.h and io_mm.h
2018-06-05	Merge tag 'rslib-v4.18-rc1' of ↵	Linus Torvalds	7	-196/+243
	git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull reed-salomon library updates from Kees Cook: "Refactors rslib and callers to provide a per-instance allocation area instead of performing VLAs on the stack" * tag 'rslib-v4.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: rslib: Allocate decoder buffers to avoid VLAs mtd: rawnand: diskonchip: Allocate rs control per instance rslib: Split rs control struct rslib: Simplify error path rslib: Remove GPL boilerplate rslib: Add SPDX identifiers rslib: Cleanup top level comments rslib: Cleanup whitespace damage dm/verity_fec: Use GFP aware reed solomon init rslib: Add GFP aware init function
2018-06-05	Merge tag 'dp-4.18-rc1' of ↵	Linus Torvalds	3	-47/+117
	git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull device properties framework update from Rafael Wysocki: "Modify the device properties framework to remove union aliasing from it (Andy Shevchenko)" * tag 'dp-4.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: device property: Get rid of union aliasing
2018-06-05	Merge tag 'acpi-4.18-rc1' of ↵	Linus Torvalds	36	-194/+751
	git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI updates from Rafael Wysocki: "These update the ACPICA code in the kernel to the 20180508 upstream revision and make it support the RT patch, add CPPC v3 support to the ACPI CPPC library, add a WDAT-based watchdog quirk to prevent clashes with the RTC, add quirks to the ACPI AC and battery drivers, and update the ACPI SoC drivers. Specifics: - Update the ACPICA code in the kernel to the 20180508 upstream revision including: * iASL -tc option enhancement (Bob Moore). * Debugger improvements (Bob Moore). * Support for tables larger than 1 MB in acpidump/acpixtract (Bob Moore). * Minor fixes and cleanups (Colin Ian King, Toomas Soome). - Make the ACPICA code in the kernel support the RT patch (Sebastian Andrzej Siewior, Steven Rostedt). - Add a kmemleak annotation to the ACPICA code (Larry Finger). - Add CPPC v3 support to the ACPI CPPC library and fix two issues related to CPPC (Prashanth Prakash, Al Stone). - Add an ACPI WDAT-based watchdog quirk to prefer iTCO_wdt on systems where WDAT clashes with the RTC SRAM (Mika Westerberg). - Add some quirks to the ACPI AC and battery drivers (Carlo Caione, Hans de Goede). - Update the ACPI SoC drivers for Intel (LPSS) and AMD (APD) platforms (Akshu Agrawal, Hans de Goede). - Fix up some assorted minor issues (Al Stone, Laszlo Toth, Mathieu Malaterre)" * tag 'acpi-4.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (32 commits) ACPICA: Mark acpi_ut_create_internal_object_dbg() memory allocations as non-leaks ACPI / watchdog: Prefer iTCO_wdt always when WDAT table uses RTC SRAM mailbox: PCC: erroneous error message when parsing ACPI PCCT ACPICA: Update version to 20180508 ACPICA: acpidump/acpixtract: Support for tables larger than 1MB ACPI: APD: Add AMD misc clock handler support clk: x86: Add ST oscout platform clock ACPICA: Update version to 20180427 ACPICA: Debugger: Removed direct support for EC address space in "Test Objects" ACPICA: Debugger: Add Package support for "test objects" command ACPICA: Improve error messages for the namespace root node ACPICA: Fix potential infinite loop in acpi_rs_dump_byte_list ACPICA: vsnprintf: this statement may fall through ACPICA: Tables: Fix spelling mistake in comment ACPICA: iASL: Enhance the -tc option (create AML hex file in C) ACPI: Add missing prototype_for arch_post_acpi_subsys_init() ACPI / tables: improve comments regarding acpi_parse_entries_array() ACPICA: Convert acpi_gbl_hardware lock back to an acpi_raw_spinlock ACPICA: provide abstraction for raw_spinlock_t ACPI / CPPC: Fix invalid PCC channel status errors ...
2018-06-05	rtnetlink: validate attributes in do_setlink()	Eric Dumazet	1	-4/+4
	It seems that rtnl_group_changelink() can call do_setlink while a prior call to validate_linkmsg(dev = NULL, ...) could not validate IFLA_ADDRESS / IFLA_BROADCAST Make sure do_setlink() calls validate_linkmsg() instead of letting its callers having this responsibility. With help from Dmitry Vyukov, thanks a lot ! BUG: KMSAN: uninit-value in is_valid_ether_addr include/linux/etherdevice.h:199 [inline] BUG: KMSAN: uninit-value in eth_prepare_mac_addr_change net/ethernet/eth.c:275 [inline] BUG: KMSAN: uninit-value in eth_mac_addr+0x203/0x2b0 net/ethernet/eth.c:308 CPU: 1 PID: 8695 Comm: syz-executor3 Not tainted 4.17.0-rc5+ #103 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x185/0x1d0 lib/dump_stack.c:113 kmsan_report+0x149/0x260 mm/kmsan/kmsan.c:1084 __msan_warning_32+0x6e/0xc0 mm/kmsan/kmsan_instr.c:686 is_valid_ether_addr include/linux/etherdevice.h:199 [inline] eth_prepare_mac_addr_change net/ethernet/eth.c:275 [inline] eth_mac_addr+0x203/0x2b0 net/ethernet/eth.c:308 dev_set_mac_address+0x261/0x530 net/core/dev.c:7157 do_setlink+0xbc3/0x5fc0 net/core/rtnetlink.c:2317 rtnl_group_changelink net/core/rtnetlink.c:2824 [inline] rtnl_newlink+0x1fe9/0x37a0 net/core/rtnetlink.c:2976 rtnetlink_rcv_msg+0xa32/0x1560 net/core/rtnetlink.c:4646 netlink_rcv_skb+0x378/0x600 net/netlink/af_netlink.c:2448 rtnetlink_rcv+0x50/0x60 net/core/rtnetlink.c:4664 netlink_unicast_kernel net/netlink/af_netlink.c:1310 [inline] netlink_unicast+0x1678/0x1750 net/netlink/af_netlink.c:1336 netlink_sendmsg+0x104f/0x1350 net/netlink/af_netlink.c:1901 sock_sendmsg_nosec net/socket.c:629 [inline] sock_sendmsg net/socket.c:639 [inline] ___sys_sendmsg+0xec0/0x1310 net/socket.c:2117 __sys_sendmsg net/socket.c:2155 [inline] __do_sys_sendmsg net/socket.c:2164 [inline] __se_sys_sendmsg net/socket.c:2162 [inline] __x64_sys_sendmsg+0x331/0x460 net/socket.c:2162 do_syscall_64+0x152/0x230 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x455a09 RSP: 002b:00007fc07480ec68 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 00007fc07480f6d4 RCX: 0000000000455a09 RDX: 0000000000000000 RSI: 00000000200003c0 RDI: 0000000000000014 RBP: 000000000072bea0 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff R13: 00000000000005d0 R14: 00000000006fdc20 R15: 0000000000000000 Uninit was stored to memory at: kmsan_save_stack_with_flags mm/kmsan/kmsan.c:279 [inline] kmsan_save_stack mm/kmsan/kmsan.c:294 [inline] kmsan_internal_chain_origin+0x12b/0x210 mm/kmsan/kmsan.c:685 kmsan_memcpy_origins+0x11d/0x170 mm/kmsan/kmsan.c:527 __msan_memcpy+0x109/0x160 mm/kmsan/kmsan_instr.c:478 do_setlink+0xb84/0x5fc0 net/core/rtnetlink.c:2315 rtnl_group_changelink net/core/rtnetlink.c:2824 [inline] rtnl_newlink+0x1fe9/0x37a0 net/core/rtnetlink.c:2976 rtnetlink_rcv_msg+0xa32/0x1560 net/core/rtnetlink.c:4646 netlink_rcv_skb+0x378/0x600 net/netlink/af_netlink.c:2448 rtnetlink_rcv+0x50/0x60 net/core/rtnetlink.c:4664 netlink_unicast_kernel net/netlink/af_netlink.c:1310 [inline] netlink_unicast+0x1678/0x1750 net/netlink/af_netlink.c:1336 netlink_sendmsg+0x104f/0x1350 net/netlink/af_netlink.c:1901 sock_sendmsg_nosec net/socket.c:629 [inline] sock_sendmsg net/socket.c:639 [inline] ___sys_sendmsg+0xec0/0x1310 net/socket.c:2117 __sys_sendmsg net/socket.c:2155 [inline] __do_sys_sendmsg net/socket.c:2164 [inline] __se_sys_sendmsg net/socket.c:2162 [inline] __x64_sys_sendmsg+0x331/0x460 net/socket.c:2162 do_syscall_64+0x152/0x230 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Uninit was created at: kmsan_save_stack_with_flags mm/kmsan/kmsan.c:279 [inline] kmsan_internal_poison_shadow+0xb8/0x1b0 mm/kmsan/kmsan.c:189 kmsan_kmalloc+0x94/0x100 mm/kmsan/kmsan.c:315 kmsan_slab_alloc+0x10/0x20 mm/kmsan/kmsan.c:322 slab_post_alloc_hook mm/slab.h:446 [inline] slab_alloc_node mm/slub.c:2753 [inline] __kmalloc_node_track_caller+0xb32/0x11b0 mm/slub.c:4395 __kmalloc_reserve net/core/skbuff.c:138 [inline] __alloc_skb+0x2cb/0x9e0 net/core/skbuff.c:206 alloc_skb include/linux/skbuff.h:988 [inline] netlink_alloc_large_skb net/netlink/af_netlink.c:1182 [inline] netlink_sendmsg+0x76e/0x1350 net/netlink/af_netlink.c:1876 sock_sendmsg_nosec net/socket.c:629 [inline] sock_sendmsg net/socket.c:639 [inline] ___sys_sendmsg+0xec0/0x1310 net/socket.c:2117 __sys_sendmsg net/socket.c:2155 [inline] __do_sys_sendmsg net/socket.c:2164 [inline] __se_sys_sendmsg net/socket.c:2162 [inline] __x64_sys_sendmsg+0x331/0x460 net/socket.c:2162 do_syscall_64+0x152/0x230 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: e7ed828f10bd ("netlink: support setting devgroup parameters") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Cc: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-05	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next	David S. Miller	76	-730/+3841
	Daniel Borkmann says: ==================== pull-request: bpf-next 2018-06-05 The following pull-request contains BPF updates for your net-next tree. The main changes are: 1) Add a new BPF hook for sendmsg similar to existing hooks for bind and connect: "This allows to override source IP (including the case when it's set via cmsg(3)) and destination IP:port for unconnected UDP (slow path). TCP and connected UDP (fast path) are not affected. This makes UDP support complete, that is, connected UDP is handled by connect hooks, unconnected by sendmsg ones.", from Andrey. 2) Rework of the AF_XDP API to allow extending it in future for type writer model if necessary. In this mode a memory window is passed to hardware and multiple frames might be filled into that window instead of just one that is the case in the current fixed frame-size model. With the new changes made this can be supported without having to add a new descriptor format. Also, core bits for the zero-copy support for AF_XDP have been merged as agreed upon, where i40e bits will be routed via Jeff later on. Various improvements to documentation and sample programs included as well, all from Björn and Magnus. 3) Given BPF's flexibility, a new program type has been added to implement infrared decoders. Quote: "The kernel IR decoders support the most widely used IR protocols, but there are many protocols which are not supported. [...] There is a 'long tail' of unsupported IR protocols, for which lircd is need to decode the IR. IR encoding is done in such a way that some simple circuit can decode it; therefore, BPF is ideal. [...] user-space can define a decoder in BPF, attach it to the rc device through the lirc chardev.", from Sean. 4) Several improvements and fixes to BPF core, among others, dumping map and prog IDs into fdinfo which is a straight forward way to correlate BPF objects used by applications, removing an indirect call and therefore retpoline in all map lookup/update/delete calls by invoking the callback directly for 64 bit archs, adding a new bpf_skb_cgroup_id() BPF helper for tc BPF programs to have an efficient way of looking up cgroup v2 id for policy or other use cases. Fixes to make sure we zero tunnel/xfrm state that hasn't been filled, to allow context access wrt pt_regs in 32 bit archs for tracing, and last but not least various test cases for fixes that landed in bpf earlier, from Daniel. 5) Get rid of the ndo_xdp_flush API and extend the ndo_xdp_xmit with a XDP_XMIT_FLUSH flag instead which allows to avoid one indirect call as flushing is now merged directly into ndo_xdp_xmit(), from Jesper. 6) Add a new bpf_get_current_cgroup_id() helper that can be used in tracing to retrieve the cgroup id from the current process in order to allow for e.g. aggregation of container-level events, from Yonghong. 7) Two follow-up fixes for BTF to reject invalid input values and related to that also two test cases for BPF kselftests, from Martin. 8) Various API improvements to the bpf_fib_lookup() helper, that is, dropping MPLS bits which are not fully hashed out yet, rejecting invalid helper flags, returning error for unsupported address families as well as renaming flowlabel to flowinfo, from David. 9) Various fixes and improvements to sockmap BPF kselftests in particular in proper error detection and data verification, from Prashant. 10) Two arm32 BPF JIT improvements. One is to fix imm range check with regards to whether immediate fits into 24 bits, and a naming cleanup to get functions related to rsh handling consistent to those handling lsh, from Wang. 11) Two compile warning fixes in BPF, one for BTF and a false positive to silent gcc in stack_map_get_build_id_offset(), from Arnd. 12) Add missing seg6.h header into tools include infrastructure in order to fix compilation of BPF kselftests, from Mathieu. 13) Several formatting cleanups in the BPF UAPI helper description that also fix an error during rst2man compilation, from Quentin. 14) Hide an unused variable in sk_msg_convert_ctx_access() when IPv6 is not built into the kernel, from Yue. 15) Remove a useless double assignment in dev_map_enqueue(), from Colin. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-05	Merge tag 'pm-4.18-rc1' of ↵	Linus Torvalds	78	-1192/+3235
	git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management updates from Rafael Wysocki: "These include a significant update of the generic power domains (genpd) and Operating Performance Points (OPP) frameworks, mostly related to the introduction of power domain performance levels, cpufreq updates (new driver for Qualcomm Kryo processors, updates of the existing drivers, some core fixes, schedutil governor improvements), PCI power management fixes, ACPI workaround for EC-based wakeup events handling on resume from suspend-to-idle, and major updates of the turbostat and pm-graph utilities. Specifics: - Introduce power domain performance levels into the the generic power domains (genpd) and Operating Performance Points (OPP) frameworks (Viresh Kumar, Rajendra Nayak, Dan Carpenter). - Fix two issues in the runtime PM framework related to the initialization and removal of devices using device links (Ulf Hansson). - Clean up the initialization of drivers for devices in PM domains (Ulf Hansson, Geert Uytterhoeven). - Fix a cpufreq core issue related to the policy sysfs interface causing CPU online to fail for CPUs sharing one cpufreq policy in some situations (Tao Wang). - Make it possible to use platform-specific suspend/resume hooks in the cpufreq-dt driver and make the Armada 37xx DVFS use that feature (Viresh Kumar, Miquel Raynal). - Optimize policy transition notifications in cpufreq (Viresh Kumar). - Improve the iowait boost mechanism in the schedutil cpufreq governor (Patrick Bellasi). - Improve the handling of deferred frequency updates in the schedutil cpufreq governor (Joel Fernandes, Dietmar Eggemann, Rafael Wysocki, Viresh Kumar). - Add a new cpufreq driver for Qualcomm Kryo (Ilia Lin). - Fix and clean up some cpufreq drivers (Colin Ian King, Dmitry Osipenko, Doug Smythies, Luc Van Oostenryck, Simon Horman, Viresh Kumar). - Fix the handling of PCI devices with the DPM_SMART_SUSPEND flag set and update stale comments in the PCI core PM code (Rafael Wysocki). - Work around an issue related to the handling of EC-based wakeup events in the ACPI PM core during resume from suspend-to-idle if the EC has been put into the low-power mode (Rafael Wysocki). - Improve the handling of wakeup source objects in the PM core (Doug Berger, Mahendran Ganesh, Rafael Wysocki). - Update the driver core to prevent deferred probe from breaking suspend/resume ordering (Feng Kan). - Clean up the PM core somewhat (Bjorn Helgaas, Ulf Hansson, Rafael Wysocki). - Make the core suspend/resume code and cpufreq support the RT patch (Sebastian Andrzej Siewior, Thomas Gleixner). - Consolidate the PM QoS handling in cpuidle governors (Rafael Wysocki). - Fix a possible crash in the hibernation core (Tetsuo Handa). - Update the rockchip-io Adaptive Voltage Scaling (AVS) driver (David Wu). - Update the turbostat utility (fixes, cleanups, new CPU IDs, new command line options, built-in "Low Power Idle" counters support, new POLL and POLL% columns) and add an entry for it to MAINTAINERS (Len Brown, Artem Bityutskiy, Chen Yu, Laura Abbott, Matt Turner, Prarit Bhargava, Srinivas Pandruvada). - Update the pm-graph to version 5.1 (Todd Brandt). - Update the intel_pstate_tracer utility (Doug Smythies)" * tag 'pm-4.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (128 commits) tools/power turbostat: update version number tools/power turbostat: Add Node in output tools/power turbostat: add node information into turbostat calculations tools/power turbostat: remove num_ from cpu_topology struct tools/power turbostat: rename num_cores_per_pkg to num_cores_per_node tools/power turbostat: track thread ID in cpu_topology tools/power turbostat: Calculate additional node information for a package tools/power turbostat: Fix node and siblings lookup data tools/power turbostat: set max_num_cpus equal to the cpumask length tools/power turbostat: if --num_iterations, print for specific number of iterations tools/power turbostat: Add Cannon Lake support tools/power turbostat: delete duplicate #defines x86: msr-index.h: Correct SNB_C1/C3_AUTO_UNDEMOTE defines tools/power turbostat: Correct SNB_C1/C3_AUTO_UNDEMOTE defines tools/power turbostat: add POLL and POLL% column tools/power turbostat: Fix --hide Pk%pc10 tools/power turbostat: Build-in "Low Power Idle" counters support tools/power turbostat: Don't make man pages executable tools/power turbostat: remove blank lines tools/power turbostat: a small C-states dump readability immprovement ...
2018-06-05	Merge tag 'for-linus-20180605' of git://git.kernel.dk/linux-block	Linus Torvalds	8	-8/+8
	Pull block fixes from Jens Axboe: "This just contains the dm kzalloc fix that was discussed, and a fix that I queued up yesterday for a case where blk-mq doesn't honor the stop bit appropriately" * tag 'for-linus-20180605' of git://git.kernel.dk/linux-block: dm: Use kzalloc for all structs with embedded biosets/mempools blk-mq: return when hctx is stopped in blk_mq_run_work_fn
2018-06-05	Merge branch 'devlink-extack'	David S. Miller	9	-32/+60
	David Ahern says: ==================== devlink: Add extack messages for reload and port split/unsplit Patch 1 adds extack arg to reload, port_split and port_unsplit devlink operations. Patch 2 adds extack messages for reload operation in netdevsim. Patch 3 adds extack messages to port split/unsplit in mlxsw driver. v2 - make the extack messages align with existing dev_err ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-05	mlxsw: Add extack messages for port_{un, }split failures	David Ahern	3	-9/+25
	Return messages in extack for port split/unsplit errors. e.g., $ devlink port split swp1s1 count 4 Error: mlxsw_spectrum: Port cannot be split further. devlink answers: Invalid argument $ devlink port unsplit swp4 Error: mlxsw_spectrum: Port was not split. devlink answers: Invalid argument Signed-off-by: David Ahern <dsahern@gmail.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-05	netdevsim: Add extack error message for devlink reload	David Ahern	3	-6/+10
	devlink reset command can fail if a FIB resource limit is set to a value lower than the current occupancy. Return a proper message indicating the reason for the failure. $ devlink resource sh netdevsim/netdevsim0 netdevsim/netdevsim0: name IPv4 size unlimited unit entry size_min 0 size_max unlimited size_gran 1 dpipe_tables none resources: name fib size unlimited occ 43 unit entry size_min 0 size_max unlimited size_gran 1 dpipe_tables none name fib-rules size unlimited occ 4 unit entry size_min 0 size_max unlimited size_gran 1 dpipe_tables none name IPv6 size unlimited unit entry size_min 0 size_max unlimited size_gran 1 dpipe_tables none resources: name fib size unlimited occ 54 unit entry size_min 0 size_max unlimited size_gran 1 dpipe_tables none name fib-rules size unlimited occ 3 unit entry size_min 0 size_max unlimited size_gran 1 dpipe_tables none $ devlink resource set netdevsim/netdevsim0 path /IPv4/fib size 40 $ devlink dev reload netdevsim/netdevsim0 Error: netdevsim: New size is less than current occupancy. devlink answers: Invalid argument Signed-off-by: David Ahern <dsahern@gmail.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-05	devlink: Add extack to reload and port_{un, }split operations	David Ahern	5	-17/+25
	Add extack argument to reload, port_split and port_unsplit operations. Signed-off-by: David Ahern <dsahern@gmail.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-05	net: metrics: add proper netlink validation	Eric Dumazet	2	-0/+4
	Before using nla_get_u32(), better make sure the attribute is of the proper size. Code recently was changed, but bug has been there from beginning of git. BUG: KMSAN: uninit-value in rtnetlink_put_metrics+0x553/0x960 net/core/rtnetlink.c:746 CPU: 1 PID: 14139 Comm: syz-executor6 Not tainted 4.17.0-rc5+ #103 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x185/0x1d0 lib/dump_stack.c:113 kmsan_report+0x149/0x260 mm/kmsan/kmsan.c:1084 __msan_warning_32+0x6e/0xc0 mm/kmsan/kmsan_instr.c:686 rtnetlink_put_metrics+0x553/0x960 net/core/rtnetlink.c:746 fib_dump_info+0xc42/0x2190 net/ipv4/fib_semantics.c:1361 rtmsg_fib+0x65f/0x8c0 net/ipv4/fib_semantics.c:419 fib_table_insert+0x2314/0x2b50 net/ipv4/fib_trie.c:1287 inet_rtm_newroute+0x210/0x340 net/ipv4/fib_frontend.c:779 rtnetlink_rcv_msg+0xa32/0x1560 net/core/rtnetlink.c:4646 netlink_rcv_skb+0x378/0x600 net/netlink/af_netlink.c:2448 rtnetlink_rcv+0x50/0x60 net/core/rtnetlink.c:4664 netlink_unicast_kernel net/netlink/af_netlink.c:1310 [inline] netlink_unicast+0x1678/0x1750 net/netlink/af_netlink.c:1336 netlink_sendmsg+0x104f/0x1350 net/netlink/af_netlink.c:1901 sock_sendmsg_nosec net/socket.c:629 [inline] sock_sendmsg net/socket.c:639 [inline] ___sys_sendmsg+0xec0/0x1310 net/socket.c:2117 __sys_sendmsg net/socket.c:2155 [inline] __do_sys_sendmsg net/socket.c:2164 [inline] __se_sys_sendmsg net/socket.c:2162 [inline] __x64_sys_sendmsg+0x331/0x460 net/socket.c:2162 do_syscall_64+0x152/0x230 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x455a09 RSP: 002b:00007faae5fd8c68 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 00007faae5fd96d4 RCX: 0000000000455a09 RDX: 0000000000000000 RSI: 0000000020000000 RDI: 0000000000000013 RBP: 000000000072bea0 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff R13: 00000000000005d0 R14: 00000000006fdc20 R15: 0000000000000000 Uninit was stored to memory at: kmsan_save_stack_with_flags mm/kmsan/kmsan.c:279 [inline] kmsan_save_stack mm/kmsan/kmsan.c:294 [inline] kmsan_internal_chain_origin+0x12b/0x210 mm/kmsan/kmsan.c:685 __msan_chain_origin+0x69/0xc0 mm/kmsan/kmsan_instr.c:529 fib_convert_metrics net/ipv4/fib_semantics.c:1056 [inline] fib_create_info+0x2d46/0x9dc0 net/ipv4/fib_semantics.c:1150 fib_table_insert+0x3e4/0x2b50 net/ipv4/fib_trie.c:1146 inet_rtm_newroute+0x210/0x340 net/ipv4/fib_frontend.c:779 rtnetlink_rcv_msg+0xa32/0x1560 net/core/rtnetlink.c:4646 netlink_rcv_skb+0x378/0x600 net/netlink/af_netlink.c:2448 rtnetlink_rcv+0x50/0x60 net/core/rtnetlink.c:4664 netlink_unicast_kernel net/netlink/af_netlink.c:1310 [inline] netlink_unicast+0x1678/0x1750 net/netlink/af_netlink.c:1336 netlink_sendmsg+0x104f/0x1350 net/netlink/af_netlink.c:1901 sock_sendmsg_nosec net/socket.c:629 [inline] sock_sendmsg net/socket.c:639 [inline] ___sys_sendmsg+0xec0/0x1310 net/socket.c:2117 __sys_sendmsg net/socket.c:2155 [inline] __do_sys_sendmsg net/socket.c:2164 [inline] __se_sys_sendmsg net/socket.c:2162 [inline] __x64_sys_sendmsg+0x331/0x460 net/socket.c:2162 do_syscall_64+0x152/0x230 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Uninit was created at: kmsan_save_stack_with_flags mm/kmsan/kmsan.c:279 [inline] kmsan_internal_poison_shadow+0xb8/0x1b0 mm/kmsan/kmsan.c:189 kmsan_kmalloc+0x94/0x100 mm/kmsan/kmsan.c:315 kmsan_slab_alloc+0x10/0x20 mm/kmsan/kmsan.c:322 slab_post_alloc_hook mm/slab.h:446 [inline] slab_alloc_node mm/slub.c:2753 [inline] __kmalloc_node_track_caller+0xb32/0x11b0 mm/slub.c:4395 __kmalloc_reserve net/core/skbuff.c:138 [inline] __alloc_skb+0x2cb/0x9e0 net/core/skbuff.c:206 alloc_skb include/linux/skbuff.h:988 [inline] netlink_alloc_large_skb net/netlink/af_netlink.c:1182 [inline] netlink_sendmsg+0x76e/0x1350 net/netlink/af_netlink.c:1876 sock_sendmsg_nosec net/socket.c:629 [inline] sock_sendmsg net/socket.c:639 [inline] ___sys_sendmsg+0xec0/0x1310 net/socket.c:2117 __sys_sendmsg net/socket.c:2155 [inline] __do_sys_sendmsg net/socket.c:2164 [inline] __se_sys_sendmsg net/socket.c:2162 [inline] __x64_sys_sendmsg+0x331/0x460 net/socket.c:2162 do_syscall_64+0x152/0x230 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: a919525ad832 ("net: Move fib_convert_metrics to metrics file") Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Cc: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-05	ipmr: fix error path when ipmr_new_table fails	Sabrina Dubroca	3	-19/+17
	commit 0bbbf0e7d0e7 ("ipmr, ip6mr: Unite creation of new mr_table") refactored ipmr_new_table, so that it now returns NULL when mr_table_alloc fails. Unfortunately, all callers of ipmr_new_table expect an ERR_PTR. This can result in NULL deref, for example when ipmr_rules_exit calls ipmr_free_table with NULL net->ipv4.mrt in the !CONFIG_IP_MROUTE_MULTIPLE_TABLES version. This patch makes mr_table_alloc return errors, and changes ip6mr_new_table and its callers to return/expect error pointers as well. It also removes the version of mr_table_alloc defined under !CONFIG_IP_MROUTE_COMMON, since it is never used. Fixes: 0bbbf0e7d0e7 ("ipmr, ip6mr: Unite creation of new mr_table") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-05	ip6mr: only set ip6mr_table from setsockopt when ip6mr_new_table succeeds	Sabrina Dubroca	1	-1/+2
	Currently, raw6_sk(sk)->ip6mr_table is set unconditionally during ip6_mroute_setsockopt(MRT6_TABLE). A subsequent attempt at the same setsockopt will fail with -ENOENT, since we haven't actually created that table. A similar fix for ipv4 was included in commit 5e1859fbcc3c ("ipv4: ipmr: various fixes and cleanups"). Fixes: d1db275dd3f6 ("ipv6: ip6mr: support multiple tables") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-05	net: hns3: remove unused hclgevf_cfg_func_mta_filter	Arnd Bergmann	1	-11/+0
	The last patch apparently added a complete replacement for this function, but left the old one in place, which now causes a harmless warning: drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c:731:12: 'hclgevf_cfg_func_mta_filter' defined but not used I assume it can be removed. Fixes: 3a678b5806e6 ("net: hns3: Optimize the VF's process of updating multicast MAC") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-05	Merge branch 'faddr2line' (patches from Josh)	Linus Torvalds	2	-2/+21
	Merge faddr2line updates from Josh Poimboeuf: - revert faddr2line's default output to its original non-code-listing output, and make the code listing an optional feature - give faddr2line a real maintainer, so get_maintainer.pl will actually CC me on future patches * emailed patches from Josh Poimboeuf <jpoimboe@redhat.com>: MAINTAINERS: add Josh Poimboeuf as faddr2line maintainer scripts/faddr2line: make the new code listing format optional
2018-06-05	MAINTAINERS: add Josh Poimboeuf as faddr2line maintainer	Josh Poimboeuf	1	-0/+5
	... so I finally get credit for my greatest accomplishment. And, less importantly, so get_maintainer.pl will actually CC me on future patches. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-05	scripts/faddr2line: make the new code listing format optional	Peter Zijlstra (Intel)	1	-2/+16
	Commit 6870c0165feaa5 ("scripts/faddr2line: show the code context") radically altered the output format of the faddr2line tool. And while the new list output format might have merit it broke my vim usage and was hard to read. Make the new format optional; using a '--list' argument and attempt to make the output slightly easier to read by adding a little whitespace to separate the different files and explicitly mark the line in question. Cc: Changbin Du <changbin.du@intel.com> Fixes: 6870c0165feaa5 ("scripts/faddr2line: show the code context") Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-05	netfilter: provide udp*_lib_lookup for nf_tproxy	Arnd Bergmann	2	-6/+2
	It is now possible to enable the libified nf_tproxy modules without also enabling NETFILTER_XT_TARGET_TPROXY, which throws off the ifdef logic in the udp core code: net/ipv6/netfilter/nf_tproxy_ipv6.o: In function `nf_tproxy_get_sock_v6': nf_tproxy_ipv6.c:(.text+0x1a8): undefined reference to `udp6_lib_lookup' net/ipv4/netfilter/nf_tproxy_ipv4.o: In function `nf_tproxy_get_sock_v4': nf_tproxy_ipv4.c:(.text+0x3d0): undefined reference to `udp4_lib_lookup' We can actually simplify the conditions now to provide the two functions exactly when they are needed. Fixes: 45ca4e0cf273 ("netfilter: Libify xt_TPROXY") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Máté Eckl <ecklm94@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-05	Merge tag 'asoc-v4.18' of ↵	Takashi Iwai	256	-3964/+29764
	git://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus ASoC: Updates for v4.18 This is a very big update, mainly due to a huge set of new drivers some of which are individually very large. We also have a lot of fixes for the topology stuff, several of the users have stepped up and fixed some the serious issues there, and continued progress on the transition away from CODEC specific drivers to generic component drivers. - Many fixes for the topology code, including fixes for the half done v4 ABI compatibility from Guenter Roeck and other ABI fixes from Kirill Marinushkin. - Lots of cleanup for Intel platforms based on Realtek CODECs from Hans de Goode. - More followups on removing legacy CODEC things and transitioning to components from Morimoto-san. - Conversion of OMAP DMA to the new, more standard SDMA-PCM driver. - A series of fixes and updates to the rather elderly Cirrus Logic SoC drivers from Alexander Sverdlin. - Qualcomm DSP support from Srinivas Kandagatla. - New drivers for Analog SSM2305, Atmel I2S controllers, Mediatek MT6351, MT6797 and MT7622, Qualcomm DSPs, Realtek RT1305, RT1306 and RT5668 and TI TSCS454
2018-06-05	qed*: Utilize FW 8.37.2.0	Michal Kalderon	19	-536/+727
	This FW contains several fixes and features. RDMA - Several modifications and fixes for Memory Windows - drop vlan and tcp timestamp from mss calculation in driver for this FW - Fix SQ completion flow when local ack timeout is infinite - Modifications in t10dif support ETH - Fix aRFS for tunneled traffic without inner IP. - Fix chip configuration which may fail under heavy traffic conditions. - Support receiving any-VNI in VXLAN and GENEVE RX classification. iSCSI / FcoE - Fix iSCSI recovery flow - Drop vlan and tcp timestamp from mss calc for fw 8.37.2.0 Misc - Several registers (split registers) won't read correctly with ethtool -d Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com> Signed-off-by: Manish Rangankar <manish.rangankar@cavium.com> Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-05	dm: Use kzalloc for all structs with embedded biosets/mempools	Kent Overstreet	7	-7/+7
	mempool_init()/bioset_init() require that the mempools/biosets be zeroed first; they probably should not _require_ this, but not allocating those structs with kzalloc is a fairly nonsensical thing to do (calling mempool_exit()/bioset_exit() on an uninitialized mempool/bioset is legal and safe, but only works if said memory was zeroed.) Acked-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-06-05	net-tcp: remove useless tw_timeout field	Maciej Żenczykowski	3	-3/+0
	Tested: 'git grep tw_timeout' comes up empty and it builds :-) Signed-off-by: Maciej Żenczykowski <maze@google.com> Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-05	net: sched: cls: Fix offloading when ingress dev is vxlan	Paul Blakey	1	-10/+16
	When using a vxlan device as the ingress dev, we count it as a "no offload dev", so when such a rule comes and err stop is true, we fail early and don't try the egdev route which can offload it through the egress device. Fix that by not calling the block offload if one of the devices attached to it is not offload capable, but make sure egress on such case is capable instead. Fixes: caa7260156eb ("net: sched: keep track of offloaded filters [..]") Reviewed-by: Roi Dayan <roid@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Paul Blakey <paulb@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-05	Merge branch 'asoc-4.17' into asoc-4.18 merge window	Mark Brown	3	-7/+8

2018-06-05	sctp: not allow transport timeout value less than HZ/5 for hb_timer	Xin Long	1	-1/+1
	syzbot reported a rcu_sched self-detected stall on CPU which is caused by too small value set on rto_min with SCTP_RTOINFO sockopt. With this value, hb_timer will get stuck there, as in its timer handler it starts this timer again with this value, then goes to the timer handler again. This problem is there since very beginning, and thanks to Eric for the reproducer shared from a syzbot mail. This patch fixes it by not allowing sctp_transport_timeout to return a smaller value than HZ/5 for hb_timer, which is based on TCP's min rto. Note that it doesn't fix this issue by limiting rto_min, as some users are still using small rto and no proper value was found for it yet. Reported-by: syzbot+3dcd59a1f907245f891f@syzkaller.appspotmail.com Suggested-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-05	bpfilter: switch to CC from HOSTCC	Alexei Starovoitov	4	-0/+22
	check that CC can build executables and use that compiler instead of HOSTCC Suggested-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-05	net/mlx5e: fix error return code in mlx5e_alloc_rq()	Wei Yongjun	1	-1/+3
	Fix to return error code -ENOMEM from the kvzalloc_node() error handling case instead of 0, as done elsewhere in this function. Fixes: 069d11465a80 ("net/mlx5e: RX, Enhance legacy Receive Queue memory scheme") Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-05	net/mlx5e: Make function mlx5e_change_rep_mtu() static	Wei Yongjun	1	-1/+1
	Fixes the following sparse warning: drivers/net/ethernet/mellanox/mlx5/core/en_rep.c:903:5: warning: symbol 'mlx5e_change_rep_mtu' was not declared. Should it be static? Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-05	net: qualcomm: rmnet: Fix use after free while sending command ack	Subash Abhinov Kasiviswanathan	1	-4/+4
	When sending an ack to a command packet, the skb is still referenced after it is sent to the real device. Since the real device could free the skb, the device pointer would be invalid. Also, remove an unnecessary variable. Fixes: ceed73a2cf4a ("drivers: net: ethernet: qualcomm: rmnet: Initial implementation") Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-05	net: ipv6: Generate random IID for addresses on RAWIP devices	Subash Abhinov Kasiviswanathan	2	-1/+7
	RAWIP devices such as rmnet do not have a hardware address and instead require the kernel to generate a random IID for the IPv6 addresses. Signed-off-by: Sean Tranchetti <stranche@codeaurora.org> Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-05	tcp: refactor tcp_ecn_check_ce to remove sk type cast	Yousuk Seung	1	-12/+14
	Refactor tcp_ecn_check_ce and __tcp_ecn_check_ce to accept struct sock* instead of tcp_sock* to clean up type casts. This is a pure refactor patch. Signed-off-by: Yousuk Seung <ysseung@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-05	Merge branch 'bpf-af-xdp-zc-api'	Daniel Borkmann	10	-61/+384
	Björn Töpel says: ==================== This patch serie introduces zerocopy (ZC) support for AF_XDP. Programs using AF_XDP sockets will now receive RX packets without any copies and can also transmit packets without incurring any copies. No modifications to the application are needed, but the NIC driver needs to be modified to support ZC. If ZC is not supported by the driver, the modes introduced in the AF_XDP patch will be used. Using ZC in our micro benchmarks results in significantly improved performance as can be seen in the performance section later in this cover letter. Note that for an untrusted application, HW packet steering to a specific queue pair (the one associated with the application) is a requirement when using ZC, as the application would otherwise be able to see other user space processes' packets. If the HW cannot support the required packet steering you need to use the XDP_SKB mode or the XDP_DRV mode without ZC turned on. The XSKMAP introduced in the AF_XDP patch set can be used to do load balancing in that case. For benchmarking, you can use the xdpsock application from the AF_XDP patch set without any modifications. Say that you would like your UDP traffic from port 4242 to end up in queue 16, that we will enable AF_XDP on. Here, we use ethtool for this: ethtool -N p3p2 rx-flow-hash udp4 fn ethtool -N p3p2 flow-type udp4 src-port 4242 dst-port 4242 \ action 16 Running the rxdrop benchmark in XDP_DRV mode with zerocopy can then be done using: samples/bpf/xdpsock -i p3p2 -q 16 -r -N We have run some benchmarks on a dual socket system with two Broadwell E5 2660 @ 2.0 GHz with hyperthreading turned off. Each socket has 14 cores which gives a total of 28, but only two cores are used in these experiments. One for TR/RX and one for the user space application. The memory is DDR4 @ 2133 MT/s (1067 MHz) and the size of each DIMM is 8192MB and with 8 of those DIMMs in the system we have 64 GB of total memory. The compiler used is gcc (Ubuntu 7.3.0-16ubuntu3) 7.3.0. The NIC is Intel I40E 40Gbit/s using the i40e driver. Below are the results in Mpps of the I40E NIC benchmark runs for 64 and 1500 byte packets, generated by a commercial packet generator HW outputing packets at full 40 Gbit/s line rate. The results are without retpoline so that we can compare against previous numbers. AF_XDP performance 64 byte packets. Results from the AF_XDP V3 patch set are also reported for ease of reference. The numbers within parantheses are from the RFC V1 ZC patch set. Benchmark XDP_SKB XDP_DRV XDP_DRV with zerocopy rxdrop 2.9* 9.6* 21.1(21.5) txpush 2.6* - 22.0(21.6) l2fwd 1.9* 2.5* 15.3(15.0) AF_XDP performance 1500 byte packets: Benchmark XDP_SKB XDP_DRV XDP_DRV with zerocopy rxdrop 2.1* 3.3* 3.3(3.3) l2fwd 1.4* 1.8* 3.1(3.1) * From AF_XDP V3 patch set and cover letter. So why do we not get higher values for RX similar to the 34 Mpps we had in AF_PACKET V4? We made an experiment running the rxdrop benchmark without using the xdp_do_redirect/flush infrastructure nor using an XDP program (all traffic on a queue goes to one socket). Instead the driver acts directly on the AF_XDP socket. With this we got 36.9 Mpps, a significant improvement without any change to the uapi. So not forcing users to have an XDP program if they do not need it, might be a good idea. This measurement is actually higher than what we got with AF_PACKET V4. XDP performance on our system as a base line: 64 byte packets: XDP stats CPU pps issue-pps XDP-RX CPU 16 32.3M 0 1500 byte packets: XDP stats CPU pps issue-pps XDP-RX CPU 16 3.3M 0 The structure of the patch set is as follows: Patches 1-3: Plumbing for AF_XDP ZC support Patches 4-5: AF_XDP ZC for RX Patches 6-7: AF_XDP ZC for TX Patch 8-10: ZC support for i40e. Patch 11: Use the bind flags in sample application to force TX skb path when -S is providedd on the command line. This patch set is based on the new uapi introduced in "AF_XDP: bug fixes and descriptor changes". You need to apply that patch set first, before applying this one. We based this patch set on bpf-next commit bd3a08aaa9a3 ("bpf: flowlabel in bpf_fib_lookup should be flowinfo") Comments: * Implementing dynamic creation and deletion of queues in the i40e driver would facilitate the coexistence of xdp_redirect and af_xdp. Thanks: Björn and Magnus ==================== Note: as agreed upon, i40e/zc bits will be routed via Jeff's tree. Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-06-05	net/ipv6: prevent use after free in ip6_route_mpath_notify	David Ahern	1	-4/+8
	syzbot reported a use-after-free: BUG: KASAN: use-after-free in ip6_route_mpath_notify+0xe9/0x100 net/ipv6/route.c:4180 Read of size 4 at addr ffff8801bf789cf0 by task syz-executor756/4555 CPU: 1 PID: 4555 Comm: syz-executor756 Not tainted 4.17.0-rc7+ #78 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1b9/0x294 lib/dump_stack.c:113 print_address_description+0x6c/0x20b mm/kasan/report.c:256 kasan_report_error mm/kasan/report.c:354 [inline] kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412 __asan_report_load4_noabort+0x14/0x20 mm/kasan/report.c:432 ip6_route_mpath_notify+0xe9/0x100 net/ipv6/route.c:4180 ip6_route_multipath_add+0x615/0x1910 net/ipv6/route.c:4303 inet6_rtm_newroute+0xe3/0x160 net/ipv6/route.c:4391 ... Allocated by task 4555: save_stack+0x43/0xd0 mm/kasan/kasan.c:448 set_track mm/kasan/kasan.c:460 [inline] kasan_kmalloc+0xc4/0xe0 mm/kasan/kasan.c:553 kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:490 kmem_cache_alloc+0x12e/0x760 mm/slab.c:3554 dst_alloc+0xbb/0x1d0 net/core/dst.c:104 __ip6_dst_alloc+0x35/0xa0 net/ipv6/route.c:361 ip6_dst_alloc+0x29/0xb0 net/ipv6/route.c:376 ip6_route_info_create+0x4d4/0x3a30 net/ipv6/route.c:2834 ip6_route_multipath_add+0xc7e/0x1910 net/ipv6/route.c:4240 inet6_rtm_newroute+0xe3/0x160 net/ipv6/route.c:4391 ... Freed by task 4555: save_stack+0x43/0xd0 mm/kasan/kasan.c:448 set_track mm/kasan/kasan.c:460 [inline] __kasan_slab_free+0x11a/0x170 mm/kasan/kasan.c:521 kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528 __cache_free mm/slab.c:3498 [inline] kmem_cache_free+0x86/0x2d0 mm/slab.c:3756 dst_destroy+0x267/0x3c0 net/core/dst.c:140 dst_release_immediate+0x71/0x9e net/core/dst.c:205 fib6_add+0xa40/0x1650 net/ipv6/ip6_fib.c:1305 __ip6_ins_rt+0x6c/0x90 net/ipv6/route.c:1011 ip6_route_multipath_add+0x513/0x1910 net/ipv6/route.c:4267 inet6_rtm_newroute+0xe3/0x160 net/ipv6/route.c:4391 ... The problem is that rt_last can point to a deleted route if the insert fails. One reproducer is to insert a route and then add a multipath route that has a duplicate nexthop.e.g,: $ ip -6 ro add vrf red 2001:db8:101::/64 nexthop via 2001:db8:1::2 $ ip -6 ro append vrf red 2001:db8:101::/64 nexthop via 2001:db8:1::4 nexthop via 2001:db8:1::2 Fix by not setting rt_last until the it is verified the insert succeeded. Fixes: 3b1137fe7482 ("net: ipv6: Change notifications for multipath add to RTA_MULTIPATH") Cc: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David Ahern <dsahern@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-05	samples/bpf: xdpsock: use skb Tx path for XDP_SKB	Björn Töpel	1	-0/+5
	Make sure that XDP_SKB also uses the skb Tx path. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-06-05	xsk: wire upp Tx zero-copy functions	Magnus Karlsson	5	-11/+137
	Here we add the functionality required to support zero-copy Tx, and also exposes various zero-copy related functions for the netdevs. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-06-05	net: added netdevice operation for Tx	Magnus Karlsson	1	-0/+2
	Added ndo_xsk_async_xmit. This ndo "kicks" the netdev to start to pull userland AF_XDP Tx frames from a NAPI context. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-06-05	xsk: add zero-copy support for Rx	Björn Töpel	5	-21/+165
	Extend the xsk_rcv to support the new MEM_TYPE_ZERO_COPY memory, and wireup ndo_bpf call in bind. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-06-05	xdp: add MEM_TYPE_ZERO_COPY	Björn Töpel	2	-5/+24
	Here, a new type of allocator support is added to the XDP return API. A zero-copy allocated xdp_buff cannot be converted to an xdp_frame. Instead is the buff has to be copied. This is not supported at all in this commit. Also, an opaque "handle" is added to xdp_buff. This can be used as a context for the zero-copy allocator implementation. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-06-05	net: xdp: added bpf_netdev_command XDP_{QUERY, SETUP}_XSK_UMEM	Björn Töpel	1	-0/+8
	Extend ndo_bpf with two new commands used for query zero-copy support and register an UMEM to a queue_id of a netdev. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-06-05	xsk: introduce xdp_umem_page	Björn Töpel	3	-4/+21
	The xdp_umem_page holds the address for a page. Trade memory for faster lookup. Later, we'll add DMA address here as well. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-06-05	xsk: moved struct xdp_umem definition	Björn Töpel	4	-24/+26
	Moved struct xdp_umem to xdp_sock.h, in order to prepare for zero-copy support. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-06-05	net: phy: broadcom: Enable 125 MHz clock on LED4 pin for BCM54612E by default.	Kun Yi	2	-2/+18
	BCM54612E have 4 multi-functional LED pins that can be configured through register setting; the LED4 pin can be configured to a 125MHz reference clock output by setting the spare register. Since the dedicated CLK125 reference clock pin is not brought out on the 48-Pin MLP, the LED4 pin is the only pin to provide such function in this package, and therefore it is beneficial to just enable the reference clock by default. Signed-off-by: Kun Yi <kunyi@google.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>