summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)AuthorFilesLines
2019-04-06Merge branch 'akpm' (patches from Andrew)Linus Torvalds4-25/+31
Merge misc fixes from Andrew Morton: "14 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: kernel/sysctl.c: fix out-of-bounds access when setting file-max mm/util.c: fix strndup_user() comment sh: fix multiple function definition build errors MAINTAINERS: add maintainer and replacing reviewer ARM/NUVOTON NPCM MAINTAINERS: fix bad pattern in ARM/NUVOTON NPCM mm: writeback: use exact memcg dirty counts psi: clarify the units used in pressure files mm/huge_memory.c: fix modifying of page protection by insert_pfn_pmd() hugetlbfs: fix memory leak for resv_map mm: fix vm_fault_t cast in VM_FAULT_GET_HINDEX() lib/lzo: fix bugs for very short or empty input include/linux/bitrev.h: fix constant bitrev kmemleak: powerpc: skip scanning holes in the .bss section lib/string.c: implement a basic bcmp
2019-04-06mm: writeback: use exact memcg dirty countsGreg Thelen1-1/+4
Since commit a983b5ebee57 ("mm: memcontrol: fix excessive complexity in memory.stat reporting") memcg dirty and writeback counters are managed as: 1) per-memcg per-cpu values in range of [-32..32] 2) per-memcg atomic counter When a per-cpu counter cannot fit in [-32..32] it's flushed to the atomic. Stat readers only check the atomic. Thus readers such as balance_dirty_pages() may see a nontrivial error margin: 32 pages per cpu. Assuming 100 cpus: 4k x86 page_size: 13 MiB error per memcg 64k ppc page_size: 200 MiB error per memcg Considering that dirty+writeback are used together for some decisions the errors double. This inaccuracy can lead to undeserved oom kills. One nasty case is when all per-cpu counters hold positive values offsetting an atomic negative value (i.e. per_cpu[*]=32, atomic=n_cpu*-32). balance_dirty_pages() only consults the atomic and does not consider throttling the next n_cpu*32 dirty pages. If the file_lru is in the 13..200 MiB range then there's absolutely no dirty throttling, which burdens vmscan with only dirty+writeback pages thus resorting to oom kill. It could be argued that tiny containers are not supported, but it's more subtle. It's the amount the space available for file lru that matters. If a container has memory.max-200MiB of non reclaimable memory, then it will also suffer such oom kills on a 100 cpu machine. The following test reliably ooms without this patch. This patch avoids oom kills. $ cat test mount -t cgroup2 none /dev/cgroup cd /dev/cgroup echo +io +memory > cgroup.subtree_control mkdir test cd test echo 10M > memory.max (echo $BASHPID > cgroup.procs && exec /memcg-writeback-stress /foo) (echo $BASHPID > cgroup.procs && exec dd if=/dev/zero of=/foo bs=2M count=100) $ cat memcg-writeback-stress.c /* * Dirty pages from all but one cpu. * Clean pages from the non dirtying cpu. * This is to stress per cpu counter imbalance. * On a 100 cpu machine: * - per memcg per cpu dirty count is 32 pages for each of 99 cpus * - per memcg atomic is -99*32 pages * - thus the complete dirty limit: sum of all counters 0 * - balance_dirty_pages() only sees atomic count -99*32 pages, which * it max()s to 0. * - So a workload can dirty -99*32 pages before balance_dirty_pages() * cares. */ #define _GNU_SOURCE #include <err.h> #include <fcntl.h> #include <sched.h> #include <stdlib.h> #include <stdio.h> #include <sys/stat.h> #include <sys/sysinfo.h> #include <sys/types.h> #include <unistd.h> static char *buf; static int bufSize; static void set_affinity(int cpu) { cpu_set_t affinity; CPU_ZERO(&affinity); CPU_SET(cpu, &affinity); if (sched_setaffinity(0, sizeof(affinity), &affinity)) err(1, "sched_setaffinity"); } static void dirty_on(int output_fd, int cpu) { int i, wrote; set_affinity(cpu); for (i = 0; i < 32; i++) { for (wrote = 0; wrote < bufSize; ) { int ret = write(output_fd, buf+wrote, bufSize-wrote); if (ret == -1) err(1, "write"); wrote += ret; } } } int main(int argc, char **argv) { int cpu, flush_cpu = 1, output_fd; const char *output; if (argc != 2) errx(1, "usage: output_file"); output = argv[1]; bufSize = getpagesize(); buf = malloc(getpagesize()); if (buf == NULL) errx(1, "malloc failed"); output_fd = open(output, O_CREAT|O_RDWR); if (output_fd == -1) err(1, "open(%s)", output); for (cpu = 0; cpu < get_nprocs(); cpu++) { if (cpu != flush_cpu) dirty_on(output_fd, cpu); } set_affinity(flush_cpu); if (fsync(output_fd)) err(1, "fsync(%s)", output); if (close(output_fd)) err(1, "close(%s)", output); free(buf); } Make balance_dirty_pages() and wb_over_bg_thresh() work harder to collect exact per memcg counters. This avoids the aforementioned oom kills. This does not affect the overhead of memory.stat, which still reads the single atomic counter. Why not use percpu_counter? memcg already handles cpus going offline, so no need for that overhead from percpu_counter. And the percpu_counter spinlocks are more heavyweight than is required. It probably also makes sense to use exact dirty and writeback counters in memcg oom reports. But that is saved for later. Link: http://lkml.kernel.org/r/20190329174609.164344-1-gthelen@google.com Signed-off-by: Greg Thelen <gthelen@google.com> Reviewed-by: Roman Gushchin <guro@fb.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Tejun Heo <tj@kernel.org> Cc: <stable@vger.kernel.org> [4.16+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-04-06mm: fix vm_fault_t cast in VM_FAULT_GET_HINDEX()Jann Horn1-1/+1
Symmetrically to VM_FAULT_SET_HINDEX(), we need a force-cast in VM_FAULT_GET_HINDEX() to tell sparse that this is intentional. Sparse complains about the current code when building a kernel with CONFIG_MEMORY_FAILURE: arch/x86/mm/fault.c:1058:53: warning: restricted vm_fault_t degrades to integer Link: http://lkml.kernel.org/r/20190327204117.35215-1-jannh@google.com Fixes: 3d3539018d2c ("mm: create the new vm_fault_t type") Signed-off-by: Jann Horn <jannh@google.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Souptick Joarder <jrdr.linux@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Rik van Riel <riel@surriel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-04-06include/linux/bitrev.h: fix constant bitrevArnd Bergmann1-23/+23
clang points out with hundreds of warnings that the bitrev macros have a problem with constant input: drivers/hwmon/sht15.c:187:11: error: variable '__x' is uninitialized when used within its own initialization [-Werror,-Wuninitialized] u8 crc = bitrev8(data->val_status & 0x0F); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ include/linux/bitrev.h:102:21: note: expanded from macro 'bitrev8' __constant_bitrev8(__x) : \ ~~~~~~~~~~~~~~~~~~~^~~~ include/linux/bitrev.h:67:11: note: expanded from macro '__constant_bitrev8' u8 __x = x; \ ~~~ ^ Both the bitrev and the __constant_bitrev macros use an internal variable named __x, which goes horribly wrong when passing one to the other. The obvious fix is to rename one of the variables, so this adds an extra '_'. It seems we got away with this because - there are only a few drivers using bitrev macros - usually there are no constant arguments to those - when they are constant, they tend to be either 0 or (unsigned)-1 (drivers/isdn/i4l/isdnhdlc.o, drivers/iio/amplifiers/ad8366.c) and give the correct result by pure chance. In fact, the only driver that I could find that gets different results with this is drivers/net/wan/slic_ds26522.c, which in turn is a driver for fairly rare hardware (adding the maintainer to Cc for testing). Link: http://lkml.kernel.org/r/20190322140503.123580-1-arnd@arndb.de Fixes: 556d2f055bf6 ("ARM: 8187/1: add CONFIG_HAVE_ARCH_BITREVERSE to support rbit instruction") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Cc: Zhao Qiang <qiang.zhao@nxp.com> Cc: Yalin Wang <yalin.wang@sonymobile.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-04-06lib/string.c: implement a basic bcmpNick Desaulniers1-0/+3
A recent optimization in Clang (r355672) lowers comparisons of the return value of memcmp against zero to comparisons of the return value of bcmp against zero. This helps some platforms that implement bcmp more efficiently than memcmp. glibc simply aliases bcmp to memcmp, but an optimized implementation is in the works. This results in linkage failures for all targets with Clang due to the undefined symbol. For now, just implement bcmp as a tailcail to memcmp to unbreak the build. This routine can be further optimized in the future. Other ideas discussed: * A weak alias was discussed, but breaks for architectures that define their own implementations of memcmp since aliases to declarations are not permitted (only definitions). Arch-specific memcmp implementations typically declare memcmp in C headers, but implement them in assembly. * -ffreestanding also is used sporadically throughout the kernel. * -fno-builtin-bcmp doesn't work when doing LTO. Link: https://bugs.llvm.org/show_bug.cgi?id=41035 Link: https://code.woboq.org/userspace/glibc/string/memcmp.c.html#bcmp Link: https://github.com/llvm/llvm-project/commit/8e16d73346f8091461319a7dfc4ddd18eedcff13 Link: https://github.com/ClangBuiltLinux/linux/issues/416 Link: http://lkml.kernel.org/r/20190313211335.165605-1-ndesaulniers@google.com Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Reported-by: Nathan Chancellor <natechancellor@gmail.com> Reported-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> Suggested-by: Arnd Bergmann <arnd@arndb.de> Suggested-by: James Y Knight <jyknight@google.com> Suggested-by: Masahiro Yamada <yamada.masahiro@socionext.com> Suggested-by: Nathan Chancellor <natechancellor@gmail.com> Suggested-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Reviewed-by: Nathan Chancellor <natechancellor@gmail.com> Tested-by: Nathan Chancellor <natechancellor@gmail.com> Reviewed-by: Masahiro Yamada <yamada.masahiro@socionext.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: David Laight <David.Laight@ACULAB.COM> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-04-06Merge tag 'trace-5.1-rc3' of ↵Linus Torvalds1-3/+8
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull syscall-get-arguments cleanup and fixes from Steven Rostedt: "Andy Lutomirski approached me to tell me that the syscall_get_arguments() implementation in x86 was horrible and gcc certainly gets it wrong. He said that since the tracepoints only pass in 0 and 6 for i and n repectively, it should be optimized for that case. Inspecting the kernel, I discovered that all users pass in 0 for i and only one file passing in something other than 6 for the number of arguments. That code happens to be my own code used for the special syscall tracing. That can easily be converted to just using 0 and 6 as well, and only copying what is needed. Which is probably the faster path anyway for that case. Along the way, a couple of real fixes came from this as the syscall_get_arguments() function was incorrect for csky and riscv. x86 has been optimized to for the new interface that removes the variable number of arguments, but the other architectures could still use some loving and take more advantage of the simpler interface" * tag 'trace-5.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: syscalls: Remove start and number from syscall_set_arguments() args syscalls: Remove start and number from syscall_get_arguments() args csky: Fix syscall_get_arguments() and syscall_set_arguments() riscv: Fix syscall_get_arguments() and syscall_set_arguments() tracing/syscalls: Pass in hardcoded 6 into syscall_get_arguments() ptrace: Remove maxargs from task_current_syscall()
2019-04-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller11-32/+39
Minor comment merge conflict in mlx5. Staging driver has a fixup due to the skb->xmit_more changes in 'net-next', but was removed in 'net'. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-06net/mlx5: Handle event of power detection in the PCIE slotAya Levin1-0/+1
Handle event of power state change in the PCIE slot. When the event occurs, check if query power state and PCI power fields is supported. If so, read these fields from MPEIN (management PCIE info) register and issue a corresponding message. Signed-off-by: Aya Levin <ayal@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-04-06Merge branch 'mlx5-next' of ↵Saeed Mahameed4-32/+63
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux This merge commit includes some misc shared code updates from mlx5-next branch needed for net-next. 1) From Maxim, Remove un-used macros and spinlock from mlx5 code. 2) From Aya, Expose Management PCIE info register layout and add rate limit print macros. 3) From Tariq, Compilation warning fix in fs_core.c 4) From Vu, Huy and Saeed, Improve mlx5 initialization flow: The goal is to provide a better logical separation of mlx5 core device initialization flow and will help to seamlessly support creating different mlx5 device types such as PF, VF and SF mlx5 sub-function virtual devices. Mlx5_core driver needs to separate HCA resources from pci resources. Its initialize/load/unload will be broken into stages: 1. Initialize common data structures 2. Setup function which initializes pci resources (for PF/VF) or some other specific resources for virtual device 3. Initialize software objects according to hardware capabilities 4. Load all mlx5_core components It is also necessary to detach mlx5_core mdev name/message from pci device mdev->pdev name/message for a clearer report/debug of different mlx5 device types. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-04-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds2-1/+3
Pull networking fixes from David Miller: 1) Several hash table refcount fixes in batman-adv, from Sven Eckelmann. 2) Use after free in bpf_evict_inode(), from Daniel Borkmann. 3) Fix mdio bus registration in ixgbe, from Ivan Vecera. 4) Unbounded loop in __skb_try_recv_datagram(), from Paolo Abeni. 5) ila rhashtable corruption fix from Herbert Xu. 6) Don't allow upper-devices to be added to vrf devices, from Sabrina Dubroca. 7) Add qmi_wwan device ID for Olicard 600, from Bjørn Mork. 8) Don't leave skb->next poisoned in __netif_receive_skb_list_ptype, from Alexander Lobakin. 9) Missing IDR checks in mlx5 driver, from Aditya Pakki. 10) Fix false connection termination in ktls, from Jakub Kicinski. 11) Work around some ASPM issues with r8169 by disabling rx interrupt coalescing on certain chips. From Heiner Kallweit. 12) Properly use per-cpu qstat values on NOLOCK qdiscs, from Paolo Abeni. 13) Fully initialize sockaddr_in structures in SCTP, from Xin Long. 14) Various BPF flow dissector fixes from Stanislav Fomichev. 15) Divide by zero in act_sample, from Davide Caratti. 16) Fix bridging multicast regression introduced by rhashtable conversion, from Nikolay Aleksandrov. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (106 commits) ibmvnic: Fix completion structure initialization ipv6: sit: reset ip header pointer in ipip6_rcv net: bridge: always clear mcast matching struct on reports and leaves libcxgb: fix incorrect ppmax calculation vlan: conditional inclusion of FCoE hooks to match netdevice.h and bnx2x sch_cake: Make sure we can write the IP header before changing DSCP bits sch_cake: Use tc_skb_protocol() helper for getting packet protocol tcp: Ensure DCTCP reacts to losses net/sched: act_sample: fix divide by zero in the traffic path net: thunderx: fix NULL pointer dereference in nicvf_open/nicvf_stop net: hns: Fix sparse: some warnings in HNS drivers net: hns: Fix WARNING when remove HNS driver with SMMU enabled net: hns: fix ICMP6 neighbor solicitation messages discard problem net: hns: Fix probabilistic memory overwrite when HNS driver initialized net: hns: Use NAPI_POLL_WEIGHT for hns driver net: hns: fix KASAN: use-after-free in hns_nic_net_xmit_hw() flow_dissector: rst'ify documentation ipv6: Fix dangling pointer when ipv6 fragment net-gro: Fix GRO flush when receiving a GSO packet. flow_dissector: document BPF flow dissector environment ...
2019-04-05net: use correct this_cpu primitive in dev_recursion_levelFlorian Westphal1-1/+1
syzbot reports: BUG: using __this_cpu_read() in preemptible code: caller is dev_recursion_level include/linux/netdevice.h:3052 [inline] __this_cpu_preempt_check+0x246/0x270 lib/smp_processor_id.c:47 dev_recursion_level include/linux/netdevice.h:3052 [inline] ip6_skb_dst_mtu include/net/ip6_route.h:245 [inline] I erronously downgraded a this_cpu_read to __this_cpu_read when moving dev_recursion_level() around. Reported-by: syzbot+51471b4aae195285a4a3@syzkaller.appspotmail.com Fixes: 97cdcf37b57e ("net: place xmit recursion in softnet data") Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-04ptrace: Remove maxargs from task_current_syscall()Steven Rostedt (Red Hat)1-3/+8
task_current_syscall() has a single user that passes in 6 for maxargs, which is the maximum arguments that can be used to get system calls from syscall_get_arguments(). Instead of passing in a number of arguments to grab, just get 6 arguments. The args argument even specifies that it's an array of 6 items. This will also allow changing syscall_get_arguments() to not get a variable number of arguments, but always grab 6. Linus also suggested not passing in a bunch of arguments to task_current_syscall() but to instead pass in a pointer to a structure, and just fill the structure. struct seccomp_data has almost all the parameters that is needed except for the stack pointer (sp). As seccomp_data is part of uapi, and I'm afraid to change it, a new structure was created "syscall_info", which includes seccomp_data and adds the "sp" field. Link: http://lkml.kernel.org/r/20161107213233.466776454@goodmis.org Cc: Andy Lutomirski <luto@kernel.org> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-04-04net: phy: fix autoneg mismatch case in genphy_read_statusHeiner Kallweit1-0/+1
The original patch didn't consider the case that autoneg process finishes successfully but both link partners have no mode in common. In this case there's no link, nevertheless we may be interested in what the link partner advertised. Like phydev->link we set phydev->autoneg_complete in genphy_update_link() and use the stored value in genphy_read_status(). This way we don't have to read register BMSR again. Fixes: b6163f194c69 ("net: phy: improve genphy_read_status") Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-04bpf: increase complexity limit and maximum program sizeAlexei Starovoitov1-0/+1
Large verifier speed improvements allow to increase verifier complexity limit. Now regardless of the program composition and its size it takes little time for the verifier to hit insn_processed limit. On typical x86 machine non-debug kernel processes 1M instructions in 1/10 of a second. (before these speed improvements specially crafted programs could be hitting multi-second verification times) Full kasan kernel with debug takes ~1 second for the same 1M insns. Hence bump the BPF_COMPLEXITY_LIMIT_INSNS limit to 1M. Also increase the number of instructions per program from 4k to internal BPF_COMPLEXITY_LIMIT_INSNS limit. 4k limit was confusing to users, since small programs with hundreds of insns could be hitting BPF_COMPLEXITY_LIMIT_INSNS limit. Sometimes adding more insns and bpf_trace_printk debug statements would make the verifier accept the program while removing code would make the verifier reject it. Some user space application started to add #define MAX_FOO to their programs and do: MAX_FOO=100; again: compile with MAX_FOO; try to load; if (fails_to_load) { reduce MAX_FOO; goto again; } to be able to fit maximum amount of processing into single program. Other users artificially split their single program into a set of programs and use all 32 iterations of tail_calls to increase compute limits. And the most advanced folks used unlimited tc-bpf filter list to execute many bpf programs. Essentially the users managed to workaround 4k insn limit. This patch removes the limit for root programs from uapi. BPF_COMPLEXITY_LIMIT_INSNS is the kernel internal limit and success to load the program no longer depends on program size, but on 'smartness' of the verifier only. The verifier will continue to get smarter with every kernel release. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-04-04bpf: improve verification speed by droping statesAlexei Starovoitov1-0/+2
Branch instructions, branch targets and calls in a bpf program are the places where the verifier remembers states that led to successful verification of the program. These states are used to prune brute force program analysis. For unprivileged programs there is a limit of 64 states per such 'branching' instructions (maximum length is tracked by max_states_per_insn counter introduced in the previous patch). Simply reducing this threshold to 32 or lower increases insn_processed metric to the point that small valid programs get rejected. For root programs there is no limit and cilium programs can have max_states_per_insn to be 100 or higher. Walking 100+ states multiplied by number of 'branching' insns during verification consumes significant amount of cpu time. Turned out simple LRU-like mechanism can be used to remove states that unlikely will be helpful in future search pruning. This patch introduces hit_cnt and miss_cnt counters: hit_cnt - this many times this state successfully pruned the search miss_cnt - this many times this state was not equivalent to other states (and that other states were added to state list) The heuristic introduced in this patch is: if (sl->miss_cnt > sl->hit_cnt * 3 + 3) /* drop this state from future considerations */ Higher numbers increase max_states_per_insn (allow more states to be considered for pruning) and slow verification speed, but do not meaningfully reduce insn_processed metric. Lower numbers drop too many states and insn_processed increases too much. Many different formulas were considered. This one is simple and works well enough in practice. (the analysis was done on selftests/progs/* and on cilium programs) The end result is this heuristic improves verification speed by 10 times. Large synthetic programs that used to take a second more now take 1/10 of a second. In cases where max_states_per_insn used to be 100 or more, now it's ~10. There is a slight increase in insn_processed for cilium progs: before after bpf_lb-DLB_L3.o 1831 1838 bpf_lb-DLB_L4.o 3029 3218 bpf_lb-DUNKNOWN.o 1064 1064 bpf_lxc-DDROP_ALL.o 26309 26935 bpf_lxc-DUNKNOWN.o 33517 34439 bpf_netdev.o 9713 9721 bpf_overlay.o 6184 6184 bpf_lcx_jit.o 37335 39389 And 2-3 times improvement in the verification speed. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-04-04bpf: add verifier stats and log_level bit 2Alexei Starovoitov1-0/+21
In order to understand the verifier bottlenecks add various stats and extend log_level: log_level 1 and 2 are kept as-is: bit 0 - level=1 - print every insn and verifier state at branch points bit 1 - level=2 - print every insn and verifier state at every insn bit 2 - level=4 - print verifier error and stats at the end of verification When verifier rejects the program the libbpf is trying to load the program twice. Once with log_level=0 (no messages, only error code is reported to user space) and second time with log_level=1 to tell the user why the verifier rejected it. With introduction of bit 2 - level=4 the libbpf can choose to always use that level and load programs once, since the verification speed is not affected and in case of error the verbose message will be available. Note that the verifier stats are not part of uapi just like all other verbose messages. They're expected to change in the future. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-04-03linux/kernel.h: Use parentheses around argument in u64_to_user_ptr()Jann Horn1-2/+2
Use parentheses around uses of the argument in u64_to_user_ptr() to ensure that the cast doesn't apply to part of the argument. There are existing uses of the macro of the form u64_to_user_ptr(A + B) which expands to (void __user *)(uintptr_t)A + B (the cast applies to the first operand of the addition, the addition is a pointer addition). This happens to still work as intended, the semantic difference doesn't cause a difference in behavior. But I want to use u64_to_user_ptr() with a ternary operator in the argument, like so: u64_to_user_ptr(A ? B : C) This currently doesn't work as intended. Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Mukesh Ojha <mojha@codeaurora.org> Cc: Andrei Vagin <avagin@openvz.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jani Nikula <jani.nikula@intel.com> Cc: Kees Cook <keescook@chromium.org> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: NeilBrown <neilb@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Qiaowei Ren <qiaowei.ren@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20190329214652.258477-1-jannh@google.com
2019-04-02net: phy: add genphy_read_abilitiesHeiner Kallweit1-0/+1
Similar to genphy_c45_pma_read_abilities() add a function to dynamically detect the abilities of a Clause 22 PHY. This is mainly copied from genphy_config_init(). Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-02net/mlx5: Expose MPEIN (Management PCIE INfo) register layoutAya Levin2-1/+51
Expose PRM layout for handling MPEIN (Management PCIE Info). It will be used in the downstream patch for querying MPEIN via the driver. Signed-off-by: Aya Levin <ayal@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-04-02net/mlx5: Add explicit bar address fieldHuy Nguyen1-0/+1
Add bar_addr field to store bar-0 address to avoid calling pci_resource_start with hard-coded bar-0 as parameter. Also note that different mlx5 device types will have bar_addr on different bars. This patch does not change any functionality. Signed-off-by: Huy Nguyen <huyn@mellanox.com> Signed-off-by: Vu Pham <vuhuong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-04-02net/mlx5: Move health and page alloc init to mdev_initSaeed Mahameed1-0/+1
Software structure initialization should be in mdev_init stage. This provides a better logical separation of mlx5 core device initialization flow and will help to seamlessly support creating different mlx5 device types such as PF, VF and SF mlx5 sub-function virtual device. This patch does not change any functionality. Signed-off-by: Vu Pham <vuhuong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-04-02net/mlx5: Remove spinlock support from mlx5_write64Maxim Mikityanskiy2-23/+10
As there is no user of mlx5_write64 that passes a spinlock to mlx5_write64, remove this functionality and simplify the function. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-04-02net/mlx5: Remove unused MLX5_*_DOORBELL_LOCK macrosMaxim Mikityanskiy1-8/+0
MLX5_*_DOORBELL_LOCK macros provided a way to avoid locking for mlx5_write64 on 64-bit platforms where it's not necessary. Currently all calls to mlx5_write64 don't use a spinlock, so the macros became unused. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-04-02drivers: net: aurora: use netdev_xmit_more helperFlorian Westphal1-2/+0
This is the last driver using always-0 skb->xmit_more. Switch it to netdev_xmit_more and remove the now unused xmit_more flag from sk_buff. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-02net: move skb->xmit_more hint to softnet dataFlorian Westphal1-1/+1
There are two reasons for this. First, the xmit_more flag conceptually doesn't fit into the skb, as xmit_more is not a property related to the skb. Its only a hint to the driver that the stack is about to transmit another packet immediately. Second, it was only done this way to not have to pass another argument to ndo_start_xmit(). We can place xmit_more in the softnet data, next to the device recursion. The recursion counter is already written to on each transmit. The "more" indicator is placed right next to it. Drivers can use the netdev_xmit_more() helper instead of skb->xmit_more to check the "more packets coming" hint. skb->xmit_more is retained (but always 0) to not cause build breakage. This change takes care of the simple s/skb->xmit_more/netdev_xmit_more()/ conversions. Remaining drivers are converted in the next patches. Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-02net: place xmit recursion in softnet dataFlorian Westphal1-8/+32
This fills a hole in softnet data, so no change in structure size. Also prepares for xmit_more placement in the same spot; skb->xmit_more will be removed in followup patch. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-31Merge branch 'core-urgent-for-linus' of ↵Linus Torvalds1-11/+0
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core fixes from Thomas Gleixner: "A small set of core updates: - Make the watchdog respect the selected CPU mask again. That was broken by the rework of the watchdog thread management and caused inconsistent state and NMI watchdog being unstoppable. - Ensure that the objtool build can find the libelf location. - Remove dead kcore stub code" * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: watchdog: Respect watchdog cpumask on CPU hotplug objtool: Query pkg-config for libelf location proc/kcore: Remove unused kclist_add_remap()
2019-03-30Merge tag 'gpio-v5.1-2' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio Pull GPIO fixes from Linus Walleij: "As you can see [in the git history] I was away on leave and Bartosz kindly stepped in and collected a slew of fixes, I pulled them into my tree in two sets and merged some two more fixes (fixing my own caused bugs) on top. Summary: - Revert the extended use of gpio_set_config() and think about how we can do this properly. - Fix up the SPI CS GPIO handling so it now works properly on the SPI bus children, as intended. - Error paths and driver fixes" * tag 'gpio-v5.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio: gpio: mockup: use simple_read_from_buffer() in debugfs read callback gpio: of: Fix of_gpiochip_add() error path gpio: of: Check for "spi-cs-high" in child instead of parent node gpio: of: Check propname before applying "cs-gpios" quirks gpio: mockup: fix debugfs read Revert "gpio: use new gpio_set_config() helper in more places" gpio: aspeed: fix a potential NULL pointer dereference gpio: amd-fch: Fix bogus SPDX identifier gpio: adnp: Fix testing wrong value in adnp_gpio_direction_input gpio: exar: add a check for the return value of ida_simple_get fails
2019-03-30Merge branch 'akpm' (patches from Andrew)Linus Torvalds5-12/+28
Merge misc fixes from Andrew Morton: "22 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (22 commits) fs/proc/proc_sysctl.c: fix NULL pointer dereference in put_links fs: fs_parser: fix printk format warning checkpatch: add %pt as a valid vsprintf extension mm/migrate.c: add missing flush_dcache_page for non-mapped page migrate drivers/block/zram/zram_drv.c: fix idle/writeback string compare mm/page_isolation.c: fix a wrong flag in set_migratetype_isolate() mm/memory_hotplug.c: fix notification in offline error path ptrace: take into account saved_sigmask in PTRACE{GET,SET}SIGMASK fs/proc/kcore.c: make kcore_modules static include/linux/list.h: fix list_is_first() kernel-doc mm/debug.c: fix __dump_page when mapping->host is not set mm: mempolicy: make mbind() return -EIO when MPOL_MF_STRICT is specified include/linux/hugetlb.h: convert to use vm_fault_t iommu/io-pgtable-arm-v7s: request DMA32 memory, and improve debugging mm: add support for kmem caches in DMA32 zone ocfs2: fix inode bh swapping mixup in ocfs2_reflink_inodes_lock mm/hotplug: fix offline undo_isolate_page_range() fs/open.c: allow opening only regular files during execve() mailmap: add Changbin Du mm/debug.c: add a cast to u64 for atomic64_read() ...
2019-03-30Merge tag 'mlx5-fixes-2019-03-29' of ↵David S. Miller1-0/+2
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== Mellanox, mlx5 fixes 2019-03-29 This series introduces some fixes to mlx5 driver. Please pull and let me know if there is any problem. For -stable v4.11 ('net/mlx5: Decrease default mr cache size') For -stable v4.12 ('net/mlx5e: Add a lock on tir list') For -stable v4.13 ('net/mlx5e: Fix error handling when refreshing TIRs') For -stable v4.18 ('net/mlx5e: Update xon formula') For -stable v4.19 ('net: mlx5: Add a missing check on idr_find, free buf') ('net/mlx5e: Update xoff formula') net-next merge Note: When merged with net-next the following simple conflict will appear, drivers/net/ethernet/mellanox/mlx5/core/en/port_buffer.c ++<<<<<<< HEAD (net) + * max_mtu: netdev's max_mtu ++======= + * @mtu: device's MTU ++>>>>>>> net-next To resolve: just replace the line in net-next * @mtu: device's MTU to * @max_mtu: netdev's max_mtu ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-30Merge tag 'driver-core-5.1-rc3' of ↵Linus Torvalds1-2/+0
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core fix from Greg KH: "Here is a single driver core patch for 5.1-rc3. After 5.1-rc1, all of the users of BUS_ATTR() are finally removed, so we can now drop this macro from include/linux/device.h so that no more new users will be created. This patch has been in linux-next for a while, with no reported issues" * tag 'driver-core-5.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: driver core: remove BUS_ATTR()
2019-03-30Merge tag 'char-misc-5.1-rc3' of ↵Linus Torvalds1-5/+7
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc driver fixes from Greg KH: "Here are some binder, habanalabs, and vboxguest driver fixes for 5.1-rc3. The Binder fixes resolve some reported issues found by testing, first by the selinux developers, and then earlier today by syzbot. The habanalabs fixes are all minor, resolving a number of tiny things. The vboxguest patches are a bit larger. They resolve the fact that virtual box decided to change their api in their latest release in a way that broke the existing kernel code, despite saying that they were never going to do that. So this is a bit of a "new feature", but is good to get merged so that 5.1 will work with the latest release. The changes are not large and of course virtual box "swears" they will not break this again, but no one is holding their breath here. All of these have been in linux-next for a while with no reported issues" * tag 'char-misc-5.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: virt: vbox: Implement passing requestor info to the host for VirtualBox 6.0.x binder: fix race between munmap() and direct reclaim binder: fix BUG_ON found by selinux-testsuite habanalabs: cast to expected type habanalabs: prevent host crash during suspend/resume habanalabs: perform accounting for active CS habanalabs: fix mapping with page size bigger than 4KB habanalabs: complete user context cleanup before hard reset habanalabs: fix bug when mapping very large memory area habanalabs: fix MMU number of pages calculation
2019-03-29net/mlx5e: Add a lock on tir listYuval Avnery1-0/+2
Refresh tirs is looping over a global list of tirs while netdevs are adding and removing tirs from that list. That is why a lock is required. Fixes: 724b2aa15126 ("net/mlx5e: TIRs management refactoring") Signed-off-by: Yuval Avnery <yuvalav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-03-29ipv4: Move IN_DEV_IGNORE_ROUTES_WITH_LINKDOWN to helperDavid Ahern1-0/+14
in_dev lookup followed by IN_DEV_IGNORE_ROUTES_WITH_LINKDOWN check is called in several places, some with the rcu lock and others with the rtnl held. Move the check to a helper similar to what IPv6 has. Since the helper can be invoked from either context use rcu_dereference_rtnl to dereference ip_ptr. Signed-off-by: David Ahern <dsahern@gmail.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-29ptrace: take into account saved_sigmask in PTRACE{GET,SET}SIGMASKAndrei Vagin1-0/+18
There are a few system calls (pselect, ppoll, etc) which replace a task sigmask while they are running in a kernel-space When a task calls one of these syscalls, the kernel saves a current sigmask in task->saved_sigmask and sets a syscall sigmask. On syscall-exit-stop, ptrace traps a task before restoring the saved_sigmask, so PTRACE_GETSIGMASK returns the syscall sigmask and PTRACE_SETSIGMASK does nothing, because its sigmask is replaced by saved_sigmask, when the task returns to user-space. This patch fixes this problem. PTRACE_GETSIGMASK returns saved_sigmask if it's set. PTRACE_SETSIGMASK drops the TIF_RESTORE_SIGMASK flag. Link: http://lkml.kernel.org/r/20181120060616.6043-1-avagin@gmail.com Fixes: 29000caecbe8 ("ptrace: add ability to get/set signal-blocked mask") Signed-off-by: Andrei Vagin <avagin@gmail.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-29include/linux/list.h: fix list_is_first() kernel-docRandy Dunlap1-1/+1
Fix typo of kernel-doc parameter notation (there should be no space between '@' and the parameter name). Also fixes bogus kernel-doc notation output formatting. Link: http://lkml.kernel.org/r/ddce8b80-9a8a-d52d-3546-87b2211c089a@infradead.org Fixes: 70b44595eafe9 ("mm, compaction: use free lists to quickly locate a migration source") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Acked-by: Mel Gorman <mgorman@techsingularity.net> Reviewed-by: William Kucharski <william.kucharski@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-29include/linux/hugetlb.h: convert to use vm_fault_tSouptick Joarder1-1/+7
kbuild produces the below warning: tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master head: 5453a3df2a5eb49bc24615d4cf0d66b2aae05e5f commit 3d3539018d2c ("mm: create the new vm_fault_t type") reproduce: # apt-get install sparse git checkout 3d3539018d2cbd12e5af4a132636ee7fd8d43ef0 make ARCH=x86_64 allmodconfig make C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' >> mm/memory.c:3968:21: sparse: incorrect type in assignment (different >> base types) @@ expected restricted vm_fault_t [usertype] ret @@ >> got e] ret @@ mm/memory.c:3968:21: expected restricted vm_fault_t [usertype] ret mm/memory.c:3968:21: got int This patch converts to return vm_fault_t type for hugetlb_fault() when CONFIG_HUGETLB_PAGE=n. Regarding the sparse warning, Luc said: : This is the expected behaviour. The constant 0 is magic regarding bitwise : types but ({ ...; 0; }) is not, it is just an ordinary expression of type : 'int'. : : So, IMHO, Souptick's patch is the right thing to do. Link: http://lkml.kernel.org/r/20190318162604.GA31553@jordon-HP-15-Notebook-PC Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Luc Van Oostenryck <luc.vanoostenryck@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-29mm: add support for kmem caches in DMA32 zoneNicolas Boichat1-0/+2
Patch series "iommu/io-pgtable-arm-v7s: Use DMA32 zone for page tables", v6. This is a followup to the discussion in [1], [2]. IOMMUs using ARMv7 short-descriptor format require page tables (level 1 and 2) to be allocated within the first 4GB of RAM, even on 64-bit systems. For L1 tables that are bigger than a page, we can just use __get_free_pages with GFP_DMA32 (on arm64 systems only, arm would still use GFP_DMA). For L2 tables that only take 1KB, it would be a waste to allocate a full page, so we considered 3 approaches: 1. This series, adding support for GFP_DMA32 slab caches. 2. genalloc, which requires pre-allocating the maximum number of L2 page tables (4096, so 4MB of memory). 3. page_frag, which is not very memory-efficient as it is unable to reuse freed fragments until the whole page is freed. [3] This series is the most memory-efficient approach. stable@ note: We confirmed that this is a regression, and IOMMU errors happen on 4.19 and linux-next/master on MT8173 (elm, Acer Chromebook R13). The issue most likely starts from commit ad67f5a6545f ("arm64: replace ZONE_DMA with ZONE_DMA32"), i.e. 4.15, and presumably breaks a number of Mediatek platforms (and maybe others?). [1] https://lists.linuxfoundation.org/pipermail/iommu/2018-November/030876.html [2] https://lists.linuxfoundation.org/pipermail/iommu/2018-December/031696.html [3] https://patchwork.codeaurora.org/patch/671639/ This patch (of 3): IOMMUs using ARMv7 short-descriptor format require page tables to be allocated within the first 4GB of RAM, even on 64-bit systems. On arm64, this is done by passing GFP_DMA32 flag to memory allocation functions. For IOMMU L2 tables that only take 1KB, it would be a waste to allocate a full page using get_free_pages, so we considered 3 approaches: 1. This patch, adding support for GFP_DMA32 slab caches. 2. genalloc, which requires pre-allocating the maximum number of L2 page tables (4096, so 4MB of memory). 3. page_frag, which is not very memory-efficient as it is unable to reuse freed fragments until the whole page is freed. This change makes it possible to create a custom cache in DMA32 zone using kmem_cache_create, then allocate memory using kmem_cache_alloc. We do not create a DMA32 kmalloc cache array, as there are currently no users of kmalloc(..., GFP_DMA32). These calls will continue to trigger a warning, as we keep GFP_DMA32 in GFP_SLAB_BUG_MASK. This implies that calls to kmem_cache_*alloc on a SLAB_CACHE_DMA32 kmem_cache must _not_ use GFP_DMA32 (it is anyway redundant and unnecessary). Link: http://lkml.kernel.org/r/20181210011504.122604-2-drinkcat@chromium.org Signed-off-by: Nicolas Boichat <drinkcat@chromium.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Will Deacon <will.deacon@arm.com> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Sasha Levin <Alexander.Levin@microsoft.com> Cc: Huaisheng Ye <yehs1@lenovo.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Yong Wu <yong.wu@mediatek.com> Cc: Matthias Brugger <matthias.bgg@gmail.com> Cc: Tomasz Figa <tfiga@google.com> Cc: Yingjoe Chen <yingjoe.chen@mediatek.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Hsin-Yi Wang <hsinyi@chromium.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-29mm/hotplug: fix offline undo_isolate_page_range()Qian Cai1-10/+0
Commit f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded memory to zones until online") introduced move_pfn_range_to_zone() which calls memmap_init_zone() during onlining a memory block. memmap_init_zone() will reset pagetype flags and makes migrate type to be MOVABLE. However, in __offline_pages(), it also call undo_isolate_page_range() after offline_isolated_pages() to do the same thing. Due to commit 2ce13640b3f4 ("mm: __first_valid_page skip over offline pages") changed __first_valid_page() to skip offline pages, undo_isolate_page_range() here just waste CPU cycles looping around the offlining PFN range while doing nothing, because __first_valid_page() will return NULL as offline_isolated_pages() has already marked all memory sections within the pfn range as offline via offline_mem_sections(). Also, after calling the "useless" undo_isolate_page_range() here, it reaches the point of no returning by notifying MEM_OFFLINE. Those pages will be marked as MIGRATE_MOVABLE again once onlining. The only thing left to do is to decrease the number of isolated pageblocks zone counter which would make some paths of the page allocation slower that the above commit introduced. Even if alloc_contig_range() can be used to isolate 16GB-hugetlb pages on ppc64, an "int" should still be enough to represent the number of pageblocks there. Fix an incorrect comment along the way. [cai@lca.pw: v4] Link: http://lkml.kernel.org/r/20190314150641.59358-1-cai@lca.pw Link: http://lkml.kernel.org/r/20190313143133.46200-1-cai@lca.pw Fixes: 2ce13640b3f4 ("mm: __first_valid_page skip over offline pages") Signed-off-by: Qian Cai <cai@lca.pw> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: <stable@vger.kernel.org> [4.13+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-28net: replace ndo_get_devlink with ndo_get_devlink_portJiri Pirko1-3/+3
Follow-up patch is going to need a devlink port instance according to a netdev. Devlink port instance should be always available when devlink is used. So change the recently introduced ndo_get_devlink to ndo_get_devlink_port. With that, adjust the wrapper for the only user to get devlink pointer. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Michal Kubecek <mkubecek@suse.cz> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-28Merge tag 'v5.1-rc2' into core/urgent, to resolve a conflictIngo Molnar10-13/+32
Conflicts: include/linux/kcore.h Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-03-28net: mii: Fix PAUSE cap advertisement from linkmode_adv_to_lcl_adv_t() helperClaudiu Manoil1-1/+1
With a recent link mode advertisement code update this helper providing local pause capability translation used for flow control link mode negotiation got broken. For eth drivers using this helper, the issue is apparent only if either PAUSE or ASYM_PAUSE is being advertised. Fixes: 3c1bcc8614db ("net: ethernet: Convert phydev advertize and supported from u32 to link mode") Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-28Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller24-43/+154
2019-03-28inet: switch IP ID generator to siphashEric Dumazet1-0/+5
According to Amit Klein and Benny Pinkas, IP ID generation is too weak and might be used by attackers. Even with recent net_hash_mix() fix (netns: provide pure entropy for net_hash_mix()) having 64bit key and Jenkins hash is risky. It is time to switch to siphash and its 128bit keys. Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Amit Klein <aksecurity@gmail.com> Reported-by: Benny Pinkas <benny@pinkas.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds6-8/+69
Pull networking fixes from David Miller: "Fixes here and there, a couple new device IDs, as usual: 1) Fix BQL race in dpaa2-eth driver, from Ioana Ciornei. 2) Fix 64-bit division in iwlwifi, from Arnd Bergmann. 3) Fix documentation for some eBPF helpers, from Quentin Monnet. 4) Some UAPI bpf header sync with tools, also from Quentin Monnet. 5) Set descriptor ownership bit at the right time for jumbo frames in stmmac driver, from Aaro Koskinen. 6) Set IFF_UP properly in tun driver, from Eric Dumazet. 7) Fix load/store doubleword instruction generation in powerpc eBPF JIT, from Naveen N. Rao. 8) nla_nest_start() return value checks all over, from Kangjie Lu. 9) Fix asoc_id handling in SCTP after the SCTP_*_ASSOC changes this merge window. From Marcelo Ricardo Leitner and Xin Long. 10) Fix memory corruption with large MTUs in stmmac, from Aaro Koskinen. 11) Do not use ipv4 header for ipv6 flows in TCP and DCCP, from Eric Dumazet. 12) Fix topology subscription cancellation in tipc, from Erik Hugne. 13) Memory leak in genetlink error path, from Yue Haibing. 14) Valid control actions properly in packet scheduler, from Davide Caratti. 15) Even if we get EEXIST, we still need to rehash if a shrink was delayed. From Herbert Xu. 16) Fix interrupt mask handling in interrupt handler of r8169, from Heiner Kallweit. 17) Fix leak in ehea driver, from Wen Yang" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (168 commits) dpaa2-eth: fix race condition with bql frame accounting chelsio: use BUG() instead of BUG_ON(1) net: devlink: skip info_get op call if it is not defined in dumpit net: phy: bcm54xx: Encode link speed and activity into LEDs tipc: change to check tipc_own_id to return in tipc_net_stop net: usb: aqc111: Extend HWID table by QNAP device net: sched: Kconfig: update reference link for PIE net: dsa: qca8k: extend slave-bus implementations net: dsa: qca8k: remove leftover phy accessors dt-bindings: net: dsa: qca8k: support internal mdio-bus dt-bindings: net: dsa: qca8k: fix example net: phy: don't clear BMCR in genphy_soft_reset bpf, libbpf: clarify bump in libbpf version info bpf, libbpf: fix version info and add it to shared object rxrpc: avoid clang -Wuninitialized warning tipc: tipc clang warning net: sched: fix cleanup NULL pointer exception in act_mirr r8169: fix cable re-plugging issue net: ethernet: ti: fix possible object reference leak net: ibm: fix possible object reference leak ...
2019-03-27virt: vbox: Implement passing requestor info to the host for VirtualBox 6.0.xHans de Goede1-5/+7
VirtualBox 6.0.x has a new feature where the guest kernel driver passes info about the origin of the request (e.g. userspace or kernelspace) to the hypervisor. If we do not pass this information then when running the 6.0.x userspace guest-additions tools on a 6.0.x host, some requests will get denied with a VERR_VERSION_MISMATCH error, breaking vboxservice.service and the mounting of shared folders marked to be auto-mounted. This commit implements passing the requestor info to the host, fixing this. Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-03-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller1-0/+1
Alexei Starovoitov says: ==================== pull-request: bpf-next 2019-03-26 The following pull-request contains BPF updates for your *net-next* tree. The main changes are: 1) introduce bpf_tcp_check_syncookie() helper for XDP and tc, from Lorenz. 2) allow bpf_skb_ecn_set_ce() in tc, from Peter. 3) numerous bpf tc tunneling improvements, from Willem. 4) and other miscellaneous improvements from Adrian, Alan, Daniel, Ivan, Stanislav. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-26net: phy: bcm54xx: Encode link speed and activity into LEDsVladimir Oltean1-0/+16
Previously the green and amber LEDs on this quad PHY were solid, to indicate an encoding of the link speed (10/100/1000). This keeps the LEDs always on just as before, but now they flash on Rx/Tx activity. Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-26proc/kcore: Remove unused kclist_add_remap()Bhupesh Sharma1-11/+0
Commit bf904d2762ee ("x86/pti/64: Remove the SYSCALL64 entry trampoline") removed the sole usage of kclist_add_remap() but it did not remove the left-over definition from the include file. Fix the same. Signed-off-by: Bhupesh Sharma <bhsharma@redhat.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dave Anderson <anderson@redhat.com> Cc: Dave Young <dyoung@redhat.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Ingo Molnar <mingo@kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Kairui Song <kasong@redhat.com> Cc: kexec@lists.infradead.org Cc: linux-arm-kernel@lists.infradead.org Cc: linuxppc-dev@lists.ozlabs.org Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Omar Sandoval <osandov@fb.com> Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org> Cc: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/1553583028-17804-1-git-send-email-bhsharma@redhat.com
2019-03-26Revert "parport: daisy: use new parport device model"Linus Torvalds1-13/+0
This reverts commit 1aec4211204d9463d1fd209eb50453de16254599. Steven Rostedt reports that it causes a hang at bootup and bisected it to this commit. The troigger is apparently a module alias for "parport_lowlevel" that points to "parport_pc", which causes a hang with modprobe -q -- parport_lowlevel blocking forever with a backtrace like this: wait_for_completion_killable+0x1c/0x28 call_usermodehelper_exec+0xa7/0x108 __request_module+0x351/0x3d8 get_lowlevel_driver+0x28/0x41 [parport] __parport_register_driver+0x39/0x1f4 [parport] daisy_drv_init+0x31/0x4f [parport] parport_bus_init+0x5d/0x7b [parport] parport_default_proc_register+0x26/0x1000 [parport] do_one_initcall+0xc2/0x1e0 do_init_module+0x50/0x1d4 load_module+0x1c2e/0x21b3 sys_init_module+0xef/0x117 Supid says: "Due to the new device model daisy driver will now try to find the parallel ports while trying to register its driver so that it can bind with them. Now, since daisy driver is loaded while parport bus is initialising the list of parport is still empty and it tries to load the lowlevel driver, which has an alias set to parport_pc, now causes a deadlock" But I don't think the daisy driver should be loaded by the parport initialization in the first place, so let's revert the whole change. If the daisy driver can just initialize separately on its own (like a driver should), instead of hooking into the parport init sequence directly, this issue probably would go away. Reported-and-bisected-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Reported-by: Michal Kubecek <mkubecek@suse.cz> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Sudip Mukherjee <sudipm.mukherjee@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>