summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2017-12-26ip6_gre: fix device features for ioctl setupAlexey Kodanev1-25/+32
When ip6gre is created using ioctl, its features, such as scatter-gather, GSO and tx-checksumming will be turned off: # ip -f inet6 tunnel add gre6 mode ip6gre remote fd00::1 # ethtool -k gre6 (truncated output) tx-checksumming: off scatter-gather: off tcp-segmentation-offload: off generic-segmentation-offload: off [requested on] But when netlink is used, they will be enabled: # ip link add gre6 type ip6gre remote fd00::1 # ethtool -k gre6 (truncated output) tx-checksumming: on scatter-gather: on tcp-segmentation-offload: on generic-segmentation-offload: on This results in a loss of performance when gre6 is created via ioctl. The issue was found with LTP/gre tests. Fix it by moving the setup of device features to a separate function and invoke it with ndo_init callback because both netlink and ioctl will eventually call it via register_netdevice(): register_netdevice() - ndo_init() callback -> ip6gre_tunnel_init() or ip6gre_tap_init() - ip6gre_tunnel_init_common() - ip6gre_tnl_init_features() The moved code also contains two minor style fixes: * removed needless tab from GRE6_FEATURES on NETIF_F_HIGHDMA line. * fixed the issue reported by checkpatch: "Unnecessary parentheses around 'nt->encap.type == TUNNEL_ENCAP_NONE'" Fixes: ac4eb009e477 ("ip6gre: Add support for basic offloads offloads excluding GSO") Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-26phylink: ensure AN is enabledRussell King1-0/+1
Ensure that we mark AN as enabled at boot time, rather than leaving it disabled. This is noticable if your SFP module is fiber, and it supports faster speeds than 1G with 2.5G support in place. Fixes: 9525ae83959b ("phylink: add phylink infrastructure") Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-26phylink: ensure the PHY interface mode is appropriately setRussell King1-0/+1
When setting the ethtool settings, ensure that the validated PHY interface mode is propagated to the current link settings, so that 2500BaseX can be selected. Fixes: 9525ae83959b ("phylink: add phylink infrastructure") Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds73-492/+1548
Pull networking fixes from David Miller" "What's a holiday weekend without some networking bug fixes? [1] 1) Fix some eBPF JIT bugs wrt. SKB pointers across helper function calls, from Daniel Borkmann. 2) Fix regression from errata limiting change to marvell PHY driver, from Zhao Qiang. 3) Fix u16 overflow in SCTP, from Xin Long. 4) Fix potential memory leak during bridge newlink, from Nikolay Aleksandrov. 5) Fix BPF selftest build on s390, from Hendrik Brueckner. 6) Don't append to cfg80211 automatically generated certs file, always write new ones from scratch. From Thierry Reding. 7) Fix sleep in atomic in mac80211 hwsim, from Jia-Ju Bai. 8) Fix hang on tg3 MTU change with certain chips, from Brian King. 9) Add stall detection to arc emac driver and reset chip when this happens, from Alexander Kochetkov. 10) Fix MTU limitng in GRE tunnel drivers, from Xin Long. 11) Fix stmmac timestamping bug due to mis-shifting of field. From Fredrik Hallenberg. 12) Fix metrics match when deleting an ipv4 route. The kernel sets some internal metrics bits which the user isn't going to set when it makes the delete request. From Phil Sutter. 13) mvneta driver loop over RX queues limits on "txq_number" :-) Fix from Yelena Krivosheev. 14) Fix double free and memory corruption in get_net_ns_by_id, from Eric W. Biederman. 15) Flush ipv4 FIB tables in the reverse order. Some tables can share their actual backing data, in particular this happens for the MAIN and LOCAL tables. We have to kill the LOCAL table first, because it uses MAIN's backing memory. Fix from Ido Schimmel. 16) Several eBPF verifier value tracking fixes, from Edward Cree, Jann Horn, and Alexei Starovoitov. 17) Make changes to ipv6 autoflowlabel sysctl really propagate to sockets, unless the socket has set the per-socket value explicitly. From Shaohua Li. 18) Fix leaks and double callback invocations of zerocopy SKBs, from Willem de Bruijn" [1] Is this a trick question? "Relaxing"? "Quiet"? "Fine"? - Linus. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (77 commits) skbuff: skb_copy_ubufs must release uarg even without user frags skbuff: orphan frags before zerocopy clone net: reevalulate autoflowlabel setting after sysctl setting openvswitch: Fix pop_vlan action for double tagged frames ipv6: Honor specified parameters in fibmatch lookup bpf: do not allow root to mangle valid pointers selftests/bpf: add tests for recent bugfixes bpf: fix integer overflows bpf: don't prune branches when a scalar is replaced with a pointer bpf: force strict alignment checks for stack pointers bpf: fix missing error return in check_stack_boundary() bpf: fix 32-bit ALU op verification bpf: fix incorrect tracking of register size truncation bpf: fix incorrect sign extension in check_alu_op() bpf/verifier: fix bounds calculation on BPF_RSH ipv4: Fix use-after-free when flushing FIB tables s390/qeth: fix error handling in checksum cmd callback tipc: remove joining group member from congested list selftests: net: Adding config fragment CONFIG_NUMA=y nfp: bpf: keep track of the offloaded program ...
2017-12-21Merge branch 'net-zerocopy-fixes'David S. Miller1-3/+4
Saeed Mahameed says: =================== Mellanox, mlx5 fixes 2017-12-19 The follwoing series includes some fixes for mlx5 core and etherent driver. Please pull and let me know if there is any problem. This series doesn't introduce any conflict with the ongoing mlx5 for-next submission. For -stable: kernels >= v4.7.y ("net/mlx5e: Fix possible deadlock of VXLAN lock") ("net/mlx5e: Add refcount to VXLAN structure") ("net/mlx5e: Prevent possible races in VXLAN control flow") ("net/mlx5e: Fix features check of IPv6 traffic") kernels >= v4.9.y ("net/mlx5: Fix error flow in CREATE_QP command") ("net/mlx5: Fix rate limit packet pacing naming and struct") kernels >= v4.13.y ("net/mlx5: FPGA, return -EINVAL if size is zero") kernels >= v4.14.y ("Revert "mlx5: move affinity hints assignments to generic code") All above patches apply and compile with no issues on corresponding -stable. =================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-21skbuff: skb_copy_ubufs must release uarg even without user fragsWillem de Bruijn1-1/+2
skb_copy_ubufs creates a private copy of frags[] to release its hold on user frags, then calls uarg->callback to notify the owner. Call uarg->callback even when no frags exist. This edge case can happen when zerocopy_sg_from_iter finds enough room in skb_headlen to copy all the data. Fixes: 3ece782693c4 ("sock: skb_copy_ubufs support for compound pages") Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-21skbuff: orphan frags before zerocopy cloneWillem de Bruijn1-2/+2
Call skb_zerocopy_clone after skb_orphan_frags, to avoid duplicate calls to skb_uarg(skb)->callback for the same data. skb_zerocopy_clone associates skb_shinfo(skb)->uarg from frag_skb with each segment. This is only safe for uargs that do refcounting, which is those that pass skb_orphan_frags without dropping their shared frags. For others, skb_orphan_frags drops the user frags and sets the uarg to NULL, after which sock_zerocopy_clone has no effect. Qemu hangs were reported due to duplicate vhost_net_zerocopy_callback calls for the same data causing the vhost_net_ubuf_ref_>refcount to drop below zero. Link: http://lkml.kernel.org/r/<CAF=yD-LWyCD4Y0aJ9O0e_CHLR+3JOeKicRRTEVCPxgw4XOcqGQ@mail.gmail.com> Fixes: 1f8b977ab32d ("sock: enable MSG_ZEROCOPY") Reported-by: Andreas Hartmann <andihartmann@01019freenet.de> Reported-by: David Hill <dhill@redhat.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-21Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds14-62/+94
Pull block fixes from Jens Axboe: "It's been a few weeks, so here's a small collection of fixes that should go into the current series. This contains: - NVMe pull request from Christoph, with a few important fixes. - kyber hang fix from Omar. - A blk-throttl fix from Shaohua, fixing a case where we double charge a bio. - Two call_single_data alignment fixes from me, fixing up some unfortunate changes that went into 4.14 without being properly reviewed on the block side (since nobody was CC'ed on the patch...). - A bounce buffer fix in two parts, one from me and one from Ming. - Revert bdi debug error handling patch. It's causing boot issues for some folks, and a week down the line, we're still no closer to a fix. Revert this patch for now until it's figured out, then we can retry for 4.16" * 'for-linus' of git://git.kernel.dk/linux-block: Revert "bdi: add error handle for bdi_debug_register" null_blk: unalign call_single_data block: unalign call_single_data in struct request block-throttle: avoid double charge block: fix blk_rq_append_bio block: don't let passthrough IO go into .make_request_fn() nvme: setup streams after initializing namespace head nvme: check hw sectors before setting chunk sectors nvme: call blk_integrity_unregister after queue is cleaned up nvme-fc: remove double put reference if admin connect fails nvme: set discard_alignment to zero kyber: fix another domain token wait queue hang
2017-12-21Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds12-85/+151
Pull KVM fixes from Paolo Bonzini: "ARM fixes: - A bug in handling of SPE state for non-vhe systems - A fix for a crash on system shutdown - Three timer fixes, introduced by the timer optimizations for v4.15 x86 fixes: - fix for a WARN that was introduced in 4.15 - fix for SMM when guest uses PCID - fixes for several bugs found by syzkaller ... and a dozen papercut fixes for the kvm_stat tool" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (22 commits) tools/kvm_stat: sort '-f help' output kvm: x86: fix RSM when PCID is non-zero KVM: Fix stack-out-of-bounds read in write_mmio KVM: arm/arm64: Fix timer enable flow KVM: arm/arm64: Properly handle arch-timer IRQs after vtimer_save_state KVM: arm/arm64: timer: Don't set irq as forwarded if no usable GIC KVM: arm/arm64: Fix HYP unmapping going off limits arm64: kvm: Prevent restoring stale PMSCR_EL1 for vcpu KVM/x86: Check input paging mode when cs.l is set tools/kvm_stat: add line for totals tools/kvm_stat: stop ignoring unhandled arguments tools/kvm_stat: suppress usage information on command line errors tools/kvm_stat: handle invalid regular expressions tools/kvm_stat: add hint on '-f help' to man page tools/kvm_stat: fix child trace events accounting tools/kvm_stat: fix extra handling of 'help' with fields filter tools/kvm_stat: fix missing field update after filter change tools/kvm_stat: fix drilldown in events-by-guests mode tools/kvm_stat: fix command line option '-g' kvm: x86: fix WARN due to uninitialized guest FPU state ...
2017-12-21net: reevalulate autoflowlabel setting after sysctl settingShaohua Li4-4/+13
sysctl.ip6.auto_flowlabels is default 1. In our hosts, we set it to 2. If sockopt doesn't set autoflowlabel, outcome packets from the hosts are supposed to not include flowlabel. This is true for normal packet, but not for reset packet. The reason is ipv6_pinfo.autoflowlabel is set in sock creation. Later if we change sysctl.ip6.auto_flowlabels, the ipv6_pinfo.autoflowlabel isn't changed, so the sock will keep the old behavior in terms of auto flowlabel. Reset packet is suffering from this problem, because reset packet is sent from a special control socket, which is created at boot time. Since sysctl.ipv6.auto_flowlabels is 1 by default, the control socket will always have its ipv6_pinfo.autoflowlabel set, even after user set sysctl.ipv6.auto_flowlabels to 1, so reset packset will always have flowlabel. Normal sock created before sysctl setting suffers from the same issue. We can't even turn off autoflowlabel unless we kill all socks in the hosts. To fix this, if IPV6_AUTOFLOWLABEL sockopt is used, we use the autoflowlabel setting from user, otherwise we always call ip6_default_np_autolabel() which has the new settings of sysctl. Note, this changes behavior a little bit. Before commit 42240901f7c4 (ipv6: Implement different admin modes for automatic flow labels), the autoflowlabel behavior of a sock isn't sticky, eg, if sysctl changes, existing connection will change autoflowlabel behavior. After that commit, autoflowlabel behavior is sticky in the whole life of the sock. With this patch, the behavior isn't sticky again. Cc: Martin KaFai Lau <kafai@fb.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Tom Herbert <tom@quantonium.net> Signed-off-by: Shaohua Li <shli@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-21openvswitch: Fix pop_vlan action for double tagged framesEric Garver1-3/+12
skb_vlan_pop() expects skb->protocol to be a valid TPID for double tagged frames. So set skb->protocol to the TPID and let skb_vlan_pop() shift the true ethertype into position for us. Fixes: 5108bbaddc37 ("openvswitch: add processing of L3 packets") Signed-off-by: Eric Garver <e@erig.me> Reviewed-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-21Revert "bdi: add error handle for bdi_debug_register"Jens Axboe1-4/+1
This reverts commit a0747a859ef6d3cc5b6cd50eb694499b78dd0025. It breaks some booting for some users, and more than a week into this, there's still no good fix. Revert this commit for now until a solution has been found. Reported-by: Laura Abbott <labbott@redhat.com> Reported-by: Bruno Wolff III <bruno@wolff.to> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-12-21ipv6: Honor specified parameters in fibmatch lookupIdo Schimmel1-8/+11
Currently, parameters such as oif and source address are not taken into account during fibmatch lookup. Example (IPv4 for reference) before patch: $ ip -4 route show 192.0.2.0/24 dev dummy0 proto kernel scope link src 192.0.2.1 198.51.100.0/24 dev dummy1 proto kernel scope link src 198.51.100.1 $ ip -6 route show 2001:db8:1::/64 dev dummy0 proto kernel metric 256 pref medium 2001:db8:2::/64 dev dummy1 proto kernel metric 256 pref medium fe80::/64 dev dummy0 proto kernel metric 256 pref medium fe80::/64 dev dummy1 proto kernel metric 256 pref medium $ ip -4 route get fibmatch 192.0.2.2 oif dummy0 192.0.2.0/24 dev dummy0 proto kernel scope link src 192.0.2.1 $ ip -4 route get fibmatch 192.0.2.2 oif dummy1 RTNETLINK answers: No route to host $ ip -6 route get fibmatch 2001:db8:1::2 oif dummy0 2001:db8:1::/64 dev dummy0 proto kernel metric 256 pref medium $ ip -6 route get fibmatch 2001:db8:1::2 oif dummy1 2001:db8:1::/64 dev dummy0 proto kernel metric 256 pref medium After: $ ip -6 route get fibmatch 2001:db8:1::2 oif dummy0 2001:db8:1::/64 dev dummy0 proto kernel metric 256 pref medium $ ip -6 route get fibmatch 2001:db8:1::2 oif dummy1 RTNETLINK answers: Network is unreachable The problem stems from the fact that the necessary route lookup flags are not set based on these parameters. Instead of duplicating the same logic for fibmatch, we can simply resolve the original route from its copy and dump it instead. Fixes: 18c3a61c4264 ("net: ipv6: RTM_GETROUTE: return matched fib result when requested") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-21tools/kvm_stat: sort '-f help' outputStefan Raspl1-10/+6
Sort the fields returned by specifying '-f help' on the command line. While at it, simplify the code a bit, indent the output and eliminate an extra blank line at the beginning. Signed-off-by: Stefan Raspl <raspl@linux.vnet.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-12-21kvm: x86: fix RSM when PCID is non-zeroPaolo Bonzini1-7/+25
rsm_load_state_64() and rsm_enter_protected_mode() load CR3, then CR4 & ~PCIDE, then CR0, then CR4. However, setting CR4.PCIDE fails if CR3[11:0] != 0. It's probably easier in the long run to replace rsm_enter_protected_mode() with an emulator callback that sets all the special registers (like KVM_SET_SREGS would do). For now, set the PCID field of CR3 only after CR4.PCIDE is 1. Reported-by: Laszlo Ersek <lersek@redhat.com> Tested-by: Laszlo Ersek <lersek@redhat.com> Fixes: 660a5d517aaab9187f93854425c4c63f4a09195c Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-12-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller6-168/+730
Daniel Borkmann says: ==================== pull-request: bpf 2017-12-21 The following pull-request contains BPF updates for your *net* tree. The main changes are: 1) Fix multiple security issues in the BPF verifier mostly related to the value and min/max bounds tracking rework in 4.14. Issues range from incorrect bounds calculation in some BPF_RSH cases, to improper sign extension and reg size handling on 32 bit ALU ops, missing strict alignment checks on stack pointers, and several others that got fixed, from Jann, Alexei and Edward. 2) Fix various build failures in BPF selftests on sparc64. More specifically, librt needed to be added to the libs to link against and few format string fixups for sizeof, from David. 3) Fix one last remaining issue from BPF selftest build that was still occuring on s390x from the asm/bpf_perf_event.h include which could not find the asm/ptrace.h copy, from Hendrik. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-21bpf: do not allow root to mangle valid pointersAlexei Starovoitov2-95/+63
Do not allow root to convert valid pointers into unknown scalars. In particular disallow: ptr &= reg ptr <<= reg ptr += ptr and explicitly allow: ptr -= ptr since pkt_end - pkt == length 1. This minimizes amount of address leaks root can do. In the future may need to further tighten the leaks with kptr_restrict. 2. If program has such pointer math it's likely a user mistake and when verifier complains about it right away instead of many instructions later on invalid memory access it's easier for users to fix their progs. 3. when register holding a pointer cannot change to scalar it allows JITs to optimize better. Like 32-bit archs could use single register for pointers instead of a pair required to hold 64-bit scalars. 4. reduces architecture dependent behavior. Since code: r1 = r10; r1 &= 0xff; if (r1 ...) will behave differently arm64 vs x64 and offloaded vs native. A significant chunk of ptr mangling was allowed by commit f1174f77b50c ("bpf/verifier: rework value tracking") yet some of it was allowed even earlier. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-12-21Merge branch 'bpf-verifier-sec-fixes'Daniel Borkmann3-67/+661
Alexei Starovoitov says: ==================== This patch set addresses a set of security vulnerabilities in bpf verifier logic discovered by Jann Horn. All of the patches are candidates for 4.14 stable. ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-12-21selftests/bpf: add tests for recent bugfixesJann Horn1-16/+533
These tests should cover the following cases: - MOV with both zero-extended and sign-extended immediates - implicit truncation of register contents via ALU32/MOV32 - implicit 32-bit truncation of ALU32 output - oversized register source operand for ALU32 shift - right-shift of a number that could be positive or negative - map access where adding the operation size to the offset causes signed 32-bit overflow - direct stack access at a ~4GiB offset Also remove the F_LOAD_WITH_STRICT_ALIGNMENT flag from a bunch of tests that should fail independent of what flags userspace passes. Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-12-21bpf: fix integer overflowsAlexei Starovoitov2-2/+50
There were various issues related to the limited size of integers used in the verifier: - `off + size` overflow in __check_map_access() - `off + reg->off` overflow in check_mem_access() - `off + reg->var_off.value` overflow or 32-bit truncation of `reg->var_off.value` in check_mem_access() - 32-bit truncation in check_stack_boundary() Make sure that any integer math cannot overflow by not allowing pointer math with large values. Also reduce the scope of "scalar op scalar" tracking. Fixes: f1174f77b50c ("bpf/verifier: rework value tracking") Reported-by: Jann Horn <jannh@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-12-21bpf: don't prune branches when a scalar is replaced with a pointerJann Horn1-8/+7
This could be made safe by passing through a reference to env and checking for env->allow_ptr_leaks, but it would only work one way and is probably not worth the hassle - not doing it will not directly lead to program rejection. Fixes: f1174f77b50c ("bpf/verifier: rework value tracking") Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-12-21bpf: force strict alignment checks for stack pointersJann Horn1-0/+5
Force strict alignment checks for stack pointers because the tracking of stack spills relies on it; unaligned stack accesses can lead to corruption of spilled registers, which is exploitable. Fixes: f1174f77b50c ("bpf/verifier: rework value tracking") Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-12-21bpf: fix missing error return in check_stack_boundary()Jann Horn1-0/+1
Prevent indirect stack accesses at non-constant addresses, which would permit reading and corrupting spilled pointers. Fixes: f1174f77b50c ("bpf/verifier: rework value tracking") Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-12-21bpf: fix 32-bit ALU op verificationJann Horn1-11/+17
32-bit ALU ops operate on 32-bit values and have 32-bit outputs. Adjust the verifier accordingly. Fixes: f1174f77b50c ("bpf/verifier: rework value tracking") Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-12-21bpf: fix incorrect tracking of register size truncationJann Horn1-17/+27
Properly handle register truncation to a smaller size. The old code first mirrors the clearing of the high 32 bits in the bitwise tristate representation, which is correct. But then, it computes the new arithmetic bounds as the intersection between the old arithmetic bounds and the bounds resulting from the bitwise tristate representation. Therefore, when coerce_reg_to_32() is called on a number with bounds [0xffff'fff8, 0x1'0000'0007], the verifier computes [0xffff'fff8, 0xffff'ffff] as bounds of the truncated number. This is incorrect: The truncated number could also be in the range [0, 7], and no meaningful arithmetic bounds can be computed in that case apart from the obvious [0, 0xffff'ffff]. Starting with v4.14, this is exploitable by unprivileged users as long as the unprivileged_bpf_disabled sysctl isn't set. Debian assigned CVE-2017-16996 for this issue. v2: - flip the mask during arithmetic bounds calculation (Ben Hutchings) v3: - add CVE number (Ben Hutchings) Fixes: b03c9f9fdc37 ("bpf/verifier: track signed and unsigned min/max values") Signed-off-by: Jann Horn <jannh@google.com> Acked-by: Edward Cree <ecree@solarflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-12-21bpf: fix incorrect sign extension in check_alu_op()Jann Horn1-1/+7
Distinguish between BPF_ALU64|BPF_MOV|BPF_K (load 32-bit immediate, sign-extended to 64-bit) and BPF_ALU|BPF_MOV|BPF_K (load 32-bit immediate, zero-padded to 64-bit); only perform sign extension in the first case. Starting with v4.14, this is exploitable by unprivileged users as long as the unprivileged_bpf_disabled sysctl isn't set. Debian assigned CVE-2017-16995 for this issue. v3: - add CVE number (Ben Hutchings) Fixes: 484611357c19 ("bpf: allow access into map value arrays") Signed-off-by: Jann Horn <jannh@google.com> Acked-by: Edward Cree <ecree@solarflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-12-21bpf/verifier: fix bounds calculation on BPF_RSHEdward Cree1-14/+16
Incorrect signed bounds were being computed. If the old upper signed bound was positive and the old lower signed bound was negative, this could cause the new upper signed bound to be too low, leading to security issues. Fixes: b03c9f9fdc37 ("bpf/verifier: track signed and unsigned min/max values") Reported-by: Jann Horn <jannh@google.com> Signed-off-by: Edward Cree <ecree@solarflare.com> Acked-by: Alexei Starovoitov <ast@kernel.org> [jannh@google.com: changed description to reflect bug impact] Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-12-21Merge tag 'scsi-fixes' of ↵Linus Torvalds6-18/+21
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "Two simple fixes: one for sparse warnings that were introduced by the merge window conversion to blist_flags_t and the other to fix dropped I/O during reset in aacraid" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: aacraid: Fix I/O drop during reset scsi: core: Use blist_flags_t consistently
2017-12-21Merge branch 'fixes' of git://git.armlinux.org.uk/~rmk/linux-armLinus Torvalds1-0/+4
Pull ARM fix from Russell King: "Just one fix for a problem in the csum_partial_copy_from_user() implementation when software PAN is enabled" * 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm: ARM: 8731/1: Fix csum_partial_copy_from_user() stack mismatch
2017-12-21Merge tag 'acpi-4.15-rc5' of ↵Linus Torvalds2-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI fixes from Rafael Wysocki: "These fix a recently introduced issue in the ACPI CPPC driver and an obscure error hanling bug in the APEI code. Specifics: - Fix an error handling issue in the ACPI APEI implementation of the >read callback in struct pstore_info (Takashi Iwai). - Fix a possible out-of-bounds arrar read in the ACPI CPPC driver (Colin Ian King)" * tag 'acpi-4.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPI: APEI / ERST: Fix missing error handling in erst_reader() ACPI: CPPC: remove initial assignment of pcc_ss_data
2017-12-21Merge tag 'pm-4.15-rc5' of ↵Linus Torvalds3-9/+28
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fixes from Rafael Wysocki: "These fix a regression in the ondemand and conservative cpufreq governors that was introduced during the 4.13 cycle, a recent regression in the imx6q cpufreq driver and a regression in the PCI handling of hibernation from the 4.14 cycle. Specifics: - Fix an issue in the PCI handling of the "thaw" transition during hibernation (after creating an image), introduced by a bug fix from the 4.13 cycle and exposed by recent changes in the IRQ subsystem, that caused pci_restore_state() to be called for devices in low-power states in some cases which is incorrect and breaks MSI management on some systems (Rafael Wysocki). - Fix a recent regression in the imx6q cpufreq driver that broke speed grading on i.MX6 QuadPlus by omitting checks causing invalid operating performance points (OPPs) to be disabled on that SoC as appropriate (Lucas Stach). - Fix a regression introduced during the 4.14 cycle in the ondemand and conservative cpufreq governors that causes the sampling interval used by them to be shorter than the tick period in some cases which leads to incorrect decisions (Rafael Wysocki)" * tag 'pm-4.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: cpufreq: governor: Ensure sufficiently large sampling intervals cpufreq: imx6q: fix speed grading regression on i.MX6 QuadPlus PCI / PM: Force devices to D0 in pci_pm_thaw_noirq()
2017-12-21Merge tag 'spi-fix-v4.15-rc4' of ↵Linus Torvalds7-11/+36
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi Pull spi fixes from Mark Brown: "A bunch of really small fixes here, all driver specific and mostly in error handling and remove paths. The most important fixes are for the a3700 clock configuration and a fix for a nasty stall which could potentially cause data corruption with the xilinx driver" * tag 'spi-fix-v4.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: spi: atmel: fixed spin_lock usage inside atmel_spi_remove spi: sun4i: disable clocks in the remove function spi: rspi: Do not set SPCR_SPE in qspi_set_config_register() spi: Fix double "when" spi: a3700: Fix clk prescaling for coefficient over 15 spi: xilinx: Detect stall with Unknown commands spi: imx: Update device tree binding documentation
2017-12-21Merge tag 'mfd-fixes-4.15' of ↵Linus Torvalds4-35/+41
git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd Pull MDF bugfixes from Lee Jones: - Fix message timing issues and report correct state when an error occurs in cros_ec_spi - Reorder enums used for Power Management in rtsx_pci - Use correct OF helper for obtaining child nodes in twl4030-audio and twl6040 * tag 'mfd-fixes-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd: mfd: Fix RTS5227 (and others) powermanagement mfd: cros ec: spi: Fix "in progress" error signaling mfd: twl6040: Fix child-node lookup mfd: twl4030-audio: Fix sibling-node lookup mfd: cros ec: spi: Don't send first message too soon
2017-12-21Merge tag 'sound-4.15-rc5' of ↵Linus Torvalds5-20/+70
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "All stable fixes here: - a regression fix of USB-audio for the previous hardening patch - a potential UAF fix in rawmidi - HD-audio and USB-audio quirks, the missing new ID" * tag 'sound-4.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: ALSA: usb-audio: Fix the missing ctl name suffix at parsing SU ALSA: hda/realtek - Fix Dell AIO LineOut issue ALSA: rawmidi: Avoid racy info ioctl via ctl device ALSA: hda - Add vendor id for Cannonlake HDMI codec ALSA: usb-audio: Add native DSD support for Esoteric D-05X
2017-12-20null_blk: unalign call_single_dataJens Axboe1-2/+2
Commit 966a967116e6 randomly added alignment to this structure, but it's actually detrimental to performance of null_blk. Test case: Running on both the home and remote node shows a ~5% degradation in performance. While in there, move blk_status_t to the hole after the integer tag in the nullb_cmd structure. After this patch, we shrink the size from 192 to 152 bytes. Fixes: 966a967116e69 ("smp: Avoid using two cache lines for struct call_single_data") Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-12-20block: unalign call_single_data in struct requestJens Axboe1-1/+1
A previous change blindly added massive alignment to the call_single_data structure in struct request. This ballooned it in size from 296 to 320 bytes on my setup, for no valid reason at all. Use the unaligned struct __call_single_data variant instead. Fixes: 966a967116e69 ("smp: Avoid using two cache lines for struct call_single_data") Cc: stable@vger.kernel.org # v4.14 Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-12-20ipv4: Fix use-after-free when flushing FIB tablesIdo Schimmel1-2/+7
Since commit 0ddcf43d5d4a ("ipv4: FIB Local/MAIN table collapse") the local table uses the same trie allocated for the main table when custom rules are not in use. When a net namespace is dismantled, the main table is flushed and freed (via an RCU callback) before the local table. In case the callback is invoked before the local table is iterated, a use-after-free can occur. Fix this by iterating over the FIB tables in reverse order, so that the main table is always freed after the local table. v3: Reworded comment according to Alex's suggestion. v2: Add a comment to make the fix more explicit per Dave's and Alex's feedback. Fixes: 0ddcf43d5d4a ("ipv4: FIB Local/MAIN table collapse") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reported-by: Fengguang Wu <fengguang.wu@intel.com> Acked-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-20s390/qeth: fix error handling in checksum cmd callbackJulian Wiedmann1-1/+8
Make sure to check both return code fields before processing the response. Otherwise we risk operating on invalid data. Fixes: c9475369bd2b ("s390/qeth: rework RX/TX checksum offload") Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-20tipc: remove joining group member from congested listJon Maloy1-4/+2
When we receive a JOIN message from a peer member, the message may contain an advertised window value ADV_IDLE that permits removing the member in question from the tipc_group::congested list. However, since the removal has been made conditional on that the advertised window is *not* ADV_IDLE, we miss this case. This has the effect that a sender sometimes may enter a state of permanent, false, broadcast congestion. We fix this by unconditinally removing the member from the congested list before calling tipc_member_update(), which might potentially sort it into the list again. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-20selftests: net: Adding config fragment CONFIG_NUMA=yNaresh Kamboju1-0/+1
kernel config fragement CONFIG_NUMA=y is need for reuseport_bpf_numa. Signed-off-by: Naresh Kamboju <naresh.kamboju@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-20Merge tag 'mlx5-fixes-2017-12-19' of ↵David S. Miller17-104/+215
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: =================== Mellanox, mlx5 fixes 2017-12-19 The follwoing series includes some fixes for mlx5 core and etherent driver. Please pull and let me know if there is any problem. This series doesn't introduce any conflict with the ongoing mlx5 for-next submission. For -stable: kernels >= v4.7.y ("net/mlx5e: Fix possible deadlock of VXLAN lock") ("net/mlx5e: Add refcount to VXLAN structure") ("net/mlx5e: Prevent possible races in VXLAN control flow") ("net/mlx5e: Fix features check of IPv6 traffic") kernels >= v4.9.y ("net/mlx5: Fix error flow in CREATE_QP command") ("net/mlx5: Fix rate limit packet pacing naming and struct") kernels >= v4.13.y ("net/mlx5: FPGA, return -EINVAL if size is zero") kernels >= v4.14.y ("Revert "mlx5: move affinity hints assignments to generic code") All above patches apply and compile with no issues on corresponding -stable. =================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-20block-throttle: avoid double chargeShaohua Li4-12/+9
If a bio is throttled and split after throttling, the bio could be resubmited and enters the throttling again. This will cause part of the bio to be charged multiple times. If the cgroup has an IO limit, the double charge will significantly harm the performance. The bio split becomes quite common after arbitrary bio size change. To fix this, we always set the BIO_THROTTLED flag if a bio is throttled. If the bio is cloned/split, we copy the flag to new bio too to avoid a double charge. However, cloned bio could be directed to a new disk, keeping the flag be a problem. The observation is we always set new disk for the bio in this case, so we can clear the flag in bio_set_dev(). This issue exists for a long time, arbitrary bio size change just makes it worse, so this should go into stable at least since v4.2. V1-> V2: Not add extra field in bio based on discussion with Tejun Cc: Vivek Goyal <vgoyal@redhat.com> Cc: stable@vger.kernel.org Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Shaohua Li <shli@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-12-20Merge branch 'cls_bpf-fix-offload-state-tracking-with-block-callbacks'David S. Miller4-69/+92
Jakub Kicinski says: =================== cls_bpf: fix offload state tracking with block callbacks After introduction of block callbacks classifiers can no longer track offload state. cls_bpf used to do that in an attempt to move common code from drivers to the core. Remove that functionality and fix drivers. The user-visible bug this is fixing is that trying to offload a second filter would trigger a spurious DESTROY and in turn disable the already installed one. =================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-20nfp: bpf: keep track of the offloaded programJakub Kicinski2-4/+51
After TC offloads were converted to callbacks we have no choice but keep track of the offloaded filter in the driver. The check for nn->dp.bpf_offload_xdp was a stop gap solution to make sure failed TC offload won't disable XDP, it's no longer necessary. nfp_net_bpf_offload() will return -EBUSY on TC vs XDP conflicts. Fixes: 3f7889c4c79b ("net: sched: cls_bpf: call block callbacks for offload") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-20cls_bpf: fix offload assumptions after callback conversionJakub Kicinski3-67/+43
cls_bpf used to take care of tracking what offload state a filter is in, i.e. it would track if offload request succeeded or not. This information would then be used to issue correct requests to the driver, e.g. requests for statistics only on offloaded filters, removing only filters which were offloaded, using add instead of replace if previous filter was not added etc. This tracking of offload state no longer functions with the new callback infrastructure. There could be multiple entities trying to offload the same filter. Throw out all the tracking and corresponding commands and simply pass to the drivers both old and new bpf program. Drivers will have to deal with offload state tracking by themselves. Fixes: 3f7889c4c79b ("net: sched: cls_bpf: call block callbacks for offload") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-20net: Fix double free and memory corruption in get_net_ns_by_id()Eric W. Biederman1-1/+1
(I can trivially verify that that idr_remove in cleanup_net happens after the network namespace count has dropped to zero --EWB) Function get_net_ns_by_id() does not check for net::count after it has found a peer in netns_ids idr. It may dereference a peer, after its count has already been finaly decremented. This leads to double free and memory corruption: put_net(peer) rtnl_lock() atomic_dec_and_test(&peer->count) [count=0] ... __put_net(peer) get_net_ns_by_id(net, id) spin_lock(&cleanup_list_lock) list_add(&net->cleanup_list, &cleanup_list) spin_unlock(&cleanup_list_lock) queue_work() peer = idr_find(&net->netns_ids, id) | get_net(peer) [count=1] | ... | (use after final put) v ... cleanup_net() ... spin_lock(&cleanup_list_lock) ... list_replace_init(&cleanup_list, ..) ... spin_unlock(&cleanup_list_lock) ... ... ... ... put_net(peer) ... atomic_dec_and_test(&peer->count) [count=0] ... spin_lock(&cleanup_list_lock) ... list_add(&net->cleanup_list, &cleanup_list) ... spin_unlock(&cleanup_list_lock) ... queue_work() ... rtnl_unlock() rtnl_lock() ... for_each_net(tmp) { ... id = __peernet2id(tmp, peer) ... spin_lock_irq(&tmp->nsid_lock) ... idr_remove(&tmp->netns_ids, id) ... ... ... net_drop_ns() ... net_free(peer) ... } ... | v cleanup_net() ... (Second free of peer) Also, put_net() on the right cpu may reorder with left's cpu list_replace_init(&cleanup_list, ..), and then cleanup_list will be corrupted. Since cleanup_net() is executed in worker thread, while put_net(peer) can happen everywhere, there should be enough time for concurrent get_net_ns_by_id() to pick the peer up, and the race does not seem to be unlikely. The patch fixes the problem in standard way. (Also, there is possible problem in peernet2id_alloc(), which requires check for net::count under nsid_lock and maybe_get_net(peer), but in current stable kernel it's used under rtnl_lock() and it has to be safe. Openswitch begun to use peernet2id_alloc(), and possibly it should be fixed too. While this is not in stable kernel yet, so I'll send a separate message to netdev@ later). Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Fixes: 0c7aecd4bde4 "netns: add rtnl cmd to add and get peer netns ids" Reviewed-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-20Merge branch 'mvneta-fixes'David S. Miller1-2/+6
Gregory CLEMENT says: ==================== Few mvneta fixes here it is a small series of fixes found on the mvneta driver. They had been already used in the vendor kernel and are now ported to mainline. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-20net: mvneta: eliminate wrong call to handle rx descriptor errorYelena Krivosheev1-1/+1
There are few reasons in mvneta_rx_swbm() function when received packet is dropped. mvneta_rx_error() should be called only if error bit [16] is set in rx descriptor. [gregory.clement@free-electrons.com: add fixes tag] Cc: stable@vger.kernel.org Fixes: dc35a10f68d3 ("net: mvneta: bm: add support for hardware buffer management") Signed-off-by: Yelena Krivosheev <yelena@marvell.com> Tested-by: Dmitri Epshtein <dima@marvell.com> Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-20net: mvneta: use proper rxq_number in loop on rx queuesYelena Krivosheev1-1/+1
When adding the RX queue association with each CPU, a typo was made in the mvneta_cleanup_rxqs() function. This patch fixes it. [gregory.clement@free-electrons.com: add commit log and fixes tag] Cc: stable@vger.kernel.org Fixes: 2dcf75e2793c ("net: mvneta: Associate RX queues with each CPU") Signed-off-by: Yelena Krivosheev <yelena@marvell.com> Tested-by: Dmitri Epshtein <dima@marvell.com> Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-20net: mvneta: clear interface link status on port disableYelena Krivosheev1-0/+4
When port connect to PHY in polling mode (with poll interval 1 sec), port and phy link status must be synchronize in order don't loss link change event. [gregory.clement@free-electrons.com: add fixes tag] Cc: <stable@vger.kernel.org> Fixes: c5aff18204da ("net: mvneta: driver for Marvell Armada 370/XP network unit") Signed-off-by: Yelena Krivosheev <yelena@marvell.com> Tested-by: Dmitri Epshtein <dima@marvell.com> Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com> Signed-off-by: David S. Miller <davem@davemloft.net>