summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2016-05-06samples/bpf: add verifier testsAlexei Starovoitov1-0/+80
add few tests for "pointer to packet" logic of the verifier Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-06samples/bpf: add 'pointer to packet' testsAlexei Starovoitov5-0/+281
parse_simple.c - packet parser exapmle with single length check that filters out udp packets for port 9 parse_varlen.c - variable length parser that understand multiple vlan headers, ipip, ipip6 and ip options to filter out udp or tcp packets on port 9. The packet is parsed layer by layer with multitple length checks. parse_ldabs.c - classic style of packet parsing using LD_ABS instruction. Same functionality as parse_simple. simple = 24.1Mpps per core varlen = 22.7Mpps ldabs = 21.4Mpps Parser with LD_ABS instructions is slower than full direct access parser which does more packet accesses and checks. These examples demonstrate the choice bpf program authors can make between flexibility of the parser vs speed. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-06bpf: add documentation for 'direct packet access'Alexei Starovoitov1-2/+83
explain how verifier checks safety of packet access and update email addresses. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-06bpf: wire in data and data_end for cls_act_bpfAlexei Starovoitov4-6/+65
allow cls_bpf and act_bpf programs access skb->data and skb->data_end pointers. The bpf helpers that change skb->data need to update data_end pointer as well. The verifier checks that programs always reload data, data_end pointers after calls to such bpf helpers. We cannot add 'data_end' pointer to struct qdisc_skb_cb directly, since it's embedded as-is by infiniband ipoib, so wrapper struct is needed. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-06bpf: improve verifier state equivalenceAlexei Starovoitov1-20/+3
since UNKNOWN_VALUE type is weaker than CONST_IMM we can un-teach verifier its recognition of constants in conditional branches without affecting safety. Ex: if (reg == 123) { .. here verifier was marking reg->type as CONST_IMM instead keep reg as UNKNOWN_VALUE } Two verifier states with UNKNOWN_VALUE are equivalent, whereas CONST_IMM_X != CONST_IMM_Y, since CONST_IMM is used for stack range verification and other cases. So help search pruning by marking registers as UNKNOWN_VALUE where possible instead of CONST_IMM. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-06bpf: direct packet accessAlexei Starovoitov3-8/+440
Extended BPF carried over two instructions from classic to access packet data: LD_ABS and LD_IND. They're highly optimized in JITs, but due to their design they have to do length check for every access. When BPF is processing 20M packets per second single LD_ABS after JIT is consuming 3% cpu. Hence the need to optimize it further by amortizing the cost of 'off < skb_headlen' over multiple packet accesses. One option is to introduce two new eBPF instructions LD_ABS_DW and LD_IND_DW with similar usage as skb_header_pointer(). The kernel part for interpreter and x64 JIT was implemented in [1], but such new insns behave like old ld_abs and abort the program with 'return 0' if access is beyond linear data. Such hidden control flow is hard to workaround plus changing JITs and rolling out new llvm is incovenient. Therefore allow cls_bpf/act_bpf program access skb->data directly: int bpf_prog(struct __sk_buff *skb) { struct iphdr *ip; if (skb->data + sizeof(struct iphdr) + ETH_HLEN > skb->data_end) /* packet too small */ return 0; ip = skb->data + ETH_HLEN; /* access IP header fields with direct loads */ if (ip->version != 4 || ip->saddr == 0x7f000001) return 1; [...] } This solution avoids introduction of new instructions. llvm stays the same and all JITs stay the same, but verifier has to work extra hard to prove safety of the above program. For XDP the direct store instructions can be allowed as well. The skb->data is NET_IP_ALIGNED, so for common cases the verifier can check the alignment. The complex packet parsers where packet pointer is adjusted incrementally cannot be tracked for alignment, so allow byte access in such cases and misaligned access on architectures that define efficient_unaligned_access [1] https://git.kernel.org/cgit/linux/kernel/git/ast/bpf.git/?h=ld_abs_dw Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-06bpf: cleanup verifier codeAlexei Starovoitov1-47/+53
cleanup verifier code and prepare it for addition of "pointer to packet" logic Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-06Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds2-4/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "This contains two fixes: a boot fix for older SGI/UV systems, and an APIC calibration fix" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/tsc: Read all ratio bits from MSR_PLATFORM_INFO x86/platform/UV: Bring back the call to map_low_mmrs in uv_system_init
2016-05-06Merge branch '40GbE' of ↵David S. Miller15-1262/+1062
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue Jeff Kirsher says: ==================== 40GbE Intel Wired LAN Driver Updates 2016-05-05 This series contains updates to i40e and i40evf. The theme behind this series is code reduction, yeah! Jesse provides most of the changes starting with a refactor of the interpretation of a tunnel which lets us start using the hardware's parsing. Removed the packet split receive routine and ancillary code in preparation for the Rx-refactor. The refactor of the receive routine, aligns the receive routine with the one in ixgbe which was highly optimized. The hardware supports a 16 byte descriptor for receive, but the driver was never using it in production. There was no performance benefit to the real driver of 16 byte descriptors, so drop a whole lot of complexity while getting rid of the code. Fixed a bug where while changing the number of descriptors using ethtool, the driver did not test the limits of the system memory before permanently assuming it would be able to get receive buffer memory. Mitch fixes a memory leak of one page each time the driver is opened by allocating the correct number of receive buffers and do not fiddle with next_to_use in the VF driver. Arnd Bergmann fixed a indentation issue by adding the appropriate curly braces in i40e_vc_config_promiscuous_mode_msg(). Julia Lawall fixed an issue found by Coccinelle, where i40e_client_ops structure can be const since it is never modified. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-06Bluetooth: Add support for Intel Bluetooth device 8265 [8087:0a2b]Tedd Ho-Jeong An1-5/+6
This patch adds support for Intel Bluetooth device 8265 also known as Windstorm Peak (WsP). T: Bus=01 Lev=01 Prnt=01 Port=01 Cnt=02 Dev#= 6 Spd=12 MxCh= 0 D: Ver= 2.00 Cls=e0(wlcon) Sub=01 Prot=01 MxPS=64 #Cfgs= 1 P: Vendor=8087 ProdID=0a2b Rev= 0.10 C:* #Ifs= 2 Cfg#= 1 Atr=e0 MxPwr=100mA I:* If#= 0 Alt= 0 #EPs= 3 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=81(I) Atr=03(Int.) MxPS= 64 Ivl=1ms E: Ad=02(O) Atr=02(Bulk) MxPS= 64 Ivl=0ms E: Ad=82(I) Atr=02(Bulk) MxPS= 64 Ivl=0ms I:* If#= 1 Alt= 0 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=03(O) Atr=01(Isoc) MxPS= 0 Ivl=1ms E: Ad=83(I) Atr=01(Isoc) MxPS= 0 Ivl=1ms I: If#= 1 Alt= 1 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=03(O) Atr=01(Isoc) MxPS= 9 Ivl=1ms E: Ad=83(I) Atr=01(Isoc) MxPS= 9 Ivl=1ms I: If#= 1 Alt= 2 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=03(O) Atr=01(Isoc) MxPS= 17 Ivl=1ms E: Ad=83(I) Atr=01(Isoc) MxPS= 17 Ivl=1ms I: If#= 1 Alt= 3 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=03(O) Atr=01(Isoc) MxPS= 25 Ivl=1ms E: Ad=83(I) Atr=01(Isoc) MxPS= 25 Ivl=1ms I: If#= 1 Alt= 4 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=03(O) Atr=01(Isoc) MxPS= 33 Ivl=1ms E: Ad=83(I) Atr=01(Isoc) MxPS= 33 Ivl=1ms I: If#= 1 Alt= 5 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=03(O) Atr=01(Isoc) MxPS= 49 Ivl=1ms E: Ad=83(I) Atr=01(Isoc) MxPS= 49 Ivl=1ms Signed-off-by: Tedd Ho-Jeong An <tedd.an@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2016-05-06net: vrf: Create FIB tables on link createDavid Ahern3-2/+13
Tables have to exist for VRFs to function. Ensure they exist when VRF device is created. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-06qede: prevent chip hang when increasing channelsSudarsana Reddy Kalluru1-4/+2
qede requires qed to provide enough resources to accommodate 16 combined channels, but that upper-bound isn't actually being enforced by it. Instead, qed inform back to qede how many channels can be opened based on available resources - but that calculation doesn't really take into account the resources requested by qede; Instead it considers other FW/HW available resources. As a result, if a user would increase the number of channels to more than 16 [e.g., using ethtool] the chip would hang. This change increments the resources requested by qede to 64 combined channels instead of 16; This value is an upper bound on the possible available channels [due to other FW/HW resources]. Signed-off-by: Sudarsana Reddy Kalluru <sudarsana.kalluru@qlogic.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-06net: ipv6: tcp reset, icmp need to consider L3 domainDavid Ahern2-4/+8
Responses for packets to unused ports are getting lost with L3 domains. IPv4 has ip_send_unicast_reply for sending TCP responses which accounts for L3 domains; update the IPv6 counterpart tcp_v6_send_response. For icmp the L3 master check needs to be moved up in icmp6_send to properly respond to UDP packets to a port with no listener. Fixes: ca254490c8df ("net: Add VRF support to IPv6 stack") Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-06cnic: call cp->stop_hw() in cnic_start_hw() on allocation failureJon Maxwell1-1/+4
We recently had a system crash in the cnic module. Vmcore analysis confirmed that "ip link up" was executed which failed due to an allocation failure because of memory fragmentation. Futher analysis revealed that the cnic irq vector was still allocated after the "ip link up" that failed. When "ip link down" was executed it called free_msi_irqs() which crashed the system because the cnic irq was still inuse. PANIC: "kernel BUG at drivers/pci/msi.c:411!" The code execution was: cnic_netdev_event() if (event == NETDEV_UP) { . . ▹ if (!cnic_start_hw(dev)) cnic_start_hw() calls cnic_cm_open() which failed with -ENOMEM cnic_start_hw() then took the err1 path: err1:↩ cp->free_resc(dev);↩ <---- frees resources but not irq vector pci_dev_put(dev->pcidev);↩ return err;↩ }↩ This returns control back to cnic_netdev_event() but now the cnic irq vector is still allocated even although cnic_cm_open() failed. The next "ip link down" while trigger the crash. The cnic_start_hw() routine is not handling the allocation failure correctly. Fix this by checking whether CNIC_DRV_STATE_HANDLES_IRQ flag is set indicating that the hardware has been started in cnic_start_hw(). If it has then call cp->stop_hw() which frees the cnic irq vector and cnic resources. Otherwise just maintain the previous behaviour and free cnic resources. I reproduced this by injecting an ENOMEM error into cnic_cm_alloc_mem()s return code. # ip link set dev enpX down # ip link set dev enpX up <--- hit's allocation failure # ip link set dev enpX down <--- crashes here With this patch I confirmed there was no crash in the reproducer. Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-06Merge tag 'pm+acpi-4.6-rc7' of ↵Linus Torvalds9-27/+45
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management and ACPI fixes from Rafael Wysocki: "Fixes for problems introduced or discovered recently (intel_pstate, sti-cpufreq, ARM64 cpuidle, Operating Performance Points framework, generic device properties framework) and one fix for a hotplug-related deadlock in ACPICA that's been there forever, but is nasty enough. Specifics: - Fix for a recent regression in the intel_pstate driver causing it to fail to restore the HWP (HW-managed P-states) configuration of the boot CPU after suspend-to-RAM (Rafael Wysocki). - Fix for two recent regressions in the intel_pstate driver, one that can trigger a divide by zero if the driver is accessed via sysfs before it manages to take the first sample and one causing it to fail to update a structure field used in a trace point, so the information coming from it is less useful (Rafael Wysocki). - Fix for a problem in the sti-cpufreq driver introduced during the 4.5 cycle that causes it to break CPU PM in multi-platform kernels by registering cpufreq-dt (which subsequently doesn't work) unconditionally and preventing the driver that would actually work from registering (Sudeep Holla). - Stable-candidate fix for an ARM64 cpuidle issue causing idle state usage counters to be incorrectly updated for idle states that were not entered due to errors (James Morse). - Fix for a recently introduced issue in the OPP (Operating Performance Points) framework causing it to print bogus error messages for missing optional regulators (Viresh Kumar). - Fix for a recently introduced issue in the generic device properties framework that may cause it to attempt to dereferece and invalid pointer in some cases (Heikki Krogerus). - Fix for a deadlock in the ACPICA core that may be triggered by device (eg Thunderbolt) hotplug (Prarit Bhargava)" * tag 'pm+acpi-4.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: PM / OPP: Remove useless check ACPICA: Dispatcher: Update thread ID for recursive method calls intel_pstate: Fix intel_pstate_get() cpufreq: intel_pstate: Fix HWP on boot CPU after system resume cpufreq: st: enable selective initialization based on the platform ARM: cpuidle: Pass on arm_cpuidle_suspend()'s return value device property: Avoid potential dereferences of invalid pointers
2016-05-06Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds1-13/+16
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fix from Ingo Molnar: "This contains a single fix that fixes a nohz tick stopping bug when mixed-poliocy SCHED_FIFO and SCHED_RR tasks are present on a runqueue" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: nohz/full, sched/rt: Fix missed tick-reenabling bug in sched_can_stop_tick()
2016-05-06Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds2-0/+3
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "This tree contains two fixes: new Intel CPU model numbers and an AMD/iommu uncore PMU driver fix" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/amd/iommu: Do not register a task ctx for uncore like PMUs perf/x86: Add model numbers for Kabylake CPUs
2016-05-06Merge branch 'efi-urgent-for-linus' of ↵Linus Torvalds3-13/+23
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull EFI fixes from Ingo Molnar: "This tree contains three fixes: a console spam fix, a file pattern fix and a sysfb_efi fix for a bug that triggered on older ThinkPads" * 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/sysfb_efi: Fix valid BAR address range check x86/efi-bgrt: Switch all pr_err() to pr_notice() for invalid BGRT MAINTAINERS: Remove asterisk from EFI directory names
2016-05-06Merge branch 'parisc-4.6-5' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux Pull parisc fix from Helge Deller: "Patch from Dmitry V Levin to fix a kernel crash when a straced process calls the (invalid) syscall which is equal to value of __NR_Linux_syscalls" * 'parisc-4.6-5' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux: parisc: fix a bug when syscall number of tracee is __NR_Linux_syscalls
2016-05-06Merge tag 'arc-4.6-rc7-fixes' of ↵Linus Torvalds6-35/+130
git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc Pull ARC fixes from Vineet Gupta: "Late in the cycle, but this has fixes for couple of issues: a PAE40 boot crash and Arnd spotting lack of barriers in BE io-accessors. The 3rd patch for enabling highmem in low physical mem ;-) honestly is more than a "fix" but its been in works for some time, seems to be stable in testing and enables 2 of our customers to go forward with 4.6 kernel. - Fix for PTE truncation in PAE40 builds - Fix for big endian IO accessors lacking IO barrier - Allow HIGHMEM to work with low physical addresses" * tag 'arc-4.6-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc: ARC: support HIGHMEM even without PAE40 ARC: Fix PAE40 boot failures due to PTE truncation ARC: Add missing io barriers to io{read,write}{16,32}be()
2016-05-06Merge tag 'powerpc-4.6-5' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fix from Michael Ellerman: "Fix bad inline asm constraint in create_zero_mask() from Anton Blanchard" * tag 'powerpc-4.6-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc: Fix bad inline asm constraint in create_zero_mask()
2016-05-06Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linuxLinus Torvalds11-16/+84
Pull drm fixes from Dave Airlie: "Fixes for i915, amdgpu/radeon and imx. The IMX fix is for an autoloading regression found in Fedora. The radeon fixes, are the same fix to amdgpu/radeon to avoid a hardware lockup in some circumstances with a bad mode, and a double free bug I took a few hours chasing down the other morning. The i915 fixes are across the board, all stable material, and fixing some hangs and suspend/resume issues, along with a live status regressions" * 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: gpu: ipu-v3: Fix imx-ipuv3-crtc module autoloading drm/amdgpu: make sure vertical front porch is at least 1 drm/radeon: make sure vertical front porch is at least 1 drm/amdgpu: set metadata pointer to NULL after freeing. drm/i915: Make RPS EI/thresholds multiple of 25 on SNB-BDW drm/i915: Fake HDMI live status drm/i915: Fix eDP low vswing for Broadwell drm/i915/ddi: Fix eDP VDD handling during booting and suspend/resume drm/i915: Fix system resume if PCI device remained enabled drm/i915: Avoid stalling on pending flips for legacy cursor updates
2016-05-06bridge: fix igmp / mld query parsingLinus Lüssing1-5/+7
With the newly introduced helper functions the skb pulling is hidden in the checksumming function - and undone before returning to the caller. The IGMP and MLD query parsing functions in the bridge still assumed that the skb is pointing to the beginning of the IGMP/MLD message while it is now kept at the beginning of the IPv4/6 header. If there is a querier somewhere else, then this either causes the multicast snooping to stay disabled even though it could be enabled. Or, if we have the querier enabled too, then this can create unnecessary IGMP / MLD query messages on the link. Fixing this by taking the offset between IP and IGMP/MLD header into account, too. Fixes: 9afd85c9e455 ("net: Export IGMP/MLD message validation code") Reported-by: Simon Wunderlich <sw@simonwunderlich.de> Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-06parisc: fix a bug when syscall number of tracee is __NR_Linux_syscallsDmitry V. Levin1-1/+1
Do not load one entry beyond the end of the syscall table when the syscall number of a traced process equals to __NR_Linux_syscalls. Similar bug with regular processes was fixed by commit 3bb457af4fa8 ("[PARISC] Fix bug when syscall nr is __NR_Linux_syscalls"). This bug was found by strace test suite. Cc: stable@vger.kernel.org Signed-off-by: Dmitry V. Levin <ldv@altlinux.org> Acked-by: Helge Deller <deller@gmx.de> Signed-off-by: Helge Deller <deller@gmx.de>
2016-05-06Merge branches 'pm-opp-fixes', 'pm-cpufreq-fixes' and 'pm-cpuidle-fixes'Rafael J. Wysocki5-23/+38
* pm-opp-fixes: PM / OPP: Remove useless check * pm-cpufreq-fixes: intel_pstate: Fix intel_pstate_get() cpufreq: intel_pstate: Fix HWP on boot CPU after system resume cpufreq: st: enable selective initialization based on the platform * pm-cpuidle-fixes: ARM: cpuidle: Pass on arm_cpuidle_suspend()'s return value
2016-05-06Merge branches 'acpica-fixes' and 'device-properties-fixes'Rafael J. Wysocki3-4/+4
* acpica-fixes: ACPICA: Dispatcher: Update thread ID for recursive method calls * device-properties-fixes: device property: Avoid potential dereferences of invalid pointers
2016-05-06Merge tag 'ipvs2-for-v4.7' of ↵Pablo Neira Ayuso2-4/+26
https://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs-next Simon Horman says: ==================== Second Round of IPVS Updates for v4.7 please consider these enhancements to the IPVS. They allow its DoS mitigation strategy effective in conjunction with the SIP persistence engine. ==================== Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-05-06x86/tsc: Read all ratio bits from MSR_PLATFORM_INFOChen Yu1-1/+1
Currently we read the tsc radio: ratio = (MSR_PLATFORM_INFO >> 8) & 0x1f; Thus we get bit 8-12 of MSR_PLATFORM_INFO, however according to the SDM (35.5), the ratio bits are bit 8-15. Ignoring the upper bits can result in an incorrect tsc ratio, which causes the TSC calibration and the Local APIC timer frequency to be incorrect. Fix this problem by masking 0xff instead. [ tglx: Massaged changelog ] Fixes: 7da7c1561366 "x86, tsc: Add static (MSR) TSC calibration on Intel Atom SoCs" Signed-off-by: Chen Yu <yu.c.chen@intel.com> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: stable@vger.kernel.org Cc: Bin Gao <bin.gao@intel.com> Cc: Len Brown <lenb@kernel.org> Link: http://lkml.kernel.org/r/1462505619-5516-1-git-send-email-yu.c.chen@intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-05-06netfilter: conntrack: use a single expectation table for all namespacesFlorian Westphal6-33/+25
We already include netns address in the hash and compare the netns pointers during lookup, so even if namespaces have overlapping addresses entries will be spread across the expectation table. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-05-06netfilter: conntrack: make netns address part of expect hashFlorian Westphal1-7/+10
Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-05-06netfilter: conntrack: check netns when walking expect hashFlorian Westphal3-4/+30
Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-05-06ipvs: make drop_entry protection effective for SIP-peMarco Angaroni2-4/+26
DoS protection policy that deletes connections to avoid out of memory is currently not effective for SIP-pe plus OPS-mode for two reasons: 1) connection templates (holding SIP call-id) are always skipped in ip_vs_random_dropentry() 2) in_pkts counter (used by drop_entry algorithm) is not incremented for connection templates This patch addresses such problems with the following changes: a) connection templates associated (via their dest) to virtual-services configured in OPS mode are included in ip_vs_random_dropentry() monitoring. This applies to SIP-pe over UDP (which requires OPS mode), but is more general principle: when OPS is controlled by templates memory can be used only by templates themselves, since OPS conns are deleted after packet is forwarded. b) OPS connections, if controlled by a template, cause increment of in_pkts counter of their template. This is already happening but only in case director is in master-slave mode (see ip_vs_sync_conn()). Signed-off-by: Marco Angaroni <marcoangaroni@gmail.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2016-05-06i40e: constify i40e_client_ops structureJulia Lawall2-2/+2
The i40e_client_ops structure is never modified, so declare it as const. Done with the help of Coccinelle. Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-05-06i40e: fix misleading indentationArnd Bergmann1-1/+2
Newly added code in i40e_vc_config_promiscuous_mode_msg() is indented in a way that gcc rightly complains about: drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c: In function 'i40e_vc_config_promiscuous_mode_msg': drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c:1543:4: error: this 'if' clause does not guard... [-Werror=misleading-indentation] if (f->vlan >= 0 && f->vlan <= I40E_MAX_VLANID) ^~ drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c:1550:5: note: ...this statement, but the latter is misleadingly indented as if it is guarded by the 'if' aq_err = pf->hw.aq.asq_last_status; From the context, it looks like the aq_err assignment was meant to be inside of the conditional expression, so I'm adding the appropriate curly braces now. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: 5676a8b9cd9a ("i40e: Add VF promiscuous mode driver support") Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-05-06i40e: Test memory before ethtool alloc succeedsJesse Brandeburg1-3/+31
When testing on systems with very limited amounts of RAM, a bug was found where, while changing the number of descriptors using ethtool, the driver didn't test the limits of system memory before permanently assuming it would be able to get receive buffer memory. Work around this issue by pre-allocation of the receive buffer memory, in the "ghost" ring, which is then used during reinit using the new ring length. Change-Id: I92d7a5fb59a6c884b2efdd1ec652845f101c3359 Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-05-06i40evf: Allocate Rx buffers properlyMitch Williams1-4/+1
Allocate the correct number of RX buffers, and don't fiddle with next_to_use. The common RX code handles all of this. This fixes a memory leak of one page each time the driver is opened. Change-Id: Id06eca353086e084921f047acad28c14745684ee Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-05-06i40e/i40evf: Remove unused hardware receive descriptor codeJesse Brandeburg5-62/+27
The hardware supports a 16 byte descriptor for receive, but the driver was never using it in production. There was no performance benefit to the real driver of 16 byte descriptors, so drop a whole lot of complexity while getting rid of the code. Also since the previous patch made us use no-split mode all the time, drop any support in the driver for any other value in dtype and assume it is always zero (aka no-split). Hooray for code removal! Change-ID: I2257e902e4dad84a07b94db6d2e6f4ce69b27bc0 Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-05-06i40evf: refactor receive routineJesse Brandeburg5-513/+481
This is part 2 of the Rx refactor series, just including changes to i40evf. This refactor aligns the receive routine with the one in ixgbe which was highly optimized. This reduces the code we have to maintain and allows for (hopefully) more readable and maintainable RX hot path. In order to do this: - consolidate the receive path into a single function that doesn't use packet split but *does* use pages for Rx buffers. - remove the old _1buf routine - consolidate several routines into helper functions - remove VF ethtool control over packet split - remove priv_flags interface since it is unused Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-05-06i40evf: Drop packet split receive routineJesse Brandeburg7-75/+3
As part of preparation for the rx-refactor, remove the packet split receive routine and ancillary code. Some of the split related context set up code stays in i40e_virtchnl_pf.c in case an older VF driver tries to load and still wants to use packet split. Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-05-06i40e: Refactor receive routineJesse Brandeburg6-303/+531
This is part 1 of the Rx refactor series, just including changes to i40e. This refactor aligns the receive routine with the one in ixgbe which was highly optimized. This reduces the code we have to maintain and allows for (hopefully) more readable and maintainable RX hot path. In order to do this: - consolidate the receive path into a single function that doesn't use packet split but *does* use pages for Rx buffers. - remove the old _1buf routine - consolidate several routines into helper functions - remove ethtool control over packet split Change-ID: I5ca100721de65992aa0114f8b4bac844b84758e0 Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-05-06Merge branch 'akpm' (patches from Andrew)Linus Torvalds16-182/+252
Merge fixes from Andrew Morton: "14 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: byteswap: try to avoid __builtin_constant_p gcc bug lib/stackdepot: avoid to return 0 handle mm: fix kcompactd hang during memory offlining modpost: fix module autoloading for OF devices with generic compatible property proc: prevent accessing /proc/<PID>/environ until it's ready mm/zswap: provide unique zpool name mm: thp: kvm: fix memory corruption in KVM with THP enabled MAINTAINERS: fix Rajendra Nayak's address mm, cma: prevent nr_isolated_* counters from going negative mm: update min_free_kbytes from khugepaged after core initialization huge pagecache: mmap_sem is unlocked when truncation splits pmd rapidio/mport_cdev: fix uapi type definitions mm: memcontrol: let v2 cgroups follow changes in system swappiness mm: thp: correct split_huge_pages file permission
2016-05-06net: bridge: fix old ioctl unlocked net device walkNikolay Aleksandrov1-2/+3
get_bridge_ifindices() is used from the old "deviceless" bridge ioctl calls which aren't called with rtnl held. The comment above says that it is called with rtnl but that is not really the case. Here's a sample output from a test ASSERT_RTNL() which I put in get_bridge_ifindices and executed "brctl show": [ 957.422726] RTNL: assertion failed at net/bridge//br_ioctl.c (30) [ 957.422925] CPU: 0 PID: 1862 Comm: brctl Tainted: G W O 4.6.0-rc4+ #157 [ 957.423009] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.1-20150318_183358- 04/01/2014 [ 957.423009] 0000000000000000 ffff880058adfdf0 ffffffff8138dec5 0000000000000400 [ 957.423009] ffffffff81ce8380 ffff880058adfe58 ffffffffa05ead32 0000000000000001 [ 957.423009] 00007ffec1a444b0 0000000000000400 ffff880053c19130 0000000000008940 [ 957.423009] Call Trace: [ 957.423009] [<ffffffff8138dec5>] dump_stack+0x85/0xc0 [ 957.423009] [<ffffffffa05ead32>] br_ioctl_deviceless_stub+0x212/0x2e0 [bridge] [ 957.423009] [<ffffffff81515beb>] sock_ioctl+0x22b/0x290 [ 957.423009] [<ffffffff8126ba75>] do_vfs_ioctl+0x95/0x700 [ 957.423009] [<ffffffff8126c159>] SyS_ioctl+0x79/0x90 [ 957.423009] [<ffffffff8163a4c0>] entry_SYSCALL_64_fastpath+0x23/0xc1 Since it only reads bridge ifindices, we can use rcu to safely walk the net device list. Also remove the wrong rtnl comment above. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-06VSOCK: do not disconnect socket when peer has shutdown SEND onlyIan Campbell1-20/+1
The peer may be expecting a reply having sent a request and then done a shutdown(SHUT_WR), so tearing down the whole socket at this point seems wrong and breaks for me with a client which does a SHUT_WR. Looking at other socket family's stream_recvmsg callbacks doing a shutdown here does not seem to be the norm and removing it does not seem to have had any adverse effects that I can see. I'm using Stefan's RFC virtio transport patches, I'm unsure of the impact on the vmci transport. Signed-off-by: Ian Campbell <ian.campbell@docker.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Stefan Hajnoczi <stefanha@redhat.com> Cc: Claudio Imbrenda <imbrenda@linux.vnet.ibm.com> Cc: Andy King <acking@vmware.com> Cc: Dmitry Torokhov <dtor@vmware.com> Cc: Jorgen Hansen <jhansen@vmware.com> Cc: Adit Ranadive <aditr@vmware.com> Cc: netdev@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-06net/mlx4_en: Fix endianness bug in IPV6 csum calculationDaniel Jurgens1-1/+1
Use htons instead of unconditionally byte swapping nexthdr. On a little endian systems shifting the byte is correct behavior, but it results in incorrect csums on big endian architectures. Fixes: f8c6455bb04b ('net/mlx4_en: Extend checksum offloading by CHECKSUM COMPLETE') Signed-off-by: Daniel Jurgens <danielj@mellanox.com> Reviewed-by: Carol Soto <clsoto@us.ibm.com> Tested-by: Carol Soto <clsoto@us.ibm.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-06net/mlx4: Avoid wrong virtual mappingsHaggai Abramovsky9-125/+68
The dma_alloc_coherent() function returns a virtual address which can be used for coherent access to the underlying memory. On some architectures, like arm64, undefined behavior results if this memory is also accessed via virtual mappings that are not coherent. Because of their undefined nature, operations like virt_to_page() return garbage when passed virtual addresses obtained from dma_alloc_coherent(). Any subsequent mappings via vmap() of the garbage page values are unusable and result in bad things like bus errors (synchronous aborts in ARM64 speak). The mlx4 driver contains code that does the equivalent of: vmap(virt_to_page(dma_alloc_coherent)), this results in an OOPs when the device is opened. Prevent Ethernet driver to run this problematic code by forcing it to allocate contiguous memory. As for the Infiniband driver, at first we are trying to allocate contiguous memory, but in case of failure roll back to work with fragmented memory. Signed-off-by: Haggai Abramovsky <hagaya@mellanox.com> Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Reported-by: David Daney <david.daney@cavium.com> Tested-by: Sinan Kaya <okaya@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-06mailmap: add John Paul Adrian GlaubitzLinus Torvalds1-0/+1
Apparently patchwork ended up truncating the full name. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-06i40e/i40evf: Remove reference to ring->dtypeJesse Brandeburg3-8/+2
As part of the rx-refactor, the dtype variable in the i40e_ring struct is no longer used, so remove it. Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-05-06i40e: Drop packet split receive routineJesse Brandeburg6-317/+10
As part of preparation for the rx-refactor, remove the packet split receive routine and ancillary code. Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-05-06i40e/i40evf: Refactor tunnel interpretationJesse Brandeburg2-14/+12
Refactor the interpretation of a tunnel. This removes some code and lets us start using the hardware's parsing. Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-05-06Merge branch 'libnvdimm-fixes' of ↵Linus Torvalds2-4/+14
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm fixes from Dan Williams: - a fix for the persistent memory 'struct page' driver. The implementation overlooked the fact that pages are allocated in 2MB units leading to -ENOMEM when establishing some configurations. It's tagged for -stable as the problem was introduced with the initial implementation in 4.5. - The new "error status translation" routine, introduced with the 4.6 updates to the nfit driver, missed a necessary path in acpi_nfit_ctl(). The end result is that we are falsely assuming commands complete successfully when the embedded status says otherwise. * 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: nfit: fix translation of command status results libnvdimm, pfn: fix memmap reservation sizing