summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)AuthorFilesLines
2016-05-10block/drbd: align properly u64 in nl messagesNicolas Dichtel1-1/+6
The attribute 0 is never used in drbd, so let's use it as pad attribute in netlink messages. This minimizes the patch. Note that this patch is only compile-tested. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-10net: phy: add phy_ethtool_{get|set}_link_ksettingsPhilippe Reynes1-0/+4
Ethtool callbacks {get|set}_link_ksettings are often the same, so we add two generics functions phy_ethtool_{get|set}_link_ksettings to avoid writing severals times the same function. Signed-off-by: Philippe Reynes <tremyfr@gmail.com> Acked-By: David Decotigny <decot@googlers.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-10Merge tag 'linux-can-next-for-4.7-20160509' of ↵David S. Miller1-2/+20
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next Marc Kleine-Budde says: ==================== pull-request: can-next 2016-05-09 this is a pull request of 12 patches for net-next/master. Alexander Gerasiov and Nikita Edward Baruzdin each contribute a patch improving the sja1000 driver. Amitoj Kaur Chawla's patch converts the mcp251x driver to alloc_workqueue(). A patch by Oliver Hartkopp fixes the handling of CAN config options. Andreas Gröger improves the error handling in the janz-ican3 driver. The patch by Maximilian Schneider for the gs_usb improves probing of the USB driver. Finally there are 6 improvement patches by Marek Vasut for the ifi CAN driver. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller5-273/+28
In netdevice.h we removed the structure in net-next that is being changes in 'net'. In macsec.c and rtnetlink.c we have overlaps between fixes in 'net' and the u64 attribute changes in 'net-next'. The mlx5 conflicts have to do with vxlan support dependencies. Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds1-0/+3
Pull networking fixes from David Miller: 1) Check klogctl failure correctly, from Colin Ian King. 2) Prevent OOM when under memory pressure in flowcache, from Steffen Klassert. 3) Fix info leak in llc and rtnetlink ifmap code, from Kangjie Lu. 4) Memory barrier and multicast handling fixes in bnxt_en, from Michael Chan. 5) Endianness bug in mlx5, from Daniel Jurgens. 6) Fix disconnect handling in VSOCK, from Ian Campbell. 7) Fix locking of netdev list walking in get_bridge_ifindices(), from Nikolay Aleksandrov. 8) Bridge multicast MLD parser can look at wrong packet offsets, fix from Linus Lüssing. 9) Fix chip hang in qede driver, from Sudarsana Reddy Kalluru. 10) Fix missing setting of encapsulation before inner handling completes in udp_offload code, from Jarno Rajahalme. 11) Missing rollbacks during LAG join and flood configuration failures in mlxsw driver, from Ido Schimmel. 12) Fix error code checks in netxen driver, from Dan Carpenter. 13) Fix key size in new macsec driver, from Sabrina Dubroca. 14) Fix mlx5/VXLAN dependencies, from Arnd Bergmann. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (29 commits) net/mlx5e: make VXLAN support conditional Revert "net/mlx5: Kconfig: Fix MLX5_EN/VXLAN build issue" macsec: key identifier is 128 bits, not 64 Documentation/networking: more accurate LCO explanation macvtap: segmented packet is consumed tools: bpf_jit_disasm: check for klogctl failure qede: uninitialized variable in qede_start_xmit() netxen: netxen_rom_fast_read() doesn't return -1 netxen: reversed condition in netxen_nic_set_link_parameters() netxen: fix error handling in netxen_get_flash_block() mlxsw: spectrum: Add missing rollback in flood configuration mlxsw: spectrum: Fix rollback order in LAG join failure udp_offload: Set encapsulation before inner completes. udp_tunnel: Remove redundant udp_tunnel_gro_complete(). qede: prevent chip hang when increasing channels net: ipv6: tcp reset, icmp need to consider L3 domain bridge: fix igmp / mld query parsing net: bridge: fix old ioctl unlocked net device walk VSOCK: do not disconnect socket when peer has shutdown SEND only net/mlx4_en: Fix endianness bug in IPV6 csum calculation ...
2016-05-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller1-3/+3
Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following large patchset contains Netfilter updates for your net-next tree. My initial intention was to send you this in two goes but when I looked back twice I already had this burden on top of me. Several updates for IPVS from Marco Angaroni: 1) Allow SIP connections originating from real-servers to be load balanced by the SIP persistence engine as is already implemented in the other direction. 2) Release connections immediately for One-packet-scheduling (OPS) in IPVS, instead of making it via timer and rcu callback. 3) Skip deleting conntracks for each one packet in OPS, and don't call nf_conntrack_alter_reply() since no reply is expected. 4) Enable drop on exhaustion for OPS + SIP persistence. Miscelaneous conntrack updates from Florian Westphal, including fix for hash resize: 5) Move conntrack generation counter out of conntrack pernet structure since this is only used by the init_ns to allow hash resizing. 6) Use get_random_once() from packet path to collect hash random seed instead of our compound. 7) Don't disable BH from ____nf_conntrack_find() for statistics, use NF_CT_STAT_INC_ATOMIC() instead. 8) Fix lookup race during conntrack hash resizing. 9) Introduce clash resolution on conntrack insertion for connectionless protocol. Then, Florian's netns rework to get rid of per-netns conntrack table, thus we use one single table for them all. There was consensus on this change during the NFWS 2015 and, on top of that, it has recently been pointed as a source of multiple problems from unpriviledged netns: 11) Use a single conntrack hashtable for all namespaces. Include netns in object comparisons and make it part of the hash calculation. Adapt early_drop() to consider netns. 12) Use single expectation and NAT hashtable for all namespaces. 13) Use a single slab cache for all namespaces for conntrack objects. 14) Skip full table scanning from nf_ct_iterate_cleanup() if the pernet conntrack counter tells us the table is empty (ie. equals zero). Fixes for nf_tables interval set element handling, support to set conntrack connlabels and allow set names up to 32 bytes. 15) Parse element flags from element deletion path and pass it up to the backend set implementation. 16) Allow adjacent intervals in the rbtree set type for dynamic interval updates. 17) Add support to set connlabel from nf_tables, from Florian Westphal. 18) Allow set names up to 32 bytes in nf_tables. Several x_tables fixes and updates: 19) Fix incorrect use of IS_ERR_VALUE() in x_tables, original patch from Andrzej Hajda. And finally, miscelaneous netfilter updates such as: 20) Disable automatic helper assignment by default. Note this proc knob was introduced by a9006892643a ("netfilter: nf_ct_helper: allow to disable automatic helper assignment") 4 years ago to start moving towards explicit conntrack helper configuration via iptables CT target. 21) Get rid of obsolete and inconsistent debugging instrumentation in x_tables. 22) Remove unnecessary check for null after ip6_route_output(). ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-09compiler-gcc: require gcc 4.8 for powerpc __builtin_bswap16()Josh Poimboeuf1-1/+1
gcc support for __builtin_bswap16() was supposedly added for powerpc in gcc 4.6, and was then later added for other architectures in gcc 4.8. However, Stephen Rothwell reported that attempting to use it on powerpc in gcc 4.6 fails with: lib/vsprintf.c:160:2: error: initializer element is not constant lib/vsprintf.c:160:2: error: (near initialization for 'decpair[0]') lib/vsprintf.c:160:2: error: initializer element is not constant lib/vsprintf.c:160:2: error: (near initialization for 'decpair[1]') ... I'm not entirely sure what those errors mean, but I don't see them on gcc 4.8. So let's consider gcc 4.8 to be the official starting point for __builtin_bswap16(). Arnd Bergmann adds: "I found the commit in gcc-4.8 that replaced the powerpc-specific implementation of __builtin_bswap16 with an architecture-independent one. Apparently the powerpc version (gcc-4.6 and 4.7) just mapped to the lhbrx/sthbrx instructions, so it ended up not being a constant, though the intent of the patch was mainly to add support for the builtin to x86: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52624 has the patch that went into gcc-4.8 and more information." Fixes: 7322dd755e7d ("byteswap: try to avoid __builtin_constant_p gcc bug") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Tested-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-09can: fix handling of unmodifiable configuration optionsOliver Hartkopp1-2/+20
As described in 'can: m_can: tag current CAN FD controllers as non-ISO' (6cfda7fbebe) it is possible to define fixed configuration options by setting the according bit in 'ctrlmode' and clear it in 'ctrlmode_supported'. This leads to the incovenience that the fixed configuration bits can not be passed by netlink even when they have the correct values (e.g. non-ISO, FD). This patch fixes that issue and not only allows fixed set bit values to be set again but now requires(!) to provide these fixed values at configuration time. A valid CAN FD configuration consists of a nominal/arbitration bittiming, a data bittiming and a control mode with CAN_CTRLMODE_FD set - which is now enforced by a new can_validate() function. This fix additionally removed the inconsistency that was prohibiting the support of 'CANFD-only' controller drivers, like the RCar CAN FD. For this reason a new helper can_set_static_ctrlmode() has been introduced to provide a proper interface to handle static enabled CAN controller options. Reported-by: Ramesh Shanmugasundaram <ramesh.shanmugasundaram@bp.renesas.com> Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Reviewed-by: Ramesh Shanmugasundaram <ramesh.shanmugasundaram@bp.renesas.com> Cc: <stable@vger.kernel.org> # >= 3.18 Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2016-05-09net: Add Qualcomm IPC routerCourtney Cavin1-1/+3
Add an implementation of Qualcomm's IPC router protocol, used to communicate with service providing remote processors. Signed-off-by: Courtney Cavin <courtney.cavin@sonymobile.com> Signed-off-by: Bjorn Andersson <bjorn.andersson@sonymobile.com> [bjorn: Cope with 0 being a valid node id and implement RTM_NEWADDR] Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-09soc: qcom: smd: Introduce compile stubsBjorn Andersson1-1/+27
Introduce compile stubs for the SMD API, allowing consumers to be compile tested. Acked-by: Andy Gross <andy.gross@linaro.org> Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-07udp_offload: Set encapsulation before inner completes.Jarno Rajahalme1-0/+3
UDP tunnel segmentation code relies on the inner offsets being set for an UDP tunnel GSO packet, but the inner *_complete() functions will set the inner offsets only if 'encapsulation' is set before calling them. Currently, udp_gro_complete() sets 'encapsulation' only after the inner *_complete() functions are done. This causes the inner offsets having invalid values after udp_gro_complete() returns, which in turn will make it impossible to properly segment the packet in case it needs to be forwarded, which would be visible to the user either as invalid packets being sent or as packet loss. This patch fixes this by setting skb's 'encapsulation' in udp_gro_complete() before calling into the inner complete functions, and by making each possible UDP tunnel gro_complete() callback set the inner_mac_header to the beginning of the tunnel payload. Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Reviewed-by: Alexander Duyck <aduyck@mirantis.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-06bpf: wire in data and data_end for cls_act_bpfAlexei Starovoitov1-0/+16
allow cls_bpf and act_bpf programs access skb->data and skb->data_end pointers. The bpf helpers that change skb->data need to update data_end pointer as well. The verifier checks that programs always reload data, data_end pointers after calls to such bpf helpers. We cannot add 'data_end' pointer to struct qdisc_skb_cb directly, since it's embedded as-is by infiniband ipoib, so wrapper struct is needed. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-06Merge tag 'pm+acpi-4.6-rc7' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management and ACPI fixes from Rafael Wysocki: "Fixes for problems introduced or discovered recently (intel_pstate, sti-cpufreq, ARM64 cpuidle, Operating Performance Points framework, generic device properties framework) and one fix for a hotplug-related deadlock in ACPICA that's been there forever, but is nasty enough. Specifics: - Fix for a recent regression in the intel_pstate driver causing it to fail to restore the HWP (HW-managed P-states) configuration of the boot CPU after suspend-to-RAM (Rafael Wysocki). - Fix for two recent regressions in the intel_pstate driver, one that can trigger a divide by zero if the driver is accessed via sysfs before it manages to take the first sample and one causing it to fail to update a structure field used in a trace point, so the information coming from it is less useful (Rafael Wysocki). - Fix for a problem in the sti-cpufreq driver introduced during the 4.5 cycle that causes it to break CPU PM in multi-platform kernels by registering cpufreq-dt (which subsequently doesn't work) unconditionally and preventing the driver that would actually work from registering (Sudeep Holla). - Stable-candidate fix for an ARM64 cpuidle issue causing idle state usage counters to be incorrectly updated for idle states that were not entered due to errors (James Morse). - Fix for a recently introduced issue in the OPP (Operating Performance Points) framework causing it to print bogus error messages for missing optional regulators (Viresh Kumar). - Fix for a recently introduced issue in the generic device properties framework that may cause it to attempt to dereferece and invalid pointer in some cases (Heikki Krogerus). - Fix for a deadlock in the ACPICA core that may be triggered by device (eg Thunderbolt) hotplug (Prarit Bhargava)" * tag 'pm+acpi-4.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: PM / OPP: Remove useless check ACPICA: Dispatcher: Update thread ID for recursive method calls intel_pstate: Fix intel_pstate_get() cpufreq: intel_pstate: Fix HWP on boot CPU after system resume cpufreq: st: enable selective initialization based on the platform ARM: cpuidle: Pass on arm_cpuidle_suspend()'s return value device property: Avoid potential dereferences of invalid pointers
2016-05-06Merge branches 'acpica-fixes' and 'device-properties-fixes'Rafael J. Wysocki1-1/+1
* acpica-fixes: ACPICA: Dispatcher: Update thread ID for recursive method calls * device-properties-fixes: device property: Avoid potential dereferences of invalid pointers
2016-05-06net/mlx4: Avoid wrong virtual mappingsHaggai Abramovsky1-2/+2
The dma_alloc_coherent() function returns a virtual address which can be used for coherent access to the underlying memory. On some architectures, like arm64, undefined behavior results if this memory is also accessed via virtual mappings that are not coherent. Because of their undefined nature, operations like virt_to_page() return garbage when passed virtual addresses obtained from dma_alloc_coherent(). Any subsequent mappings via vmap() of the garbage page values are unusable and result in bad things like bus errors (synchronous aborts in ARM64 speak). The mlx4 driver contains code that does the equivalent of: vmap(virt_to_page(dma_alloc_coherent)), this results in an OOPs when the device is opened. Prevent Ethernet driver to run this problematic code by forcing it to allocate contiguous memory. As for the Infiniband driver, at first we are trying to allocate contiguous memory, but in case of failure roll back to work with fragmented memory. Signed-off-by: Haggai Abramovsky <hagaya@mellanox.com> Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Reported-by: David Daney <david.daney@cavium.com> Tested-by: Sinan Kaya <okaya@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-06mm: thp: kvm: fix memory corruption in KVM with THP enabledAndrea Arcangeli1-0/+22
After the THP refcounting change, obtaining a compound pages from get_user_pages() no longer allows us to assume the entire compound page is immediately mappable from a secondary MMU. A secondary MMU doesn't want to call get_user_pages() more than once for each compound page, in order to know if it can map the whole compound page. So a secondary MMU needs to know from a single get_user_pages() invocation when it can map immediately the entire compound page to avoid a flood of unnecessary secondary MMU faults and spurious atomic_inc()/atomic_dec() (pages don't have to be pinned by MMU notifier users). Ideally instead of the page->_mapcount < 1 check, get_user_pages() should return the granularity of the "page" mapping in the "mm" passed to get_user_pages(). However it's non trivial change to pass the "pmd" status belonging to the "mm" walked by get_user_pages up the stack (up to the caller of get_user_pages). So the fix just checks if there is not a single pte mapping on the page returned by get_user_pages, and in turn if the caller can assume that the whole compound page is mapped in the current "mm" (in a pmd_trans_huge()). In such case the entire compound page is safe to map into the secondary MMU without additional get_user_pages() calls on the surrounding tail/head pages. In addition of being faster, not having to run other get_user_pages() calls also reduces the memory footprint of the secondary MMU fault in case the pmd split happened as result of memory pressure. Without this fix after a MADV_DONTNEED (like invoked by QEMU during postcopy live migration or balloning) or after generic swapping (with a failure in split_huge_page() that would only result in pmd splitting and not a physical page split), KVM would map the whole compound page into the shadow pagetables, despite regular faults or userfaults (like UFFDIO_COPY) may map regular pages into the primary MMU as result of the pte faults, leading to the guest mode and userland mode going out of sync and not working on the same memory at all times. Any other secondary MMU notifier manager (KVM is just one of the many MMU notifier users) will need the same information if it doesn't want to run a flood of get_user_pages_fast and it can support multiple granularity in the secondary MMU mappings, so I think it is justified to be exposed not just to KVM. The other option would be to move transparent_hugepage_adjust to mm/huge_memory.c but that currently has all kind of KVM data structures in it, so it's definitely not a cut-and-paste work, so I couldn't do a fix as cleaner as this one for 4.6. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: "Li, Liang Z" <liang.z.li@intel.com> Cc: Amit Shah <amit.shah@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-06rapidio/mport_cdev: fix uapi type definitionsAlexandre Bounine1-271/+0
Fix problems in uapi definitions reported by Gabriel Laskar: (see https://lkml.org/lkml/2016/4/5/205 for details) - move public header file rio_mport_cdev.h to include/uapi/linux directory - change types in data structures passed as IOCTL parameters - improve parameter checking in some IOCTL service routines Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com> Reported-by: Gabriel Laskar <gabriel@lse.epita.fr> Tested-by: Barry Wood <barry.wood@idt.com> Cc: Gabriel Laskar <gabriel@lse.epita.fr> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Aurelien Jacquiot <a-jacquiot@ti.com> Cc: Andre van Herk <andre.van.herk@prodrive-technologies.com> Cc: Barry Wood <barry.wood@idt.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-06mm: memcontrol: let v2 cgroups follow changes in system swappinessJohannes Weiner1-0/+4
Cgroup2 currently doesn't have a per-cgroup swappiness setting. We might want to add one later - that's a different discussion - but until we do, the cgroups should always follow the system setting. Otherwise it will be unchangeably set to whatever the ancestor inherited from the system setting at the time of cgroup creation. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Vladimir Davydov <vdavydov@virtuozzo.com> Cc: <stable@vger.kernel.org> [4.5] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-04tcp: fix lockdep splat in tcp_snd_una_update()Eric Dumazet1-0/+14
tcp_snd_una_update() and tcp_rcv_nxt_update() call u64_stats_update_begin() either from process context or BH handler. This triggers a lockdep splat on 32bit & SMP builds. We could add u64_stats_update_begin_bh() variant but this would slow down 32bit builds with useless local_disable_bh() and local_enable_bh() pairs, since we own the socket lock at this point. I add sock_owned_by_me() helper to have proper lockdep support even on 64bit builds, and new u64_stats_update_begin_raw() and u64_stats_update_end_raw methods. Fixes: c10d9310edf5 ("tcp: do not assume TCP code is non preemptible") Reported-by: Fabio Estevam <festevam@gmail.com> Diagnosed-by: Francois Romieu <romieu@fr.zoreil.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Tested-by: Fabio Estevam <fabio.estevam@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-04net: remove dev->trans_startFlorian Westphal1-10/+5
previous patches removed all direct accesses to dev->trans_start, so change the netif_trans_update helper to update trans_start of netdev queue 0 instead and then remove trans_start from struct net_device. AFAICS a lot of the netif_trans_update() invocations are now useless because they occur in ndo_start_xmit and driver doesn't set LLTX (i.e. stack already took care of the update). As I can't test any of them it seems better to just leave them alone. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-04netdevice: add helper to update trans_startFlorian Westphal1-0/+6
trans_start exists twice: - as member of net_device (legacy) - as member of netdev_queue In order to get rid of the legacy case, add a helper for the dev->trans_update (this patch), then convert spots that do dev->trans_start = jiffies to use this helper (next patch). This would then allow us to change the helper so that it updates the trans_stamp of netdev queue 0 instead. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-04net/mlx5: Flow steering, Add vport ACL supportMohamad Haj Yahia3-0/+21
Update the relevant flow steering device structs and commands to support vport. Update the flow steering core API to receive vport number. Add ingress and egress ACL flow table name spaces. Add ACL flow table support: * ACL (Access Control List) flow table is a table that contains only allow/drop steering rules. * We have two types of ACL flow tables - ingress and egress. * ACLs handle traffic sent from/to E-Switch FDB table, Ingress refers to traffic sent from Vport to E-Switch and Egress refers to traffic sent from E-Switch to vport. * Ingress ACL flow table allow/drop rules is checked against traffic sent from VF. * Egress ACL flow table allow/drop rules is checked against traffic sent to VF. Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-04Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller13-20/+66
Conflicts: net/ipv4/ip_gre.c Minor conflicts between tunnel bug fixes in net and ipv6 tunnel cleanups in net-next. Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-04Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds1-0/+5
Pull networking fixes from David Miller: "Some straggler bug fixes: 1) Batman-adv DAT must consider VLAN IDs when choosing candidate nodes, from Antonio Quartulli. 2) Fix botched reference counting of vlan objects and neigh nodes in batman-adv, from Sven Eckelmann. 3) netem can crash when it sees GSO packets, the fix is to segment then upon ->enqueue. Fix from Neil Horman with help from Eric Dumazet. 4) Fix VXLAN dependencies in mlx5 driver Kconfig, from Matthew Finlay. 5) Handle VXLAN ops outside of rcu lock, via a workqueue, in mlx5, since it can sleep. Fix also from Matthew Finlay. 6) Check mdiobus_scan() return values properly in pxa168_eth and macb drivers. From Sergei Shtylyov. 7) If the netdevice doesn't support checksumming, disable segmentation. From Alexandery Duyck. 8) Fix races between RDS tcp accept and sending, from Sowmini Varadhan. 9) In macb driver, probe MDIO bus before we register the netdev, otherwise we can try to open the device before it is really ready for that. Fix from Florian Fainelli. 10) Netlink attribute size for ILA "tunnels" not calculated properly, fix from Nicolas Dichtel" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: ipv6/ila: fix nlsize calculation for lwtunnel net: macb: Probe MDIO bus before registering netdev RDS: TCP: Synchronize accept() and connect() paths on t_conn_lock. RDS:TCP: Synchronize rds_tcp_accept_one with rds_send_xmit when resetting t_sock vxlan: Add checksum check to the features check function net: Disable segmentation if checksumming is not supported net: mvneta: Remove superfluous SMP function call macb: fix mdiobus_scan() error check pxa168_eth: fix mdiobus_scan() error check net/mlx5e: Use workqueue for vxlan ops net/mlx5e: Implement a mlx5e workqueue net/mlx5: Kconfig: Fix MLX5_EN/VXLAN build issue net/mlx5: Unmap only the relevant IO memory mapping netem: Segment GSO packets on enqueue batman-adv: Fix reference counting of hardif_neigh_node object for neigh_node batman-adv: Fix reference counting of vlan object for tt_local_entry batman-adv: B.A.T.M.A.N V - make sure iface is reactivated upon NETDEV_UP event batman-adv: fix DAT candidate selection (must use vid)
2016-05-03vxlan: Add checksum check to the features check functionAlexander Duyck1-0/+5
We need to perform an additional check on the inner headers to determine if we can offload the checksum for them. Previously this check didn't occur so we would generate an invalid frame in the case of an IPv6 header encapsulated inside of an IPv4 tunnel. To fix this I added a secondary check to vxlan_features_check so that we can verify that we can offload the inner checksum. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-03net: relax expensive skb_unclone() in iptunnel_handle_offloads()Eric Dumazet1-0/+10
Locally generated TCP GSO packets having to go through a GRE/SIT/IPIP tunnel have to go through an expensive skb_unclone() Reallocating skb->head is a lot of work. Test should really check if a 'real clone' of the packet was done. TCP does not care if the original gso_type is changed while the packet travels in the stack. This adds skb_header_unclone() which is a variant of skb_clone() using skb_header_cloned() check instead of skb_cloned(). This variant can probably be used from other points. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-03netdevice: shrink size of struct netdev_queueFlorian Westphal1-7/+6
- trans_timeout is incremented when tx queue timed out (tx watchdog). - tx_maxrate is set via sysfs Moving tx_maxrate to read-mostly part shrinks the struct by 64 bytes. While at it, also move trans_timeout (it is out-of-place in the 'write-mostly' part). Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-02Minimal fix-up of bad hashing behavior of hash_64()Linus Torvalds1-2/+18
This is a fairly minimal fixup to the horribly bad behavior of hash_64() with certain input patterns. In particular, because the multiplicative value used for the 64-bit hash was intentionally bit-sparse (so that the multiply could be done with shifts and adds on architectures without hardware multipliers), some bits did not get spread out very much. In particular, certain fairly common bit ranges in the input (roughly bits 12-20: commonly with the most information in them when you hash things like byte offsets in files or memory that have block factors that mean that the low bits are often zero) would not necessarily show up much in the result. There's a bigger patch-series brewing to fix up things more completely, but this is the fairly minimal fix for the 64-bit hashing problem. It simply picks a much better constant multiplier, spreading the bits out a lot better. NOTE! For 32-bit architectures, the bad old hash_64() remains the same for now, since 64-bit multiplies are expensive. The bigger hashing cleanup will replace the 32-bit case with something better. The new constants were picked by George Spelvin who wrote that bigger cleanup series. I just picked out the constants and part of the comment from that series. Cc: stable@vger.kernel.org Cc: George Spelvin <linux@horizon.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds3-3/+12
Pull networking fixes from David Miller: 1) MODULE_FIRMWARE firmware string not correct for iwlwifi 8000 chips, from Sara Sharon. 2) Fix SKB size checks in batman-adv stack on receive, from Sven Eckelmann. 3) Leak fix on mac80211 interface add error paths, from Johannes Berg. 4) Cannot invoke napi_disable() with BH disabled in myri10ge driver, fix from Stanislaw Gruszka. 5) Fix sign extension problem when computing feature masks in net_gso_ok(), from Marcelo Ricardo Leitner. 6) lan78xx driver doesn't count packets and packet lengths in its statistics properly, fix from Woojung Huh. 7) Fix the buffer allocation sizes in pegasus USB driver, from Petko Manolov. 8) Fix refcount overflows in bpf, from Alexei Starovoitov. 9) Unified dst cache handling introduced a preempt warning in ip_tunnel, fix by resetting rather then setting the cached route. From Paolo Abeni. 10) Listener hash collision test fix in soreuseport, from Craig Gallak * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (47 commits) gre: do not pull header in ICMP error processing net: Implement net_dbg_ratelimited() for CONFIG_DYNAMIC_DEBUG case tipc: only process unicast on intended node cxgb3: fix out of bounds read net/smscx5xx: use the device tree for mac address soreuseport: Fix TCP listener hash collision net: l2tp: fix reversed udp6 checksum flags ip_tunnel: fix preempt warning in ip tunnel creation/updating samples/bpf: fix trace_output example bpf: fix check_map_func_compatibility logic bpf: fix refcnt overflow drivers: net: cpsw: use of_phy_connect() in fixed-link case dt: cpsw: phy-handle, phy_id, and fixed-link are mutually exclusive drivers: net: cpsw: don't ignore phy-mode if phy-handle is used drivers: net: cpsw: fix segfault in case of bad phy-handle drivers: net: cpsw: fix parsing of phy-handle DT property in dual_emac config MAINTAINERS: net: Change maintainer for GRETH 10/100/1G Ethernet MAC device driver gre: reject GUE and FOU in collect metadata mode pegasus: fixes reported packet length pegasus: fixes URB buffer allocation size; ...
2016-05-02qed: add infrastructure for device self tests.Sudarsana Reddy Kalluru1-0/+47
This patch adds the functionality and APIs needed for selftests. It adds the ability to configure the link-mode which is required for the implementation of loopback tests. It adds the APIs for clock test, register test, interrupt test and memory test. Signed-off-by: Sudarsana Reddy Kalluru <sudarsana.kalluru@qlogic.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-02net: Implement net_dbg_ratelimited() for CONFIG_DYNAMIC_DEBUG caseTim Bingham1-1/+9
Prior to commit d92cff89a0c8 ("net_dbg_ratelimited: turn into no-op when !DEBUG") the implementation of net_dbg_ratelimited() was buggy for both the DEBUG and CONFIG_DYNAMIC_DEBUG cases. The bug was that net_ratelimit() was being called and, despite returning true, nothing was being printed to the console. This resulted in messages like the following - "net_ratelimit: %d callbacks suppressed" with no other output nearby. After commit d92cff89a0c8 ("net_dbg_ratelimited: turn into no-op when !DEBUG") the bug is fixed for the DEBUG case. However, there's no output at all for CONFIG_DYNAMIC_DEBUG case. This patch restores debug output (if enabled) for the CONFIG_DYNAMIC_DEBUG case. Add a definition of net_dbg_ratelimited() for the CONFIG_DYNAMIC_DEBUG case. The implementation takes care to check that dynamic debugging is enabled before calling net_ratelimit(). Fixes: d92cff89a0c8 ("net_dbg_ratelimited: turn into no-op when !DEBUG") Signed-off-by: Tim Bingham <tbingham@akamai.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-30Merge tag 'for-linus' of ↵Linus Torvalds1-0/+11
git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull rdma fixes from Doug Ledford: "Final set of -rc fixes for 4.6. I've collected up a number of patches that are all pretty small with the exception of only a couple. The hfi1 driver has a number of important patches, and it is what really drives the line count of this pull request up. These are all small and I've got this kernel built and running in the test lab (I have most of the hardware, I think nes is the only thing in this patch set that I can't say I've personally tested and have up and running). Summary: - A number of collected fixes for oopses, memory corruptions, deadlocks, etc. All of these fixes are small (many only 5-10 lines), obvious, and tested. - Fix for the security issue related to the use of write for bi-directional communications" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: RDMA/nes: don't leak skb if carrier down IB/security: Restrict use of the write() interface IB/hfi1: Use kernel default llseek for ui device IB/hfi1: Don't attempt to free resources if initialization failed IB/hfi1: Fix missing lock/unlock in verbs drain callback IB/rdmavt: Fix send scheduling IB/hfi1: Prevent unpinning of wrong pages IB/hfi1: Fix deadlock caused by locking with wrong scope IB/hfi1: Prevent NULL pointer deferences in caching code MAINTAINERS: Update iser/isert maintainer contact info IB/mlx5: Expose correct max_sge_rd limit RDMA/iw_cxgb4: Fix bar2 virt addr calculation for T4 chips iw_cxgb4: handle draining an idle qp iw_cxgb3: initialize ibdev.iwcm->ifname for port mapping iw_cxgb4: initialize ibdev.iwcm->ifname for port mapping IB/core: Don't drain non-existent rq queue-pair IB/core: Fix oops in ib_cache_gid_set_default_gid
2016-04-29net/mlx5: Initializing CPU reverse mappingMaor Gottlieb1-0/+3
Allocating CPU rmap and add entry for each IRQ. CPU rmap is used in aRFS to get the RX queue number of the RX completion interrupts. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-29net/mlx5: Add user chosen levels when allocating flow tablesMaor Gottlieb1-2/+4
Currently, consumers of the flow steering infrastructure can't choose their own flow table levels and are limited to one flow table per level. This just waste levels. Instead, we introduce here the possibility to use multiple flow tables in a level. The user is free to connect these flow tables, while following the rule (FTEs in FT of level x could only point to FTs of level y where y > x). In addition this patch switch the order of the create/destroy flow tables of the NIC(vlan and main). Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-29net/mlx5: Introduce modify flow rule destinationMaor Gottlieb1-0/+3
This API is used for modifying the flow rule destination. This is needed for modifying the pointed flow table by the traffic type classifier rules to point on the aRFS tables. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-29net: constify is_skb_forwardable's argumentsNikolay Aleksandrov1-1/+2
is_skb_forwardable is not supposed to change anything so constify its arguments Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-29Merge branch 'akpm' (patches from Andrew)Linus Torvalds2-0/+9
Merge fixes from Andrew Morton: "20 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: Documentation/sysctl/vm.txt: update numa_zonelist_order description lib/stackdepot.c: allow the stack trace hash to be zero rapidio: fix potential NULL pointer dereference mm/memory-failure: fix race with compound page split/merge ocfs2/dlm: return zero if deref_done message is successfully handled Ananth has moved kcov: don't profile branches in kcov kcov: don't trace the code coverage code mm: wake kcompactd before kswapd's short sleep .mailmap: add Frank Rowand mm/hwpoison: fix wrong num_poisoned_pages accounting mm: call swap_slot_free_notify() with page lock held mm: vmscan: reclaim highmem zone if buffer_heads is over limit numa: fix /proc/<pid>/numa_maps for THP mm/huge_memory: replace VM_NO_THP VM_BUG_ON with actual VMA check mailmap: fix Krzysztof Kozlowski's misspelled name thp: keep huge zero page pinned until tlb flush mm: exclude HugeTLB pages from THP page_mapped() logic kexec: export OFFSET(page.compound_head) to find out compound tail page kexec: update VMCOREINFO for compound_order/dtor
2016-04-29netfilter: fix IS_ERR_VALUE usagePablo Neira Ayuso1-3/+3
This is a forward-port of the original patch from Andrzej Hajda, he said: "IS_ERR_VALUE should be used only with unsigned long type. Otherwise it can work incorrectly. To achieve this function xt_percpu_counter_alloc is modified to return unsigned long, and its result is assigned to temporary variable to perform error checking, before assigning to .pcnt field. The patch follows conclusion from discussion on LKML [1][2]. [1]: http://permalink.gmane.org/gmane.linux.kernel/2120927 [2]: http://permalink.gmane.org/gmane.linux.kernel/2150581" Original patch from Andrzej is here: http://patchwork.ozlabs.org/patch/582970/ This patch has clashed with input validation fixes for x_tables. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-04-29Merge branch 'locking-urgent-for-linus' of ↵Linus Torvalds1-3/+5
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fixes from Ingo Molnar: "Two lockdep fixes" * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: lockdep: Fix lock_chain::base size locking/lockdep: Fix ->irq_context calculation
2016-04-29numa: fix /proc/<pid>/numa_maps for THPGerald Schaefer1-0/+2
In gather_pte_stats() a THP pmd is cast into a pte, which is wrong because the layouts may differ depending on the architecture. On s390 this will lead to inaccurate numa_maps accounting in /proc because of misguided pte_present() and pte_dirty() checks on the fake pte. On other architectures pte_present() and pte_dirty() may work by chance, but there may be an issue with direct-access (dax) mappings w/o underlying struct pages when HAVE_PTE_SPECIAL is set and THP is available. In vm_normal_page() the fake pte will be checked with pte_special() and because there is no "special" bit in a pmd, this will always return false and the VM_PFNMAP | VM_MIXEDMAP checking will be skipped. On dax mappings w/o struct pages, an invalid struct page pointer would then be returned that can crash the kernel. This patch fixes the numa_maps THP handling by introducing new "_pmd" variants of the can_gather_numa_stats() and vm_normal_page() functions. Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Jerome Marchand <jmarchan@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com> Cc: <stable@vger.kernel.org> [4.3+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-29thp: keep huge zero page pinned until tlb flushKirill A. Shutemov1-0/+5
Andrea has found[1] a race condition on MMU-gather based TLB flush vs split_huge_page() or shrinker which frees huge zero under us (patch 1/2 and 2/2 respectively). With new THP refcounting, we don't need patch 1/2: mmu_gather keeps the page pinned until flush is complete and the pin prevents the page from being split under us. We still need patch 2/2. This is simplified version of Andrea's patch. We don't need fancy encoding. [1] http://lkml.kernel.org/r/1447938052-22165-1-git-send-email-aarcange@redhat.com Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reported-by: Andrea Arcangeli <aarcange@redhat.com> Reviewed-by: Andrea Arcangeli <aarcange@redhat.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-29mm: exclude HugeTLB pages from THP page_mapped() logicSteve Capper1-0/+2
HugeTLB pages cannot be split, so we use the compound_mapcount to track rmaps. Currently page_mapped() will check the compound_mapcount, but will also go through the constituent pages of a THP compound page and query the individual _mapcount's too. Unfortunately, page_mapped() does not distinguish between HugeTLB and THP compound pages and assumes that a compound page always needs to have HPAGE_PMD_NR pages querying. For most cases when dealing with HugeTLB this is just inefficient, but for scenarios where the HugeTLB page size is less than the pmd block size (e.g. when using contiguous bit on ARM) this can lead to crashes. This patch adjusts the page_mapped function such that we skip the unnecessary THP reference checks for HugeTLB pages. Fixes: e1534ae95004 ("mm: differentiate page_mapped() from page_mapcount() for compound pages") Signed-off-by: Steve Capper <steve.capper@arm.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-29Merge branch 'for-linus' of ↵Linus Torvalds2-6/+5
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull Ceph fixes from Sage Weil: "There is a lifecycle fix in the auth code, a fix for a narrow race condition on map, and a helpful message in the log when there is a feature mismatch (which happens frequently now that the default server-side options have changed)" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: rbd: report unsupported features to syslog rbd: fix rbd map vs notify races libceph: make authorizer destruction independent of ceph_auth_client
2016-04-29bpf: fix refcnt overflowAlexei Starovoitov1-1/+2
On a system with >32Gbyte of phyiscal memory and infinite RLIMIT_MEMLOCK, the malicious application may overflow 32-bit bpf program refcnt. It's also possible to overflow map refcnt on 1Tb system. Impose 32k hard limit which means that the same bpf program or map cannot be shared by more than 32k processes. Fixes: 1be7f75d1668 ("bpf: enable non-root eBPF programs") Reported-by: Jann Horn <jannh@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-28tcp: remove SKBTX_ACK_TSTAMP since it is redundantSoheil Hassas Yeganeh1-5/+1
The SKBTX_ACK_TSTAMP flag is set in skb_shinfo->tx_flags when the timestamp of the TCP acknowledgement should be reported on error queue. Since accessing skb_shinfo is likely to incur a cache-line miss at the time of receiving the ack, the txstamp_ack bit was added in tcp_skb_cb, which is set iff the SKBTX_ACK_TSTAMP flag is set for an skb. This makes SKBTX_ACK_TSTAMP flag redundant. Remove the SKBTX_ACK_TSTAMP and instead use the txstamp_ack bit everywhere. Note that this frees one bit in shinfo->tx_flags. Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Suggested-by: Willem de Bruijn <willemb@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-28net: fix net_gso_ok for new GSO types.Marcelo Ricardo Leitner1-1/+1
Fix casting in net_gso_ok. Otherwise the shift on gso_type << NETIF_F_GSO_SHIFT may hit the 32th bit and make it look like a INT_MIN, which is then promoted from signed to uint64 which is 0xffffffff80000000, resulting in wrong behavior when it is and'ed with the feature itself, as in: This test app: #include <stdio.h> #include <stdint.h> int main(int argc, char **argv) { uint64_t feature1; uint64_t feature2; int gso_type = 1 << 15; feature1 = gso_type << 16; feature2 = (uint64_t)gso_type << 16; printf("%lx %lx\n", feature1, feature2); return 0; } Gives: ffffffff80000000 80000000 So that this: return (features & feature) == feature; Actually works on more bits than expected and invalid ones. Fix is to promote it earlier. Issue noted while rebasing SCTP GSO patch but posting separetely as someone else may experience this meanwhile. Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-28IB/mlx5: Expose correct max_sge_rd limitSagi Grimberg1-0/+11
mlx5 devices (Connect-IB, ConnectX-4, ConnectX-4-LX) has a limitation where rdma read work queue entries cannot exceed 512 bytes. A rdma_read wqe needs to fit in 512 bytes: - wqe control segment (16 bytes) - rdma segment (16 bytes) - scatter elements (16 bytes each) So max_sge_rd should be: (512 - 16 - 16) / 16 = 30. Cc: linux-stable@vger.kernel.org Reported-by: Christoph Hellwig <hch@lst.de> Tested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagig@grimberg.me> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-04-28net-rfs: fix false sharing accessing sd->input_queue_headEric Dumazet1-2/+6
sd->input_queue_head is incremented for each processed packet in process_backlog(), and read from other cpus performing Out Of Order avoidance in get_rps_cpu() Moving this field in a separate cache line keeps it mostly hot for the cpu in process_backlog(), as other cpus will only read it. In a stress test, process_backlog() was consuming 6.80 % of cpu cycles, and the patch reduced the cost to 0.65 % Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-28device property: Avoid potential dereferences of invalid pointersHeikki Krogerus1-1/+1
Since fwnode may hold ERR_PTR(-ENODEV) or it may be NULL, the fwnode type checks is_of_node(), is_acpi_node() and is is_pset_node() need to consider it. Using IS_ERR_OR_NULL() to check it. Fixes: 0d67e0fa1664 (device property: fix for a case of use-after-free) Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> [ rjw: Subject & changelog ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-04-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller6-13/+16
Minor overlapping changes in the conflicts. In the macsec case, the change of the default ID macro name overlapped with the 64-bit netlink attribute alignment fixes in net-next. Signed-off-by: David S. Miller <davem@davemloft.net>