summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2017-09-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller359-1658/+2437
Three cases of simple overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-02inetpeer: fix RCU lookup()Eric Dumazet1-3/+6
Excess of seafood or something happened while I cooked the commit adding RB tree to inetpeer. Of course, RCU rules need to be respected or bad things can happen. In this particular loop, we need to read *pp once per iteration, not twice. Fixes: b145425f269a ("inetpeer: remove AVL implementation in favor of RB tree") Reported-by: John Sperbeck <jsperbeck@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-01epoll: fix race between ep_poll_callback(POLLFREE) and ep_free()/ep_remove()Oleg Nesterov1-16/+26
The race was introduced by me in commit 971316f0503a ("epoll: ep_unregister_pollwait() can use the freed pwq->whead"). I did not realize that nothing can protect eventpoll after ep_poll_callback() sets ->whead = NULL, only whead->lock can save us from the race with ep_free() or ep_remove(). Move ->whead = NULL to the end of ep_poll_callback() and add the necessary barriers. TODO: cleanup the ewake/EPOLLEXCLUSIVE logic, it was confusing even before this patch. Hopefully this explains use-after-free reported by syzcaller: BUG: KASAN: use-after-free in debug_spin_lock_before ... _raw_spin_lock_irqsave+0x4a/0x60 kernel/locking/spinlock.c:159 ep_poll_callback+0x29f/0xff0 fs/eventpoll.c:1148 this is spin_lock(eventpoll->lock), ... Freed by task 17774: ... kfree+0xe8/0x2c0 mm/slub.c:3883 ep_free+0x22c/0x2a0 fs/eventpoll.c:865 Fixes: 971316f0503a ("epoll: ep_unregister_pollwait() can use the freed pwq->whead") Reported-by: 范龙飞 <long7573@126.com> Cc: stable@vger.kernel.org Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds119-559/+816
Pull networking fixes from David Miller: 1) Fix handling of pinned BPF map nodes in hash of maps, from Daniel Borkmann. 2) IPSEC ESP error paths leak memory, from Steffen Klassert. 3) We need an RCU grace period before freeing fib6_node objects, from Wei Wang. 4) Must check skb_put_padto() return value in HSR driver, from FLorian Fainelli. 5) Fix oops on PHY probe failure in ftgmac100 driver, from Andrew Jeffery. 6) Fix infinite loop in UDP queue when using SO_PEEK_OFF, from Eric Dumazet. 7) Use after free when tcf_chain_destroy() called multiple times, from Jiri Pirko. 8) Fix KSZ DSA tag layer multiple free of SKBS, from Florian Fainelli. 9) Fix leak of uninitialized memory in sctp_get_sctp_info(), inet_diag_msg_sctpladdrs_fill() and inet_diag_msg_sctpaddrs_fill(). From Stefano Brivio. 10) L2TP tunnel refcount fixes from Guillaume Nault. 11) Don't leak UDP secpath in udp_set_dev_scratch(), from Yossi Kauperman. 12) Revert a PHY layer change wrt. handling of PHY_HALTED state in phy_stop_machine(), it causes regressions for multiple people. From Florian Fainelli. 13) When packets are sent out of br0 we have to clear the offload_fwdq_mark value. 14) Several NULL pointer deref fixes in packet schedulers when their ->init() routine fails. From Nikolay Aleksandrov. 15) Aquantium devices cannot checksum offload correctly when the packet is <= 60 bytes. From Pavel Belous. 16) Fix vnet header access past end of buffer in AF_PACKET, from Benjamin Poirier. 17) Double free in probe error paths of nfp driver, from Dan Carpenter. 18) QOS capability not checked properly in DCB init paths of mlx5 driver, from Huy Nguyen. 19) Fix conflicts between firmware load failure and health_care timer in mlx5, also from Huy Nguyen. 20) Fix dangling page pointer when DMA mapping errors occur in mlx5, from Eran Ben ELisha. 21) ->ndo_setup_tc() in bnxt_en driver doesn't count rings properly, from Michael Chan. 22) Missing MSIX vector free in bnxt_en, also from Michael Chan. 23) Refcount leak in xfrm layer when using sk_policy, from Lorenzo Colitti. 24) Fix copy of uninitialized data in qlge driver, from Arnd Bergmann. 25) bpf_setsockopts() erroneously always returns -EINVAL even on success. Fix from Yuchung Cheng. 26) tipc_rcv() needs to linearize the SKB before parsing the inner headers, from Parthasarathy Bhuvaragan. 27) Fix deadlock between link status updates and link removal in netvsc driver, from Stephen Hemminger. 28) Missed locking of page fragment handling in ESP output, from Steffen Klassert. 29) Fix refcnt leak in ebpf congestion control code, from Sabrina Dubroca. 30) sxgbe_probe_config_dt() doesn't check devm_kzalloc()'s return value, from Christophe Jaillet. 31) Fix missing ipv6 rx_dst_cookie update when rx_dst is updated during early demux, from Paolo Abeni. 32) Several info leaks in xfrm_user layer, from Mathias Krause. 33) Fix out of bounds read in cxgb4 driver, from Stefano Brivio. 34) Properly propagate obsolete state of route upwards in ipv6 so that upper holders like xfrm can see it. From Xin Long. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (118 commits) udp: fix secpath leak bridge: switchdev: Clear forward mark when transmitting packet mlxsw: spectrum: Forbid linking to devices that have uppers wl1251: add a missing spin_lock_init() Revert "net: phy: Correctly process PHY_HALTED in phy_stop_machine()" net: dsa: bcm_sf2: Fix number of CFP entries for BCM7278 kcm: do not attach PF_KCM sockets to avoid deadlock sch_tbf: fix two null pointer dereferences on init failure sch_sfq: fix null pointer dereference on init failure sch_netem: avoid null pointer deref on init failure sch_fq_codel: avoid double free on init failure sch_cbq: fix null pointer dereferences on init failure sch_hfsc: fix null pointer deref and double free on init failure sch_hhf: fix null pointer dereference on init failure sch_multiq: fix double free on init failure sch_htb: fix crash on init failure net/mlx5e: Fix CQ moderation mode not set properly net/mlx5e: Fix inline header size for small packets net/mlx5: E-Switch, Unload the representors in the correct order net/mlx5e: Properly resolve TC offloaded ipv6 vxlan tunnel source address ...
2017-09-01Merge tag 'ceph-for-4.13-rc8' of git://github.com/ceph/ceph-clientLinus Torvalds2-18/+18
Pull ceph fix from Ilya Dryomov: "ceph fscache page locking fix from Zheng, marked for stable" * tag 'ceph-for-4.13-rc8' of git://github.com/ceph/ceph-client: ceph: fix readpage from fscache
2017-09-01Merge branch 'for-linus' of ↵Linus Torvalds2-20/+39
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input Pull input fixes from Dmitry Torokhov: "Just a couple drivers fixes (Synaptics PS/2, Xpad)" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: xpad - fix PowerA init quirk for some gamepad models Input: synaptics - fix device info appearing different on reconnect
2017-09-01selftests: correct define in msg_zerocopy.cWillem de Bruijn1-3/+3
The msg_zerocopy test defines SO_ZEROCOPY if necessary, but its value is inconsistent with the one in asm-generic.h. Correct that. Also convert one error to a warning. When the test is complete, report throughput and close cleanly even if the process did not wait for all completions. Reported-by: Dan Melnic <dmm@fb.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-01Merge tag 'mmc-v4.13-rc7' of ↵Linus Torvalds2-3/+22
git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc Pull two more MMC fixes from Ulf Hansson: "MMC core: - Fix block status codes MMC host: - sdhci-xenon: Fix SD bus voltage select" * tag 'mmc-v4.13-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: mmc: sdhci-xenon: add set_power callback mmc: block: Fix block status codes
2017-09-01doc: document MSG_ZEROCOPYWillem de Bruijn1-0/+257
Documentation for this feature was missing from the patchset. Copied a lot from the netdev 2.1 paper, addressing some small interface changes since then. Changes v1 -> v2 - change email discussion URL format - clarify that u32 counter is per-syscall, unsigned and wraps after UINT_MAX calls - describe errno on send failure specific to MSG_ZEROCOPY - a few very minor rewordings Signed-off-by: Willem de Bruijn <willemb@google.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-01Merge tag 'sound-4.13-rc8' of ↵Linus Torvalds4-3/+17
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "Three regression fixes that should be addressed before the final release: a missing mutex call in OSS PCM emulation ioctl, ASoC rt5670 headset detection breakage, and a regression in simple-card parser code" * tag 'sound-4.13-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: ASoC: simple_card_utils: fix fallback when "label" property isn't present ALSA: pcm: Fix power lock unbalance via OSS emulation ASoC: rt5670: Fix GPIO headset detection regression
2017-09-01bpf: Collapse offset checks in sock_filter_is_valid_accessDavid Ahern1-2/+0
Make sock_filter_is_valid_access consistent with other is_valid_access helpers. Requested-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David Ahern <dsahern@gmail.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-01mvneta: Driver and hardware supports IPv6 offload, so enable itAndrew Pilloud1-1/+1
The mvneta driver and hardware supports IPv6 offload, however it isn't enabled. Set the NETIF_F_IPV6_CSUM feature to inform the network layer that this driver can offload IPV6 TCP and UDP checksums. This change has been tested on an Armada 370 and the feature support confirmed with several device datasheets including the Armada XP and Armada 3700. Signed-off-by: Andrew Pilloud <andrewpilloud@igneoussystems.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-01Merge branch 'for-linus' of ↵Linus Torvalds3-3/+10
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 fixes from Martin Schwidefsky: "Three more bug fixes for v4.13. The two memory management related fixes are quite new, they fix kernel crashes that can be triggered by user space. The third commit fixes a bug in the vfio ccw translation code" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/mm: fix BUG_ON in crst_table_upgrade s390/mm: fork vs. 5 level page tabel vfio: ccw: fix bad ptr math for TIC cda translation
2017-09-01Merge tag 'wireless-drivers-next-for-davem-2017-09-01' of ↵David S. Miller43-401/+931
git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next Kalle Valo says: ==================== wireless-drivers-next patches for 4.14 Few last patches for 4.14, nothing really major here. Major changes: wil6210 * support FW RSSI reporting (by mistake this was accidentally mentioned already in the previous pull request, but now it's really included) * make debugfs optional, adds new Kconfig option CONFIG_WIL6210_DEBUGFS qtnfmac * implement 64-bit DMA support ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-01qlcnic: remove redundant zero check on retries counterColin Ian King1-7/+3
At the end of the do while loop the integer counter retries will always be zero and so the subsequent check to see if it is zero is always true and therefore redundant. Remove the redundant check and always return -EIO on this return path. Also unbreak the literal string in dev_err message to clean up a checkpatch warning. Detected by CoverityScan, CID#744279 ("Logically dead code") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-01Merge branch 'linus' of ↵Linus Torvalds4-5/+24
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto fixes from Herbert Xu: "This fixes the following issues: - Regression in chacha20 handling of chunked input - Crash in algif_skcipher when used with async io - Potential bogus pointer dereference in lib/mpi" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: algif_skcipher - only call put_page on referenced and used pages crypto: testmgr - add chunked test cases for chacha20 crypto: chacha20 - fix handling of chunked input lib/mpi: kunmap after finishing accessing buffer
2017-09-01udp: fix secpath leakYossi Kuperman1-1/+1
After commit dce4551cb2ad ("udp: preserve head state for IP_CMSG_PASSSEC") we preserve the secpath for the whole skb lifecycle, but we also end up leaking a reference to it. We must clear the head state on skb reception, if secpath is present. Fixes: dce4551cb2ad ("udp: preserve head state for IP_CMSG_PASSSEC") Signed-off-by: Yossi Kuperman <yossiku@mellanox.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-01Merge branch 'mdio-mux-Misc-fix'David S. Miller2-16/+5
Corentin Labbe says: ==================== net: mdio-mux: Misc fix This patch series fix minor problems found when working on the dwmac-sun8i syscon mdio-mux. Changes since v1: - Removed obsolete comment about of_mdio_find_bus/put_device - removed more DRV_VERSION ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-01net: mdio-mux: fix unbalanced put_deviceCorentin Labbe1-4/+2
mdio_mux_uninit() call put_device (unconditionally) because of of_mdio_find_bus() in mdio_mux_init. But of_mdio_find_bus is only called if mux_bus is empty. If mux_bus is set, mdio_mux_uninit will print a "refcount_t: underflow" trace. This patch add a get_device in the other branch of "if (mux_bus)". Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-01net: mdio-mux-mmioreg: Can handle 8/16/32 bits registersCorentin Labbe1-1/+1
This patch fix an old information that mdio-mux-mmioreg can only handle 8bit registers. This is not true anymore. Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-01net: mdio-mux: printing driver version is uselessCorentin Labbe1-3/+0
Remove the driver version information because this information is not useful in an upstream kernel driver. Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-01net: mdio-mux: Remove unnecessary 'out of memory' messageCorentin Labbe1-6/+0
This patch fix checkpatch warning about unnecessary 'out of memory' message. Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-01net: mdio-mux: Fix NULL Comparison styleCorentin Labbe1-2/+2
This patch fix checkpatch warning about NULL Comparison style. Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-01bridge: switchdev: Clear forward mark when transmitting packetIdo Schimmel1-0/+3
Commit 6bc506b4fb06 ("bridge: switchdev: Add forward mark support for stacked devices") added the 'offload_fwd_mark' bit to the skb in order to allow drivers to indicate to the bridge driver that they already forwarded the packet in L2. In case the bit is set, before transmitting the packet from each port, the port's mark is compared with the mark stored in the skb's control block. If both marks are equal, we know the packet arrived from a switch device that already forwarded the packet and it's not re-transmitted. However, if the packet is transmitted from the bridge device itself (e.g., br0), we should clear the 'offload_fwd_mark' bit as the mark stored in the skb's control block isn't valid. This scenario can happen in rare cases where a packet was trapped during L3 forwarding and forwarded by the kernel to a bridge device. Fixes: 6bc506b4fb06 ("bridge: switchdev: Add forward mark support for stacked devices") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reported-by: Yotam Gigi <yotamg@mellanox.com> Tested-by: Yotam Gigi <yotamg@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-01Merge branch 'mvpp2-optional-PHYs-and-GoP-link-irq'David S. Miller2-26/+187
Antoine Tenart says: ==================== net: mvpp2: optional PHYs and GoP link irq This series aims at making the driver work when no PHY is connected between a port and the physical layer and not described as a fixed-phy. This is useful for some usecases such as when a switch is connected directly to the serdes lanes. It can also be used for SFP ports on the 7k-db and 8k-db while waiting for the phylink support to land in (which should be part of another series). This series makes the phy optional in the PPv2 driver, and then adds the support for the GoP port link interrupt to handle link status changes on such ports. This was tested using the SFP ports on the 7k-db and 8k-db boards. Since v1: - Now use phy_interface_mode_is_rgmii() in the GoP link patch. - Added one cosmetic patch to take advantage of phy_interface_mode_is_rgmii() in the whole PPv2 driver. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-01Documentation/bindings: net: marvell-pp2: add the link interruptAntoine Tenart1-1/+1
A link interrupt can be described. Document this valid interrupt name. Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com> Tested-by: Marcin Wojtas <mw@semihalf.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-01net: mvpp2: use the GoP interrupt for link status changesAntoine Tenart1-5/+172
This patch adds the GoP link interrupt support for when a port isn't connected to a PHY. Because of this the phylib callback is never called and the link status management isn't done. This patch use the GoP link interrupt in such cases to still have a minimal link management. Without this patch ports not connected to a PHY cannot work. Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com> Tested-by: Marcin Wojtas <mw@semihalf.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-01net: mvpp2: make the phy optionalAntoine Tenart1-8/+11
There is not necessarily a PHY between the GoP and the physical port. However, the driver currently makes the "phy" property mandatory, contrary to what is stated in the device tree bindings. This patch makes the PHY optional, and aligns the PPv2 driver on its device tree documentation. However if a PHY is provided, the GoP link interrupt won't be used. With this patch switches directly connected to the serdes lanes and SFP ports on the Armada 8040-db and Armada 7040-db can be used if the link interrupt is described in the device tree. Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com> Tested-by: Marcin Wojtas <mw@semihalf.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-01net: mvpp2: take advantage of the is_rgmii helperAntoine Tenart1-12/+3
Convert all RGMII checks to use the phy_interface_mode_is_rgmii() helper. This is a cosmetic patch. Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-01Merge branch 'mlxsw-next-fixes'David S. Miller1-4/+1
Jiri Pirko says: ==================== mlxsw: spectrum_router: Couple of fixes Ido Schimmel (2): mlxsw: spectrum_router: Trap packets hitting anycast routes mlxsw: spectrum_router: Set abort trap in all virtual routers ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-01mlxsw: spectrum_router: Set abort trap in all virtual routersIdo Schimmel1-3/+0
When the abort mechanism is invoked a default route directing packets to the CPU is programmed in all the virtual routers currently in use. This can result in packet loss in case a new VRF is configured. Upon abort, program the default route in all virtual routers, whether they are in use or not. The patch is directed at net-next since post-abort fixes aren't critical and packet loss due to a missing default route will be insignificant compared to packet loss caused by the CPU port policer. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-01mlxsw: spectrum_router: Trap packets hitting anycast routesIdo Schimmel1-1/+1
I relied on the fact that anycast routes use the loopback device as their nexthop device to trap packets hitting them to the CPU. After commit 4832c30d5458 ("net: ipv6: put host and anycast routes on device with address") this is no longer the case and such routes are programmed with a forward action (note the 'offload' flag): anycast cafe:: dev enp3s0np7 proto kernel metric 0 offload pref medium This will prevent the router from locally receiving packets destined to the Subnet-Router anycast address. Fix this by specifically programming anycast routes with action trap, which results in the following output: anycast cafe:: dev enp3s0np7 proto kernel metric 0 pref medium Fixes: 4832c30d5458 ("net: ipv6: put host and anycast routes on device with address") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-01mlxsw: spectrum: Forbid linking to devices that have uppersIdo Schimmel3-1/+10
The mlxsw driver relies on NETDEV_CHANGEUPPER events to configure the device in case a port is enslaved to a master netdev such as bridge or bond. Since the driver ignores events unrelated to its ports and their uppers, it's possible to engineer situations in which the device's data path differs from the kernel's. One example to such a situation is when a port is enslaved to a bond that is already enslaved to a bridge. When the bond was enslaved the driver ignored the event - as the bond wasn't one of its uppers - and therefore a bridge port instance isn't created in the device. Until such configurations are supported forbid them by checking that the upper device doesn't have uppers of its own. Fixes: 0d65fc13042f ("mlxsw: spectrum: Implement LAG port join/leave") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reported-by: Nogah Frankel <nogahf@mellanox.com> Tested-by: Nogah Frankel <nogahf@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-01Merge branch 'bpf-Improve-LRU-map-lookup-performance'David S. Miller4-10/+138
Martin KaFai Lau says: ==================== bpf: Improve LRU map lookup performance This patchset improves the lookup performance of the LRU map. Please see individual patch for details. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-01bpf: Only set node->ref = 1 if it has not been setMartin KaFai Lau2-2/+8
This patch writes 'node->ref = 1' only if node->ref is 0. The number of lookups/s for a ~1M entries LRU map increased by ~30% (260097 to 343313). Other writes on 'node->ref = 0' is not changed. In those cases, the same cache line has to be changed anyway. First column: Size of the LRU hash Second column: Number of lookups/s Before: > echo "$((2**20+1)): $(./map_perf_test 1024 1 $((2**20+1)) 10000000 | awk '{print $3}')" 1048577: 260097 After: > echo "$((2**20+1)): $(./map_perf_test 1024 1 $((2**20+1)) 10000000 | awk '{print $3}')" 1048577: 343313 Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-01bpf: Inline LRU map lookupMartin KaFai Lau1-0/+19
Inline the lru map lookup to save the cost in making calls to bpf_map_lookup_elem() and htab_lru_map_lookup_elem(). Different LRU hash size is tested. The benefit diminishes when the cache miss starts to dominate in the bigger LRU hash. Considering the change is simple, it is still worth to optimize. First column: Size of the LRU hash Second column: Number of lookups/s Before: > for i in $(seq 9 20); do echo "$((2**i+1)): $(./map_perf_test 1024 1 $((2**i+1)) 10000000 | awk '{print $3}')"; done 513: 1132020 1025: 1056826 2049: 1007024 4097: 853298 8193: 742723 16385: 712600 32769: 688142 65537: 677028 131073: 619437 262145: 498770 524289: 316695 1048577: 260038 After: > for i in $(seq 9 20); do echo "$((2**i+1)): $(./map_perf_test 1024 1 $((2**i+1)) 10000000 | awk '{print $3}')"; done 513: 1221851 1025: 1144695 2049: 1049902 4097: 884460 8193: 773731 16385: 729673 32769: 721989 65537: 715530 131073: 671665 262145: 516987 524289: 321125 1048577: 260048 Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-01bpf: Add lru_hash_lookup performance testMartin KaFai Lau2-9/+112
Create a new case to test the LRU lookup performance. At the beginning, the LRU map is fully loaded (i.e. the number of keys is equal to map->max_entries). The lookup is done through key 0 to num_map_entries and then repeats from 0 again. This patch also creates an anonymous struct to properly name the test params in stress_lru_hmap_alloc() in map_perf_test_kern.c. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-01Merge branch 'master' of ↵David S. Miller5-40/+91
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2017-09-01 This should be the last ipsec-next pull request for this release cycle: 1) Support netdevice ESP trailer removal when decryption is offloaded. From Yossi Kuperman. 2) Fix overwritten return value of copy_sec_ctx(). Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-01Merge branch 'bpf-Add-option-to-set-mark-and-priority-in-cgroup-sock-programs'David S. Miller5-65/+401
David Ahern says: ==================== bpf: Add option to set mark and priority in cgroup sock programs Add option to set mark and priority in addition to bound device for newly created sockets. Also, allow the bpf programs to use the get_current_uid_gid helper meaning socket marks, priority and device can be set based on the uid/gid of the running process. Sample programs are updated to demonstrate the new options. v3 - no changes to Patches 1 and 2 which Alexei acked in previous versions - dropped change related to recursive programs in a cgroup - updated tests per dropped patch v2 - added flag to control recursive behavior as requested by Alexei - added comment to sock_filter_func_proto regarding use of get_current_uid_gid helper - updated test programs for recursive option ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-01samples/bpf: Update cgroup socket examples to use uid gid helperDavid Ahern2-1/+16
Signed-off-by: David Ahern <dsahern@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-01samples/bpf: Update cgrp2 socket testsDavid Ahern1-38/+124
Update cgrp2 bpf sock tests to check that device, mark and priority can all be set on a socket via bpf programs attached to a cgroup. Signed-off-by: David Ahern <dsahern@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-01samples/bpf: Add option to dump socket settingsDavid Ahern1-2/+73
Add option to dump socket settings. Will be used in the next patch to verify bpf programs are correctly setting mark, priority and device based on the cgroup attachment for the program run. Signed-off-by: David Ahern <dsahern@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-01samples/bpf: Add detach option to test_cgrp2_sockDavid Ahern1-15/+35
Add option to detach programs from a cgroup. Signed-off-by: David Ahern <dsahern@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-01samples/bpf: Update sock test to allow setting mark and priorityDavid Ahern2-17/+119
Update sock test to set mark and priority on socket create. Signed-off-by: David Ahern <dsahern@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-01bpf: Allow cgroup sock filters to use get_current_uid_gid helperDavid Ahern1-1/+15
Allow BPF programs run on sock create to use the get_current_uid_gid helper. IPv4 and IPv6 sockets are created in a process context so there is always a valid uid/gid Signed-off-by: David Ahern <dsahern@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-01bpf: Add mark and priority to sock options that can be setDavid Ahern2-0/+28
Add socket mark and priority to fields that can be set by ebpf program when a socket is created. Signed-off-by: David Ahern <dsahern@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-01Merge tag 'cifs-fixes-for-4.13-rc7-and-stable' of ↵Linus Torvalds2-3/+3
git://git.samba.org/sfrench/cifs-2.6 Pull cifs fixes from Steve French: "Two cifs bug fixes for stable" * tag 'cifs-fixes-for-4.13-rc7-and-stable' of git://git.samba.org/sfrench/cifs-2.6: CIFS: remove endian related sparse warning CIFS: Fix maximum SMB2 header size
2017-09-01Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds4-16/+26
Pull block fixes from Jens Axboe: "Unfortunately a few issues that warrant sending another pull request, even if I had hoped to avoid it. This contains: - A fix for multiqueue xen-blkback, on tear down / disconnect. - A few fixups for NVMe, including a wrong bit definition, fix for host memory buffers, and an nvme rdma page size fix" * 'for-linus' of git://git.kernel.dk/linux-block: nvme: fix the definition of the doorbell buffer config support bit nvme-pci: use dma memory for the host memory buffer descriptors nvme-rdma: default MR page size to 4k xen-blkback: stop blkback thread of every queue in xen_blkif_disconnect
2017-09-01Merge tag 'for-4.13/dm-fixes-2' of ↵Linus Torvalds3-42/+13
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper fixes from Mike Snitzer: - A couple fixes for bugs introduced as part of the blk_status_t block layer changes during the 4.13 merge window - A printk throttling fix to use discrete rate limiting state for each DM log level - A stable@ fix for DM multipath that delays request requeueing to avoid CPU lockup if/when the request queue is "dying" * tag 'for-4.13/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm mpath: do not lock up a CPU with requeuing activity dm: fix printk() rate limiting code dm mpath: retry BLK_STS_RESOURCE errors dm: fix the second dec_pending() argument in __split_and_process_bio()
2017-09-01Merge branch 'akpm' (patches from Andrew)Linus Torvalds7-7/+27
Merge more fixes from Andrew Morton: "6 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: scripts/dtc: fix '%zx' warning include/linux/compiler.h: don't perform compiletime_assert with -O0 mm, madvise: ensure poisoned pages are removed from per-cpu lists mm, uprobes: fix multiple free of ->uprobes_state.xol_area kernel/kthread.c: kthread_worker: don't hog the cpu mm,page_alloc: don't call __node_reclaim() with oom_lock held.