summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2014-04-16net: mdio-gpio: Add support for active low gpio pinsGuenter Roeck2-6/+16
Some systems using mdio-gpio may use active-low gpio pins (eg with inverters or FETs connected to all or some of the gpio pins). Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-16net: mdio-gpio: Use devm_ functions where possibleGuenter Roeck1-14/+5
This simplifies error path and deinit/removal functions. Signed-off-by: Guenter Roeck <linux@roeck-us.net> Tested-by: Chris Healy <cphealy@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-16Merge branch 'fib_validate_loopback'David S. Miller8-18/+16
Cong Wang says: ==================== ipv4: fix flowi4_iif for input routing This patchset fixes ->flowi4_iif for input routing and rp filter, based on suggestion from Julian. See per patch for details. v1 -> v2: * merge the first two patches into one * fix fib_check_nh() too * add this cover letter ==================== Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Cong Wang <cwang@twopensource.com> Reviewed-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-16ipv4, route: pass 0 instead of LOOPBACK_IFINDEX to fib_validate_source()Cong Wang1-2/+1
In my special case, when a packet is redirected from veth0 to lo, its skb->dev->ifindex would be LOOPBACK_IFINDEX. Meanwhile we pass the hard-coded LOOPBACK_IFINDEX to fib_validate_source() in ip_route_input_slow(). This would cause the following check in fib_validate_source() fail: (dev->ifindex != oif || !IN_DEV_TX_REDIRECTS(idev)) when rp_filter is disabeld on loopback. As suggested by Julian, the caller should pass 0 here so that we will not end up by calling __fib_validate_source(). Cc: Eric Biederman <ebiederm@xmission.com> Cc: Julian Anastasov <ja@ssi.bg> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Cong Wang <cwang@twopensource.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-16ipv4, fib: pass LOOPBACK_IFINDEX instead of 0 to flowi4_iifCong Wang7-16/+15
As suggested by Julian: Simply, flowi4_iif must not contain 0, it does not look logical to ignore all ip rules with specified iif. because in fib_rule_match() we do: if (rule->iifindex && (rule->iifindex != fl->flowi_iif)) goto out; flowi4_iif should be LOOPBACK_IFINDEX by default. We need to move LOOPBACK_IFINDEX to include/net/flow.h: 1) It is mostly used by flowi_iif 2) Fix the following compile error if we use it in flow.h by the patches latter: In file included from include/linux/netfilter.h:277:0, from include/net/netns/netfilter.h:5, from include/net/net_namespace.h:21, from include/linux/netdevice.h:43, from include/linux/icmpv6.h:12, from include/linux/ipv6.h:61, from include/net/ipv6.h:16, from include/linux/sunrpc/clnt.h:27, from include/linux/nfs_fs.h:30, from init/do_mounts.c:32: include/net/flow.h: In function ‘flowi4_init_output’: include/net/flow.h:84:32: error: ‘LOOPBACK_IFINDEX’ undeclared (first use in this function) Cc: Eric Biederman <ebiederm@xmission.com> Cc: Julian Anastasov <ja@ssi.bg> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Cong Wang <cwang@twopensource.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-16mlx4_en: don't use napi_synchronize inside mlx4_en_netpollChris Mason3-7/+1
The mlx4 driver is triggering schedules while atomic inside mlx4_en_netpoll: spin_lock_irqsave(&cq->lock, flags); napi_synchronize(&cq->napi); ^^^^^ msleep here mlx4_en_process_rx_cq(dev, cq, 0); spin_unlock_irqrestore(&cq->lock, flags); This was part of a patch by Alexander Guller from Mellanox in 2011, but it still isn't upstream. Signed-off-by: Chris Mason <clm@fb.com> cc: stable@vger.kernel.org Acked-By: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-16Merge branch 'mvneta_qsgmii'David S. Miller3-40/+38
Thomas Petazzoni says: ==================== net: mvneta: fix usage as a module, and support QSGMII properly This set of patches is a new attempt at fixing the operation of the mvneta driver when built as a module. For the record, the previous attempt, merged in commit e3a8786c10e75903f1269474e21fe8cb49c3a670 ('net: mvneta: fix usage as a module on RGMII configurations') caused problems for all RGMII configurations. In fact, it turned out that the MAC to PHY connection on the Armada XP GP, which was described as using RGMII-ID according to its Device Tree, is in fact a QSGMII connection. And the RGMII and QSGMII configurations have to be handled in a different way in the driver, because the SERDES configuration is different in those two cases. So, this patch series fixes that by: * Adding minimal handling of a "qsgmii" connection type in the PHY layer. Mainly to make sure that a "qsgmii" phy-mode in the Device Tree is recognized, and handed over to the driver as PHY_INTERFACE_QSGMII. * Changing the mvneta driver to properly configure the RGMIIEn and PCSEn bits in the GMAC_CTRL_2 register, and configure the SERDES register, in the three possible cases: RGMII, SGMII and QSGMII. * Updating the Device Tree of the Armada XP GP board to reflect the fact that it uses a QSGMII MAC/PHY connection. PATCH 1 and 2 would be merged by David Miller, through the net tree, while PATCH 3 would be merged by the mach-mvebu maintainers, through their tree and arm-soc. This set of patches has been tested on: * Armada XP GP (four QSGMII interfaces) * Armada XP DB (two RGMII interfaces and two SGMII interfaces) * Armada 370 Mirabox (two RGMII interfaces) I've tested both the driver built-in, and compiled as a module. Since the last attempt at fixing this was quite a fiasco, I'd like this new attempt to be tested more widely before being applied. I'll try to do some testing on other Armada boards I have, but independent testing from other persons would also be appreciated. Note that these patches apply after reverting the previous attempt, obviously. ==================== Tested-by: Arnaud Ebalard <arno@natisbad.org> Tested-by: Willy Tarreau <w@1wt.eu> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-16net: mvneta: properly configure the MAC <-> PHY connection in all situationsThomas Petazzoni1-39/+34
Commit 5445eaf309ff ('mvneta: Try to fix mvneta when compiled as module') fixed the mvneta driver to make it work properly when loaded as a module in SGMII configuration, which was tested successful by the author on the Armada XP OpenBlocks AX3, which uses SGMII. However, some other platforms, namely the Armada XP GP don't use SGMII, but a QSGMII connection between the MAC and the PHY, and this case was not supported by the mvneta driver, which was relying on configuration put in place by the bootloader. While this works when the mvneta driver is built-in (because clocks are not gated), it breaks when mvneta is built as a module, because the clock is gated (all configuration is lost) and then re-enabled when the mvneta driver is loaded. In order to support all of RGMII, SGMII and QSGMII, this commit reworks how the PHY interface configuration is done, and simplifies it: it removes the mvneta_port_sgmii_config() and mvneta_gmac_rgmii_set() functions, which were strange because mvneta_gmac_rgmii_set() was called in all cases, even for SGMII configurations. Also, the mvneta_gmac_rgmii_set() function was taking a boolean as argument, which was always true. Instead, all the PHY interface configuration logic is moved into the mvneta_port_power_up() function, in a much simpler 'switch' construct, with four cases: - QSGMII: the RGMIIEn bit, the PCSEn bit in GMAC_CTRL_2 are set, and the SERDES is configured in QSGMII. Technically speaking, configuring the SERDES of the first port would be sufficient, but it is simpler to do it on all ports. - SGMII: the RGMIIEn bit, the PCSEn bit in GMAC_CTRL_2 are set, and the SERDES is configured as SGMII. - RGMII: the RGMIIEn bit in GMAC_CTRL_2 is set. The PCSEn bit is kept cleared, and no SERDES configuration is done, because RGMII is not using SERDES lanes. - other: an error is returned. For this reason, the mvneta_port_power_up() now returns an int instead of nothing, and the return value is checked by mvneta_probe(). This has been successfully tested on: * Armada XP DB, which has two RGMII and two SGMII connections * Armada XP GP, which uses QSGMII for its four interfaces * Armada 370 Mirabox, which has two RGMII connections Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-16net: phy: add minimal support for QSGMII PHYThomas Petazzoni2-1/+4
This commit adds the necessary definitions for the PHY layer to recognize "qsgmii" as a valid PHY interface. A QSMII interface, as defined at http://en.wikipedia.org/wiki/Media_Independent_Interface#Quad_Serial_Gigabit_Media_Independent_Interface, is "is a method of combining four SGMII lines into a 5Gbit/s interface. QSGMII, like SGMII, uses LVDS signalling for the TX and RX data and a single LVDS clock signal. QSGMII uses significantly fewer signal lines than four SGMII busses." This type of MAC <-> PHY connection might require special handling on the MAC driver side, so it should be possible to express this type of MAC <-> PHY connection, for example in the Device Tree. Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Cc: devicetree@vger.kernel.org Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-16sfc:On MCDI timeout, issue an FLR (and mark MCDI to fail-fast)Edward Cree10-22/+133
When an MCDI command times out (whether or not we find it completed when we poll), call efx_mcdi_abandon(), which tells all subsequent MCDI calls to fail-fast, and queues up an FLR. Because an FLR doesn't lead to receiving any reboot even from the MC (unlike most other types of reset), we have to call efx_ef10_reset_mc_allocations. In efx_start_all(), if a reset (of any kind) is pending, we bail out. Without this, attempts to reconfigure (e.g. change mtu) can cause driver/mc state inconsistency if the first MCDI call triggers an FLR. For similar reasons, on EF10, in efx_reset_down(method=RESET_TYPE_MCDI_TIMEOUT), set the number of active queues to zero before calling efx_stop_all(). And, on farch, in efx_reset_up(method=RESET_TYPE_MCDI_TIMEOUT), set active_queues and flushes pending & outstanding to zero. efx_mcdi_mode_{poll,event}() should not take us out of fail-fast mode. Instead, this is done by efx_mcdi_reset() after the FLR completes. The new FLR reset_type RESET_TYPE_MCDI_TIMEOUT doesn't really fit into the hierarchy of reset 'scopes' whereby efx_reset() decides some resets subsume others. Thus, it uses separate logic. Also, fixed up some inconsistency around RESET_TYPE_MC_BIST, which was in the wrong place in that hierarchy. Signed-off-by: Shradha Shah <sshah@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds63-279/+454
Pull networking fixes from David Miller: 1) Fix BPF filter validation of netlink attribute accesses, from Mathias Kruase. 2) Netfilter conntrack generation seqcount not initialized properly, from Andrey Vagin. 3) Fix comparison mask computation on big-endian in nft_cmp_fast(), from Patrick McHardy. 4) Properly limit MTU over ipv6, from Eric Dumazet. 5) Fix seccomp system call argument population on 32-bit, from Daniel Borkmann. 6) skb_network_protocol() should not use hard-coded ETH_HLEN, instead skb->mac_len needs to be used. From Vlad Yasevich. 7) We have several cases of using socket based communications to implement a tunnel. For example, some tunnels are encapsulations over UDP so we use an internal kernel UDP socket to do the transmits. These tunnels should behave just like other software devices and pass the packets on down to the next layer. Most importantly we want the top-level socket (eg TCP) that created the traffic to be charged for the SKB memory. However, once you get into the IP output path, we have code that assumed that whatever was attached to skb->sk is an IP socket. To keep the top-level socket being charged for the SKB memory, whilst satisfying the needs of the IP output path, we now pass in an explicit 'sk' argument. From Eric Dumazet. 8) ping_init_sock() leaks group info, from Xiaoming Wang. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (33 commits) cxgb4: use the correct max size for firmware flash qlcnic: Fix MSI-X initialization code ip6_gre: don't allow to remove the fb_tunnel_dev ipv4: add a sock pointer to dst->output() path. ipv4: add a sock pointer to ip_queue_xmit() driver/net: cosa driver uses udelay incorrectly at86rf230: fix __at86rf230_read_subreg function at86rf230: remove check if AVDD settled net: cadence: Add architecture dependencies net: Start with correct mac_len in skb_network_protocol Revert "net: sctp: Fix a_rwnd/rwnd management to reflect real state of the receiver's buffer" cxgb4: Save the correct mac addr for hw-loopback connections in the L2T net: filter: seccomp: fix wrong decoding of BPF_S_ANC_SECCOMP_LD_W seccomp: fix populating a0-a5 syscall args in 32-bit x86 BPF qlcnic: Do not disable SR-IOV when VFs are assigned to VMs qlcnic: Fix QLogic application/driver interface for virtual NIC configuration qlcnic: Fix PVID configuration on eSwitch port. qlcnic: Fix max ring count calculation qlcnic: Fix to send INIT_NIC_FUNC as first mailbox. qlcnic: Fix panic due to uninitialzed delayed_work struct in use. ...
2014-04-15cxgb4: use the correct max size for firmware flashSteve Wise1-1/+1
The wrong max fw size was being used and causing false "too big" errors running ethtool -f. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-15qlcnic: Fix MSI-X initialization codeAlexander Gordeev1-12/+16
Function qlcnic_setup_tss_rss_intr() might enter endless loop in case pci_enable_msix() contiguously returns a positive number of MSI-Xs that could have been allocated. Besides, the function contains 'err = -EIO;' assignment that never could be reached. This update fixes the aforementioned issues. Cc: Shahed Shaikh <shahed.shaikh@qlogic.com> Cc: Dept-HSGLinuxNICDev@qlogic.com Cc: netdev@vger.kernel.org Cc: linux-pci@vger.kernel.org Signed-off-by: Alexander Gordeev <agordeev@redhat.com> Acked-by: Shahed Shaikh <shahed.shaikh@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-15ip6_gre: don't allow to remove the fb_tunnel_devNicolas Dichtel1-0/+10
It's possible to remove the FB tunnel with the command 'ip link del ip6gre0' but this is unsafe, the module always supposes that this device exists. For example, ip6gre_tunnel_lookup() may use it unconditionally. Let's add a rtnl handler for dellink, which will never remove the FB tunnel (we let ip6gre_destroy_tunnels() do the job). Introduced by commit c12b395a4664 ("gre: Support GRE over IPv6"). CC: Dmitry Kozlov <xeb@mail.ru> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-15ipv4: add a sock pointer to dst->output() path.Eric Dumazet19-46/+74
In the dst->output() path for ipv4, the code assumes the skb it has to transmit is attached to an inet socket, specifically via ip_mc_output() : The sk_mc_loop() test triggers a WARN_ON() when the provider of the packet is an AF_PACKET socket. The dst->output() method gets an additional 'struct sock *sk' parameter. This needs a cascade of changes so that this parameter can be propagated from vxlan to final consumer. Fixes: 8f646c922d55 ("vxlan: keep original skb ownership") Reported-by: lucien xin <lucien.xin@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-15ipv4: add a sock pointer to ip_queue_xmit()Eric Dumazet10-13/+13
ip_queue_xmit() assumes the skb it has to transmit is attached to an inet socket. Commit 31c70d5956fc ("l2tp: keep original skb ownership") changed l2tp to not change skb ownership and thus broke this assumption. One fix is to add a new 'struct sock *sk' parameter to ip_queue_xmit(), so that we do not assume skb->sk points to the socket used by l2tp tunnel. Fixes: 31c70d5956fc ("l2tp: keep original skb ownership") Reported-by: Zhan Jianyu <nasa4836@gmail.com> Tested-by: Zhan Jianyu <nasa4836@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-15driver/net: cosa driver uses udelay incorrectlyLi, Zhen-Hua1-4/+0
In cosa driver, udelay with more than 20000 may cause __bad_udelay. Use msleep for instead. Signed-off-by: Li, Zhen-Hua <zhen-hual@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-15at86rf230: fix __at86rf230_read_subreg functionAlexander Aring1-1/+1
The __at86rf230_read_subreg function don't mask and shift register contents which it should do. This patch adds the necessary masks and shift operations in this function. Since we have csma support this can make some trouble on state changes. Since CSMA support turned on some bits in the TRX_STATUS register that used to be zero, not masking broke checking of the TRX_STATUS field after commanding a state change. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Reviewed-by: Werner Almesberger <werner@almesberger.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-15at86rf230: remove check if AVDD settledAlexander Aring1-8/+0
The AVDD regulator is only enabled when the RF section is active TX_ON (PLL_ON) state. Since commit 7dcbd22a97eb0689e6c583ad630ae0e7341e34c1 ("ieee802154: ensure that first RF212 state comes from TRX_OFF"). We are in TRX_OFF state at the time at86rf230_hw_init is run. Note that this test would only fail in case of a severe hardware malfunction (faulty/shorted power supply, etc.) so it wasn't all that useful in the first place. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Reviewed-by: Werner Almesberger <werner@almesberger.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-15net: cadence: Add architecture dependenciesJean Delvare1-3/+3
The Cadence ethernet chipsets are only used on specific ARM architectures. Add Kconfig dependencies so that drivers for these chipsets are only buildable on the relevant architectures. Signed-off-by: Jean Delvare <jdelvare@suse.de> Cc: Nicolas Ferre <nicolas.ferre@atmel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-15Merge git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds9-29/+113
Pull KVM fixes from Marcelo Tosatti: - Fix for guest triggerable BUG_ON (CVE-2014-0155) - CR4.SMAP support - Spurious WARN_ON() fix * git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: x86: remove WARN_ON from get_kernel_ns() KVM: Rename variable smep to cr4_smep KVM: expose SMAP feature to guest KVM: Disable SMAP for guests in EPT realmode and EPT unpaging mode KVM: Add SMAP support when setting CR4 KVM: Remove SMAP bit from CR4_RESERVED_BITS KVM: ioapic: try to recover if pending_eoi goes out of range KVM: ioapic: fix assignment of ioapic->rtc_status.pending_eoi (CVE-2014-0155)
2014-04-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6Linus Torvalds1-5/+5
Pull bmc2835 crypto fix from Herbert Xu: "This fixes a potential boot crash on bcm2835 due to the recent change that now causes hardware RNGs to be accessed on registration" * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: hwrng: bcm2835 - fix oops when rng h/w is accessed during registration
2014-04-15user namespace: fix incorrect memory barriersMikulas Patocka1-6/+5
smp_read_barrier_depends() can be used if there is data dependency between the readers - i.e. if the read operation after the barrier uses address that was obtained from the read operation before the barrier. In this file, there is only control dependency, no data dependecy, so the use of smp_read_barrier_depends() is incorrect. The code could fail in the following way: * the cpu predicts that idx < entries is true and starts executing the body of the for loop * the cpu fetches map->extent[0].first and map->extent[0].count * the cpu fetches map->nr_extents * the cpu verifies that idx < extents is true, so it commits the instructions in the body of the for loop The problem is that in this scenario, the cpu read map->extent[0].first and map->nr_extents in the wrong order. We need a full read memory barrier to prevent it. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller7-25/+15
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains three Netfilter fixes for your net tree, they are: * Fix missing generation sequence initialization which results in a splat if lockdep is enabled, it was introduced in the recent works to improve nf_conntrack scalability, from Andrey Vagin. * Don't flush the GRE keymap list in nf_conntrack when the pptp helper is disabled otherwise this crashes due to a double release, from Andrey Vagin. * Fix nf_tables cmp fast in big endian, from Patrick McHardy. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-15net: Start with correct mac_len in skb_network_protocolVlad Yasevich1-1/+1
Sometimes, when the packet arrives at skb_mac_gso_segment() its skb->mac_len already accounts for some of the mac lenght headers in the packet. This seems to happen when forwarding through and OpenSSL tunnel. When we start looking for any vlan headers in skb_network_protocol() we seem to ignore any of the already known mac headers and start with an ETH_HLEN. This results in an incorrect offset, dropped TSO frames and general slowness of the connection. We can start counting from the known skb->mac_len and return at least that much if all mac level headers are known and accounted for. Fixes: 53d6471cef17262d3ad1c7ce8982a234244f68ec (net: Account for all vlan headers in skb_mac_gso_segment) CC: Eric Dumazet <eric.dumazet@gmail.com> CC: Daniel Borkman <dborkman@redhat.com> Tested-by: Martin Filip <nexus+kernel@smoula.net> Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-15KVM: x86: remove WARN_ON from get_kernel_ns()Marcelo Tosatti1-1/+0
Function and callers can be preempted. https://bugzilla.kernel.org/show_bug.cgi?id=73721 Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
2014-04-15KVM: Rename variable smep to cr4_smepFeng Wu1-3/+3
Rename variable smep to cr4_smep, which can better reflect the meaning of the variable. Signed-off-by: Feng Wu <feng.wu@intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2014-04-15KVM: expose SMAP feature to guestFeng Wu1-1/+1
This patch exposes SMAP feature to guest Signed-off-by: Feng Wu <feng.wu@intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2014-04-15KVM: Disable SMAP for guests in EPT realmode and EPT unpaging modeFeng Wu1-5/+6
SMAP is disabled if CPU is in non-paging mode in hardware. However KVM always uses paging mode to emulate guest non-paging mode with TDP. To emulate this behavior, SMAP needs to be manually disabled when guest switches to non-paging mode. Signed-off-by: Feng Wu <feng.wu@intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2014-04-15KVM: Add SMAP support when setting CR4Feng Wu5-13/+84
This patch adds SMAP handling logic when setting CR4 for guests Thanks a lot to Paolo Bonzini for his suggestion to use the branchless way to detect SMAP violation. Signed-off-by: Feng Wu <feng.wu@intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2014-04-15KVM: Remove SMAP bit from CR4_RESERVED_BITSFeng Wu1-1/+1
This patch removes SMAP bit from CR4_RESERVED_BITS. Signed-off-by: Feng Wu <feng.wu@intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2014-04-15Revert "net: sctp: Fix a_rwnd/rwnd management to reflect real state of the ↵Daniel Borkmann5-25/+87
receiver's buffer" This reverts commit ef2820a735f7 ("net: sctp: Fix a_rwnd/rwnd management to reflect real state of the receiver's buffer") as it introduced a serious performance regression on SCTP over IPv4 and IPv6, though a not as dramatic on the latter. Measurements are on 10Gbit/s with ixgbe NICs. Current state: [root@Lab200slot2 ~]# iperf3 --sctp -4 -c 192.168.241.3 -V -l 1452 -t 60 iperf version 3.0.1 (10 January 2014) Linux Lab200slot2 3.14.0 #1 SMP Thu Apr 3 23:18:29 EDT 2014 x86_64 Time: Fri, 11 Apr 2014 17:56:21 GMT Connecting to host 192.168.241.3, port 5201 Cookie: Lab200slot2.1397238981.812898.548918 [ 4] local 192.168.241.2 port 38616 connected to 192.168.241.3 port 5201 Starting Test: protocol: SCTP, 1 streams, 1452 byte blocks, omitting 0 seconds, 60 second test [ ID] Interval Transfer Bandwidth [ 4] 0.00-1.09 sec 20.8 MBytes 161 Mbits/sec [ 4] 1.09-2.13 sec 10.8 MBytes 86.8 Mbits/sec [ 4] 2.13-3.15 sec 3.57 MBytes 29.5 Mbits/sec [ 4] 3.15-4.16 sec 4.33 MBytes 35.7 Mbits/sec [ 4] 4.16-6.21 sec 10.4 MBytes 42.7 Mbits/sec [ 4] 6.21-6.21 sec 0.00 Bytes 0.00 bits/sec [ 4] 6.21-7.35 sec 34.6 MBytes 253 Mbits/sec [ 4] 7.35-11.45 sec 22.0 MBytes 45.0 Mbits/sec [ 4] 11.45-11.45 sec 0.00 Bytes 0.00 bits/sec [ 4] 11.45-11.45 sec 0.00 Bytes 0.00 bits/sec [ 4] 11.45-11.45 sec 0.00 Bytes 0.00 bits/sec [ 4] 11.45-12.51 sec 16.0 MBytes 126 Mbits/sec [ 4] 12.51-13.59 sec 20.3 MBytes 158 Mbits/sec [ 4] 13.59-14.65 sec 13.4 MBytes 107 Mbits/sec [ 4] 14.65-16.79 sec 33.3 MBytes 130 Mbits/sec [ 4] 16.79-16.79 sec 0.00 Bytes 0.00 bits/sec [ 4] 16.79-17.82 sec 5.94 MBytes 48.7 Mbits/sec (etc) [root@Lab200slot2 ~]# iperf3 --sctp -6 -c 2001:db8:0:f101::1 -V -l 1400 -t 60 iperf version 3.0.1 (10 January 2014) Linux Lab200slot2 3.14.0 #1 SMP Thu Apr 3 23:18:29 EDT 2014 x86_64 Time: Fri, 11 Apr 2014 19:08:41 GMT Connecting to host 2001:db8:0:f101::1, port 5201 Cookie: Lab200slot2.1397243321.714295.2b3f7c [ 4] local 2001:db8:0:f101::2 port 55804 connected to 2001:db8:0:f101::1 port 5201 Starting Test: protocol: SCTP, 1 streams, 1400 byte blocks, omitting 0 seconds, 60 second test [ ID] Interval Transfer Bandwidth [ 4] 0.00-1.00 sec 169 MBytes 1.42 Gbits/sec [ 4] 1.00-2.00 sec 201 MBytes 1.69 Gbits/sec [ 4] 2.00-3.00 sec 188 MBytes 1.58 Gbits/sec [ 4] 3.00-4.00 sec 174 MBytes 1.46 Gbits/sec [ 4] 4.00-5.00 sec 165 MBytes 1.39 Gbits/sec [ 4] 5.00-6.00 sec 199 MBytes 1.67 Gbits/sec [ 4] 6.00-7.00 sec 163 MBytes 1.36 Gbits/sec [ 4] 7.00-8.00 sec 174 MBytes 1.46 Gbits/sec [ 4] 8.00-9.00 sec 193 MBytes 1.62 Gbits/sec [ 4] 9.00-10.00 sec 196 MBytes 1.65 Gbits/sec [ 4] 10.00-11.00 sec 157 MBytes 1.31 Gbits/sec [ 4] 11.00-12.00 sec 175 MBytes 1.47 Gbits/sec [ 4] 12.00-13.00 sec 192 MBytes 1.61 Gbits/sec [ 4] 13.00-14.00 sec 199 MBytes 1.67 Gbits/sec (etc) After patch: [root@Lab200slot2 ~]# iperf3 --sctp -4 -c 192.168.240.3 -V -l 1452 -t 60 iperf version 3.0.1 (10 January 2014) Linux Lab200slot2 3.14.0+ #1 SMP Mon Apr 14 12:06:40 EDT 2014 x86_64 Time: Mon, 14 Apr 2014 16:40:48 GMT Connecting to host 192.168.240.3, port 5201 Cookie: Lab200slot2.1397493648.413274.65e131 [ 4] local 192.168.240.2 port 50548 connected to 192.168.240.3 port 5201 Starting Test: protocol: SCTP, 1 streams, 1452 byte blocks, omitting 0 seconds, 60 second test [ ID] Interval Transfer Bandwidth [ 4] 0.00-1.00 sec 240 MBytes 2.02 Gbits/sec [ 4] 1.00-2.00 sec 239 MBytes 2.01 Gbits/sec [ 4] 2.00-3.00 sec 240 MBytes 2.01 Gbits/sec [ 4] 3.00-4.00 sec 239 MBytes 2.00 Gbits/sec [ 4] 4.00-5.00 sec 245 MBytes 2.05 Gbits/sec [ 4] 5.00-6.00 sec 240 MBytes 2.01 Gbits/sec [ 4] 6.00-7.00 sec 240 MBytes 2.02 Gbits/sec [ 4] 7.00-8.00 sec 239 MBytes 2.01 Gbits/sec With the reverted patch applied, the SCTP/IPv4 performance is back to normal on latest upstream for IPv4 and IPv6 and has same throughput as 3.4.2 test kernel, steady and interval reports are smooth again. Fixes: ef2820a735f7 ("net: sctp: Fix a_rwnd/rwnd management to reflect real state of the receiver's buffer") Reported-by: Peter Butler <pbutler@sonusnet.com> Reported-by: Dongsheng Song <dongsheng.song@gmail.com> Reported-by: Fengguang Wu <fengguang.wu@intel.com> Tested-by: Peter Butler <pbutler@sonusnet.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Matija Glavinic Pecotic <matija.glavinic-pecotic.ext@nsn.com> Cc: Alexander Sverdlin <alexander.sverdlin@nsn.com> Cc: Vlad Yasevich <vyasevich@gmail.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-15cxgb4: Save the correct mac addr for hw-loopback connections in the L2TSteve Wise1-1/+3
Hardware needs the local device mac address to support hw loopback for rdma loopback connections. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-15net: filter: seccomp: fix wrong decoding of BPF_S_ANC_SECCOMP_LD_WDaniel Borkmann2-2/+0
While reviewing seccomp code, we found that BPF_S_ANC_SECCOMP_LD_W has been wrongly decoded by commit a8fc927780 ("sk-filter: Add ability to get socket filter program (v2)") into the opcode BPF_LD|BPF_B|BPF_ABS although it should have been decoded as BPF_LD|BPF_W|BPF_ABS. In practice, this should not have much side-effect though, as such conversion is/was being done through prctl(2) PR_SET_SECCOMP. Reverse operation PR_GET_SECCOMP will only return the current seccomp mode, but not the filter itself. Since the transition to the new BPF infrastructure, it's also not used anymore, so we can simply remove this as it's unreachable. Fixes: a8fc927780 ("sk-filter: Add ability to get socket filter program (v2)") Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Cc: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-15seccomp: fix populating a0-a5 syscall args in 32-bit x86 BPFDaniel Borkmann1-9/+8
Linus reports that on 32-bit x86 Chromium throws the following seccomp resp. audit log messages: audit: type=1326 audit(1397359304.356:28108): auid=500 uid=500 gid=500 ses=2 subj=unconfined_u:unconfined_r:chrome_sandbox_t:s0-s0:c0.c1023 pid=3677 comm="chrome" exe="/opt/google/chrome/chrome" sig=0 syscall=172 compat=0 ip=0xb2dd9852 code=0x30000 audit: type=1326 audit(1397359304.356:28109): auid=500 uid=500 gid=500 ses=2 subj=unconfined_u:unconfined_r:chrome_sandbox_t:s0-s0:c0.c1023 pid=3677 comm="chrome" exe="/opt/google/chrome/chrome" sig=0 syscall=5 compat=0 ip=0xb2dd9852 code=0x50000 These audit messages are being triggered via audit_seccomp() through __secure_computing() in seccomp mode (BPF) filter with seccomp return codes 0x30000 (== SECCOMP_RET_TRAP) and 0x50000 (== SECCOMP_RET_ERRNO) during filter runtime. Moreover, Linus reports that x86_64 Chromium seems fine. The underlying issue that explains this is that the implementation of populate_seccomp_data() is wrong. Our seccomp data structure sd that is being shared with user ABI is: struct seccomp_data { int nr; __u32 arch; __u64 instruction_pointer; __u64 args[6]; }; Therefore, a simple cast to 'unsigned long *' for storing the value of the syscall argument via syscall_get_arguments() is just wrong as on 32-bit x86 (or any other 32bit arch), it would result in storing a0-a5 at wrong offsets in args[] member, and thus i) could leak stack memory to user space and ii) tampers with the logic of seccomp BPF programs that read out and check for syscall arguments: syscall_get_arguments(task, regs, 0, 1, (unsigned long *) &sd->args[0]); Tested on 32-bit x86 with Google Chrome, unfortunately only via remote test machine through slow ssh X forwarding, but it fixes the issue on my side. So fix it up by storing args in type correct variables, gcc is clever and optimizes the copy away in other cases, e.g. x86_64. Fixes: bd4cf0ed331a ("net: filter: rework/optimize internal BPF interpreter's instruction set") Reported-and-bisected-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Eric Paris <eparis@redhat.com> Cc: James Morris <james.l.morris@oracle.com> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-14Merge branch 'qlcnic'David S. Miller6-29/+43
Shahed Shaikh says: ==================== qlcnic: Bug fixes This patch series contains following bug fixes - * Send INIT_NIC_FUNC mailbox command as first mailbox * Fix a panic because of uninitialized delayed_work. * Fix inconsistent calculation of max rings count. * Fix PVID configuration issue. Driver needs to clear older PVID before adding new one. * Fix QLogic application/driver interface by packing vNIC information array. * Fix a crash when user tries to disable SR-IOV while VFs are still assigned to VMs. Please apply to net. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-14qlcnic: Do not disable SR-IOV when VFs are assigned to VMsManish Chopra1-0/+10
o While disabling SR-IOV when VFs are assigned to VMs causes host crash so return -EPERM when user request to disable SR-IOV using pci sysfs in case of VFs are assigned to VMs. Signed-off-by: Manish Chopra <manish.chopra@qlogic.com> Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-14qlcnic: Fix QLogic application/driver interface for virtual NIC configurationJitendra Kalsaria1-14/+17
o Application expect vNIC number as the array index but driver interface return configuration in array index form. o Pack the vNIC information array in the buffer such that application can access it using vNIC number as the array index. Signed-off-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com> Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-14qlcnic: Fix PVID configuration on eSwitch port.Jitendra Kalsaria1-0/+1
Clear older PVID before adding a newer PVID to the eSwicth port Signed-off-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com> Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-14qlcnic: Fix max ring count calculationShahed Shaikh2-8/+8
Do not read max rings count from qlcnic_get_nic_info(). Use driver defined values for 82xx adapters. In case of 83xx adapters, use minimum of firmware provided and driver defined values. Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-14qlcnic: Fix to send INIT_NIC_FUNC as first mailbox.Sucheta Chakraborty3-4/+5
o INIT_NIC_FUNC should be first mailbox sent. Sending DCB capability and parameter query commands after that command. Signed-off-by: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com> Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-14qlcnic: Fix panic due to uninitialzed delayed_work struct in use.Sucheta Chakraborty1-3/+2
o AEN event was being received before initializing delayed_work struct and handlers for it. This was resulting in crash. This patch fixes it. Signed-off-by: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com> Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-14Merge branch 'be2net'David S. Miller2-4/+14
Sathya Perla says: ==================== be2net: patch set Patch 1/2 is a v2 of a patch that was submitted earlier (as a part of a different patch-set). v2 incorporates a suggestion given by David Laight for how long to poll for pending TX completions while disabling a device. Patch 2/2 fixes a crash in be_remove()->be_close() path after be2net has aborted an EEH error recovery due to a permanant failure. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-14be2net: Fix invocation of be_close() after be_clear()Kalesh AP2-0/+9
In the EEH error recovery path, when a permanent failure occurs, we clean up adapter structure (i.e. destroy queues etc) by calling be_clear() and return PCI_ERS_RESULT_DISCONNECT. After this the stack tries to remove device from bus and calls be_remove() which invokes netdev_unregister()->be_close(). be_close() operating on destroyed queues results in a NULL dereference. This patch fixes this problem by introducing a flag to keep track of the setup state. Signed-off-by: Kalesh AP <kalesh.purayil@emulex.com> Signed-off-by: Sathya Perla <sathya.perla@emulex.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-14be2net: Fix to reap TX compls till HW doesn't respond for some timeVasundhara Volam1-4/+5
be_close() currently waits for a max of 200ms to receive all pending TX compls. This timeout value was roughly calculated based on 10G transmission speeds and the TX queue depth. This timeout may not be enough when the link is operating at lower speeds or in multi-channel/SR-IOV configs with TX-rate limiting setting. It is hard to calculate a "proper timeout value" that works in all configurations. This patch solves this problem by continuing to reap TX completions till the HW is completely silent for a period of 10ms or a HW error is detected. v2: implements the new scheme (as suggested by David Laight) instead of just waiting longer than 200ms for reaping all completions. Signed-off-by: Vasundhara Volam <vasundhara.volam@emulex.com> Signed-off-by: Sathya Perla <sathya.perla@emulex.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-14net/mlx4_core: Defer VF initialization till PF is fully initializedAmir Vadai1-1/+8
Fix in commit [1] is not sufficient since a deferred VF initialization could happen after pci_enable_sriov() is finished, but before the PF is fully initialized. Need to prevent VFs from initializing till the PF is fully ready and comm channel is operational. [1] - 9798935 "net/mlx4_core: mlx4_init_slave() shouldn't access comm channel before PF is ready" CC: Stuart Hayes <Stuart_Hayes@Dell.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-14bnx2: Don't build unused suspend/resume functions not enabledDaniel J Blueman1-1/+1
When CONFIG_PM_SLEEP isn't enabled, bnx2_suspend/resume are unused; don't build them when they aren't used. Signed-off-by: Daniel J Blueman <daniel@quora.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-14ipv6: Limit mtu to 65575 bytesEric Dumazet2-2/+8
Francois reported that setting big mtu on loopback device could prevent tcp sessions making progress. We do not support (yet ?) IPv6 Jumbograms and cook corrupted packets. We must limit the IPv6 MTU to (65535 + 40) bytes in theory. Tested: ifconfig lo mtu 70000 netperf -H ::1 Before patch : Throughput : 0.05 Mbits After patch : Throughput : 35484 Mbits Reported-by: Francois WELLENREITER <f.wellenreiter@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-14netfilter: nf_tables: fix nft_cmp_fast failure on big endian for size < 4Patrick McHardy3-3/+12
nft_cmp_fast is used for equality comparisions of size <= 4. For comparisions of size < 4 byte a mask is calculated that is applied to both the data from userspace (during initialization) and the register value (during runtime). Both values are stored using (in effect) memcpy to a memory area that is then interpreted as u32 by nft_cmp_fast. This works fine on little endian since smaller types have the same base address, however on big endian this is not true and the smaller types are interpreted as a big number with trailing zero bytes. The mask therefore must not include the lower bytes, but the higher bytes on big endian. Add a helper function that does a cpu_to_le32 to switch the bytes on big endian. Since we're dealing with a mask of just consequitive bits, this works out fine. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-04-14netfilter: nf_conntrack: initialize net.ct.generationAndrey Vagin1-0/+1
[ 251.920788] INFO: trying to register non-static key. [ 251.921386] the code is fine but needs lockdep annotation. [ 251.921386] turning off the locking correctness validator. [ 251.921386] CPU: 2 PID: 15715 Comm: socket_listen Not tainted 3.14.0+ #294 [ 251.921386] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 251.921386] 0000000000000000 000000009d18c210 ffff880075f039b8 ffffffff816b7ecd [ 251.921386] ffffffff822c3b10 ffff880075f039c8 ffffffff816b36f4 ffff880075f03aa0 [ 251.921386] ffffffff810c65ff ffffffff810c4a85 00000000fffffe01 ffffffffa0075172 [ 251.921386] Call Trace: [ 251.921386] [<ffffffff816b7ecd>] dump_stack+0x45/0x56 [ 251.921386] [<ffffffff816b36f4>] register_lock_class.part.24+0x38/0x3c [ 251.921386] [<ffffffff810c65ff>] __lock_acquire+0x168f/0x1b40 [ 251.921386] [<ffffffff810c4a85>] ? trace_hardirqs_on_caller+0x105/0x1d0 [ 251.921386] [<ffffffffa0075172>] ? nf_nat_setup_info+0x252/0x3a0 [nf_nat] [ 251.921386] [<ffffffff816c1215>] ? _raw_spin_unlock_bh+0x35/0x40 [ 251.921386] [<ffffffffa0075172>] ? nf_nat_setup_info+0x252/0x3a0 [nf_nat] [ 251.921386] [<ffffffff810c7272>] lock_acquire+0xa2/0x120 [ 251.921386] [<ffffffffa008ab90>] ? ipv4_confirm+0x90/0xf0 [nf_conntrack_ipv4] [ 251.921386] [<ffffffffa0055989>] __nf_conntrack_confirm+0x129/0x410 [nf_conntrack] [ 251.921386] [<ffffffffa008ab90>] ? ipv4_confirm+0x90/0xf0 [nf_conntrack_ipv4] [ 251.921386] [<ffffffffa008ab90>] ipv4_confirm+0x90/0xf0 [nf_conntrack_ipv4] [ 251.921386] [<ffffffff815e7b00>] ? ip_fragment+0x9f0/0x9f0 [ 251.921386] [<ffffffff815d8c5a>] nf_iterate+0xaa/0xc0 [ 251.921386] [<ffffffff815e7b00>] ? ip_fragment+0x9f0/0x9f0 [ 251.921386] [<ffffffff815d8d14>] nf_hook_slow+0xa4/0x190 [ 251.921386] [<ffffffff815e7b00>] ? ip_fragment+0x9f0/0x9f0 [ 251.921386] [<ffffffff815e98f2>] ip_output+0x92/0x100 [ 251.921386] [<ffffffff815e8df9>] ip_local_out+0x29/0x90 [ 251.921386] [<ffffffff815e9240>] ip_queue_xmit+0x170/0x4c0 [ 251.921386] [<ffffffff815e90d5>] ? ip_queue_xmit+0x5/0x4c0 [ 251.921386] [<ffffffff81601208>] tcp_transmit_skb+0x498/0x960 [ 251.921386] [<ffffffff81602d82>] tcp_connect+0x812/0x960 [ 251.921386] [<ffffffff810e3dc5>] ? ktime_get_real+0x25/0x70 [ 251.921386] [<ffffffff8159ea2a>] ? secure_tcp_sequence_number+0x6a/0xc0 [ 251.921386] [<ffffffff81606f57>] tcp_v4_connect+0x317/0x470 [ 251.921386] [<ffffffff8161f645>] __inet_stream_connect+0xb5/0x330 [ 251.921386] [<ffffffff8158dfc3>] ? lock_sock_nested+0x33/0xa0 [ 251.921386] [<ffffffff810c4b5d>] ? trace_hardirqs_on+0xd/0x10 [ 251.921386] [<ffffffff81078885>] ? __local_bh_enable_ip+0x75/0xe0 [ 251.921386] [<ffffffff8161f8f8>] inet_stream_connect+0x38/0x50 [ 251.921386] [<ffffffff8158b157>] SYSC_connect+0xe7/0x120 [ 251.921386] [<ffffffff810e3789>] ? current_kernel_time+0x69/0xd0 [ 251.921386] [<ffffffff810c4a85>] ? trace_hardirqs_on_caller+0x105/0x1d0 [ 251.921386] [<ffffffff810c4b5d>] ? trace_hardirqs_on+0xd/0x10 [ 251.921386] [<ffffffff8158c36e>] SyS_connect+0xe/0x10 [ 251.921386] [<ffffffff816caf69>] system_call_fastpath+0x16/0x1b [ 312.014104] INFO: rcu_sched detected stalls on CPUs/tasks: {} (detected by 0, t=60003 jiffies, g=42359, c=42358, q=333) [ 312.015097] INFO: Stall ended before state dump start Fixes: 93bb0ceb75be ("netfilter: conntrack: remove central spinlock nf_conntrack_lock") Cc: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: Patrick McHardy <kaber@trash.net> Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>