summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2016-07-02Merge branch 'mlx5-next'David S. Miller20-331/+1840
Saeed Mahameed says: ==================== Mellanox 100G SRIOV E-Switch offload and VF representors We are happy to announce SRIOV E-Switch offload and VF netdev representors. Or Gerlitz says: Currently, the way SR-IOV embedded switches are dealt with in Linux is limited in its expressiveness and flexibility, but this is not necessarily due to hardware limitations. The kernel software model for controlling the SR-IOV switch simply does not allow the configuration of anything more complex than MAC/VLAN based forwarding. Hence the benefits brought by SRIOV come at a price of management flexibility, when compared to software virtual switches which are used in Para-Virtual (PV) schemes and allow implementing complex policies and virtual topologies. Such SW switching typically involved a complex per-packet processing within the host kernel using subsystems such as TC, Bridge, Netfilter and Open-vswitch. We'd like to change that and get the best of both worlds: the performance of SR-IOV with the management flexibility of software switches. This will eventually include a richer model for controlling the SR-IOV switch for flow-based switching and tunneling. Under this model, the e-switch is configured dynamically and a fallback to software exists in case the hardware is unable to offload all required flows. This series from Hadar Hen-Zion and myself, is the 1st step in that direction, specfically, it provides full control on the SRIOV embedded switching by host software and paves the way to offload switching rules and polices with downstream patches. To allow for host based SW control on the SRIOV HW switch, we introduce per VF representor host netdevice. The VF representor plays the same role as TAP devices in PV setup. A packet send through the VF representor on the host arrives to the VF, and a packet sent through the VF is received by its representor. The administrator can hook the representor netdev into a kernel switching component. Once they do that, packets from the VF are subject to steering (matching and actions) of that software component." Doing so indeed hurts the performance benefits of SRIOV as it forces all the traffic to go through the hypervisor. However, this SW representation is what would eventually allow us to introduce hybrid model, where we offload steering for some of the VF/VM traffic to the HW while keeping other VM traffic to go through the hypervisor. Examples for the latter are first packet of flows which are needed for SW switches learning and/or matching against policy database or types of traffic for which offloading is not desired or not supported by the current HW eswitch generation. The embedded switch is managed through a PCI device driver. As such, we introduce a devlink/pci based scheme for setting the mode of the e-switch. The current mode (where steering is done based on mac/vlan, etc) is referred to as "legacy" and the new mode as "offloads". For the mlx5 driver / ConnectX4 HW case, the VF representors implement a functional subset of mlx5e Ethernet netdevices using their own profile. This design buys us robust implementation with code reuse and sharing. The representors are created by the host PCI driver when (1) in SRIOV and (2) the e-switch is set to offloads mode. Currently, in mlx5 the e-switch management is done through the PF vport (0) and hence the VF representors along with the existing PF netdev which represents the uplink share the PCI PF device instance. The series is built from two major components, the first relates to the e-switch management and the second to VF representors. We start with a refactoring that treats the existing SRIOV e-switch code as of operating in legacy mode. Next, we add the code for the offloads mode which programs the e-switch to operate in a way which serves for software based switching: 1. miss rule which matches all packets that do not match any HW other switching rule and forwards them to the e-switch management port (0) for further processing. 2. infrastructure for send-to-vport rules which conceptually bypass other "normal" steering rules which present at the e-switch datapath. Such rules apply only for packets that originate in the e-switch manager vport (0). Since all the VF reps run over the same e-switch port, we use more logic in the host PCI driver to do HW steering of missed packets into the HW queue opened by a the respective VF representor. Finally here, we add the devlink APIs to configure the e-switch mode. The second part from Hadar starts with some refactoring work which allow for multiple mlx5e NIC instances to be created over the same PCI function, use common resources and avoid wrong loopbacks. Next comes the heart of the change which is a profile definition which allow to practically have both "conventional" mlx5e NIC use cases such as native mode (non SRIOV), VF, PF and VF representor to share the Ethernet driver code. This is done by a small surgery that ended up with few internal callbacks that should be implemented by a profile instance. The profile for the conventional NIC is implemented, to preserve the existing functionality. The last two patches add e-switch registration API for the VF representors and the implementation of the VF representors netdevice profile. Being an mlx5e instance, the VF representor uses HW send/recv queues, completions queues and such. It currently doesn't support NIC offloads but some of them could be added later on. The VF representor has switchdev ops, where currently the only supported API is the one to the HW ID, which is needed to identify multiple representors belonging to the same e-switch. The architecture + solution (software and firmware) work were done by a team consisting of Ilya Lesokhin, Haggai Eran, Rony Efraim, Tal Anker, Natan Oppenheimer, Saeed Mahameed, Hadar and Or, thanks you all! v1 --> v2 fixes: * removed unneeded variable (patch #3) * removed unused value DEVLINK_ESWITCH_MODE_NONE (patch #8) * changed the devlink mode name from "offloads" to "switchdev" which better describes what are we referring here, using a known concept (patch #8) * correctly refer to devlink e-switch modes (patch #10) * use the correct mlx5e way to define the VF rep statistics (patch #16) v2 --> v3 fixes: * Rebased on top 6fde0e63eccb 'be2net: signedness bug in be_msix_enable()' * Handled compilation error introduced by rebase on top "f5074d0ce2f8 Merge branch 'mlx5-100G-fixes'" * This series applies perfectly even with 'mlx5 resiliency and xmit path fixes' merged to net-next ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-02net/mlx5e: Introduce SRIOV VF representorsHadar Hen Zion6-19/+574
Implement the relevant profile functions to create mlx5e driver instance serving as VF representor. When SRIOV offloads mode is enabled, each VF will have a representor netdevice instance on the host. To do that, we also export set of shared service functions from en_main.c, such that they can be used by both NIC and repsresentors netdevs. The newly created representor netdevice has a basic set of net_device_ops which are the same ndo functions as the NIC netdevice and an ndo of it's own for phys port name. The profiling infrastructure allow sharing code between the NIC and the vport representor even though the representor has only a subset of the NIC functionality. The VF reps and the PF which is used in that mode to represent the uplink, expose switchdev ops. Currently the only op supposed is attr get for the port parent ID which here serves to identify net-devices belonging to the same HW E-Switch. Other than that, no offloading is implemented and hence switching functionality is achieved if one sets SW switching rules, e.g using tc, bridge or ovs. Port phys name (ndo_get_phys_port_name) is implemented to allow exporting to user-space the VF vport number and along with the switchdev port parent id (phys_switch_id) enable a udev base consistent naming scheme: SUBSYSTEM=="net", ACTION=="add", ATTR{phys_switch_id}=="<phys_switch_id>", \ ATTR{phys_port_name}!="", NAME="$PF_NIC$attr{phys_port_name}" where phys_switch_id is exposed by the PF (and VF reps) and $PF_NIC is the name of the PF netdevice. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-02net/mlx5: Add Representors registration APIHadar Hen Zion5-7/+97
Introduce E-Switch registration/unregister representors functions. Those functions are called by the mlx5e driver when the PF NIC is created upon pci probe action regardless of the E-Switch mode (NONE, LEGACY or OFFLOADS). Adding basic E-Switch database that will hold the vport represntors upon creation. This patch doesn't add any new functionality. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-02net/mlx5e: Add support for multiple profilesHadar Hen Zion2-118/+240
To allow support in representor netdevices where we create more than one netdevice per NIC, add profiles to the mlx5e driver. The profiling allows for creation of mlx5e instances with different characteristics. Each profile implements its own behavior using set of function pointers defined in struct mlx5e_profile. This is done to allow for avoiding complex per profix branching in the code. Currently only the profile for the conventional NIC is implemented, which is of use when a netdev is created upon pci probe. This patch doesn't add any new functionality. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-02net/mlx5e: Mark enabled RQTs instances explicitlyHadar Hen Zion3-23/+37
In the current driver implementation two types of receive queue tables (RQTs) are in use - direct and indirect. Change the driver to mark each new created RQT (direct or indirect) as "enabled". This behaviour is needed for introducing new mlx5e instances which serve to represent SRIOV VFs. The VF representors will have only one type of RQTs (direct). An "enabled" flag is added to each RQT to allow better handling and code sharing between the representors and the nic netdevices. This patch doesn't add any new functionality. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-02net/mlx5e: TIRs management refactoringHadar Hen Zion6-57/+77
The current refresh tirs self loopback mechanism, refreshes all the tirs belonging to the same mlx5e instance to prevent self loopback by packets sent over any ring of that instance. This mechanism relies on all the tirs/tises of an instance to be created with the same transport domain number (tdn). Change the driver to refresh all the tirs created under the same tdn regardless of which mlx5e netdev instance they belong to. This behaviour is needed for introducing new mlx5e instances which serve to represent SRIOV VFs. The representors and the PF share vport used for E-Switch management, and we want to avoid NIC level HW loopback between them, e.g when sending broadcast packets. To achieve that, both the representors and the PF NIC will share the tdn. This patch doesn't add any new functionality. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-02net/mlx5e: Create NIC global resources only onceHadar Hen Zion5-90/+171
To allow creating more than one netdev over the same PCI function, we change the driver such that global NIC resources are created once and later be shared amongst all the mlx5e netdevs running over that port. Move the CQ UAR, PD (pdn), Transport Domain (tdn), MKey resources from being kept in the mlx5e priv part to a new resources structure (mlx5e_resources) placed under the mlx5_core device. This patch doesn't add any new functionality. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-02net/mlx5e: Add devlink based SRIOV mode changesOr Gerlitz2-9/+124
Implement handlers for the devlink commands to get and set the SRIOV E-Switch mode. When turning to the switchdev/offloads mode, we disable the e-switch and enable it again in the new mode, create the NIC offloads table and create VF reps. When turning to legacy mode, we remove the VF reps and the offloads table, and re-initiate the e-switch in it's legacy mode. The actual creation/removal of the VF reps is done in downstream patches. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-02net/mlx5: Add devlink interfaceOr Gerlitz4-4/+37
The devlink interface is initially used to set/get the mode of the SRIOV e-switch. Currently, these are only stubs for get/set, down-stream patch will actually fill them out. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-02net/devlink: Add E-Switch mode controlOr Gerlitz3-0/+98
Add the commands to set and show the mode of SRIOV E-Switch, two modes are supported: * legacy: operating in the "old" L2 based mode (DMAC --> VF vport) * switchdev: the E-Switch is referred to as whitebox switch configured using standard tools such as tc, bridge, openvswitch etc. To allow working with the tools, for each VF, a VF representor netdevice is created by the E-Switch manager vendor device driver instance (e.g PF). Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-02net/mlx5: E-Switch, Add API to create vport rx rulesOr Gerlitz2-0/+89
Add the API to create vport rx rules of the form packet meta-data :: vport == $VPORT --> $TIR where the TIR is opened by this VF representor. This logic will by used for packets that didn't match any rule in the e-switch datapath and should be received into the host OS through the netdevice that represents the VF they were sent from. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-02net/mlx5: E-Switch, Add offloads tableOr Gerlitz2-0/+36
Belongs to the NIC offloads name-space, and to be used as part of the SRIOV offloads logic to steer packets that hit the e-switch miss rule to the TIR of the relevant VF representor. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-02net/mlx5: Introduce offloads steering namespaceOr Gerlitz2-1/+11
Add a new namespace (MLX5_FLOW_NAMESPACE_OFFLOADS) to be populated with flow steering rules that deal with rules that have have to be executed before the EN NIC steering rules are matched. The namespace is located after the bypass name-space and before the kernel name-space. Therefore, it precedes the HW processing done for rules set for the kernel NIC name-space. Under SRIOV, it would allow us to match on e-switch missed packet and forward them to the relevant VF representor TIR. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Amir Vadai <amir@vadai.me> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-02net/mlx5: E-Switch, Add API to create send-to-vport rulesOr Gerlitz2-1/+41
Add the API to create send-to-vport e-switch rules of the form packet meta-data :: send-queue-number == $SQN and source-vport == 0 --> $VPORT These rules are to be used for a send-to-vport logic which conceptually bypasses the "normal" steering rules currently present at the e-switch datapath. Such rule should apply only for packets that originate in the e-switch manager vport (0) and are sent for a given SQN which is used by a given VF representor device, and hence the matching logic. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-02net/mlx5: E-Switch, Add miss rule for offloads modeOr Gerlitz2-0/+41
In the sriov offloads mode, packets that are not matched by any other rule should be sent towards the e-switch manager for further processing. Add such "miss" rule which matches ANY packet as the last rule in the e-switch FDB and programs the HW to send the packet to vport 0 where the e-switch manager runs. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-02net/mlx5: E-Switch, Add support for the sriov offloads modeOr Gerlitz4-20/+168
Unlike the legacy mode, here, forwarding rules are not learned by the driver per events on macs set by VFs/VMs into their vports, but rather should be programmed by higher-level SW entities. Saying that, still, in the offloads mode (SRIOV_OFFLOADS), two flow groups are created by the driver for management (slow path) purposes: The first group will be used for sending packets over e-switch vports from the host OS where the e-switch management code runs, to be received by VFs. The second group will be used by a miss rule which forwards packets toward the e-switch manager. Further logic will trap these packets such that the receiving net-device as seen by the networking stack is the representor of the vport that sent the packet over the e-switch data-path. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-02net/mlx5: E-Switch, Add operational mode to the SRIOV e-SwitchOr Gerlitz3-29/+46
Define three modes for the SRIOV e-switch operation, none (SRIOV_NONE, none of the VF vports are enabled), legacy (SRIOV_LEGACY, the current mode) and sriov offloads (SRIOV_OFFLOADS). Currently, when in SRIOV, only the legacy mode is supported, where steering rules are of the form: destination mac --> VF vport This patch does not change any functionality. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-02net: stmmac: Fix null-function call in ISR on stmmac1000Matt Corallo1-1/+1
(resent due to overhelpful mail client corrupting patch) At least on Meson GXBB, the CORE_IRQ_MTL_RX_OVERFLOW interrupt is thrown with the stmmac1000 driver, which does not support set_rx_tail_ptr. With this patch and the clock fixes, 1G ethernet works on ODROID-C2. Signed-off-by: Matt Corallo <git@bluematt.me> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-02Merge tag 'drm-fixes-for-v4.7-rc6' of ↵Linus Torvalds12-78/+160
git://people.freedesktop.org/~airlied/linux Pull drm fixes frlm Dave Airlie: "Just some AMD and Intel fixes, the AMD ones are further production Polaris fixes, and the Intel ones fix some early timeouts, some PCI ID changes and a couple of other fixes. Still a bit Internet challenged here, hopefully end of next week will solve it" * tag 'drm-fixes-for-v4.7-rc6' of git://people.freedesktop.org/~airlied/linux: drm/i915: Fix missing unlock on error in i915_ppgtt_info() drm/amd/powerplay: workaround for UVD clock issue drm/amdgpu: add ACLK_CNTL setting for polaris10 drm/amd/powerplay: fix issue uvd dpm can't enabled on Polaris11. drm/amd/powerplay: Workaround for Memory EDC Error on Polaris10. drm/i915: Removing PCI IDs that are no longer listed as Kabylake. drm/i915: Add more Kabylake PCI IDs. drm/i915: Avoid early timeout during AUX transfers drm/i915/hsw: Avoid early timeout during LCPLL disable/restore drm/i915/lpt: Avoid early timeout during FDI PHY reset drm/i915/bxt: Avoid early timeout during PLL enable drm/i915: Refresh cached DP port register value on resume drm/amd/powerplay: Update CKS on/ CKS off voltage offset calculation drm/amd/powerplay: disable FFC. drm/amd/powerplay: add some definition for FFC feature on polaris.
2016-07-02Merge tag 'spi-fix-v4.7-rc5' of ↵Linus Torvalds4-6/+38
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi Pull spi fixes from Mark Brown: "A few small driver-specific fixes for SPI, all in the normal important if you hit them category especially the rockchip driver fix which addresses a race which has been exposed more frequently with some recent performance improvements" * tag 'spi-fix-v4.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: spi: sunxi: fix transfer timeout spi: sun4i: fix FIFO limit spi: rockchip: Signal unfinished DMA transfers spi: spi-ti-qspi: Suspend the queue before removing the device
2016-07-02Merge tag 'regulator-fix-v4.7-rc5' of ↵Linus Torvalds2-2/+7
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator Pull regulator fixes from Mark Brown: "Two small fixes for the regulator subsystem - one fixing a crash with one of the devices supported by the max77620 driver, another fixing startup for the anatop regulator when it starts up with the regulator in bypass mode" * tag 'regulator-fix-v4.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: regulator: max77620: check for valid regulator info regulator: anatop: allow regulator to be in bypass mode
2016-07-02Merge tag 'clk-fixes-for-linus' of ↵Linus Torvalds4-17/+11
git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux Pull clk fixes from Stephen Boyd: "A small fix for the newly added oxnas clk driver and a handful of rockchip clk driver fixes for newly added rk3399 support" * tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: clk: Fix return value check in oxnas_stdclk_probe() clk: rockchip: release io resource when failing to init clk on rk3399 clk: rockchip: fix cpuclk registration error handling clk: rockchip: Revert "clk: rockchip: reset init state before mmc card initialization" clk: rockchip: fix incorrect parent for rk3399's {c,g}pll_aclk_perihp_src clk: rockchip: mark rk3399 GIC clocks as critical clk: rockchip: initialize flags of clk_init_data in mmc-phase clock
2016-07-02Merge tag 'asoc-fix-v4.7-rc5' of ↵Takashi Iwai992-5881/+9521
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus ASoC: Fixes for v4.7 A small clutch of hardware specific fixes for various ASoC devices, all small individually and important if you have that device but not otherwise.
2016-07-02Merge tag 'drm-intel-fixes-2016-06-30' of ↵Dave Airlie5-22/+22
git://anongit.freedesktop.org/drm-intel into drm-fixes here's a batch of i915 fixes for 4.7. * tag 'drm-intel-fixes-2016-06-30' of git://anongit.freedesktop.org/drm-intel: drm/i915: Fix missing unlock on error in i915_ppgtt_info() drm/i915: Removing PCI IDs that are no longer listed as Kabylake. drm/i915: Add more Kabylake PCI IDs. drm/i915: Avoid early timeout during AUX transfers drm/i915/hsw: Avoid early timeout during LCPLL disable/restore drm/i915/lpt: Avoid early timeout during FDI PHY reset drm/i915/bxt: Avoid early timeout during PLL enable drm/i915: Refresh cached DP port register value on resume
2016-07-02Merge branch 'drm-fixes-4.7' of git://people.freedesktop.org/~agd5f/linux ↵Dave Airlie7-56/+138
into drm-fixes Just a few more late fixes for Polaris cards. * 'drm-fixes-4.7' of git://people.freedesktop.org/~agd5f/linux: drm/amd/powerplay: workaround for UVD clock issue drm/amdgpu: add ACLK_CNTL setting for polaris10 drm/amd/powerplay: fix issue uvd dpm can't enabled on Polaris11. drm/amd/powerplay: Workaround for Memory EDC Error on Polaris10. drm/amd/powerplay: Update CKS on/ CKS off voltage offset calculation drm/amd/powerplay: disable FFC. drm/amd/powerplay: add some definition for FFC feature on polaris.
2016-07-02MIPS: Fix possible corruption of cache mode by mprotect.Ralf Baechle1-4/+6
The following testcase may result in a page table entries with a invalid CCA field being generated: static void *bindstack; static int sysrqfd; static void protect_low(int protect) { mprotect(bindstack, BINDSTACK_SIZE, protect); } static void sigbus_handler(int signal, siginfo_t * info, void *context) { void *addr = info->si_addr; write(sysrqfd, "x", 1); printf("sigbus, fault address %p (should not happen, but might)\n", addr); abort(); } static void run_bind_test(void) { unsigned int *p = bindstack; p[0] = 0xf001f001; write(sysrqfd, "x", 1); /* Set trap on access to p[0] */ protect_low(PROT_NONE); write(sysrqfd, "x", 1); /* Clear trap on access to p[0] */ protect_low(PROT_READ | PROT_WRITE | PROT_EXEC); write(sysrqfd, "x", 1); /* Check the contents of p[0] */ if (p[0] != 0xf001f001) { write(sysrqfd, "x", 1); /* Reached, but shouldn't be */ printf("badness, shouldn't happen but does\n"); abort(); } } int main(void) { struct sigaction sa; sysrqfd = open("/proc/sysrq-trigger", O_WRONLY); if (sigprocmask(SIG_BLOCK, NULL, &sa.sa_mask)) { perror("sigprocmask"); return 0; } sa.sa_sigaction = sigbus_handler; sa.sa_flags = SA_SIGINFO | SA_NODEFER | SA_RESTART; if (sigaction(SIGBUS, &sa, NULL)) { perror("sigaction"); return 0; } bindstack = mmap(NULL, BINDSTACK_SIZE, PROT_READ | PROT_WRITE | PROT_EXEC, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); if (bindstack == MAP_FAILED) { perror("mmap bindstack"); return 0; } printf("bindstack: %p\n", bindstack); run_bind_test(); printf("done\n"); return 0; } There are multiple ingredients for this: 1) PAGE_NONE is defined to _CACHE_CACHABLE_NONCOHERENT, which is CCA 3 on all platforms except SB1 where it's CCA 5. 2) _page_cachable_default must have bits set which are not set _CACHE_CACHABLE_NONCOHERENT. 3) Either the defective version of pte_modify for XPA or the standard version must be in used. However pte_modify for the 36 bit address space support is no affected. In that case additional bits in the final CCA mode may generate an invalid value for the CCA field. On the R10000 system where this was tracked down for example a CCA 7 has been observed, which is Uncached Accelerated. Fixed by: 1) Using the proper CCA mode for PAGE_NONE just like for all the other PAGE_* pte/pmd bits. 2) Fix the two affected variants of pte_modify. Further code inspection also shows the same issue to exist in pmd_modify which would affect huge page systems. Issue in pte_modify tracked down by Alastair Bridgewater, PAGE_NONE and pmd_modify issue found by me. The history of this goes back beyond Linus' git history. Chris Dearman's commit 351336929ccf222ae38ff0cb7a8dd5fd5c6236a0 ("[MIPS] Allow setting of the cache attribute at run time.") missed the opportunity to fix this but it was originally introduced in lmo commit d523832cf12007b3242e50bb77d0c9e63e0b6518 ("Missing from last commit.") and 32cc38229ac7538f2346918a09e75413e8861f87 ("New configuration option CONFIG_MIPS_UNCACHED.") Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Reported-by: Alastair Bridgewater <alastair.bridgewater@gmail.com>
2016-07-02Merge tag 'acpi-4.7-rc6' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI fix from Rafael Wysocki: "Fix an expression in the ACPI PCI IRQ management code added by a recent commit that overlooked missing parens in it, so the result of the computation is incorrect in some cases (Sinan Kaya)" * tag 'acpi-4.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPI,PCI,IRQ: correct operator precedence
2016-07-02Merge tag 'pm-4.7-rc6' of ↵Linus Torvalds3-5/+11
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fixes from Rafael Wysocki: "Three cpufreq fixes, one in the core (stable-candidate) and two in drivers (intel_pstate and cpufreq-dt). Specifics: - Fix a recent intel_pstate regression that caused the number of wakeups to increase significantly on an idle system in some cases due to excessive synchronize_sched() invocations (Rafael Wysocki). - Fix unnecessary invocations of WARN_ON() in the cpufreq core after cpufreq has been suspended introduced during the 4.6 cycla (Rafael Wysocki). - Fix an error code path in the cpufreq-dt-platdev driver that forgets to drop a reference to a DT node (Masahiro Yamada)" * tag 'pm-4.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: cpufreq: Avoid false-positive WARN_ON()s in cpufreq_update_policy() cpufreq: dt: call of_node_put() before error out intel_pstate: Do not clear utilization update hooks on policy changes
2016-07-02Merge branch 'for-linus' of ↵Linus Torvalds4-48/+78
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs fixes from Al Viro: "Tmpfs readdir throughput regression fix (this cycle) + some -stable fodder all over the place. One missing bit is Miklos' tonight locks.c fix - NFS folks had already grabbed that one by the time I woke up ;-)" [ The locks.c fix came through the nfsd tree just moments ago ] * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: namespace: update event counter when umounting a deleted dentry 9p: use file_dentry() ceph: fix d_obtain_alias() misuses lockless next_positive() libfs.c: new helper - next_positive() dcache_{readdir,dir_lseek}(): don't bother with nested ->d_lock
2016-07-02Merge tag 'nfsd-4.7-3' of git://linux-nfs.org/~bfields/linuxLinus Torvalds2-4/+11
Pull lockd/locks fixes from Bruce Fields: "One fix for lockd soft lookups in an error path, and one fix for file leases on overlayfs" * tag 'nfsd-4.7-3' of git://linux-nfs.org/~bfields/linux: locks: use file_inode() lockd: unregister notifier blocks if the service fails to come up completely
2016-07-02Merge tag 'mfd-fixes-4.7.1' of ↵Linus Torvalds3-3/+5
git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd Pull more MFD fixes from Lee Jones: "Apologies for missing these from the first pull request. Final patches fixing Reset API change" * tag 'mfd-fixes-4.7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd: usb: dwc3: st: Use explicit reset_control_get_exclusive() API phy: phy-stih407-usb: Use explicit reset_control_get_exclusive() API phy: miphy28lp: Inform the reset framework that our reset line may be shared
2016-07-02Merge branch 'libnvdimm-fixes' of ↵Linus Torvalds5-26/+60
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm fixes from Dan Williams: "1/ Two regression fixes since v4.6: one for the byte order of a sysfs attribute (bz121161) and another for QEMU 2.6's NVDIMM _DSM (ACPI Device Specific Method) implementation that gets tripped up by new auto-probing behavior in the NFIT driver. 2/ A fix tagged for -stable that stops the kernel from clobbering/ignoring changes to the configuration of a 'pfn' instance ("struct page" driver). For example changing the alignment from 2M to 1G may silently revert to 2M if that value is currently stored on media. 3/ A fix from Eric for an xfstests failure in dax. It is not currently tagged for -stable since it requires an 8-exabyte file system to trigger, and there appear to be no user visible side effects" * 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: nfit: fix format interface code byte order dax: fix offset overflow in dax_io acpi, nfit: fix acpi_check_dsm() vs zero functions implemented libnvdimm, pfn, dax: fix initialization vs autodetect for mode + alignment
2016-07-02Merge tag 'batadv-next-for-davem-20160701' of ↵David S. Miller26-192/+704
git://git.open-mesh.org/linux-merge Simon Wunderlich says: ==================== This feature patchset includes the following changes: - two patches with minimal clean up work by Antonio Quartulli and Simon Wunderlich - eight patches of B.A.T.M.A.N. V, API and documentation clean up work, by Antonio Quartulli and Marek Lindner - Andrew Lunn fixed the skb priority adoption when forwarding fragmented packets (two patches) - Multicast optimization support is now enabled for bridges which comes with some protocol updates, by Linus Luessing ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-01Merge branch 'hns-next'David S. Miller10-69/+158
Yisen Zhuang says: ==================== net: hns: fix the typo of hns This series includes typo fixes which review by Andy, adding the hns maintainer to MAINTAINERS, as below: > from Daode: adds the maintainer for hns driver; > from Daode: fix the typo of hns reviewed by Andy Shevchenko; > from Kejian: one remove redundant function and two fix to get configuration from DT. changlog: v2 -> v3: match all files in and below drivers/net/ethernet/hisilicon/ v1 -> v2: fix the indentations reviewed by David. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-01net: hns: get reset registers from DTKejian Yan1-14/+66
Since the registers of subctrl may be different, it is better to mv the registers from hns mdio driver routine to device tree node. Signed-off-by: Kejian Yan <yankejian@huawei.com> Signed-off-by: Yisen Zhuang <Yisen.Zhuang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-01net: hns: add media-type property for hnsKejian Yan5-3/+48
It is PORT_TP type if the service port is GE mode. It is wrong to judge the port type by using if it is service port. Adding the media type to know port type. Reported-by: Jinchuan Tian <tianjinchuan1@huawei.com> Signed-off-by: Kejian Yan <yankejian@huawei.com> Signed-off-by: Yisen Zhuang <Yisen.Zhuang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-01net: hns: remove redundant hns_mac_dev_to_enet_if()Kejian Yan1-15/+0
The sequence of hns_mac_dev_to_enet_if() is the same as hns_get_enet_interface(), and hns_get_enet_interface() is called by initialization to get the mac mode. And the mode is not changed anywhere. Thus add hns_mac_dev_to_enet_if() function to get the mac mode is obviously redundant. Reported-by: Jinchuan Tian <tianjinchuan1@huawei.com> Signed-off-by: Kejian Yan <yankejian@huawei.com> Signed-off-by: Yisen Zhuang <Yisen.Zhuang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-01net: hns: normalize two different loopDaode Huang1-9/+9
There are two approaches to assign data, one does 2 loops, another does 1 loop. This patch normalize the different methods to 1 loop. Signed-off-by: Daode Huang <huangdaode@hisilicon.com> Signed-off-by: Yisen Zhuang <Yisen.Zhuang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-01net: hns: add a space before "*/"Daode Huang1-2/+2
In comment line, some time miss a space before */, so this patch adds a space before */. Signed-off-by: Daode Huang <huangdaode@hisilicon.com> Signed-off-by: Yisen Zhuang <Yisen.Zhuang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-01net: hns: delete redundant parentheseDaode Huang1-2/+2
According to the previous review comments from Andy, this patch deletes the redundant parens in the patch. Signed-off-by: Daode Huang <huangdaode@hisilicon.com> Signed-off-by: Yisen Zhuang <Yisen.Zhuang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-01net: hns: change code style from a = a + x to a += xDaode Huang1-16/+16
This patch fixes the code style in hns driver. Change it from "buff = buff + xxx" to "buff += xxx". The reveiw comments is from andy. Reviewed-by: Andriy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Daode Huang <huangdaode@hisilicon.com> Signed-off-by: Yisen Zhuang <Yisen.Zhuang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-01net: hns: fix code style about hns driverDaode Huang1-9/+7
This patch fixes code sytle of hns driver to make it simple. Signed-off-by: Daode Huang <huangdaode@hisilicon.com> Signed-off-by: Yisen Zhuang <Yisen.Zhuang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-01MAINTAINERS: add maintainers for hns driverDaode Huang1-0/+9
This patch adds maintainers for hisilicon network subsystem driver Signed-off-by: Daode Huang <huangdaode@hisilicon.com> Signed-off-by: Yisen Zhuang <Yisen.Zhuang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-01tipc: fix nl compat regression for link statisticsRichard Alpe1-1/+1
Fix incorrect use of nla_strlcpy() where the first NLA_HDRLEN bytes of the link name where left out. Making the output of tipc-config -ls look something like: Link statistics: dcast-link 1:data0-1.1.2:data0 1:data0-1.1.3:data0 Also, for the record, the patch that introduce this regression claims "Sending the whole object out can cause a leak". Which isn't very likely as this is a compat layer, where the data we are parsing is generated by us and we know the string to be NULL terminated. But you can of course never be to secure. Fixes: 5d2be1422e02 (tipc: fix an infoleak in tipc_nl_compat_link_dump) Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-01Merge branch 'rds-multipath-datastructures'David S. Miller17-174/+211
Sowmini Varadhan says: ==================== RDS:TCP data structure changes for multipath support The second installment of changes to enable multipath support in RDS-TCP. This series implements the changes in rds-tcp so that the rds_conn_path has a pointer to the rds_tcp_connection in cp_transport_data. Struct rds_tcp_connection keeps track of the inet_sk per path in t_sock. The ->sk_user_data in turn is a pointer to the rds_conn_path. With this set of changes, rds_tcp has the needed plumbing to handle multiple paths(socket) per rds_connection. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-01RDS: Do not send a pong to an incoming ping with 0 src portSowmini Varadhan1-0/+4
RDS ping messages are sent with a non-zero src port to a zero dst port, so that the rds pong messages can be sent back to the originators src port. However if a confused/malicious sender sends a ping with a 0 src port, we'd have an infinite ping-pong loop. To avoid this, the receiver should ignore ping messages with a 0 src port. Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-01RDS: TCP: Simplify reconnect to avoid duelling reconnnect attemptsSowmini Varadhan2-3/+6
When reconnecting, the peer with the smaller IP address will initiate the reconnect, to avoid needless duelling SYN issues. Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-01RDS: TCP: Hooks to set up a single connection pathSowmini Varadhan9-17/+20
This patch adds ->conn_path_connect callbacks in the rds_transport that are used to set up a single connection path. Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-01RDS: TCP: make receive path use the rds_conn_pathSowmini Varadhan9-22/+26
The ->sk_user_data contains a pointer to the rds_conn_path for the socket. Use this consistently in the rds_tcp_data_ready callbacks to get the rds_conn_path for rds_recv_incoming. Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-01RDS: TCP: make ->sk_user_data point to a rds_conn_pathSowmini Varadhan6-40/+41
The socket callbacks should all operate on a struct rds_conn_path, in preparation for a MP capable RDS-TCP. Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>