From fbc81ec5b85d43a4b22e49ec0e643fa7dec2ea40 Mon Sep 17 00:00:00 2001 From: Ard Biesheuvel Date: Sat, 3 Oct 2020 17:28:27 +0200 Subject: efi/arm: set HSCTLR Thumb2 bit correctly for HVC calls from HYP Commit db227c19e68db353 ("ARM: 8985/1: efi/decompressor: deal with HYP mode boot gracefully") updated the EFI entry code to permit firmware to invoke the EFI stub loader in HYP mode, with the MMU either enabled or disabled, neither of which is permitted by the EFI spec, but which does happen in the field. In the MMU on case, we remain in HYP mode as configured by the firmware, and rely on the fact that any HVC instruction issued in this mode will be dispatched via the SVC slot in the HYP vector table. However, this slot will point to a Thumb2 symbol if the kernel is built in Thumb2 mode, and so we have to configure HSCTLR to ensure that the exception handlers are invoked in Thumb2 mode as well. Fixes: db227c19e68db353 ("ARM: 8985/1: efi/decompressor: deal with HYP mode boot gracefully") Signed-off-by: Ard Biesheuvel --- arch/arm/boot/compressed/head.S | 3 +++ 1 file changed, 3 insertions(+) (limited to 'arch') diff --git a/arch/arm/boot/compressed/head.S b/arch/arm/boot/compressed/head.S index 2e04ec5b5446..caa27322a0ab 100644 --- a/arch/arm/boot/compressed/head.S +++ b/arch/arm/boot/compressed/head.S @@ -1472,6 +1472,9 @@ ENTRY(efi_enter_kernel) @ issued from HYP mode take us to the correct handler code. We @ will disable the MMU before jumping to the kernel proper. @ + ARM( bic r1, r1, #(1 << 30) ) @ clear HSCTLR.TE + THUMB( orr r1, r1, #(1 << 30) ) @ set HSCTLR.TE + mcr p15, 4, r1, c1, c0, 0 adr r0, __hyp_reentry_vectors mcr p15, 4, r0, c12, c0, 0 @ set HYP vector base (HVBAR) isb -- cgit v1.2.3 From c20782ad4eb9dfa7f41cb2d85f218d0940f7cef1 Mon Sep 17 00:00:00 2001 From: Tony Lindgren Date: Mon, 26 Oct 2020 10:08:47 +0200 Subject: ARM: OMAP2+: Fix location for select PM_GENERIC_DOMAINS I accidentally misplaced select PM_GENERIC_DOMAINS, it should be selected for all the SoCs instead. Fixes: 58cbff023bfa ("soc: ti: omap-prm: Add basic power domain support") Signed-off-by: Tony Lindgren --- arch/arm/mach-omap2/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/arm/mach-omap2/Kconfig b/arch/arm/mach-omap2/Kconfig index 3ee7bdff86b2..68ba5adfaa23 100644 --- a/arch/arm/mach-omap2/Kconfig +++ b/arch/arm/mach-omap2/Kconfig @@ -7,7 +7,6 @@ config ARCH_OMAP2 depends on ARCH_MULTI_V6 select ARCH_OMAP2PLUS select CPU_V6 - select PM_GENERIC_DOMAINS if PM select SOC_HAS_OMAP2_SDRC config ARCH_OMAP3 @@ -106,6 +105,7 @@ config ARCH_OMAP2PLUS select OMAP_DM_TIMER select OMAP_GPMC select PINCTRL + select PM_GENERIC_DOMAINS if PM select RESET_CONTROLLER select SOC_BUS select TI_SYSC -- cgit v1.2.3 From b69fd00120f8e3348273323099669cb058668263 Mon Sep 17 00:00:00 2001 From: Tony Lindgren Date: Mon, 26 Oct 2020 10:08:47 +0200 Subject: ARM: OMAP2+: Fix missing select PM_GENERIC_DOMAINS_OF We should select also PM_GENERIC_DOMAINS_OF in addition to PM_GENERIC_DOMAINS to have device tree based PM domain providers enabled. Fixes: 58cbff023bfa ("soc: ti: omap-prm: Add basic power domain support") Signed-off-by: Tony Lindgren --- arch/arm/mach-omap2/Kconfig | 1 + 1 file changed, 1 insertion(+) (limited to 'arch') diff --git a/arch/arm/mach-omap2/Kconfig b/arch/arm/mach-omap2/Kconfig index 68ba5adfaa23..3f62a0c9450d 100644 --- a/arch/arm/mach-omap2/Kconfig +++ b/arch/arm/mach-omap2/Kconfig @@ -106,6 +106,7 @@ config ARCH_OMAP2PLUS select OMAP_GPMC select PINCTRL select PM_GENERIC_DOMAINS if PM + select PM_GENERIC_DOMAINS_OF if PM select RESET_CONTROLLER select SOC_BUS select TI_SYSC -- cgit v1.2.3 From a2089ac7f8dc682ef52ed74b52997d36cde76d05 Mon Sep 17 00:00:00 2001 From: Clément Péron Date: Sun, 11 Oct 2020 23:15:14 +0200 Subject: arm64: dts: allwinner: pinetab: Drop unnecessary address/size-cells information MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit make dtbs_check warm about unknown address/size-cells property in the pinetab device-tree. This is because these information are not necessary. Drop them. Signed-off-by: Clément Péron Signed-off-by: Maxime Ripard Link: https://lore.kernel.org/r/20201011211514.155266-1-peron.clem@gmail.com --- arch/arm64/boot/dts/allwinner/sun50i-a64-pinetab.dts | 3 --- 1 file changed, 3 deletions(-) (limited to 'arch') diff --git a/arch/arm64/boot/dts/allwinner/sun50i-a64-pinetab.dts b/arch/arm64/boot/dts/allwinner/sun50i-a64-pinetab.dts index 3ab0f0347bc9..0494bfaf2ffa 100644 --- a/arch/arm64/boot/dts/allwinner/sun50i-a64-pinetab.dts +++ b/arch/arm64/boot/dts/allwinner/sun50i-a64-pinetab.dts @@ -122,9 +122,6 @@ status = "okay"; port { - #address-cells = <1>; - #size-cells = <0>; - csi_ep: endpoint { remote-endpoint = <&ov5640_ep>; bus-width = <8>; -- cgit v1.2.3 From 97a38c1c213b162aa577299de698f39c18ba696b Mon Sep 17 00:00:00 2001 From: Clément Péron Date: Sun, 18 Oct 2020 19:24:09 +0200 Subject: arm64: dts: allwinner: beelink-gs1: Enable both RGMII RX/TX delay MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Before the commit bbc4d71d6354 ("net: phy: realtek: fix rtl8211e rx/tx delay config"), the software overwrite for RX/TX delays of the RTL8211e were not working properly and the Beelink GS1 had both RX/TX delay of RGMII interface set using pull-up on the TXDLY and RXDLY pins. Now that these delays are working properly they overwrite the HW config and set this to 'rgmii' meaning no delay on both RX/TX. This makes the ethernet of this board not working anymore. Set the phy-mode to 'rgmii-id' meaning RGMII with RX/TX delays in the device-tree to keep the correct configuration. Fixes: 089bee8dd119 ("arm64: dts: allwinner: h6: Introduce Beelink GS1 board") Signed-off-by: Clément Péron Signed-off-by: Maxime Ripard Acked-by: Chen-Yu Tsai Link: https://lore.kernel.org/r/20201018172409.1754775-1-peron.clem@gmail.com --- arch/arm64/boot/dts/allwinner/sun50i-h6-beelink-gs1.dts | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/arm64/boot/dts/allwinner/sun50i-h6-beelink-gs1.dts b/arch/arm64/boot/dts/allwinner/sun50i-h6-beelink-gs1.dts index 3f7ceeb1a767..7c9dbde645b5 100644 --- a/arch/arm64/boot/dts/allwinner/sun50i-h6-beelink-gs1.dts +++ b/arch/arm64/boot/dts/allwinner/sun50i-h6-beelink-gs1.dts @@ -97,7 +97,7 @@ &emac { pinctrl-names = "default"; pinctrl-0 = <&ext_rgmii_pins>; - phy-mode = "rgmii"; + phy-mode = "rgmii-id"; phy-handle = <&ext_rgmii_phy>; phy-supply = <®_aldo2>; status = "okay"; -- cgit v1.2.3 From 419c65f5000a6c25597ea52488528d75b287cbd0 Mon Sep 17 00:00:00 2001 From: Corentin Labbe Date: Mon, 19 Oct 2020 06:34:49 +0000 Subject: arm64: dts: allwinner: Pine H64: Enable both RGMII RX/TX delay Since commit bbc4d71d6354 ("net: phy: realtek: fix rtl8211e rx/tx delay config"), the network is unusable on PineH64 model A. This is due to phy-mode incorrectly set to rgmii instead of rgmii-id. Fixes: 729e1ffcf47e ("arm64: allwinner: h6: add support for the Ethernet on Pine H64") Signed-off-by: Corentin Labbe Signed-off-by: Maxime Ripard Link: https://lore.kernel.org/r/20201019063449.33316-1-clabbe@baylibre.com --- arch/arm64/boot/dts/allwinner/sun50i-h6-pine-h64.dts | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/arm64/boot/dts/allwinner/sun50i-h6-pine-h64.dts b/arch/arm64/boot/dts/allwinner/sun50i-h6-pine-h64.dts index af85b2074867..961732c52aa0 100644 --- a/arch/arm64/boot/dts/allwinner/sun50i-h6-pine-h64.dts +++ b/arch/arm64/boot/dts/allwinner/sun50i-h6-pine-h64.dts @@ -100,7 +100,7 @@ &emac { pinctrl-names = "default"; pinctrl-0 = <&ext_rgmii_pins>; - phy-mode = "rgmii"; + phy-mode = "rgmii-id"; phy-handle = <&ext_rgmii_phy>; phy-supply = <®_gmac_3v3>; allwinner,rx-delay-ps = <200>; -- cgit v1.2.3 From d7cdff444579e6659459b2fe04340ebb27628d5e Mon Sep 17 00:00:00 2001 From: Jernej Skrabec Date: Thu, 22 Oct 2020 20:58:39 +0200 Subject: arm64: dts: allwinner: a64: OrangePi Win: Fix ethernet node RX/TX delay on OrangePi Win board is set on PHY. Reflect that in ethernet node. Fixes: 93d6a27cfcc0 ("arm64: dts: allwinner: a64: Orange Pi Win: Add Ethernet node") Signed-off-by: Jernej Skrabec Signed-off-by: Maxime Ripard Link: https://lore.kernel.org/r/20201022185839.2779245-1-jernej.skrabec@siol.net --- arch/arm64/boot/dts/allwinner/sun50i-a64-orangepi-win.dts | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/arm64/boot/dts/allwinner/sun50i-a64-orangepi-win.dts b/arch/arm64/boot/dts/allwinner/sun50i-a64-orangepi-win.dts index d894ec5fa8a1..70e31743f0ba 100644 --- a/arch/arm64/boot/dts/allwinner/sun50i-a64-orangepi-win.dts +++ b/arch/arm64/boot/dts/allwinner/sun50i-a64-orangepi-win.dts @@ -120,7 +120,7 @@ &emac { pinctrl-names = "default"; pinctrl-0 = <&rgmii_pins>; - phy-mode = "rgmii"; + phy-mode = "rgmii-id"; phy-handle = <&ext_rgmii_phy>; phy-supply = <®_gmac_3v3>; status = "okay"; -- cgit v1.2.3 From 927f42fcc1b4f7d04a2ac5cf02f25612aa8923a4 Mon Sep 17 00:00:00 2001 From: Jernej Skrabec Date: Thu, 22 Oct 2020 23:13:01 +0200 Subject: arm64: dts: allwinner: a64: Pine64 Plus: Fix ethernet node According to board schematic, PHY provides both, RX and TX delays. However, according to "fix" Realtek provided for this board, only TX delay should be provided by PHY. Tests show that both variants work but TX only PHY delay works slightly better. Update ethernet node to reflect the fact that PHY provides TX delay. Fixes: 94f442886711 ("arm64: dts: allwinner: A64: Restore EMAC changes") Signed-off-by: Jernej Skrabec Signed-off-by: Maxime Ripard Link: https://lore.kernel.org/r/20201022211301.3548422-1-jernej.skrabec@siol.net --- arch/arm64/boot/dts/allwinner/sun50i-a64-pine64-plus.dts | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/arm64/boot/dts/allwinner/sun50i-a64-pine64-plus.dts b/arch/arm64/boot/dts/allwinner/sun50i-a64-pine64-plus.dts index b26181cf9095..b54099b654c8 100644 --- a/arch/arm64/boot/dts/allwinner/sun50i-a64-pine64-plus.dts +++ b/arch/arm64/boot/dts/allwinner/sun50i-a64-pine64-plus.dts @@ -13,7 +13,7 @@ &emac { pinctrl-names = "default"; pinctrl-0 = <&rgmii_pins>; - phy-mode = "rgmii"; + phy-mode = "rgmii-txid"; phy-handle = <&ext_rgmii_phy>; status = "okay"; }; -- cgit v1.2.3 From b34bf9f6a623ddb82600a5ed5c644224122395e1 Mon Sep 17 00:00:00 2001 From: Jernej Skrabec Date: Fri, 23 Oct 2020 20:48:58 +0200 Subject: arm64: dts: allwinner: h5: OrangePi PC2: Fix ethernet node RX and TX delay are provided by ethernet PHY. Reflect that in ethernet node. Fixes: 44a94c7ef989 ("arm64: dts: allwinner: H5: Restore EMAC changes") Signed-off-by: Jernej Skrabec Signed-off-by: Maxime Ripard Link: https://lore.kernel.org/r/20201023184858.3272918-1-jernej.skrabec@siol.net --- arch/arm64/boot/dts/allwinner/sun50i-h5-orangepi-pc2.dts | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/arm64/boot/dts/allwinner/sun50i-h5-orangepi-pc2.dts b/arch/arm64/boot/dts/allwinner/sun50i-h5-orangepi-pc2.dts index 7d7aad18f078..8bf2db9dcbda 100644 --- a/arch/arm64/boot/dts/allwinner/sun50i-h5-orangepi-pc2.dts +++ b/arch/arm64/boot/dts/allwinner/sun50i-h5-orangepi-pc2.dts @@ -123,7 +123,7 @@ pinctrl-0 = <&emac_rgmii_pins>; phy-supply = <®_gmac_3v3>; phy-handle = <&ext_rgmii_phy>; - phy-mode = "rgmii"; + phy-mode = "rgmii-id"; status = "okay"; }; -- cgit v1.2.3 From b3eec3212e66ece33f69be0de98d54e67834e798 Mon Sep 17 00:00:00 2001 From: Jernej Skrabec Date: Sun, 25 Oct 2020 09:19:49 +0100 Subject: ARM: dts: sun8i: r40: bananapi-m2-ultra: Fix ethernet node Ethernet PHY on BananaPi M2 Ultra provides RX and TX delays. Fix ethernet node to reflect that fact. Fixes: c36fd5a48bd2 ("ARM: dts: sun8i: r40: bananapi-m2-ultra: Enable GMAC ethernet controller") Signed-off-by: Jernej Skrabec Signed-off-by: Maxime Ripard Link: https://lore.kernel.org/r/20201025081949.783443-1-jernej.skrabec@siol.net --- arch/arm/boot/dts/sun8i-r40-bananapi-m2-ultra.dts | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/arm/boot/dts/sun8i-r40-bananapi-m2-ultra.dts b/arch/arm/boot/dts/sun8i-r40-bananapi-m2-ultra.dts index 2fc62ef0cb3e..a6a1087a0c9b 100644 --- a/arch/arm/boot/dts/sun8i-r40-bananapi-m2-ultra.dts +++ b/arch/arm/boot/dts/sun8i-r40-bananapi-m2-ultra.dts @@ -129,7 +129,7 @@ pinctrl-names = "default"; pinctrl-0 = <&gmac_rgmii_pins>; phy-handle = <&phy1>; - phy-mode = "rgmii"; + phy-mode = "rgmii-id"; phy-supply = <®_dc1sw>; status = "okay"; }; -- cgit v1.2.3 From 8d80e2f00a42ef10b54e1b2d9e97314f8fd046c0 Mon Sep 17 00:00:00 2001 From: Chen-Yu Tsai Date: Sun, 25 Oct 2020 00:25:06 +0800 Subject: Revert "arm: sun8i: orangepi-pc-plus: Set EMAC activity LEDs to active high" This reverts commit 75ee680cbd2e4d0156b94f9fec50076361ab12f2. Turns out the activity and link LEDs on the RJ45 port are active low, just like on the Orange Pi PC. Revert the commit that says otherwise. Fixes: 75ee680cbd2e ("arm: sun8i: orangepi-pc-plus: Set EMAC activity LEDs to active high") Fixes: 4904337fe34f ("ARM: dts: sunxi: Restore EMAC changes (boards)") Signed-off-by: Chen-Yu Tsai Signed-off-by: Maxime Ripard Tested-by: Jernej Skrabec Acked-by: Jernej Skrabec Link: https://lore.kernel.org/r/20201024162515.30032-1-wens@kernel.org --- arch/arm/boot/dts/sun8i-h3-orangepi-pc-plus.dts | 5 ----- 1 file changed, 5 deletions(-) (limited to 'arch') diff --git a/arch/arm/boot/dts/sun8i-h3-orangepi-pc-plus.dts b/arch/arm/boot/dts/sun8i-h3-orangepi-pc-plus.dts index 71fb73208939..babf4cf1b2f6 100644 --- a/arch/arm/boot/dts/sun8i-h3-orangepi-pc-plus.dts +++ b/arch/arm/boot/dts/sun8i-h3-orangepi-pc-plus.dts @@ -53,11 +53,6 @@ }; }; -&emac { - /* LEDs changed to active high on the plus */ - /delete-property/ allwinner,leds-active-low; -}; - &mmc1 { vmmc-supply = <®_vcc3v3>; bus-width = <4>; -- cgit v1.2.3 From e76724153f5b4539802cc21b2c6131058668a1c6 Mon Sep 17 00:00:00 2001 From: Chen-Yu Tsai Date: Sun, 25 Oct 2020 00:25:07 +0800 Subject: ARM: dts: sun6i: a31-hummingbird: Enable RGMII RX/TX delay on Ethernet PHY The Ethernet PHY on the A31 Hummingbird has the RX and TX delays enabled on the PHY, using pull-ups on the RXDLY and TXDLY pins. Fix the phy-mode description to correct reflect this so that the implementation doesn't reconfigure the delays incorrectly. This happened with commit bbc4d71d6354 ("net: phy: realtek: fix rtl8211e rx/tx delay config"). Fixes: c220aec2bb79 ("ARM: dts: sun6i: Add Merrii A31 Hummingbird support") Signed-off-by: Chen-Yu Tsai Signed-off-by: Maxime Ripard Acked-by: Jernej Skrabec Link: https://lore.kernel.org/r/20201024162515.30032-2-wens@kernel.org --- arch/arm/boot/dts/sun6i-a31-hummingbird.dts | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/arm/boot/dts/sun6i-a31-hummingbird.dts b/arch/arm/boot/dts/sun6i-a31-hummingbird.dts index 049e6ab3cf56..73de34ae37fd 100644 --- a/arch/arm/boot/dts/sun6i-a31-hummingbird.dts +++ b/arch/arm/boot/dts/sun6i-a31-hummingbird.dts @@ -154,7 +154,7 @@ pinctrl-names = "default"; pinctrl-0 = <&gmac_rgmii_pins>; phy-handle = <&phy1>; - phy-mode = "rgmii"; + phy-mode = "rgmii-id"; status = "okay"; }; -- cgit v1.2.3 From 353c3de1303fc93032164402c0eb8550ecd6f154 Mon Sep 17 00:00:00 2001 From: Chen-Yu Tsai Date: Sun, 25 Oct 2020 00:25:08 +0800 Subject: ARM: dts: sun7i: cubietruck: Enable RGMII RX/TX delay on Ethernet PHY MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The Ethernet PHY on the Cubietruck has the RX and TX delays enabled on the PHY, using pull-ups on the RXDLY and TXDLY pins. Fix the phy-mode description to correct reflect this so that the implementation doesn't reconfigure the delays incorrectly. This happened with commit bbc4d71d6354 ("net: phy: realtek: fix rtl8211e rx/tx delay config"). Fixes: 67073d97672d ("ARM: dts: sun7i: cubietruck: Enable the GMAC") Signed-off-by: Chen-Yu Tsai Signed-off-by: Maxime Ripard Tested-by: Emilio López Reviewed-by: Emilio López Link: https://lore.kernel.org/r/20201024162515.30032-3-wens@kernel.org --- arch/arm/boot/dts/sun7i-a20-cubietruck.dts | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/arm/boot/dts/sun7i-a20-cubietruck.dts b/arch/arm/boot/dts/sun7i-a20-cubietruck.dts index 8c8dee6ea461..9109ca0919ad 100644 --- a/arch/arm/boot/dts/sun7i-a20-cubietruck.dts +++ b/arch/arm/boot/dts/sun7i-a20-cubietruck.dts @@ -151,7 +151,7 @@ pinctrl-names = "default"; pinctrl-0 = <&gmac_rgmii_pins>; phy-handle = <&phy1>; - phy-mode = "rgmii"; + phy-mode = "rgmii-id"; status = "okay"; }; -- cgit v1.2.3 From f94f78bd93f567c022f594589dbeecdf59931365 Mon Sep 17 00:00:00 2001 From: Chen-Yu Tsai Date: Sun, 25 Oct 2020 00:25:09 +0800 Subject: ARM: dts: sun7i: bananapi-m1-plus: Enable RGMII RX/TX delay on Ethernet PHY The Ethernet PHY on the Bananapi M1+ has the RX and TX delays enabled on the PHY, using pull-ups on the RXDLY and TXDLY pins. Fix the phy-mode description to correct reflect this so that the implementation doesn't reconfigure the delays incorrectly. This happened with commit bbc4d71d6354 ("net: phy: realtek: fix rtl8211e rx/tx delay config"). Fixes: 04c85ecad32a ("ARM: dts: sun7i: Add dts file for Bananapi M1 Plus board") Signed-off-by: Chen-Yu Tsai Signed-off-by: Maxime Ripard Acked-by: Jernej Skrabec Link: https://lore.kernel.org/r/20201024162515.30032-4-wens@kernel.org --- arch/arm/boot/dts/sun7i-a20-bananapi-m1-plus.dts | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/arm/boot/dts/sun7i-a20-bananapi-m1-plus.dts b/arch/arm/boot/dts/sun7i-a20-bananapi-m1-plus.dts index 32d5d45a35c0..8945dbb114a2 100644 --- a/arch/arm/boot/dts/sun7i-a20-bananapi-m1-plus.dts +++ b/arch/arm/boot/dts/sun7i-a20-bananapi-m1-plus.dts @@ -130,7 +130,7 @@ pinctrl-names = "default"; pinctrl-0 = <&gmac_rgmii_pins>; phy-handle = <&phy1>; - phy-mode = "rgmii"; + phy-mode = "rgmii-id"; phy-supply = <®_gmac_3v3>; status = "okay"; }; -- cgit v1.2.3 From e080ab31a0aa126b0a7e4f67f2b01b371b852c88 Mon Sep 17 00:00:00 2001 From: Chen-Yu Tsai Date: Sun, 25 Oct 2020 00:25:10 +0800 Subject: ARM: dts: sun8i: h3: orangepi-plus2e: Enable RGMII RX/TX delay on Ethernet PHY The Ethernet PHY on the Orange Pi Plus 2E has the RX and TX delays enabled on the PHY, using pull-ups on the RXDLY and TXDLY pins. Fix the phy-mode description to correct reflect this so that the implementation doesn't reconfigure the delays incorrectly. This happened with commit bbc4d71d6354 ("net: phy: realtek: fix rtl8211e rx/tx delay config"). Fixes: 4904337fe34f ("ARM: dts: sunxi: Restore EMAC changes (boards)") Fixes: 7a78ef92cdc5 ("ARM: sun8i: h3: Enable EMAC with external PHY on Orange Pi Plus 2E") Signed-off-by: Chen-Yu Tsai Signed-off-by: Maxime Ripard Tested-by: Jernej Skrabec Acked-by: Jernej Skrabec Link: https://lore.kernel.org/r/20201024162515.30032-5-wens@kernel.org --- arch/arm/boot/dts/sun8i-h3-orangepi-plus2e.dts | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/arm/boot/dts/sun8i-h3-orangepi-plus2e.dts b/arch/arm/boot/dts/sun8i-h3-orangepi-plus2e.dts index 6dbf7b2e0c13..b6ca45d18e51 100644 --- a/arch/arm/boot/dts/sun8i-h3-orangepi-plus2e.dts +++ b/arch/arm/boot/dts/sun8i-h3-orangepi-plus2e.dts @@ -67,7 +67,7 @@ pinctrl-0 = <&emac_rgmii_pins>; phy-supply = <®_gmac_3v3>; phy-handle = <&ext_rgmii_phy>; - phy-mode = "rgmii"; + phy-mode = "rgmii-id"; status = "okay"; }; -- cgit v1.2.3 From 57dbe558457bf4042169bc1f334e3b53a8480a1c Mon Sep 17 00:00:00 2001 From: Chen-Yu Tsai Date: Sun, 25 Oct 2020 00:25:11 +0800 Subject: ARM: dts: sun8i: a83t: Enable both RGMII RX/TX delay on Ethernet PHY The Ethernet PHY on the Bananapi M3 and Cubietruck Plus have the RX and TX delays enabled on the PHY, using pull-ups on the RXDLY and TXDLY pins. Fix the phy-mode description to correct reflect this so that the implementation doesn't reconfigure the delays incorrectly. This happened with commit bbc4d71d6354 ("net: phy: realtek: fix rtl8211e rx/tx delay config"). Fixes: 039359948a4b ("ARM: dts: sun8i: a83t: Enable Ethernet on two boards") Signed-off-by: Chen-Yu Tsai Signed-off-by: Maxime Ripard Acked-by: Jernej Skrabec Link: https://lore.kernel.org/r/20201024162515.30032-6-wens@kernel.org --- arch/arm/boot/dts/sun8i-a83t-bananapi-m3.dts | 2 +- arch/arm/boot/dts/sun8i-a83t-cubietruck-plus.dts | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) (limited to 'arch') diff --git a/arch/arm/boot/dts/sun8i-a83t-bananapi-m3.dts b/arch/arm/boot/dts/sun8i-a83t-bananapi-m3.dts index 9d34eabba121..431f70234d36 100644 --- a/arch/arm/boot/dts/sun8i-a83t-bananapi-m3.dts +++ b/arch/arm/boot/dts/sun8i-a83t-bananapi-m3.dts @@ -131,7 +131,7 @@ pinctrl-0 = <&emac_rgmii_pins>; phy-supply = <®_sw>; phy-handle = <&rgmii_phy>; - phy-mode = "rgmii"; + phy-mode = "rgmii-id"; allwinner,rx-delay-ps = <700>; allwinner,tx-delay-ps = <700>; status = "okay"; diff --git a/arch/arm/boot/dts/sun8i-a83t-cubietruck-plus.dts b/arch/arm/boot/dts/sun8i-a83t-cubietruck-plus.dts index d9be511f054f..d8326a5c681d 100644 --- a/arch/arm/boot/dts/sun8i-a83t-cubietruck-plus.dts +++ b/arch/arm/boot/dts/sun8i-a83t-cubietruck-plus.dts @@ -183,7 +183,7 @@ pinctrl-0 = <&emac_rgmii_pins>; phy-supply = <®_dldo4>; phy-handle = <&rgmii_phy>; - phy-mode = "rgmii"; + phy-mode = "rgmii-id"; status = "okay"; }; -- cgit v1.2.3 From b1064037e8ecf09d587b7b4966eebe0c362908e5 Mon Sep 17 00:00:00 2001 From: Chen-Yu Tsai Date: Sun, 25 Oct 2020 00:25:12 +0800 Subject: ARM: dts: sun9i: Enable both RGMII RX/TX delay on Ethernet PHY The Ethernet PHY on the Cubieboard 4 and A80 Optimus have the RX and TX delays enabled on the PHY, using pull-ups on the RXDLY and TXDLY pins. Fix the phy-mode description to correct reflect this so that the implementation doesn't reconfigure the delays incorrectly. This happened with commit bbc4d71d6354 ("net: phy: realtek: fix rtl8211e rx/tx delay config"). Fixes: 98048143b7f8 ("ARM: dts: sun9i: cubieboard4: Enable GMAC") Fixes: bc9bd03a44f9 ("ARM: dts: sun9i: a80-optimus: Enable GMAC") Signed-off-by: Chen-Yu Tsai Signed-off-by: Maxime Ripard Acked-by: Jernej Skrabec Link: https://lore.kernel.org/r/20201024162515.30032-7-wens@kernel.org --- arch/arm/boot/dts/sun9i-a80-cubieboard4.dts | 2 +- arch/arm/boot/dts/sun9i-a80-optimus.dts | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) (limited to 'arch') diff --git a/arch/arm/boot/dts/sun9i-a80-cubieboard4.dts b/arch/arm/boot/dts/sun9i-a80-cubieboard4.dts index d3b337b043a1..484b93df20cb 100644 --- a/arch/arm/boot/dts/sun9i-a80-cubieboard4.dts +++ b/arch/arm/boot/dts/sun9i-a80-cubieboard4.dts @@ -129,7 +129,7 @@ pinctrl-names = "default"; pinctrl-0 = <&gmac_rgmii_pins>; phy-handle = <&phy1>; - phy-mode = "rgmii"; + phy-mode = "rgmii-id"; phy-supply = <®_cldo1>; status = "okay"; }; diff --git a/arch/arm/boot/dts/sun9i-a80-optimus.dts b/arch/arm/boot/dts/sun9i-a80-optimus.dts index bbc6335e5631..5c3580d712e4 100644 --- a/arch/arm/boot/dts/sun9i-a80-optimus.dts +++ b/arch/arm/boot/dts/sun9i-a80-optimus.dts @@ -124,7 +124,7 @@ pinctrl-names = "default"; pinctrl-0 = <&gmac_rgmii_pins>; phy-handle = <&phy1>; - phy-mode = "rgmii"; + phy-mode = "rgmii-id"; phy-supply = <®_cldo1>; status = "okay"; }; -- cgit v1.2.3 From 3914160ffc0bf762d6d605d4b27036b7b89367ea Mon Sep 17 00:00:00 2001 From: Chen-Yu Tsai Date: Sun, 25 Oct 2020 00:25:13 +0800 Subject: ARM: dts: sunxi: bananapi-m2-plus: Enable RGMII RX/TX delay on Ethernet PHY The Ethernet PHY on the Bananapi M2+ has the RX and TX delays enabled on the PHY, using pull-ups on the RXDLY and TXDLY pins. Fix the phy-mode description to correct reflect this so that the implementation doesn't reconfigure the delays incorrectly. This happened with commit bbc4d71d6354 ("net: phy: realtek: fix rtl8211e rx/tx delay config"). Fixes: 8c7ba536e709 ("ARM: sun8i: bananapi-m2-plus: Enable dwmac-sun8i") Fixes: 4904337fe34f ("ARM: dts: sunxi: Restore EMAC changes (boards)") Fixes: aa8fee415f46 ("ARM: dts: sun8i: h3: Split out non-SoC-specific parts of Bananapi M2 Plus") Signed-off-by: Chen-Yu Tsai Signed-off-by: Maxime Ripard Tested-by: Jernej Skrabec Acked-by: Jernej Skrabec Link: https://lore.kernel.org/r/20201024162515.30032-8-wens@kernel.org --- arch/arm/boot/dts/sunxi-bananapi-m2-plus.dtsi | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/arm/boot/dts/sunxi-bananapi-m2-plus.dtsi b/arch/arm/boot/dts/sunxi-bananapi-m2-plus.dtsi index 39263e74fbb5..8e5cb3b3fd68 100644 --- a/arch/arm/boot/dts/sunxi-bananapi-m2-plus.dtsi +++ b/arch/arm/boot/dts/sunxi-bananapi-m2-plus.dtsi @@ -126,7 +126,7 @@ pinctrl-0 = <&emac_rgmii_pins>; phy-supply = <®_gmac_3v3>; phy-handle = <&ext_rgmii_phy>; - phy-mode = "rgmii"; + phy-mode = "rgmii-id"; status = "okay"; }; -- cgit v1.2.3 From 2bd8570d20c88909b8be3251727a26476b02652c Mon Sep 17 00:00:00 2001 From: Chen-Yu Tsai Date: Sun, 25 Oct 2020 00:25:14 +0800 Subject: arm64: dts: allwinner: h5: libretech-all-h5-cc: Enable RGMII RX/TX delay on PHY The Ethernet PHY on the Libre Computer ALL-H5-CC has the RX and TX delays enabled on the PHY, using pull-ups on the RXDLY and TXDLY pins. Fix the phy-mode description to correct reflect this so that the implementation doesn't reconfigure the delays incorrectly. This happened with commit bbc4d71d6354 ("net: phy: realtek: fix rtl8211e rx/tx delay config"). Fixes: 60d0426d7603 ("arm64: dts: allwinner: h5: Add Libre Computer ALL-H5-CC H5 board") Signed-off-by: Chen-Yu Tsai Signed-off-by: Maxime Ripard Acked-by: Jernej Skrabec Link: https://lore.kernel.org/r/20201024162515.30032-9-wens@kernel.org --- arch/arm64/boot/dts/allwinner/sun50i-h5-libretech-all-h5-cc.dts | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/arm64/boot/dts/allwinner/sun50i-h5-libretech-all-h5-cc.dts b/arch/arm64/boot/dts/allwinner/sun50i-h5-libretech-all-h5-cc.dts index df1b9263ad0e..6e30a564c87f 100644 --- a/arch/arm64/boot/dts/allwinner/sun50i-h5-libretech-all-h5-cc.dts +++ b/arch/arm64/boot/dts/allwinner/sun50i-h5-libretech-all-h5-cc.dts @@ -36,7 +36,7 @@ pinctrl-0 = <&emac_rgmii_pins>; phy-supply = <®_gmac_3v3>; phy-handle = <&ext_rgmii_phy>; - phy-mode = "rgmii"; + phy-mode = "rgmii-id"; /delete-property/ allwinner,leds-active-low; status = "okay"; }; -- cgit v1.2.3 From 1a9a8910b2153cd3c4f3f2f8defcb853ead3b1fd Mon Sep 17 00:00:00 2001 From: Chen-Yu Tsai Date: Sun, 25 Oct 2020 00:25:15 +0800 Subject: arm64: dts: allwinner: a64: bananapi-m64: Enable RGMII RX/TX delay on PHY The Ethernet PHY on the Bananapi M64 has the RX and TX delays enabled on the PHY, using pull-ups on the RXDLY and TXDLY pins. Fix the phy-mode description to correct reflect this so that the implementation doesn't reconfigure the delays incorrectly. This happened with commit bbc4d71d6354 ("net: phy: realtek: fix rtl8211e rx/tx delay config"). Fixes: e7295499903d ("arm64: allwinner: bananapi-m64: Enable dwmac-sun8i") Fixes: 94f442886711 ("arm64: dts: allwinner: A64: Restore EMAC changes") Signed-off-by: Chen-Yu Tsai Signed-off-by: Maxime Ripard Tested-by: Corentin Labbe Acked-by: Jernej Skrabec Link: https://lore.kernel.org/r/20201024162515.30032-10-wens@kernel.org --- arch/arm64/boot/dts/allwinner/sun50i-a64-bananapi-m64.dts | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/arm64/boot/dts/allwinner/sun50i-a64-bananapi-m64.dts b/arch/arm64/boot/dts/allwinner/sun50i-a64-bananapi-m64.dts index 3ea5182ca489..e5e840b9fbb4 100644 --- a/arch/arm64/boot/dts/allwinner/sun50i-a64-bananapi-m64.dts +++ b/arch/arm64/boot/dts/allwinner/sun50i-a64-bananapi-m64.dts @@ -105,7 +105,7 @@ &emac { pinctrl-names = "default"; pinctrl-0 = <&rgmii_pins>; - phy-mode = "rgmii"; + phy-mode = "rgmii-id"; phy-handle = <&ext_rgmii_phy>; phy-supply = <®_dc1sw>; status = "okay"; -- cgit v1.2.3 From dbf563eee0b8cc056744514d91c5ffc2fa6c0982 Mon Sep 17 00:00:00 2001 From: Michael Kelley Date: Mon, 26 Oct 2020 07:52:52 -0700 Subject: x86/hyperv: Clarify comment on x2apic mode The comment about Hyper-V accessors is unclear regarding their potential use in x2apic mode, as is the associated commit message in e211288b72f1. Clarify that while the architectural and synthetic MSRs are equivalent in x2apic mode, the full set of xapic accessors cannot be used because of register layout differences. Fixes: e211288b72f1 ("x86/hyperv: Make vapic support x2apic mode") Signed-off-by: Michael Kelley Link: https://lore.kernel.org/r/1603723972-81303-1-git-send-email-mikelley@microsoft.com Signed-off-by: Wei Liu --- arch/x86/hyperv/hv_apic.c | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-) (limited to 'arch') diff --git a/arch/x86/hyperv/hv_apic.c b/arch/x86/hyperv/hv_apic.c index 40e0e322161d..284e73661a18 100644 --- a/arch/x86/hyperv/hv_apic.c +++ b/arch/x86/hyperv/hv_apic.c @@ -273,11 +273,15 @@ void __init hv_apic_init(void) pr_info("Hyper-V: Using enlightened APIC (%s mode)", x2apic_enabled() ? "x2apic" : "xapic"); /* - * With x2apic, architectural x2apic MSRs are equivalent to the - * respective synthetic MSRs, so there's no need to override - * the apic accessors. The only exception is - * hv_apic_eoi_write, because it benefits from lazy EOI when - * available, but it works for both xapic and x2apic modes. + * When in x2apic mode, don't use the Hyper-V specific APIC + * accessors since the field layout in the ICR register is + * different in x2apic mode. Furthermore, the architectural + * x2apic MSRs function just as well as the Hyper-V + * synthetic APIC MSRs, so there's no benefit in having + * separate Hyper-V accessors for x2apic mode. The only + * exception is hv_apic_eoi_write, because it benefits from + * lazy EOI when available, but the same accessor works for + * both xapic and x2apic because the field layout is the same. */ apic_set_eoi_write(hv_apic_eoi_write); if (!x2apic_enabled()) { -- cgit v1.2.3 From 294a3317bef52b189139c813b50dd14d344fa9ec Mon Sep 17 00:00:00 2001 From: Tony Lindgren Date: Wed, 28 Oct 2020 08:03:17 +0200 Subject: ARM: OMAP2+: Manage MPU state properly for omap_enter_idle_coupled() Based on more testing, commit 8ca5ee624b4c ("ARM: OMAP2+: Restore MPU power domain if cpu_cluster_pm_enter() fails") is a poor fix for handling cpu_cluster_pm_enter() returned errors. We should not override the cpuidle states with a hardcoded PWRDM_POWER_ON value. Instead, we should use a configured idle state that does not cause the context to be lost. Otherwise we end up configuring a potentially improper state for the MPUSS. We also want to update the returned state index for the selected state. Let's just select the highest power idle state C1 to ensure no context loss is allowed on cpu_cluster_pm_enter() errors. With these changes we can now unconditionally call omap4_enter_lowpower() for WFI like we did earlier before commit 55be2f50336f ("ARM: OMAP2+: Handle errors for cpu_pm"). And we can return the selected state index. Fixes: 8f04aea048d5 ("ARM: OMAP2+: Restore MPU power domain if cpu_cluster_pm_enter() fails") Fixes: 55be2f50336f ("ARM: OMAP2+: Handle errors for cpu_pm") Signed-off-by: Tony Lindgren --- arch/arm/mach-omap2/cpuidle44xx.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) (limited to 'arch') diff --git a/arch/arm/mach-omap2/cpuidle44xx.c b/arch/arm/mach-omap2/cpuidle44xx.c index a92d277f81a0..c8d317fafe2e 100644 --- a/arch/arm/mach-omap2/cpuidle44xx.c +++ b/arch/arm/mach-omap2/cpuidle44xx.c @@ -175,8 +175,11 @@ static int omap_enter_idle_coupled(struct cpuidle_device *dev, if (mpuss_can_lose_context) { error = cpu_cluster_pm_enter(); if (error) { - omap_set_pwrdm_state(mpu_pd, PWRDM_POWER_ON); - goto cpu_cluster_pm_out; + index = 0; + cx = state_ptr + index; + pwrdm_set_logic_retst(mpu_pd, cx->mpu_logic_state); + omap_set_pwrdm_state(mpu_pd, cx->mpu_state); + mpuss_can_lose_context = 0; } } } @@ -184,7 +187,6 @@ static int omap_enter_idle_coupled(struct cpuidle_device *dev, omap4_enter_lowpower(dev->cpu, cx->cpu_state); cpu_done[dev->cpu] = true; -cpu_cluster_pm_out: /* Wakeup CPU1 only if it is not offlined */ if (dev->cpu == 0 && cpumask_test_cpu(1, cpu_online_mask)) { -- cgit v1.2.3 From 31b4d8e172f614adc53ddecb4b6b2f6411a49b84 Mon Sep 17 00:00:00 2001 From: Randy Dunlap Date: Fri, 23 Oct 2020 12:44:40 -0700 Subject: MIPS: export has_transparent_hugepage() for modules MIPS should export its local version of "has_transparent_hugepage" so that loadable modules (dax) can use it. Fixes this build error: ERROR: modpost: "has_transparent_hugepage" [drivers/dax/dax.ko] undefined! Fixes: fd8cfd300019 ("arch: fix has_transparent_hugepage()") Reported-by: kernel test robot Signed-off-by: Randy Dunlap Cc: Thomas Bogendoerfer Cc: linux-mips@vger.kernel.org Cc: Dan Williams Cc: Vishal Verma Cc: Dave Jiang Cc: linux-nvdimm@lists.01.org Cc: Hugh Dickins Cc: Andrew Morton Signed-off-by: Thomas Bogendoerfer --- arch/mips/mm/tlb-r4k.c | 1 + 1 file changed, 1 insertion(+) (limited to 'arch') diff --git a/arch/mips/mm/tlb-r4k.c b/arch/mips/mm/tlb-r4k.c index 38e2894d5fa3..1b939abbe4ca 100644 --- a/arch/mips/mm/tlb-r4k.c +++ b/arch/mips/mm/tlb-r4k.c @@ -438,6 +438,7 @@ int has_transparent_hugepage(void) } return mask == PM_HUGE_MASK; } +EXPORT_SYMBOL(has_transparent_hugepage); #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ -- cgit v1.2.3 From 9fa2e7af3d53a4b769136eccc32c02e128a4ee51 Mon Sep 17 00:00:00 2001 From: Andrew Jeffery Date: Thu, 22 Oct 2020 01:43:59 +0100 Subject: ARM: 9019/1: kprobes: Avoid fortify_panic() when copying optprobe template Setting both CONFIG_KPROBES=y and CONFIG_FORTIFY_SOURCE=y on ARM leads to a panic in memcpy() when injecting a kprobe despite the fixes found in commit e46daee53bb5 ("ARM: 8806/1: kprobes: Fix false positive with FORTIFY_SOURCE") and commit 0ac569bf6a79 ("ARM: 8834/1: Fix: kprobes: optimized kprobes illegal instruction"). arch/arm/include/asm/kprobes.h effectively declares the target type of the optprobe_template_entry assembly label as a u32 which leads memcpy()'s __builtin_object_size() call to determine that the pointed-to object is of size four. However, the symbol is used as a handle for the optimised probe assembly template that is at least 96 bytes in size. The symbol's use despite its type blows up the memcpy() in ARM's arch_prepare_optimized_kprobe() with a false-positive fortify_panic() when it should instead copy the optimised probe template into place: ``` $ sudo perf probe -a aspeed_g6_pinctrl_probe [ 158.457252] detected buffer overflow in memcpy [ 158.458069] ------------[ cut here ]------------ [ 158.458283] kernel BUG at lib/string.c:1153! [ 158.458436] Internal error: Oops - BUG: 0 [#1] SMP ARM [ 158.458768] Modules linked in: [ 158.459043] CPU: 1 PID: 99 Comm: perf Not tainted 5.9.0-rc7-00038-gc53ebf8167e9 #158 [ 158.459296] Hardware name: Generic DT based system [ 158.459529] PC is at fortify_panic+0x18/0x20 [ 158.459658] LR is at __irq_work_queue_local+0x3c/0x74 [ 158.459831] pc : [<8047451c>] lr : [<8020ecd4>] psr: 60000013 [ 158.460032] sp : be2d1d50 ip : be2d1c58 fp : be2d1d5c [ 158.460174] r10: 00000006 r9 : 00000000 r8 : 00000060 [ 158.460348] r7 : 8011e434 r6 : b9e0b800 r5 : 7f000000 r4 : b9fe4f0c [ 158.460557] r3 : 80c04cc8 r2 : 00000000 r1 : be7c03cc r0 : 00000022 [ 158.460801] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none [ 158.461037] Control: 10c5387d Table: b9cd806a DAC: 00000051 [ 158.461251] Process perf (pid: 99, stack limit = 0x81c71a69) [ 158.461472] Stack: (0xbe2d1d50 to 0xbe2d2000) [ 158.461757] 1d40: be2d1d84 be2d1d60 8011e724 80474510 [ 158.462104] 1d60: b9e0b800 b9fe4f0c 00000000 b9fe4f14 80c8ec80 be235000 be2d1d9c be2d1d88 [ 158.462436] 1d80: 801cee44 8011e57c b9fe4f0c 00000000 be2d1dc4 be2d1da0 801d0ad0 801cedec [ 158.462742] 1da0: 00000000 00000000 b9fe4f00 ffffffea 00000000 be235000 be2d1de4 be2d1dc8 [ 158.463087] 1dc0: 80204604 801d0738 00000000 00000000 b9fe4004 ffffffea be2d1e94 be2d1de8 [ 158.463428] 1de0: 80205434 80204570 00385c00 00000000 00000000 00000000 be2d1e14 be2d1e08 [ 158.463880] 1e00: 802ba014 b9fe4f00 b9e718c0 b9fe4f84 b9e71ec8 be2d1e24 00000000 00385c00 [ 158.464365] 1e20: 00000000 626f7270 00000065 802b905c be2d1e94 0000002e 00000000 802b9914 [ 158.464829] 1e40: be2d1e84 be2d1e50 802b9914 8028ff78 804629d0 b9e71ec0 0000002e b9e71ec0 [ 158.465141] 1e60: be2d1ea8 80c04cc8 00000cc0 b9e713c4 00000002 80205834 80205834 0000002e [ 158.465488] 1e80: be235000 be235000 be2d1ea4 be2d1e98 80205854 80204e94 be2d1ecc be2d1ea8 [ 158.465806] 1ea0: 801ee4a0 80205840 00000002 80c04cc8 00000000 0000002e 0000002e 00000000 [ 158.466110] 1ec0: be2d1f0c be2d1ed0 801ee5c8 801ee428 00000000 be2d0000 006b1fd0 00000051 [ 158.466398] 1ee0: 00000000 b9eedf00 0000002e 80204410 006b1fd0 be2d1f60 00000000 00000004 [ 158.466763] 1f00: be2d1f24 be2d1f10 8020442c 801ee4c4 80205834 802c613c be2d1f5c be2d1f28 [ 158.467102] 1f20: 802c60ac 8020441c be2d1fac be2d1f38 8010c764 802e9888 be2d1f5c b9eedf00 [ 158.467447] 1f40: b9eedf00 006b1fd0 0000002e 00000000 be2d1f94 be2d1f60 802c634c 802c5fec [ 158.467812] 1f60: 00000000 00000000 00000000 80c04cc8 006b1fd0 00000003 76f7a610 00000004 [ 158.468155] 1f80: 80100284 be2d0000 be2d1fa4 be2d1f98 802c63ec 802c62e8 00000000 be2d1fa8 [ 158.468508] 1fa0: 80100080 802c63e0 006b1fd0 00000003 00000003 006b1fd0 0000002e 00000000 [ 158.468858] 1fc0: 006b1fd0 00000003 76f7a610 00000004 006b1fb0 0026d348 00000017 7ef2738c [ 158.469202] 1fe0: 76f3431c 7ef272d8 0014ec50 76f34338 60000010 00000003 00000000 00000000 [ 158.469461] Backtrace: [ 158.469683] [<80474504>] (fortify_panic) from [<8011e724>] (arch_prepare_optimized_kprobe+0x1b4/0x1f8) [ 158.470021] [<8011e570>] (arch_prepare_optimized_kprobe) from [<801cee44>] (alloc_aggr_kprobe+0x64/0x70) [ 158.470287] r9:be235000 r8:80c8ec80 r7:b9fe4f14 r6:00000000 r5:b9fe4f0c r4:b9e0b800 [ 158.470478] [<801cede0>] (alloc_aggr_kprobe) from [<801d0ad0>] (register_kprobe+0x3a4/0x5a0) [ 158.470685] r5:00000000 r4:b9fe4f0c [ 158.470790] [<801d072c>] (register_kprobe) from [<80204604>] (__register_trace_kprobe+0xa0/0xa4) [ 158.471001] r9:be235000 r8:00000000 r7:ffffffea r6:b9fe4f00 r5:00000000 r4:00000000 [ 158.471188] [<80204564>] (__register_trace_kprobe) from [<80205434>] (trace_kprobe_create+0x5ac/0x9ac) [ 158.471408] r7:ffffffea r6:b9fe4004 r5:00000000 r4:00000000 [ 158.471553] [<80204e88>] (trace_kprobe_create) from [<80205854>] (create_or_delete_trace_kprobe+0x20/0x3c) [ 158.471766] r10:be235000 r9:be235000 r8:0000002e r7:80205834 r6:80205834 r5:00000002 [ 158.471949] r4:b9e713c4 [ 158.472027] [<80205834>] (create_or_delete_trace_kprobe) from [<801ee4a0>] (trace_run_command+0x84/0x9c) [ 158.472255] [<801ee41c>] (trace_run_command) from [<801ee5c8>] (trace_parse_run_command+0x110/0x1f8) [ 158.472471] r6:00000000 r5:0000002e r4:0000002e [ 158.472594] [<801ee4b8>] (trace_parse_run_command) from [<8020442c>] (probes_write+0x1c/0x28) [ 158.472800] r10:00000004 r9:00000000 r8:be2d1f60 r7:006b1fd0 r6:80204410 r5:0000002e [ 158.472968] r4:b9eedf00 [ 158.473046] [<80204410>] (probes_write) from [<802c60ac>] (vfs_write+0xcc/0x1e8) [ 158.473226] [<802c5fe0>] (vfs_write) from [<802c634c>] (ksys_write+0x70/0xf8) [ 158.473400] r8:00000000 r7:0000002e r6:006b1fd0 r5:b9eedf00 r4:b9eedf00 [ 158.473567] [<802c62dc>] (ksys_write) from [<802c63ec>] (sys_write+0x18/0x1c) [ 158.473745] r9:be2d0000 r8:80100284 r7:00000004 r6:76f7a610 r5:00000003 r4:006b1fd0 [ 158.473932] [<802c63d4>] (sys_write) from [<80100080>] (ret_fast_syscall+0x0/0x54) [ 158.474126] Exception stack(0xbe2d1fa8 to 0xbe2d1ff0) [ 158.474305] 1fa0: 006b1fd0 00000003 00000003 006b1fd0 0000002e 00000000 [ 158.474573] 1fc0: 006b1fd0 00000003 76f7a610 00000004 006b1fb0 0026d348 00000017 7ef2738c [ 158.474811] 1fe0: 76f3431c 7ef272d8 0014ec50 76f34338 [ 158.475171] Code: e24cb004 e1a01000 e59f0004 ebf40dd3 (e7f001f2) [ 158.475847] ---[ end trace 55a5b31c08a29f00 ]--- [ 158.476088] Kernel panic - not syncing: Fatal exception [ 158.476375] CPU0: stopping [ 158.476709] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G D 5.9.0-rc7-00038-gc53ebf8167e9 #158 [ 158.477176] Hardware name: Generic DT based system [ 158.477411] Backtrace: [ 158.477604] [<8010dd28>] (dump_backtrace) from [<8010dfd4>] (show_stack+0x20/0x24) [ 158.477990] r7:00000000 r6:60000193 r5:00000000 r4:80c2f634 [ 158.478323] [<8010dfb4>] (show_stack) from [<8046390c>] (dump_stack+0xcc/0xe8) [ 158.478686] [<80463840>] (dump_stack) from [<80110750>] (handle_IPI+0x334/0x3a0) [ 158.479063] r7:00000000 r6:00000004 r5:80b65cc8 r4:80c78278 [ 158.479352] [<8011041c>] (handle_IPI) from [<801013f8>] (gic_handle_irq+0x88/0x94) [ 158.479757] r10:10c5387d r9:80c01ed8 r8:00000000 r7:c0802000 r6:80c0537c r5:000003ff [ 158.480146] r4:c080200c r3:fffffff4 [ 158.480364] [<80101370>] (gic_handle_irq) from [<80100b6c>] (__irq_svc+0x6c/0x90) [ 158.480748] Exception stack(0x80c01ed8 to 0x80c01f20) [ 158.481031] 1ec0: 000128bc 00000000 [ 158.481499] 1ee0: be7b8174 8011d3a0 80c00000 00000000 80c04cec 80c04d28 80c5d7c2 80a026d4 [ 158.482091] 1f00: 10c5387d 80c01f34 80c01f38 80c01f28 80109554 80109558 60000013 ffffffff [ 158.482621] r9:80c00000 r8:80c5d7c2 r7:80c01f0c r6:ffffffff r5:60000013 r4:80109558 [ 158.482983] [<80109518>] (arch_cpu_idle) from [<80818780>] (default_idle_call+0x38/0x120) [ 158.483360] [<80818748>] (default_idle_call) from [<801585a8>] (do_idle+0xd4/0x158) [ 158.483945] r5:00000000 r4:80c00000 [ 158.484237] [<801584d4>] (do_idle) from [<801588f4>] (cpu_startup_entry+0x28/0x2c) [ 158.484784] r9:80c78000 r8:00000000 r7:80c78000 r6:80c78040 r5:80c04cc0 r4:000000d6 [ 158.485328] [<801588cc>] (cpu_startup_entry) from [<80810a78>] (rest_init+0x9c/0xbc) [ 158.485930] [<808109dc>] (rest_init) from [<80b00ae4>] (arch_call_rest_init+0x18/0x1c) [ 158.486503] r5:80c04cc0 r4:00000001 [ 158.486857] [<80b00acc>] (arch_call_rest_init) from [<80b00fcc>] (start_kernel+0x46c/0x548) [ 158.487589] [<80b00b60>] (start_kernel) from [<00000000>] (0x0) ``` Fixes: e46daee53bb5 ("ARM: 8806/1: kprobes: Fix false positive with FORTIFY_SOURCE") Fixes: 0ac569bf6a79 ("ARM: 8834/1: Fix: kprobes: optimized kprobes illegal instruction") Suggested-by: Kees Cook Signed-off-by: Andrew Jeffery Tested-by: Luka Oreskovic Tested-by: Joel Stanley Reviewed-by: Joel Stanley Acked-by: Masami Hiramatsu Cc: Luka Oreskovic Cc: Juraj Vijtiuk Signed-off-by: Russell King --- arch/arm/include/asm/kprobes.h | 22 +++++++++++----------- arch/arm/probes/kprobes/opt-arm.c | 18 +++++++++--------- 2 files changed, 20 insertions(+), 20 deletions(-) (limited to 'arch') diff --git a/arch/arm/include/asm/kprobes.h b/arch/arm/include/asm/kprobes.h index 213607a1f45c..e26a278d301a 100644 --- a/arch/arm/include/asm/kprobes.h +++ b/arch/arm/include/asm/kprobes.h @@ -44,20 +44,20 @@ int kprobe_exceptions_notify(struct notifier_block *self, unsigned long val, void *data); /* optinsn template addresses */ -extern __visible kprobe_opcode_t optprobe_template_entry; -extern __visible kprobe_opcode_t optprobe_template_val; -extern __visible kprobe_opcode_t optprobe_template_call; -extern __visible kprobe_opcode_t optprobe_template_end; -extern __visible kprobe_opcode_t optprobe_template_sub_sp; -extern __visible kprobe_opcode_t optprobe_template_add_sp; -extern __visible kprobe_opcode_t optprobe_template_restore_begin; -extern __visible kprobe_opcode_t optprobe_template_restore_orig_insn; -extern __visible kprobe_opcode_t optprobe_template_restore_end; +extern __visible kprobe_opcode_t optprobe_template_entry[]; +extern __visible kprobe_opcode_t optprobe_template_val[]; +extern __visible kprobe_opcode_t optprobe_template_call[]; +extern __visible kprobe_opcode_t optprobe_template_end[]; +extern __visible kprobe_opcode_t optprobe_template_sub_sp[]; +extern __visible kprobe_opcode_t optprobe_template_add_sp[]; +extern __visible kprobe_opcode_t optprobe_template_restore_begin[]; +extern __visible kprobe_opcode_t optprobe_template_restore_orig_insn[]; +extern __visible kprobe_opcode_t optprobe_template_restore_end[]; #define MAX_OPTIMIZED_LENGTH 4 #define MAX_OPTINSN_SIZE \ - ((unsigned long)&optprobe_template_end - \ - (unsigned long)&optprobe_template_entry) + ((unsigned long)optprobe_template_end - \ + (unsigned long)optprobe_template_entry) #define RELATIVEJUMP_SIZE 4 struct arch_optimized_insn { diff --git a/arch/arm/probes/kprobes/opt-arm.c b/arch/arm/probes/kprobes/opt-arm.c index 7a449df0b359..c78180172120 100644 --- a/arch/arm/probes/kprobes/opt-arm.c +++ b/arch/arm/probes/kprobes/opt-arm.c @@ -85,21 +85,21 @@ asm ( "optprobe_template_end:\n"); #define TMPL_VAL_IDX \ - ((unsigned long *)&optprobe_template_val - (unsigned long *)&optprobe_template_entry) + ((unsigned long *)optprobe_template_val - (unsigned long *)optprobe_template_entry) #define TMPL_CALL_IDX \ - ((unsigned long *)&optprobe_template_call - (unsigned long *)&optprobe_template_entry) + ((unsigned long *)optprobe_template_call - (unsigned long *)optprobe_template_entry) #define TMPL_END_IDX \ - ((unsigned long *)&optprobe_template_end - (unsigned long *)&optprobe_template_entry) + ((unsigned long *)optprobe_template_end - (unsigned long *)optprobe_template_entry) #define TMPL_ADD_SP \ - ((unsigned long *)&optprobe_template_add_sp - (unsigned long *)&optprobe_template_entry) + ((unsigned long *)optprobe_template_add_sp - (unsigned long *)optprobe_template_entry) #define TMPL_SUB_SP \ - ((unsigned long *)&optprobe_template_sub_sp - (unsigned long *)&optprobe_template_entry) + ((unsigned long *)optprobe_template_sub_sp - (unsigned long *)optprobe_template_entry) #define TMPL_RESTORE_BEGIN \ - ((unsigned long *)&optprobe_template_restore_begin - (unsigned long *)&optprobe_template_entry) + ((unsigned long *)optprobe_template_restore_begin - (unsigned long *)optprobe_template_entry) #define TMPL_RESTORE_ORIGN_INSN \ - ((unsigned long *)&optprobe_template_restore_orig_insn - (unsigned long *)&optprobe_template_entry) + ((unsigned long *)optprobe_template_restore_orig_insn - (unsigned long *)optprobe_template_entry) #define TMPL_RESTORE_END \ - ((unsigned long *)&optprobe_template_restore_end - (unsigned long *)&optprobe_template_entry) + ((unsigned long *)optprobe_template_restore_end - (unsigned long *)optprobe_template_entry) /* * ARM can always optimize an instruction when using ARM ISA, except @@ -234,7 +234,7 @@ int arch_prepare_optimized_kprobe(struct optimized_kprobe *op, struct kprobe *or } /* Copy arch-dep-instance from template. */ - memcpy(code, (unsigned long *)&optprobe_template_entry, + memcpy(code, (unsigned long *)optprobe_template_entry, TMPL_END_IDX * sizeof(kprobe_opcode_t)); /* Adjust buffer according to instruction. */ -- cgit v1.2.3 From 107954afc5df667da438644aa4982606663f9b17 Mon Sep 17 00:00:00 2001 From: Nenad Peric Date: Wed, 28 Oct 2020 12:58:17 +0100 Subject: arm64: dts: allwinner: h5: OrangePi Prime: Fix ethernet node RX and TX delay are provided by ethernet PHY. Reflect that in ethernet node. Fixes: 44a94c7ef989 ("arm64: dts: allwinner: H5: Restore EMAC changes") Signed-off-by: Nenad Peric Signed-off-by: Maxime Ripard Acked-by: Jernej Skrabec Link: https://lore.kernel.org/r/20201028115817.68113-1-nperic@gmail.com --- arch/arm64/boot/dts/allwinner/sun50i-h5-orangepi-prime.dts | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/arm64/boot/dts/allwinner/sun50i-h5-orangepi-prime.dts b/arch/arm64/boot/dts/allwinner/sun50i-h5-orangepi-prime.dts index cb44bfa5981f..33ab44072e6d 100644 --- a/arch/arm64/boot/dts/allwinner/sun50i-h5-orangepi-prime.dts +++ b/arch/arm64/boot/dts/allwinner/sun50i-h5-orangepi-prime.dts @@ -124,7 +124,7 @@ pinctrl-0 = <&emac_rgmii_pins>; phy-supply = <®_gmac_3v3>; phy-handle = <&ext_rgmii_phy>; - phy-mode = "rgmii"; + phy-mode = "rgmii-id"; status = "okay"; }; -- cgit v1.2.3 From 3ad84246a4097010f3ae3d6944120c0be00e9e7a Mon Sep 17 00:00:00 2001 From: Joerg Roedel Date: Wed, 28 Oct 2020 17:46:55 +0100 Subject: x86/boot/compressed/64: Introduce sev_status Introduce sev_status and initialize it together with sme_me_mask to have an indicator which SEV features are enabled. Signed-off-by: Joerg Roedel Signed-off-by: Borislav Petkov Reviewed-by: Tom Lendacky Link: https://lkml.kernel.org/r/20201028164659.27002-2-joro@8bytes.org --- arch/x86/boot/compressed/mem_encrypt.S | 16 +++++++++++++++- 1 file changed, 15 insertions(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/x86/boot/compressed/mem_encrypt.S b/arch/x86/boot/compressed/mem_encrypt.S index dd07e7b41b11..3092ae173f94 100644 --- a/arch/x86/boot/compressed/mem_encrypt.S +++ b/arch/x86/boot/compressed/mem_encrypt.S @@ -81,6 +81,19 @@ SYM_FUNC_START(set_sev_encryption_mask) bts %rax, sme_me_mask(%rip) /* Create the encryption mask */ + /* + * Read MSR_AMD64_SEV again and store it to sev_status. Can't do this in + * get_sev_encryption_bit() because this function is 32-bit code and + * shared between 64-bit and 32-bit boot path. + */ + movl $MSR_AMD64_SEV, %ecx /* Read the SEV MSR */ + rdmsr + + /* Store MSR value in sev_status */ + shlq $32, %rdx + orq %rdx, %rax + movq %rax, sev_status(%rip) + .Lno_sev_mask: movq %rbp, %rsp /* Restore original stack pointer */ @@ -96,5 +109,6 @@ SYM_FUNC_END(set_sev_encryption_mask) #ifdef CONFIG_AMD_MEM_ENCRYPT .balign 8 -SYM_DATA(sme_me_mask, .quad 0) +SYM_DATA(sme_me_mask, .quad 0) +SYM_DATA(sev_status, .quad 0) #endif -- cgit v1.2.3 From ed7b895f3efb5df184722f5a30f8164fcaffceb1 Mon Sep 17 00:00:00 2001 From: Joerg Roedel Date: Wed, 28 Oct 2020 17:46:56 +0100 Subject: x86/boot/compressed/64: Sanity-check CPUID results in the early #VC handler The early #VC handler which doesn't have a GHCB can only handle CPUID exit codes. It is needed by the early boot code to handle #VC exceptions raised in verify_cpu() and to get the position of the C-bit. But the CPUID information comes from the hypervisor which is untrusted and might return results which trick the guest into the no-SEV boot path with no C-bit set in the page-tables. All data written to memory would then be unencrypted and could leak sensitive data to the hypervisor. Add sanity checks to the early #VC handler to make sure the hypervisor can not pretend that SEV is disabled. [ bp: Massage a bit. ] Signed-off-by: Joerg Roedel Signed-off-by: Borislav Petkov Reviewed-by: Tom Lendacky Link: https://lkml.kernel.org/r/20201028164659.27002-3-joro@8bytes.org --- arch/x86/kernel/sev-es-shared.c | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+) (limited to 'arch') diff --git a/arch/x86/kernel/sev-es-shared.c b/arch/x86/kernel/sev-es-shared.c index 5f83ccaab877..7d04b356d44d 100644 --- a/arch/x86/kernel/sev-es-shared.c +++ b/arch/x86/kernel/sev-es-shared.c @@ -178,6 +178,32 @@ void __init do_vc_no_ghcb(struct pt_regs *regs, unsigned long exit_code) goto fail; regs->dx = val >> 32; + /* + * This is a VC handler and the #VC is only raised when SEV-ES is + * active, which means SEV must be active too. Do sanity checks on the + * CPUID results to make sure the hypervisor does not trick the kernel + * into the no-sev path. This could map sensitive data unencrypted and + * make it accessible to the hypervisor. + * + * In particular, check for: + * - Hypervisor CPUID bit + * - Availability of CPUID leaf 0x8000001f + * - SEV CPUID bit. + * + * The hypervisor might still report the wrong C-bit position, but this + * can't be checked here. + */ + + if ((fn == 1 && !(regs->cx & BIT(31)))) + /* Hypervisor bit */ + goto fail; + else if (fn == 0x80000000 && (regs->ax < 0x8000001f)) + /* SEV leaf check */ + goto fail; + else if ((fn == 0x8000001f && !(regs->ax & BIT(1)))) + /* SEV bit */ + goto fail; + /* Skip over the CPUID two-byte opcode */ regs->ip += 2; -- cgit v1.2.3 From 86ce43f7dde81562f58b24b426cef068bd9f7595 Mon Sep 17 00:00:00 2001 From: Joerg Roedel Date: Wed, 28 Oct 2020 17:46:57 +0100 Subject: x86/boot/compressed/64: Check SEV encryption in 64-bit boot-path Check whether the hypervisor reported the correct C-bit when running as an SEV guest. Using a wrong C-bit position could be used to leak sensitive data from the guest to the hypervisor. The check function is in a separate file: arch/x86/kernel/sev_verify_cbit.S so that it can be re-used in the running kernel image. [ bp: Massage. ] Signed-off-by: Joerg Roedel Signed-off-by: Borislav Petkov Reviewed-by: Tom Lendacky Link: https://lkml.kernel.org/r/20201028164659.27002-4-joro@8bytes.org --- arch/x86/boot/compressed/ident_map_64.c | 1 + arch/x86/boot/compressed/mem_encrypt.S | 4 ++ arch/x86/boot/compressed/misc.h | 2 + arch/x86/kernel/sev_verify_cbit.S | 89 +++++++++++++++++++++++++++++++++ 4 files changed, 96 insertions(+) create mode 100644 arch/x86/kernel/sev_verify_cbit.S (limited to 'arch') diff --git a/arch/x86/boot/compressed/ident_map_64.c b/arch/x86/boot/compressed/ident_map_64.c index a5e5db6ada3c..39b2eded7bc2 100644 --- a/arch/x86/boot/compressed/ident_map_64.c +++ b/arch/x86/boot/compressed/ident_map_64.c @@ -164,6 +164,7 @@ void initialize_identity_maps(void *rmode) add_identity_map(cmdline, cmdline + COMMAND_LINE_SIZE); /* Load the new page-table. */ + sev_verify_cbit(top_level_pgt); write_cr3(top_level_pgt); } diff --git a/arch/x86/boot/compressed/mem_encrypt.S b/arch/x86/boot/compressed/mem_encrypt.S index 3092ae173f94..aa561795efd1 100644 --- a/arch/x86/boot/compressed/mem_encrypt.S +++ b/arch/x86/boot/compressed/mem_encrypt.S @@ -68,6 +68,9 @@ SYM_FUNC_START(get_sev_encryption_bit) SYM_FUNC_END(get_sev_encryption_bit) .code64 + +#include "../../kernel/sev_verify_cbit.S" + SYM_FUNC_START(set_sev_encryption_mask) #ifdef CONFIG_AMD_MEM_ENCRYPT push %rbp @@ -111,4 +114,5 @@ SYM_FUNC_END(set_sev_encryption_mask) .balign 8 SYM_DATA(sme_me_mask, .quad 0) SYM_DATA(sev_status, .quad 0) +SYM_DATA(sev_check_data, .quad 0) #endif diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h index 6d31f1b4c4d1..d9a631c5973c 100644 --- a/arch/x86/boot/compressed/misc.h +++ b/arch/x86/boot/compressed/misc.h @@ -159,4 +159,6 @@ void boot_page_fault(void); void boot_stage1_vc(void); void boot_stage2_vc(void); +unsigned long sev_verify_cbit(unsigned long cr3); + #endif /* BOOT_COMPRESSED_MISC_H */ diff --git a/arch/x86/kernel/sev_verify_cbit.S b/arch/x86/kernel/sev_verify_cbit.S new file mode 100644 index 000000000000..ee04941a6546 --- /dev/null +++ b/arch/x86/kernel/sev_verify_cbit.S @@ -0,0 +1,89 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * sev_verify_cbit.S - Code for verification of the C-bit position reported + * by the Hypervisor when running with SEV enabled. + * + * Copyright (c) 2020 Joerg Roedel (jroedel@suse.de) + * + * sev_verify_cbit() is called before switching to a new long-mode page-table + * at boot. + * + * Verify that the C-bit position is correct by writing a random value to + * an encrypted memory location while on the current page-table. Then it + * switches to the new page-table to verify the memory content is still the + * same. After that it switches back to the current page-table and when the + * check succeeded it returns. If the check failed the code invalidates the + * stack pointer and goes into a hlt loop. The stack-pointer is invalidated to + * make sure no interrupt or exception can get the CPU out of the hlt loop. + * + * New page-table pointer is expected in %rdi (first parameter) + * + */ +SYM_FUNC_START(sev_verify_cbit) +#ifdef CONFIG_AMD_MEM_ENCRYPT + /* First check if a C-bit was detected */ + movq sme_me_mask(%rip), %rsi + testq %rsi, %rsi + jz 3f + + /* sme_me_mask != 0 could mean SME or SEV - Check also for SEV */ + movq sev_status(%rip), %rsi + testq %rsi, %rsi + jz 3f + + /* Save CR4 in %rsi */ + movq %cr4, %rsi + + /* Disable Global Pages */ + movq %rsi, %rdx + andq $(~X86_CR4_PGE), %rdx + movq %rdx, %cr4 + + /* + * Verified that running under SEV - now get a random value using + * RDRAND. This instruction is mandatory when running as an SEV guest. + * + * Don't bail out of the loop if RDRAND returns errors. It is better to + * prevent forward progress than to work with a non-random value here. + */ +1: rdrand %rdx + jnc 1b + + /* Store value to memory and keep it in %rdx */ + movq %rdx, sev_check_data(%rip) + + /* Backup current %cr3 value to restore it later */ + movq %cr3, %rcx + + /* Switch to new %cr3 - This might unmap the stack */ + movq %rdi, %cr3 + + /* + * Compare value in %rdx with memory location. If C-bit is incorrect + * this would read the encrypted data and make the check fail. + */ + cmpq %rdx, sev_check_data(%rip) + + /* Restore old %cr3 */ + movq %rcx, %cr3 + + /* Restore previous CR4 */ + movq %rsi, %cr4 + + /* Check CMPQ result */ + je 3f + + /* + * The check failed, prevent any forward progress to prevent ROP + * attacks, invalidate the stack and go into a hlt loop. + */ + xorq %rsp, %rsp + subq $0x1000, %rsp +2: hlt + jmp 2b +3: +#endif + /* Return page-table pointer */ + movq %rdi, %rax + ret +SYM_FUNC_END(sev_verify_cbit) -- cgit v1.2.3 From c9f09539e16e281f92a27760fdfae71e8af036f6 Mon Sep 17 00:00:00 2001 From: Joerg Roedel Date: Wed, 28 Oct 2020 17:46:58 +0100 Subject: x86/head/64: Check SEV encryption before switching to kernel page-table When SEV is enabled, the kernel requests the C-bit position again from the hypervisor to build its own page-table. Since the hypervisor is an untrusted source, the C-bit position needs to be verified before the kernel page-table is used. Call sev_verify_cbit() before writing the CR3. [ bp: Massage. ] Signed-off-by: Joerg Roedel Signed-off-by: Borislav Petkov Reviewed-by: Tom Lendacky Link: https://lkml.kernel.org/r/20201028164659.27002-5-joro@8bytes.org --- arch/x86/kernel/head_64.S | 16 ++++++++++++++++ arch/x86/mm/mem_encrypt.c | 1 + 2 files changed, 17 insertions(+) (limited to 'arch') diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S index 7eb2a1c87969..3c417734790f 100644 --- a/arch/x86/kernel/head_64.S +++ b/arch/x86/kernel/head_64.S @@ -161,6 +161,21 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL) /* Setup early boot stage 4-/5-level pagetables. */ addq phys_base(%rip), %rax + + /* + * For SEV guests: Verify that the C-bit is correct. A malicious + * hypervisor could lie about the C-bit position to perform a ROP + * attack on the guest by writing to the unencrypted stack and wait for + * the next RET instruction. + * %rsi carries pointer to realmode data and is callee-clobbered. Save + * and restore it. + */ + pushq %rsi + movq %rax, %rdi + call sev_verify_cbit + popq %rsi + + /* Switch to new page-table */ movq %rax, %cr3 /* Ensure I am executing from virtual addresses */ @@ -279,6 +294,7 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL) SYM_CODE_END(secondary_startup_64) #include "verify_cpu.S" +#include "sev_verify_cbit.S" #ifdef CONFIG_HOTPLUG_CPU /* diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c index efbb3de472df..bc0833713be9 100644 --- a/arch/x86/mm/mem_encrypt.c +++ b/arch/x86/mm/mem_encrypt.c @@ -39,6 +39,7 @@ */ u64 sme_me_mask __section(".data") = 0; u64 sev_status __section(".data") = 0; +u64 sev_check_data __section(".data") = 0; EXPORT_SYMBOL(sme_me_mask); DEFINE_STATIC_KEY_FALSE(sev_enable_key); EXPORT_SYMBOL_GPL(sev_enable_key); -- cgit v1.2.3 From 2411cd82112397bfb9d8f0f19cd46c3d71e0ce67 Mon Sep 17 00:00:00 2001 From: Joerg Roedel Date: Wed, 28 Oct 2020 17:46:59 +0100 Subject: x86/sev-es: Do not support MMIO to/from encrypted memory MMIO memory is usually not mapped encrypted, so there is no reason to support emulated MMIO when it is mapped encrypted. Prevent a possible hypervisor attack where a RAM page is mapped as an MMIO page in the nested page-table, so that any guest access to it will trigger a #VC exception and leak the data on that page to the hypervisor via the GHCB (like with valid MMIO). On the read side this attack would allow the HV to inject data into the guest. Signed-off-by: Joerg Roedel Signed-off-by: Borislav Petkov Reviewed-by: Tom Lendacky Link: https://lkml.kernel.org/r/20201028164659.27002-6-joro@8bytes.org --- arch/x86/kernel/sev-es.c | 20 +++++++++++++------- 1 file changed, 13 insertions(+), 7 deletions(-) (limited to 'arch') diff --git a/arch/x86/kernel/sev-es.c b/arch/x86/kernel/sev-es.c index 4a96726fbaf8..0bd1a0fc587e 100644 --- a/arch/x86/kernel/sev-es.c +++ b/arch/x86/kernel/sev-es.c @@ -374,8 +374,8 @@ fault: return ES_EXCEPTION; } -static bool vc_slow_virt_to_phys(struct ghcb *ghcb, struct es_em_ctxt *ctxt, - unsigned long vaddr, phys_addr_t *paddr) +static enum es_result vc_slow_virt_to_phys(struct ghcb *ghcb, struct es_em_ctxt *ctxt, + unsigned long vaddr, phys_addr_t *paddr) { unsigned long va = (unsigned long)vaddr; unsigned int level; @@ -394,15 +394,19 @@ static bool vc_slow_virt_to_phys(struct ghcb *ghcb, struct es_em_ctxt *ctxt, if (user_mode(ctxt->regs)) ctxt->fi.error_code |= X86_PF_USER; - return false; + return ES_EXCEPTION; } + if (WARN_ON_ONCE(pte_val(*pte) & _PAGE_ENC)) + /* Emulated MMIO to/from encrypted memory not supported */ + return ES_UNSUPPORTED; + pa = (phys_addr_t)pte_pfn(*pte) << PAGE_SHIFT; pa |= va & ~page_level_mask(level); *paddr = pa; - return true; + return ES_OK; } /* Include code shared with pre-decompression boot stage */ @@ -731,6 +735,7 @@ static enum es_result vc_do_mmio(struct ghcb *ghcb, struct es_em_ctxt *ctxt, { u64 exit_code, exit_info_1, exit_info_2; unsigned long ghcb_pa = __pa(ghcb); + enum es_result res; phys_addr_t paddr; void __user *ref; @@ -740,11 +745,12 @@ static enum es_result vc_do_mmio(struct ghcb *ghcb, struct es_em_ctxt *ctxt, exit_code = read ? SVM_VMGEXIT_MMIO_READ : SVM_VMGEXIT_MMIO_WRITE; - if (!vc_slow_virt_to_phys(ghcb, ctxt, (unsigned long)ref, &paddr)) { - if (!read) + res = vc_slow_virt_to_phys(ghcb, ctxt, (unsigned long)ref, &paddr); + if (res != ES_OK) { + if (res == ES_EXCEPTION && !read) ctxt->fi.error_code |= X86_PF_WRITE; - return ES_EXCEPTION; + return res; } exit_info_1 = paddr; -- cgit v1.2.3 From 00203737867c8b63ca247e71ada1b32bb0b0dd3d Mon Sep 17 00:00:00 2001 From: Krzysztof Kozlowski Date: Sun, 27 Sep 2020 18:59:42 +0200 Subject: arm64: dts: imx8mm-var-som: fix missing PMIC's interrupt line pull-up The PMIC's interrupt is level low and should be pulled up. The PMIC's device node had pinctrl-0 property but it lacked pinctrl-names which is required to apply the pin configuration. The actual problem in DTS was pointed out by Felix Radensky from Variscite. Reported-by: Felix Radensky Fixes: 5f67317bd967 ("arm64: dts: imx8mm: correct interrupt flags") Signed-off-by: Krzysztof Kozlowski Reviewed-by: Robin Gong Signed-off-by: Shawn Guo --- arch/arm64/boot/dts/freescale/imx8mm-var-som.dtsi | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-) (limited to 'arch') diff --git a/arch/arm64/boot/dts/freescale/imx8mm-var-som.dtsi b/arch/arm64/boot/dts/freescale/imx8mm-var-som.dtsi index 4107fe914d08..49082529764f 100644 --- a/arch/arm64/boot/dts/freescale/imx8mm-var-som.dtsi +++ b/arch/arm64/boot/dts/freescale/imx8mm-var-som.dtsi @@ -135,13 +135,10 @@ pmic@4b { compatible = "rohm,bd71847"; reg = <0x4b>; + pinctrl-names = "default"; pinctrl-0 = <&pinctrl_pmic>; interrupt-parent = <&gpio2>; - /* - * The interrupt is not correct. It should be level low, - * however with internal pull up this causes IRQ storm. - */ - interrupts = <8 IRQ_TYPE_EDGE_RISING>; + interrupts = <8 IRQ_TYPE_LEVEL_LOW>; rohm,reset-snvs-powered; #clock-cells = <0>; @@ -398,7 +395,7 @@ pinctrl_pmic: pmicirqgrp { fsl,pins = < - MX8MM_IOMUXC_SD1_DATA6_GPIO2_IO8 0x41 + MX8MM_IOMUXC_SD1_DATA6_GPIO2_IO8 0x141 >; }; -- cgit v1.2.3 From 0710e4385c9c978952333393396061ed1672d145 Mon Sep 17 00:00:00 2001 From: Krzysztof Kozlowski Date: Sun, 27 Sep 2020 18:59:43 +0200 Subject: arm64: dts: imx8mm-beacon-som: fix missing PMIC's interrupt line pull-up The PMIC's interrupt is level low and should be pulled up. The PMIC's device node had pinctrl-0 property but it lacked pinctrl-names which is required to apply the pin configuration. Fixes: 5f67317bd967 ("arm64: dts: imx8mm: correct interrupt flags") Fixes: 593816fa2f35 ("arm64: dts: imx: Add Beacon i.MX8m-Mini development kit") Signed-off-by: Krzysztof Kozlowski Tested-by: Adam Ford Reviewed-by: Robin Gong Signed-off-by: Shawn Guo --- arch/arm64/boot/dts/freescale/imx8mm-beacon-som.dtsi | 1 + 1 file changed, 1 insertion(+) (limited to 'arch') diff --git a/arch/arm64/boot/dts/freescale/imx8mm-beacon-som.dtsi b/arch/arm64/boot/dts/freescale/imx8mm-beacon-som.dtsi index 6de86a4f0ec4..55b36bddd513 100644 --- a/arch/arm64/boot/dts/freescale/imx8mm-beacon-som.dtsi +++ b/arch/arm64/boot/dts/freescale/imx8mm-beacon-som.dtsi @@ -72,6 +72,7 @@ pmic@4b { compatible = "rohm,bd71847"; reg = <0x4b>; + pinctrl-names = "default"; pinctrl-0 = <&pinctrl_pmic>; interrupt-parent = <&gpio1>; interrupts = <3 IRQ_TYPE_LEVEL_LOW>; -- cgit v1.2.3 From ce6fc31f388d45b9f7135169f911cd27f4d21126 Mon Sep 17 00:00:00 2001 From: Krzysztof Kozlowski Date: Sun, 27 Sep 2020 18:59:44 +0200 Subject: arm64: dts: imx8mm-evk: fix missing PMIC's interrupt line pull-up The PMIC's interrupt is level low and should be pulled up. The PMIC's device node had pinctrl-0 property but it lacked pinctrl-names which is required to apply the pin configuration. Fixes: 5f67317bd967 ("arm64: dts: imx8mm: correct interrupt flags") Fixes: aa71d0648318 ("arm64: dts: imx8mm: Split the imx8mm evk board dts to a common dtsi") Signed-off-by: Krzysztof Kozlowski Reviewed-by: Robin Gong Signed-off-by: Shawn Guo --- arch/arm64/boot/dts/freescale/imx8mm-evk.dtsi | 1 + 1 file changed, 1 insertion(+) (limited to 'arch') diff --git a/arch/arm64/boot/dts/freescale/imx8mm-evk.dtsi b/arch/arm64/boot/dts/freescale/imx8mm-evk.dtsi index f305a530ff6f..521eb3a5a12e 100644 --- a/arch/arm64/boot/dts/freescale/imx8mm-evk.dtsi +++ b/arch/arm64/boot/dts/freescale/imx8mm-evk.dtsi @@ -121,6 +121,7 @@ pmic@4b { compatible = "rohm,bd71847"; reg = <0x4b>; + pinctrl-names = "default"; pinctrl-0 = <&pinctrl_pmic>; interrupt-parent = <&gpio1>; interrupts = <3 IRQ_TYPE_LEVEL_LOW>; -- cgit v1.2.3 From 34a1c5e39b670fd7a324b5620c9ad4ac80c2f018 Mon Sep 17 00:00:00 2001 From: Krzysztof Kozlowski Date: Sun, 27 Sep 2020 18:59:45 +0200 Subject: arm64: dts: imx8mn-var-som: fix missing PMIC's interrupt line pull-up The PMIC's interrupt is level low and should be pulled up. The PMIC's device node had pinctrl-0 property but it lacked pinctrl-names which is required to apply the pin configuration. The actual problem in DTS was pointed out by Felix Radensky from Variscite. Reported-by: Felix Radensky Fixes: ade0176dd8a0 ("arm64: dts: imx8mn-var-som: Add Variscite VAR-SOM-MX8MN System on Module") Signed-off-by: Krzysztof Kozlowski Reviewed-by: Robin Gong Signed-off-by: Shawn Guo --- arch/arm64/boot/dts/freescale/imx8mn-var-som.dtsi | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-) (limited to 'arch') diff --git a/arch/arm64/boot/dts/freescale/imx8mn-var-som.dtsi b/arch/arm64/boot/dts/freescale/imx8mn-var-som.dtsi index a2d0190921e4..7f356edf9f91 100644 --- a/arch/arm64/boot/dts/freescale/imx8mn-var-som.dtsi +++ b/arch/arm64/boot/dts/freescale/imx8mn-var-som.dtsi @@ -116,13 +116,10 @@ pmic@4b { compatible = "rohm,bd71847"; reg = <0x4b>; + pinctrl-names = "default"; pinctrl-0 = <&pinctrl_pmic>; interrupt-parent = <&gpio2>; - /* - * The interrupt is not correct. It should be level low, - * however with internal pull up this causes IRQ storm. - */ - interrupts = <8 IRQ_TYPE_EDGE_RISING>; + interrupts = <8 IRQ_TYPE_LEVEL_LOW>; rohm,reset-snvs-powered; regulators { @@ -388,7 +385,7 @@ pinctrl_pmic: pmicirqgrp { fsl,pins = < - MX8MN_IOMUXC_SD1_DATA6_GPIO2_IO8 0x101 + MX8MN_IOMUXC_SD1_DATA6_GPIO2_IO8 0x141 >; }; -- cgit v1.2.3 From 4d20fa1dac2e3cf5aa0cd317b3436f4fda680b04 Mon Sep 17 00:00:00 2001 From: Krzysztof Kozlowski Date: Sun, 27 Sep 2020 18:59:46 +0200 Subject: arm64: dts: imx8mn-ddr4-evk: fix missing PMIC's interrupt line pull-up The PMIC's interrupt is level low and should be pulled up. The PMIC's device node had pinctrl-0 property but it lacked pinctrl-names which is required to apply the pin configuration. Fixes: 4153f7811a9b ("arm64: dts: imx8mn: correct interrupt flags") Fixes: 3e44dd09736d ("arm64: dts: imx8mn-ddr4-evk: Add rohm,bd71847 PMIC support") Signed-off-by: Krzysztof Kozlowski Reviewed-by: Robin Gong Signed-off-by: Shawn Guo --- arch/arm64/boot/dts/freescale/imx8mn-ddr4-evk.dts | 1 + 1 file changed, 1 insertion(+) (limited to 'arch') diff --git a/arch/arm64/boot/dts/freescale/imx8mn-ddr4-evk.dts b/arch/arm64/boot/dts/freescale/imx8mn-ddr4-evk.dts index 46e76cf32b2f..7dfee715a2c4 100644 --- a/arch/arm64/boot/dts/freescale/imx8mn-ddr4-evk.dts +++ b/arch/arm64/boot/dts/freescale/imx8mn-ddr4-evk.dts @@ -53,6 +53,7 @@ pmic@4b { compatible = "rohm,bd71847"; reg = <0x4b>; + pinctrl-names = "default"; pinctrl-0 = <&pinctrl_pmic>; interrupt-parent = <&gpio1>; interrupts = <3 IRQ_TYPE_LEVEL_LOW>; -- cgit v1.2.3 From 6efb099a1da4e954409e241b47257a637120e5c2 Mon Sep 17 00:00:00 2001 From: Krzysztof Kozlowski Date: Sun, 27 Sep 2020 18:59:47 +0200 Subject: arm64: dts: imx8mn-evk: fix missing PMIC's interrupt line pull-up The PMIC's interrupt is level low and should be pulled up. The PMIC's device node had pinctrl-0 property but it lacked pinctrl-names which is required to apply the pin configuration. Fixes: 4153f7811a9b ("arm64: dts: imx8mn: correct interrupt flags") Fixes: 6386156eb279 ("arm64: dts: imx8mn-evk: add pca9450 for i.mx8mn-evk board") Signed-off-by: Krzysztof Kozlowski Reviewed-by: Robin Gong Signed-off-by: Shawn Guo --- arch/arm64/boot/dts/freescale/imx8mn-evk.dts | 1 + 1 file changed, 1 insertion(+) (limited to 'arch') diff --git a/arch/arm64/boot/dts/freescale/imx8mn-evk.dts b/arch/arm64/boot/dts/freescale/imx8mn-evk.dts index 707d8486b4d8..8311b95dee49 100644 --- a/arch/arm64/boot/dts/freescale/imx8mn-evk.dts +++ b/arch/arm64/boot/dts/freescale/imx8mn-evk.dts @@ -18,6 +18,7 @@ pmic: pmic@25 { compatible = "nxp,pca9450b"; reg = <0x25>; + pinctrl-names = "default"; pinctrl-0 = <&pinctrl_pmic>; interrupt-parent = <&gpio1>; interrupts = <3 IRQ_TYPE_LEVEL_LOW>; -- cgit v1.2.3 From d92454287ee25d78f1caac3734a1864f8a5a5275 Mon Sep 17 00:00:00 2001 From: Biwen Li Date: Tue, 29 Sep 2020 09:30:21 +0800 Subject: arm64: dts: fsl: fix endianness issue of rcpm MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Add little-endian property to RCPM node (for ls1028a,ls1088a,ls208xa), otherwise RCPM driver will program hardware with incorrect setting, causing system (such as LS1028ARDB) failed to be waked by wakeup source. Fixes: 791c88ca5713 (“arm64: dts: ls1028a: Add ftm_alarm0 DT node”) Fixes: f4fe3a8665495 (“arm64: dts: layerscape: add ftm_alarm0 node”) Signed-off-by: Biwen Li Signed-off-by: Ran Wang Acked-by: Li Yang Signed-off-by: Shawn Guo --- arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi | 1 + arch/arm64/boot/dts/freescale/fsl-ls1088a.dtsi | 1 + arch/arm64/boot/dts/freescale/fsl-ls208xa.dtsi | 1 + 3 files changed, 3 insertions(+) (limited to 'arch') diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi b/arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi index 73e4f9466887..7a6fb7e1fb82 100644 --- a/arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi +++ b/arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi @@ -1012,6 +1012,7 @@ compatible = "fsl,ls1028a-rcpm", "fsl,qoriq-rcpm-2.1+"; reg = <0x0 0x1e34040 0x0 0x1c>; #fsl,rcpm-wakeup-cells = <7>; + little-endian; }; ftm_alarm0: timer@2800000 { diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1088a.dtsi b/arch/arm64/boot/dts/freescale/fsl-ls1088a.dtsi index ff5805206a28..692d8f4a206d 100644 --- a/arch/arm64/boot/dts/freescale/fsl-ls1088a.dtsi +++ b/arch/arm64/boot/dts/freescale/fsl-ls1088a.dtsi @@ -805,6 +805,7 @@ compatible = "fsl,ls1088a-rcpm", "fsl,qoriq-rcpm-2.1+"; reg = <0x0 0x1e34040 0x0 0x18>; #fsl,rcpm-wakeup-cells = <6>; + little-endian; }; ftm_alarm0: timer@2800000 { diff --git a/arch/arm64/boot/dts/freescale/fsl-ls208xa.dtsi b/arch/arm64/boot/dts/freescale/fsl-ls208xa.dtsi index bf72918fe545..e7abb74bd816 100644 --- a/arch/arm64/boot/dts/freescale/fsl-ls208xa.dtsi +++ b/arch/arm64/boot/dts/freescale/fsl-ls208xa.dtsi @@ -892,6 +892,7 @@ compatible = "fsl,ls208xa-rcpm", "fsl,qoriq-rcpm-2.1+"; reg = <0x0 0x1e34040 0x0 0x18>; #fsl,rcpm-wakeup-cells = <6>; + little-endian; }; ftm_alarm0: timer@2800000 { -- cgit v1.2.3 From 054b5d97448714ae4a0bcd6f36b0515ac7aed21e Mon Sep 17 00:00:00 2001 From: Madalin Bucur Date: Mon, 5 Oct 2020 15:46:39 +0300 Subject: arm64: dts: fsl: DPAA FMan DMA operations are coherent Although the DPAA 1 FMan operations are coherent, the device tree node for the FMan does not indicate that, resulting in a needless loss of performance. Adding the missing dma-coherent property. Fixes: 1ffbecdd8321 ("arm64: dts: add DPAA FMan nodes") Signed-off-by: Madalin Bucur Tested-by: Camelia Groza Acked-by: Li Yang Signed-off-by: Shawn Guo --- arch/arm64/boot/dts/freescale/qoriq-fman3-0.dtsi | 1 + 1 file changed, 1 insertion(+) (limited to 'arch') diff --git a/arch/arm64/boot/dts/freescale/qoriq-fman3-0.dtsi b/arch/arm64/boot/dts/freescale/qoriq-fman3-0.dtsi index 8bc6caa9167d..4338db14c5da 100644 --- a/arch/arm64/boot/dts/freescale/qoriq-fman3-0.dtsi +++ b/arch/arm64/boot/dts/freescale/qoriq-fman3-0.dtsi @@ -19,6 +19,7 @@ fman0: fman@1a00000 { clock-names = "fmanclk"; fsl,qman-channel-range = <0x800 0x10>; ptimer-handle = <&ptp_timer0>; + dma-coherent; muram@0 { compatible = "fsl,fman-muram"; -- cgit v1.2.3 From 587258edd94c305077923ec458e04c032fca83e6 Mon Sep 17 00:00:00 2001 From: Adam Ford Date: Wed, 7 Oct 2020 08:02:37 -0500 Subject: arm64: dts: imx8mm-beacon-som: Fix Choppy BT audio When streaming bluetooth audio, the sound is choppy due to the fact that the default baud rate of the HCI interface is too slow to handle 16-bit stereo at 48KHz. The Bluetooth chip is capable of up to 4M baud on the serial port, so this patch sets the max-speed to 4000000 in order to properly stream audio over the Bluetooth. Fixes: 593816fa2f35 ("arm64: dts: imx: Add Beacon i.MX8m-Mini development kit") Signed-off-by: Adam Ford Signed-off-by: Shawn Guo --- arch/arm64/boot/dts/freescale/imx8mm-beacon-som.dtsi | 1 + 1 file changed, 1 insertion(+) (limited to 'arch') diff --git a/arch/arm64/boot/dts/freescale/imx8mm-beacon-som.dtsi b/arch/arm64/boot/dts/freescale/imx8mm-beacon-som.dtsi index 55b36bddd513..b88c3c99b007 100644 --- a/arch/arm64/boot/dts/freescale/imx8mm-beacon-som.dtsi +++ b/arch/arm64/boot/dts/freescale/imx8mm-beacon-som.dtsi @@ -211,6 +211,7 @@ host-wakeup-gpios = <&gpio2 8 GPIO_ACTIVE_HIGH>; device-wakeup-gpios = <&gpio2 7 GPIO_ACTIVE_HIGH>; clocks = <&osc_32k>; + max-speed = <4000000>; clock-names = "extclk"; }; }; -- cgit v1.2.3 From cf5abb0132193767c07c83e06f91b777d22ba495 Mon Sep 17 00:00:00 2001 From: Adam Ford Date: Thu, 8 Oct 2020 13:33:00 -0500 Subject: arm64: dts imx8mn: Remove non-existent USB OTG2 According to the i.MX8MN TRM, there is only one OTG port. The address for OTG2 is reserved on Nano. This patch removes the non-existent OTG2, usbphynop2, and the usbmisc2 nodes. Fixes: 6c3debcbae47 ("arm64: dts: freescale: Add i.MX8MN dtsi support") Signed-off-by: Adam Ford Reviewed-by: Krzysztof Kozlowski Signed-off-by: Shawn Guo --- arch/arm64/boot/dts/freescale/imx8mn.dtsi | 30 ------------------------------ 1 file changed, 30 deletions(-) (limited to 'arch') diff --git a/arch/arm64/boot/dts/freescale/imx8mn.dtsi b/arch/arm64/boot/dts/freescale/imx8mn.dtsi index 746faf1cf2fb..16c7202885d7 100644 --- a/arch/arm64/boot/dts/freescale/imx8mn.dtsi +++ b/arch/arm64/boot/dts/freescale/imx8mn.dtsi @@ -790,28 +790,6 @@ #index-cells = <1>; reg = <0x32e40200 0x200>; }; - - usbotg2: usb@32e50000 { - compatible = "fsl,imx8mn-usb", "fsl,imx7d-usb"; - reg = <0x32e50000 0x200>; - interrupts = ; - clocks = <&clk IMX8MN_CLK_USB1_CTRL_ROOT>; - clock-names = "usb1_ctrl_root_clk"; - assigned-clocks = <&clk IMX8MN_CLK_USB_BUS>, - <&clk IMX8MN_CLK_USB_CORE_REF>; - assigned-clock-parents = <&clk IMX8MN_SYS_PLL2_500M>, - <&clk IMX8MN_SYS_PLL1_100M>; - fsl,usbphy = <&usbphynop2>; - fsl,usbmisc = <&usbmisc2 0>; - status = "disabled"; - }; - - usbmisc2: usbmisc@32e50200 { - compatible = "fsl,imx8mn-usbmisc", "fsl,imx7d-usbmisc"; - #index-cells = <1>; - reg = <0x32e50200 0x200>; - }; - }; dma_apbh: dma-controller@33000000 { @@ -876,12 +854,4 @@ assigned-clock-parents = <&clk IMX8MN_SYS_PLL1_100M>; clock-names = "main_clk"; }; - - usbphynop2: usbphynop2 { - compatible = "usb-nop-xceiv"; - clocks = <&clk IMX8MN_CLK_USB_PHY_REF>; - assigned-clocks = <&clk IMX8MN_CLK_USB_PHY_REF>; - assigned-clock-parents = <&clk IMX8MN_SYS_PLL1_100M>; - clock-names = "main_clk"; - }; }; -- cgit v1.2.3 From b0c0aa7aa4b919e02e0a24aa3a46dfbf2bbc34dc Mon Sep 17 00:00:00 2001 From: David Bauer Date: Mon, 26 Oct 2020 17:27:21 +0100 Subject: arm64: dts: rockchip: fix NanoPi R2S GMAC clock name This commit fixes the name for the GMAC clock to gmac_clkin, as this is the name of the clock provided by the rk3328-clk driver. Without this commit, the GMAC will not work in TX direction. Fixes: f1ec83f880db ("arm64: dts: rockchip: Add support for FriendlyARM NanoPi R2S") Suggested-by: Tobias Waldvogel Signed-off-by: David Bauer Link: https://lore.kernel.org/r/20201026162721.70672-1-mail@david-bauer.net Signed-off-by: Heiko Stuebner --- arch/arm64/boot/dts/rockchip/rk3328-nanopi-r2s.dts | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/arm64/boot/dts/rockchip/rk3328-nanopi-r2s.dts b/arch/arm64/boot/dts/rockchip/rk3328-nanopi-r2s.dts index be7a31d81632..2ee07d15a6e3 100644 --- a/arch/arm64/boot/dts/rockchip/rk3328-nanopi-r2s.dts +++ b/arch/arm64/boot/dts/rockchip/rk3328-nanopi-r2s.dts @@ -20,7 +20,7 @@ gmac_clk: gmac-clock { compatible = "fixed-clock"; clock-frequency = <125000000>; - clock-output-names = "gmac_clk"; + clock-output-names = "gmac_clkin"; #clock-cells = <0>; }; -- cgit v1.2.3 From 01fe332800d0d2f94337b45c1973f4cf28ae6195 Mon Sep 17 00:00:00 2001 From: Maciej Matuszczyk Date: Fri, 23 Oct 2020 20:16:29 +0200 Subject: arm64: dts: rockchip: Remove system-power-controller from pmic on Odroid Go Advance This fixes a poweroff issue when this is supposed to happen via PSCI. Signed-off-by: Maciej Matuszczyk Link: https://lore.kernel.org/r/20201023181629.119727-1-maccraft123mc@gmail.com Signed-off-by: Heiko Stuebner --- arch/arm64/boot/dts/rockchip/rk3326-odroid-go2.dts | 1 - 1 file changed, 1 deletion(-) (limited to 'arch') diff --git a/arch/arm64/boot/dts/rockchip/rk3326-odroid-go2.dts b/arch/arm64/boot/dts/rockchip/rk3326-odroid-go2.dts index 35bd6b904b9c..337681038519 100644 --- a/arch/arm64/boot/dts/rockchip/rk3326-odroid-go2.dts +++ b/arch/arm64/boot/dts/rockchip/rk3326-odroid-go2.dts @@ -243,7 +243,6 @@ interrupts = ; pinctrl-names = "default"; pinctrl-0 = <&pmic_int>; - rockchip,system-power-controller; wakeup-source; #clock-cells = <1>; clock-output-names = "rk808-clkout1", "xin32k"; -- cgit v1.2.3 From 7dd8f0ba88fce98e2953267a66af74c6f4792a56 Mon Sep 17 00:00:00 2001 From: Sergey Matyukevich Date: Sat, 24 Oct 2020 23:11:20 +0300 Subject: arm: dts: imx6qdl-udoo: fix rgmii phy-mode for ksz9031 phy Commit bcf3440c6dd7 ("net: phy: micrel: add phy-mode support for the KSZ9031 PHY") fixed micrel phy driver adding proper support for phy modes. Adapt imx6q-udoo board phy settings : explicitly set required delay configuration using "rgmii-id". Fixes: cbd54fe0b2bc ("ARM: dts: imx6dl-udoo: Add board support based off imx6q-udoo") Signed-off-by: Sergey Matyukevich Reviewed-by: Fabio Estevam Signed-off-by: Shawn Guo --- arch/arm/boot/dts/imx6qdl-udoo.dtsi | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/arm/boot/dts/imx6qdl-udoo.dtsi b/arch/arm/boot/dts/imx6qdl-udoo.dtsi index 828dd20cd27d..d07d8f83456d 100644 --- a/arch/arm/boot/dts/imx6qdl-udoo.dtsi +++ b/arch/arm/boot/dts/imx6qdl-udoo.dtsi @@ -98,7 +98,7 @@ &fec { pinctrl-names = "default"; pinctrl-0 = <&pinctrl_enet>; - phy-mode = "rgmii"; + phy-mode = "rgmii-id"; status = "okay"; }; -- cgit v1.2.3 From f8b5a33707c9a19ec905d2826be0acd151997a09 Mon Sep 17 00:00:00 2001 From: Andrew Lunn Date: Fri, 30 Oct 2020 01:55:28 +0100 Subject: ARM: dts: vf610-zii-dev-rev-b: Fix MDIO over clocking The ZII devel B board has two generations of Marvell Switches. The mv88e6352 supports an MDIO clock of 12MHz. However the older 88e6185 does not like 12MHz, and often fails to probe. Reduce the clock speed to 5MHz, which seems to work reliably. Cc: Chris Healy Fixes: b955387667ec ("ARM: dts: ZII: update MDIO speed and preamble") Signed-off-by: Andrew Lunn Reviewed-by: Chris Healy Signed-off-by: Shawn Guo --- arch/arm/boot/dts/vf610-zii-dev-rev-b.dts | 3 +++ 1 file changed, 3 insertions(+) (limited to 'arch') diff --git a/arch/arm/boot/dts/vf610-zii-dev-rev-b.dts b/arch/arm/boot/dts/vf610-zii-dev-rev-b.dts index e500911ce0a5..6f1e0f0d4f0a 100644 --- a/arch/arm/boot/dts/vf610-zii-dev-rev-b.dts +++ b/arch/arm/boot/dts/vf610-zii-dev-rev-b.dts @@ -406,6 +406,9 @@ }; }; +&mdio1 { + clock-frequency = <5000000>; +}; &iomuxc { pinctrl_gpio_e6185_eeprom_sel: pinctrl-gpio-e6185-eeprom-spi0 { -- cgit v1.2.3 From e402599e5e5e0b2758d7766fd9f6d7953d4ccd85 Mon Sep 17 00:00:00 2001 From: Oleksij Rempel Date: Mon, 12 Oct 2020 09:18:16 +0200 Subject: ARM: dts: imx6q-prti6q: fix PHY address Due to bug in the bootloader, the PHY has floating address and may randomly change on each PHY reset. To avoid it, the updated bootloader with the following patch[0] should be used: | ARM: protonic: disable on-die termination to fix PHY bootstrapping | | If on-die termination is enabled, the RXC pin of iMX6 will be pulled | high. Since we already have an 10K pull-down on board, the RXC level on | PHY reset will be ~800mV, which is mostly interpreted as 1. On some | reboots we get 0 instead and kernel can't detect the PHY properly. | | Since the default 0x020e07ac value is 0, it is sufficient to remove this | entry from the affected imxcfg files. | | Since we get stable 0 on pin PHYADDR[2], the PHY address is changed from | 4 to 0. With latest bootloader update, the PHY address will be fixed to "0". [0] https://git.pengutronix.de/cgit/barebox/commit/?id=93f7dcf631edfcda19e7757b28d66017ea274b81 Fixes: 0d446a505592 ("ARM: dts: add Protonic PRTI6Q board") Signed-off-by: Oleksij Rempel Signed-off-by: Shawn Guo --- arch/arm/boot/dts/imx6q-prti6q.dts | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'arch') diff --git a/arch/arm/boot/dts/imx6q-prti6q.dts b/arch/arm/boot/dts/imx6q-prti6q.dts index d112b50f8c5d..b4605edfd2ab 100644 --- a/arch/arm/boot/dts/imx6q-prti6q.dts +++ b/arch/arm/boot/dts/imx6q-prti6q.dts @@ -213,8 +213,8 @@ #size-cells = <0>; /* Microchip KSZ9031RNX PHY */ - rgmii_phy: ethernet-phy@4 { - reg = <4>; + rgmii_phy: ethernet-phy@0 { + reg = <0>; interrupts-extended = <&gpio1 28 IRQ_TYPE_LEVEL_LOW>; reset-gpios = <&gpio1 25 GPIO_ACTIVE_LOW>; reset-assert-us = <10000>; -- cgit v1.2.3 From fd552e0542b4532483289cce48fdbd27b692984b Mon Sep 17 00:00:00 2001 From: Qian Cai Date: Wed, 28 Oct 2020 11:27:17 -0400 Subject: powerpc/eeh_cache: Fix a possible debugfs deadlock Lockdep complains that a possible deadlock below in eeh_addr_cache_show() because it is acquiring a lock with IRQ enabled, but eeh_addr_cache_insert_dev() needs to acquire the same lock with IRQ disabled. Let's just make eeh_addr_cache_show() acquire the lock with IRQ disabled as well. CPU0 CPU1 ---- ---- lock(&pci_io_addr_cache_root.piar_lock); local_irq_disable(); lock(&tp->lock); lock(&pci_io_addr_cache_root.piar_lock); lock(&tp->lock); *** DEADLOCK *** lock_acquire+0x140/0x5f0 _raw_spin_lock_irqsave+0x64/0xb0 eeh_addr_cache_insert_dev+0x48/0x390 eeh_probe_device+0xb8/0x1a0 pnv_pcibios_bus_add_device+0x3c/0x80 pcibios_bus_add_device+0x118/0x290 pci_bus_add_device+0x28/0xe0 pci_bus_add_devices+0x54/0xb0 pcibios_init+0xc4/0x124 do_one_initcall+0xac/0x528 kernel_init_freeable+0x35c/0x3fc kernel_init+0x24/0x148 ret_from_kernel_thread+0x5c/0x80 lock_acquire+0x140/0x5f0 _raw_spin_lock+0x4c/0x70 eeh_addr_cache_show+0x38/0x110 seq_read+0x1a0/0x660 vfs_read+0xc8/0x1f0 ksys_read+0x74/0x130 system_call_exception+0xf8/0x1d0 system_call_common+0xe8/0x218 Fixes: 5ca85ae6318d ("powerpc/eeh_cache: Add a way to dump the EEH address cache") Signed-off-by: Qian Cai Reviewed-by: Oliver O'Halloran Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20201028152717.8967-1-cai@redhat.com --- arch/powerpc/kernel/eeh_cache.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) (limited to 'arch') diff --git a/arch/powerpc/kernel/eeh_cache.c b/arch/powerpc/kernel/eeh_cache.c index 6b50bf15d8c1..bf3270426d82 100644 --- a/arch/powerpc/kernel/eeh_cache.c +++ b/arch/powerpc/kernel/eeh_cache.c @@ -264,8 +264,9 @@ static int eeh_addr_cache_show(struct seq_file *s, void *v) { struct pci_io_addr_range *piar; struct rb_node *n; + unsigned long flags; - spin_lock(&pci_io_addr_cache_root.piar_lock); + spin_lock_irqsave(&pci_io_addr_cache_root.piar_lock, flags); for (n = rb_first(&pci_io_addr_cache_root.rb_root); n; n = rb_next(n)) { piar = rb_entry(n, struct pci_io_addr_range, rb_node); @@ -273,7 +274,7 @@ static int eeh_addr_cache_show(struct seq_file *s, void *v) (piar->flags & IORESOURCE_IO) ? "i/o" : "mem", &piar->addr_lo, &piar->addr_hi, pci_name(piar->pcidev)); } - spin_unlock(&pci_io_addr_cache_root.piar_lock); + spin_unlock_irqrestore(&pci_io_addr_cache_root.piar_lock, flags); return 0; } -- cgit v1.2.3 From 99f070b62322a4b8c1252952735806d09eb44b68 Mon Sep 17 00:00:00 2001 From: Qian Cai Date: Wed, 28 Oct 2020 14:23:34 -0400 Subject: powerpc/smp: Call rcu_cpu_starting() earlier The call to rcu_cpu_starting() in start_secondary() is not early enough in the CPU-hotplug onlining process, which results in lockdep splats as follows (with CONFIG_PROVE_RCU_LIST=y): WARNING: suspicious RCU usage ----------------------------- kernel/locking/lockdep.c:3497 RCU-list traversed in non-reader section!! other info that might help us debug this: RCU used illegally from offline CPU! rcu_scheduler_active = 1, debug_locks = 1 no locks held by swapper/1/0. Call Trace: dump_stack+0xec/0x144 (unreliable) lockdep_rcu_suspicious+0x128/0x14c __lock_acquire+0x1060/0x1c60 lock_acquire+0x140/0x5f0 _raw_spin_lock_irqsave+0x64/0xb0 clockevents_register_device+0x74/0x270 register_decrementer_clockevent+0x94/0x110 start_secondary+0x134/0x800 start_secondary_prolog+0x10/0x14 This is avoided by adding a call to rcu_cpu_starting() near the beginning of the start_secondary() function. Note that the raw_smp_processor_id() is required in order to avoid calling into lockdep before RCU has declared the CPU to be watched for readers. It's safe to call rcu_cpu_starting() in the arch code as well as later in generic code, as explained by Paul: It uses a per-CPU variable so that RCU pays attention only to the first call to rcu_cpu_starting() if there is more than one of them. This is even intentional, due to there being a generic arch-independent call to rcu_cpu_starting() in notify_cpu_starting(). So multiple calls to rcu_cpu_starting() are fine by design. Fixes: 4d004099a668 ("lockdep: Fix lockdep recursion") Signed-off-by: Qian Cai Acked-by: Paul E. McKenney [mpe: Add Fixes tag, reword slightly & expand change log] Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20201028182334.13466-1-cai@redhat.com --- arch/powerpc/kernel/smp.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c index 3c6b9822f978..8c2857cbd960 100644 --- a/arch/powerpc/kernel/smp.c +++ b/arch/powerpc/kernel/smp.c @@ -1393,13 +1393,14 @@ static void add_cpu_to_masks(int cpu) /* Activate a secondary processor. */ void start_secondary(void *unused) { - unsigned int cpu = smp_processor_id(); + unsigned int cpu = raw_smp_processor_id(); mmgrab(&init_mm); current->active_mm = &init_mm; smp_store_cpu_info(cpu); set_dec(tb_ticks_per_jiffy); + rcu_cpu_starting(cpu); preempt_disable(); cpu_callin_map[cpu] = 1; -- cgit v1.2.3 From f126b6702e7354d6247a36f20b9172457af5c15a Mon Sep 17 00:00:00 2001 From: Dinh Nguyen Date: Sun, 1 Nov 2020 14:02:56 -0600 Subject: arm64: dts: agilex/stratix10: Fix qspi node compatible The QSPI flash node needs to have the required "jedec,spi-nor" in the compatible string. Fixes: 0cb140d07fc7 ("arm64: dts: stratix10: Add QSPI support for Stratix10") Cc: stable@vger.kernel.org Suggested-by: Vignesh Raghavendra Signed-off-by: Dinh Nguyen --- arch/arm64/boot/dts/altera/socfpga_stratix10_socdk.dts | 2 +- arch/arm64/boot/dts/altera/socfpga_stratix10_socdk_nand.dts | 2 +- arch/arm64/boot/dts/intel/socfpga_agilex_socdk.dts | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) (limited to 'arch') diff --git a/arch/arm64/boot/dts/altera/socfpga_stratix10_socdk.dts b/arch/arm64/boot/dts/altera/socfpga_stratix10_socdk.dts index feadd21bc0dc..46e558ab7729 100644 --- a/arch/arm64/boot/dts/altera/socfpga_stratix10_socdk.dts +++ b/arch/arm64/boot/dts/altera/socfpga_stratix10_socdk.dts @@ -159,7 +159,7 @@ flash@0 { #address-cells = <1>; #size-cells = <1>; - compatible = "n25q00a"; + compatible = "micron,mt25qu02g", "jedec,spi-nor"; reg = <0>; spi-max-frequency = <100000000>; diff --git a/arch/arm64/boot/dts/altera/socfpga_stratix10_socdk_nand.dts b/arch/arm64/boot/dts/altera/socfpga_stratix10_socdk_nand.dts index c07966740e14..f9b4a39683cf 100644 --- a/arch/arm64/boot/dts/altera/socfpga_stratix10_socdk_nand.dts +++ b/arch/arm64/boot/dts/altera/socfpga_stratix10_socdk_nand.dts @@ -192,7 +192,7 @@ flash@0 { #address-cells = <1>; #size-cells = <1>; - compatible = "n25q00a"; + compatible = "micron,mt25qu02g", "jedec,spi-nor"; reg = <0>; spi-max-frequency = <100000000>; diff --git a/arch/arm64/boot/dts/intel/socfpga_agilex_socdk.dts b/arch/arm64/boot/dts/intel/socfpga_agilex_socdk.dts index 96c50d48289d..a7a83f29f00b 100644 --- a/arch/arm64/boot/dts/intel/socfpga_agilex_socdk.dts +++ b/arch/arm64/boot/dts/intel/socfpga_agilex_socdk.dts @@ -110,7 +110,7 @@ flash@0 { #address-cells = <1>; #size-cells = <1>; - compatible = "mt25qu02g"; + compatible = "micron,mt25qu02g", "jedec,spi-nor"; reg = <0>; spi-max-frequency = <100000000>; -- cgit v1.2.3 From 328d2168ca524d501fc4b133d6be076142bd305c Mon Sep 17 00:00:00 2001 From: Vineet Gupta Date: Tue, 27 Oct 2020 15:01:17 -0700 Subject: ARC: stack unwinding: avoid indefinite looping Currently stack unwinder is a while(1) loop which relies on the dwarf unwinder to signal termination, which in turn relies on dwarf info to do so. This in theory could cause an infinite loop if the dwarf info was somehow messed up or the register contents were etc. This fix thus detects the excessive looping and breaks the loop. | Mem: 26184K used, 1009136K free, 0K shrd, 0K buff, 14416K cached | CPU: 0.0% usr 72.8% sys 0.0% nic 27.1% idle 0.0% io 0.0% irq 0.0% sirq | Load average: 4.33 2.60 1.11 2/74 139 | PID PPID USER STAT VSZ %VSZ CPU %CPU COMMAND | 133 2 root SWN 0 0.0 3 22.9 [rcu_torture_rea] | 132 2 root SWN 0 0.0 0 22.0 [rcu_torture_rea] | 131 2 root SWN 0 0.0 3 21.5 [rcu_torture_rea] | 126 2 root RW 0 0.0 2 5.4 [rcu_torture_wri] | 129 2 root SWN 0 0.0 0 0.2 [rcu_torture_fak] | 137 2 root SW 0 0.0 0 0.2 [rcu_torture_cbf] | 127 2 root SWN 0 0.0 0 0.1 [rcu_torture_fak] | 138 115 root R 1464 0.1 2 0.1 top | 130 2 root SWN 0 0.0 0 0.1 [rcu_torture_fak] | 128 2 root SWN 0 0.0 0 0.1 [rcu_torture_fak] | 115 1 root S 1472 0.1 1 0.0 -/bin/sh | 104 1 root S 1464 0.1 0 0.0 inetd | 1 0 root S 1456 0.1 2 0.0 init | 78 1 root S 1456 0.1 0 0.0 syslogd -O /var/log/messages | 134 2 root SW 0 0.0 2 0.0 [rcu_torture_sta] | 10 2 root IW 0 0.0 1 0.0 [rcu_preempt] | 88 2 root IW 0 0.0 1 0.0 [kworker/1:1-eve] | 66 2 root IW 0 0.0 2 0.0 [kworker/2:2-eve] | 39 2 root IW 0 0.0 2 0.0 [kworker/2:1-eve] | unwinder looping too long, aborting ! Cc: Signed-off-by: Vineet Gupta --- arch/arc/kernel/stacktrace.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/arc/kernel/stacktrace.c b/arch/arc/kernel/stacktrace.c index feba91c9d969..b23986f98450 100644 --- a/arch/arc/kernel/stacktrace.c +++ b/arch/arc/kernel/stacktrace.c @@ -112,7 +112,7 @@ arc_unwind_core(struct task_struct *tsk, struct pt_regs *regs, int (*consumer_fn) (unsigned int, void *), void *arg) { #ifdef CONFIG_ARC_DW2_UNWIND - int ret = 0; + int ret = 0, cnt = 0; unsigned int address; struct unwind_frame_info frame_info; @@ -132,6 +132,11 @@ arc_unwind_core(struct task_struct *tsk, struct pt_regs *regs, break; frame_info.regs.r63 = frame_info.regs.r31; + + if (cnt++ > 128) { + printk("unwinder looping too long, aborting !\n"); + return 0; + } } return address; /* return the last address it saw */ -- cgit v1.2.3 From 3b57533b460c8dc22a432684b7e8d22571f34d2e Mon Sep 17 00:00:00 2001 From: Vineet Gupta Date: Thu, 29 Oct 2020 19:18:34 -0700 Subject: ARC: [plat-hsdk] Remap CCMs super early in asm boot trampoline ARC HSDK platform stopped booting on released v5.10-rc1, getting stuck in startup of non master SMP cores. This was bisected to upstream commit 7fef431be9c9ac25 "(mm/page_alloc: place pages to tail in __free_pages_core())" That commit itself is harmless, it just exposed a subtle assumption in our platform code (hence CC'ing linux-mm just as FYI in case some other arches / platforms trip on it). The upstream commit is semantically disruptive as it reverses the order of page allocations (actually it can be good test for hardware verification to exercise different memory patterns altogether). For ARC HSDK platform that meant a remapped memory region (pertaining to unused Closely Coupled Memory) started getting used early for dynamice allocations, while not effectively remapped on all the cores, triggering memory error exception on those cores. The fix is to move the CCM remapping from early platform code to to early core boot code. And while it is undesirable to riddle common boot code with platform quirks, there is no other way to do this since the faltering code involves setting up stack itself so even function calls are not allowed at that point. If anyone is interested, all the gory details can be found at Link below. Link: https://github.com/foss-for-synopsys-dwc-arc-processors/linux/issues/32 Cc: David Hildenbrand Cc: linux-mm@kvack.org Signed-off-by: Vineet Gupta --- arch/arc/kernel/head.S | 17 ++++++++++++++++- arch/arc/plat-hsdk/platform.c | 17 ----------------- 2 files changed, 16 insertions(+), 18 deletions(-) (limited to 'arch') diff --git a/arch/arc/kernel/head.S b/arch/arc/kernel/head.S index 17fd1ed700cc..9152782444b5 100644 --- a/arch/arc/kernel/head.S +++ b/arch/arc/kernel/head.S @@ -67,7 +67,22 @@ sr r5, [ARC_REG_LPB_CTRL] 1: #endif /* CONFIG_ARC_LPB_DISABLE */ -#endif + + /* On HSDK, CCMs need to remapped super early */ +#ifdef CONFIG_ARC_SOC_HSDK + mov r6, 0x60000000 + lr r5, [ARC_REG_ICCM_BUILD] + breq r5, 0, 1f + sr r6, [ARC_REG_AUX_ICCM] +1: + lr r5, [ARC_REG_DCCM_BUILD] + breq r5, 0, 2f + sr r6, [ARC_REG_AUX_DCCM] +2: +#endif /* CONFIG_ARC_SOC_HSDK */ + +#endif /* CONFIG_ISA_ARCV2 */ + ; Config DSP_CTRL properly, so kernel may use integer multiply, ; multiply-accumulate, and divide operations DSP_EARLY_INIT diff --git a/arch/arc/plat-hsdk/platform.c b/arch/arc/plat-hsdk/platform.c index 0b63fc095b99..b3ea1fa11f87 100644 --- a/arch/arc/plat-hsdk/platform.c +++ b/arch/arc/plat-hsdk/platform.c @@ -17,22 +17,6 @@ int arc_hsdk_axi_dmac_coherent __section(".data") = 0; #define ARC_CCM_UNUSED_ADDR 0x60000000 -static void __init hsdk_init_per_cpu(unsigned int cpu) -{ - /* - * By default ICCM is mapped to 0x7z while this area is used for - * kernel virtual mappings, so move it to currently unused area. - */ - if (cpuinfo_arc700[cpu].iccm.sz) - write_aux_reg(ARC_REG_AUX_ICCM, ARC_CCM_UNUSED_ADDR); - - /* - * By default DCCM is mapped to 0x8z while this area is used by kernel, - * so move it to currently unused area. - */ - if (cpuinfo_arc700[cpu].dccm.sz) - write_aux_reg(ARC_REG_AUX_DCCM, ARC_CCM_UNUSED_ADDR); -} #define ARC_PERIPHERAL_BASE 0xf0000000 #define CREG_BASE (ARC_PERIPHERAL_BASE + 0x1000) @@ -339,5 +323,4 @@ static const char *hsdk_compat[] __initconst = { MACHINE_START(SIMULATION, "hsdk") .dt_compat = hsdk_compat, .init_early = hsdk_init_early, - .init_per_cpu = hsdk_init_per_cpu, MACHINE_END -- cgit v1.2.3 From 2a13c13b39a8aea4c69a31549e4cb0094f30103b Mon Sep 17 00:00:00 2001 From: Vanshidhar Konda Date: Fri, 30 Oct 2020 10:30:50 -0700 Subject: arm64: NUMA: Kconfig: Increase NODES_SHIFT to 4 The current arm64 default config limits max NUMA nodes available on system to 4 (NODES_SHIFT = 2). Today's arm64 systems can reach or exceed 16 NUMA nodes. To accomodate current hardware and to fit NODES_SHIFT within page flags on arm64, increase NODES_SHIFT to 4. Signed-off-by: Vanshidhar Konda Acked-by: Catalin Marinas Link: https://lore.kernel.org/r/20201020173409.1266576-1-vanshikonda@os.amperecomputing.com/ Link: https://lore.kernel.org/r/20201030173050.1182876-1-vanshikonda@os.amperecomputing.com Signed-off-by: Will Deacon --- arch/arm64/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 1d466addb078..1515f6f153a0 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -1002,7 +1002,7 @@ config NUMA config NODES_SHIFT int "Maximum NUMA Nodes (as a power of 2)" range 1 10 - default "2" + default "4" depends on NEED_MULTIPLE_NODES help Specify the maximum number of NUMA Nodes available on the target -- cgit v1.2.3 From 7ee31a3aa8f490c6507bc4294df6b70bed1c593e Mon Sep 17 00:00:00 2001 From: Jean-Philippe Brucker Date: Tue, 3 Nov 2020 14:49:01 +0100 Subject: arm64: kprobes: Use BRK instead of single-step when executing instructions out-of-line Commit 36dadef23fcc ("kprobes: Init kprobes in early_initcall") enabled using kprobes from early_initcall. Unfortunately at this point the hardware debug infrastructure is not operational. The OS lock may still be locked, and the hardware watchpoints may have unknown values when kprobe enables debug monitors to single-step instructions. Rather than using hardware single-step, append a BRK instruction after the instruction to be executed out-of-line. Fixes: 36dadef23fcc ("kprobes: Init kprobes in early_initcall") Suggested-by: Will Deacon Signed-off-by: Jean-Philippe Brucker Acked-by: Masami Hiramatsu Link: https://lore.kernel.org/r/20201103134900.337243-1-jean-philippe@linaro.org Signed-off-by: Will Deacon --- arch/arm64/include/asm/brk-imm.h | 2 + arch/arm64/include/asm/debug-monitors.h | 1 + arch/arm64/include/asm/kprobes.h | 2 +- arch/arm64/kernel/probes/kprobes.c | 69 +++++++++++---------------------- 4 files changed, 27 insertions(+), 47 deletions(-) (limited to 'arch') diff --git a/arch/arm64/include/asm/brk-imm.h b/arch/arm64/include/asm/brk-imm.h index e3d47b52161d..ec7720dbe2c8 100644 --- a/arch/arm64/include/asm/brk-imm.h +++ b/arch/arm64/include/asm/brk-imm.h @@ -10,6 +10,7 @@ * #imm16 values used for BRK instruction generation * 0x004: for installing kprobes * 0x005: for installing uprobes + * 0x006: for kprobe software single-step * Allowed values for kgdb are 0x400 - 0x7ff * 0x100: for triggering a fault on purpose (reserved) * 0x400: for dynamic BRK instruction @@ -19,6 +20,7 @@ */ #define KPROBES_BRK_IMM 0x004 #define UPROBES_BRK_IMM 0x005 +#define KPROBES_BRK_SS_IMM 0x006 #define FAULT_BRK_IMM 0x100 #define KGDB_DYN_DBG_BRK_IMM 0x400 #define KGDB_COMPILED_DBG_BRK_IMM 0x401 diff --git a/arch/arm64/include/asm/debug-monitors.h b/arch/arm64/include/asm/debug-monitors.h index 0b298f48f5bf..657c921fd784 100644 --- a/arch/arm64/include/asm/debug-monitors.h +++ b/arch/arm64/include/asm/debug-monitors.h @@ -53,6 +53,7 @@ /* kprobes BRK opcodes with ESR encoding */ #define BRK64_OPCODE_KPROBES (AARCH64_BREAK_MON | (KPROBES_BRK_IMM << 5)) +#define BRK64_OPCODE_KPROBES_SS (AARCH64_BREAK_MON | (KPROBES_BRK_SS_IMM << 5)) /* uprobes BRK opcodes with ESR encoding */ #define BRK64_OPCODE_UPROBES (AARCH64_BREAK_MON | (UPROBES_BRK_IMM << 5)) diff --git a/arch/arm64/include/asm/kprobes.h b/arch/arm64/include/asm/kprobes.h index 97e511d645a2..8699ce30f587 100644 --- a/arch/arm64/include/asm/kprobes.h +++ b/arch/arm64/include/asm/kprobes.h @@ -16,7 +16,7 @@ #include #define __ARCH_WANT_KPROBES_INSN_SLOT -#define MAX_INSN_SIZE 1 +#define MAX_INSN_SIZE 2 #define flush_insn_slot(p) do { } while (0) #define kretprobe_blacklist_size 0 diff --git a/arch/arm64/kernel/probes/kprobes.c b/arch/arm64/kernel/probes/kprobes.c index deba738142ed..f11a1a1f7026 100644 --- a/arch/arm64/kernel/probes/kprobes.c +++ b/arch/arm64/kernel/probes/kprobes.c @@ -36,25 +36,16 @@ DEFINE_PER_CPU(struct kprobe_ctlblk, kprobe_ctlblk); static void __kprobes post_kprobe_handler(struct kprobe_ctlblk *, struct pt_regs *); -static int __kprobes patch_text(kprobe_opcode_t *addr, u32 opcode) -{ - void *addrs[1]; - u32 insns[1]; - - addrs[0] = addr; - insns[0] = opcode; - - return aarch64_insn_patch_text(addrs, insns, 1); -} - static void __kprobes arch_prepare_ss_slot(struct kprobe *p) { + kprobe_opcode_t *addr = p->ainsn.api.insn; + void *addrs[] = {addr, addr + 1}; + u32 insns[] = {p->opcode, BRK64_OPCODE_KPROBES_SS}; + /* prepare insn slot */ - patch_text(p->ainsn.api.insn, p->opcode); + aarch64_insn_patch_text(addrs, insns, 2); - flush_icache_range((uintptr_t) (p->ainsn.api.insn), - (uintptr_t) (p->ainsn.api.insn) + - MAX_INSN_SIZE * sizeof(kprobe_opcode_t)); + flush_icache_range((uintptr_t)addr, (uintptr_t)(addr + MAX_INSN_SIZE)); /* * Needs restoring of return address after stepping xol. @@ -128,13 +119,18 @@ void *alloc_insn_page(void) /* arm kprobe: install breakpoint in text */ void __kprobes arch_arm_kprobe(struct kprobe *p) { - patch_text(p->addr, BRK64_OPCODE_KPROBES); + void *addr = p->addr; + u32 insn = BRK64_OPCODE_KPROBES; + + aarch64_insn_patch_text(&addr, &insn, 1); } /* disarm kprobe: remove breakpoint from text */ void __kprobes arch_disarm_kprobe(struct kprobe *p) { - patch_text(p->addr, p->opcode); + void *addr = p->addr; + + aarch64_insn_patch_text(&addr, &p->opcode, 1); } void __kprobes arch_remove_kprobe(struct kprobe *p) @@ -163,20 +159,15 @@ static void __kprobes set_current_kprobe(struct kprobe *p) } /* - * Interrupts need to be disabled before single-step mode is set, and not - * reenabled until after single-step mode ends. - * Without disabling interrupt on local CPU, there is a chance of - * interrupt occurrence in the period of exception return and start of - * out-of-line single-step, that result in wrongly single stepping - * into the interrupt handler. + * Mask all of DAIF while executing the instruction out-of-line, to keep things + * simple and avoid nesting exceptions. Interrupts do have to be disabled since + * the kprobe state is per-CPU and doesn't get migrated. */ static void __kprobes kprobes_save_local_irqflag(struct kprobe_ctlblk *kcb, struct pt_regs *regs) { kcb->saved_irqflag = regs->pstate & DAIF_MASK; - regs->pstate |= PSR_I_BIT; - /* Unmask PSTATE.D for enabling software step exceptions. */ - regs->pstate &= ~PSR_D_BIT; + regs->pstate |= DAIF_MASK; } static void __kprobes kprobes_restore_local_irqflag(struct kprobe_ctlblk *kcb, @@ -219,10 +210,7 @@ static void __kprobes setup_singlestep(struct kprobe *p, slot = (unsigned long)p->ainsn.api.insn; set_ss_context(kcb, slot); /* mark pending ss */ - - /* IRQs and single stepping do not mix well. */ kprobes_save_local_irqflag(kcb, regs); - kernel_enable_single_step(regs); instruction_pointer_set(regs, slot); } else { /* insn simulation */ @@ -273,12 +261,8 @@ post_kprobe_handler(struct kprobe_ctlblk *kcb, struct pt_regs *regs) } /* call post handler */ kcb->kprobe_status = KPROBE_HIT_SSDONE; - if (cur->post_handler) { - /* post_handler can hit breakpoint and single step - * again, so we enable D-flag for recursive exception. - */ + if (cur->post_handler) cur->post_handler(cur, regs, 0); - } reset_current_kprobe(); } @@ -302,8 +286,6 @@ int __kprobes kprobe_fault_handler(struct pt_regs *regs, unsigned int fsr) if (!instruction_pointer(regs)) BUG(); - kernel_disable_single_step(); - if (kcb->kprobe_status == KPROBE_REENTER) restore_previous_kprobe(kcb); else @@ -365,10 +347,6 @@ static void __kprobes kprobe_handler(struct pt_regs *regs) * pre-handler and it returned non-zero, it will * modify the execution path and no need to single * stepping. Let's just reset current kprobe and exit. - * - * pre_handler can hit a breakpoint and can step thru - * before return, keep PSTATE D-flag enabled until - * pre_handler return back. */ if (!p->pre_handler || !p->pre_handler(p, regs)) { setup_singlestep(p, regs, kcb, 0); @@ -399,7 +377,7 @@ kprobe_ss_hit(struct kprobe_ctlblk *kcb, unsigned long addr) } static int __kprobes -kprobe_single_step_handler(struct pt_regs *regs, unsigned int esr) +kprobe_breakpoint_ss_handler(struct pt_regs *regs, unsigned int esr) { struct kprobe_ctlblk *kcb = get_kprobe_ctlblk(); int retval; @@ -409,16 +387,15 @@ kprobe_single_step_handler(struct pt_regs *regs, unsigned int esr) if (retval == DBG_HOOK_HANDLED) { kprobes_restore_local_irqflag(kcb, regs); - kernel_disable_single_step(); - post_kprobe_handler(kcb, regs); } return retval; } -static struct step_hook kprobes_step_hook = { - .fn = kprobe_single_step_handler, +static struct break_hook kprobes_break_ss_hook = { + .imm = KPROBES_BRK_SS_IMM, + .fn = kprobe_breakpoint_ss_handler, }; static int __kprobes @@ -486,7 +463,7 @@ int __kprobes arch_trampoline_kprobe(struct kprobe *p) int __init arch_init_kprobes(void) { register_kernel_break_hook(&kprobes_break_hook); - register_kernel_step_hook(&kprobes_step_hook); + register_kernel_break_hook(&kprobes_break_ss_hook); return 0; } -- cgit v1.2.3 From b0e98aa9c411585eb586b2fa98873c936735008e Mon Sep 17 00:00:00 2001 From: Gerald Schaefer Date: Tue, 20 Oct 2020 20:20:07 +0200 Subject: s390/mm: make pmd/pud_deref() large page aware pmd/pud_deref() assume that they will never operate on large pmd/pud entries, and therefore only use the non-large _xxx_ENTRY_ORIGIN mask. With commit 9ec8fa8dc331b ("s390/vmemmap: extend modify_pagetable() to handle vmemmap"), that assumption is no longer true, at least for pmd_deref(). In theory, we could end up with wrong addresses because some of the non-address bits of a large entry would not be masked out. In practice, this does not (yet) show any impact, because vmemmap_free() is currently never used for s390. Fix pmd/pud_deref() to check for the entry type and use the _xxx_ENTRY_ORIGIN_LARGE mask for large entries. While at it, also move pmd/pud_pfn() around, in order to avoid code duplication, because they do the same thing. Fixes: 9ec8fa8dc331b ("s390/vmemmap: extend modify_pagetable() to handle vmemmap") Cc: # 5.9 Signed-off-by: Gerald Schaefer Reviewed-by: Alexander Gordeev Signed-off-by: Heiko Carstens --- arch/s390/include/asm/pgtable.h | 52 ++++++++++++++++++++++++----------------- 1 file changed, 30 insertions(+), 22 deletions(-) (limited to 'arch') diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h index 6b8d8c69b1a1..b5dbae78969b 100644 --- a/arch/s390/include/asm/pgtable.h +++ b/arch/s390/include/asm/pgtable.h @@ -692,16 +692,6 @@ static inline int pud_large(pud_t pud) return !!(pud_val(pud) & _REGION3_ENTRY_LARGE); } -static inline unsigned long pud_pfn(pud_t pud) -{ - unsigned long origin_mask; - - origin_mask = _REGION_ENTRY_ORIGIN; - if (pud_large(pud)) - origin_mask = _REGION3_ENTRY_ORIGIN_LARGE; - return (pud_val(pud) & origin_mask) >> PAGE_SHIFT; -} - #define pmd_leaf pmd_large static inline int pmd_large(pmd_t pmd) { @@ -747,16 +737,6 @@ static inline int pmd_none(pmd_t pmd) return pmd_val(pmd) == _SEGMENT_ENTRY_EMPTY; } -static inline unsigned long pmd_pfn(pmd_t pmd) -{ - unsigned long origin_mask; - - origin_mask = _SEGMENT_ENTRY_ORIGIN; - if (pmd_large(pmd)) - origin_mask = _SEGMENT_ENTRY_ORIGIN_LARGE; - return (pmd_val(pmd) & origin_mask) >> PAGE_SHIFT; -} - #define pmd_write pmd_write static inline int pmd_write(pmd_t pmd) { @@ -1238,11 +1218,39 @@ static inline pte_t mk_pte(struct page *page, pgprot_t pgprot) #define pud_index(address) (((address) >> PUD_SHIFT) & (PTRS_PER_PUD-1)) #define pmd_index(address) (((address) >> PMD_SHIFT) & (PTRS_PER_PMD-1)) -#define pmd_deref(pmd) (pmd_val(pmd) & _SEGMENT_ENTRY_ORIGIN) -#define pud_deref(pud) (pud_val(pud) & _REGION_ENTRY_ORIGIN) #define p4d_deref(pud) (p4d_val(pud) & _REGION_ENTRY_ORIGIN) #define pgd_deref(pgd) (pgd_val(pgd) & _REGION_ENTRY_ORIGIN) +static inline unsigned long pmd_deref(pmd_t pmd) +{ + unsigned long origin_mask; + + origin_mask = _SEGMENT_ENTRY_ORIGIN; + if (pmd_large(pmd)) + origin_mask = _SEGMENT_ENTRY_ORIGIN_LARGE; + return pmd_val(pmd) & origin_mask; +} + +static inline unsigned long pmd_pfn(pmd_t pmd) +{ + return pmd_deref(pmd) >> PAGE_SHIFT; +} + +static inline unsigned long pud_deref(pud_t pud) +{ + unsigned long origin_mask; + + origin_mask = _REGION_ENTRY_ORIGIN; + if (pud_large(pud)) + origin_mask = _REGION3_ENTRY_ORIGIN_LARGE; + return pud_val(pud) & origin_mask; +} + +static inline unsigned long pud_pfn(pud_t pud) +{ + return pud_deref(pud) >> PAGE_SHIFT; +} + /* * The pgd_offset function *always* adds the index for the top-level * region/segment table. This is done to get a sequence like the -- cgit v1.2.3 From e99198661ecd02545b926ba40d1e91626bb29647 Mon Sep 17 00:00:00 2001 From: Heiko Carstens Date: Mon, 26 Oct 2020 16:15:22 +0100 Subject: s390/vdso: remove empty unused file Signed-off-by: Heiko Carstens --- arch/s390/include/asm/vdso/vdso.h | 0 1 file changed, 0 insertions(+), 0 deletions(-) delete mode 100644 arch/s390/include/asm/vdso/vdso.h (limited to 'arch') diff --git a/arch/s390/include/asm/vdso/vdso.h b/arch/s390/include/asm/vdso/vdso.h deleted file mode 100644 index e69de29bb2d1..000000000000 -- cgit v1.2.3 From cfef9aa69a7382a205661a83e621114b37824474 Mon Sep 17 00:00:00 2001 From: Heiko Carstens Date: Mon, 26 Oct 2020 16:15:22 +0100 Subject: s390/vdso: remove unused constants Signed-off-by: Heiko Carstens --- arch/s390/kernel/asm-offsets.c | 8 -------- 1 file changed, 8 deletions(-) (limited to 'arch') diff --git a/arch/s390/kernel/asm-offsets.c b/arch/s390/kernel/asm-offsets.c index ece58f2217cb..2012c1cf0853 100644 --- a/arch/s390/kernel/asm-offsets.c +++ b/arch/s390/kernel/asm-offsets.c @@ -61,14 +61,6 @@ int main(void) BLANK(); OFFSET(__VDSO_GETCPU_VAL, vdso_per_cpu_data, getcpu_val); BLANK(); - /* constants used by the vdso */ - DEFINE(__CLOCK_REALTIME, CLOCK_REALTIME); - DEFINE(__CLOCK_MONOTONIC, CLOCK_MONOTONIC); - DEFINE(__CLOCK_REALTIME_COARSE, CLOCK_REALTIME_COARSE); - DEFINE(__CLOCK_MONOTONIC_COARSE, CLOCK_MONOTONIC_COARSE); - DEFINE(__CLOCK_THREAD_CPUTIME_ID, CLOCK_THREAD_CPUTIME_ID); - DEFINE(__CLOCK_COARSE_RES, LOW_RES_NSEC); - BLANK(); /* idle data offsets */ OFFSET(__CLOCK_IDLE_ENTER, s390_idle_data, clock_idle_enter); OFFSET(__CLOCK_IDLE_EXIT, s390_idle_data, clock_idle_exit); -- cgit v1.2.3 From c3d9cdca73d0e49f01a71cdc477a09b04b1b30fc Mon Sep 17 00:00:00 2001 From: Heiko Carstens Date: Mon, 26 Oct 2020 19:52:04 +0100 Subject: s390: update defconfigs Signed-off-by: Heiko Carstens --- arch/s390/configs/debug_defconfig | 10 ++++++---- arch/s390/configs/defconfig | 9 +++++---- arch/s390/configs/zfcpdump_defconfig | 2 +- 3 files changed, 12 insertions(+), 9 deletions(-) (limited to 'arch') diff --git a/arch/s390/configs/debug_defconfig b/arch/s390/configs/debug_defconfig index 0784bf3caf43..a4d3c578fbd8 100644 --- a/arch/s390/configs/debug_defconfig +++ b/arch/s390/configs/debug_defconfig @@ -93,9 +93,10 @@ CONFIG_CLEANCACHE=y CONFIG_FRONTSWAP=y CONFIG_CMA_DEBUG=y CONFIG_CMA_DEBUGFS=y +CONFIG_CMA_AREAS=7 CONFIG_MEM_SOFT_DIRTY=y CONFIG_ZSWAP=y -CONFIG_ZSMALLOC=m +CONFIG_ZSMALLOC=y CONFIG_ZSMALLOC_STAT=y CONFIG_DEFERRED_STRUCT_PAGE_INIT=y CONFIG_IDLE_PAGE_TRACKING=y @@ -378,7 +379,6 @@ CONFIG_NETLINK_DIAG=m CONFIG_CGROUP_NET_PRIO=y CONFIG_BPF_JIT=y CONFIG_NET_PKTGEN=m -# CONFIG_NET_DROP_MONITOR is not set CONFIG_PCI=y # CONFIG_PCIEASPM is not set CONFIG_PCI_DEBUG=y @@ -386,7 +386,7 @@ CONFIG_HOTPLUG_PCI=y CONFIG_HOTPLUG_PCI_S390=y CONFIG_DEVTMPFS=y CONFIG_CONNECTOR=y -CONFIG_ZRAM=m +CONFIG_ZRAM=y CONFIG_BLK_DEV_LOOP=m CONFIG_BLK_DEV_CRYPTOLOOP=m CONFIG_BLK_DEV_DRBD=m @@ -689,6 +689,7 @@ CONFIG_CRYPTO_TEST=m CONFIG_CRYPTO_DH=m CONFIG_CRYPTO_ECDH=m CONFIG_CRYPTO_ECRDSA=m +CONFIG_CRYPTO_SM2=m CONFIG_CRYPTO_CURVE25519=m CONFIG_CRYPTO_GCM=y CONFIG_CRYPTO_CHACHA20POLY1305=m @@ -709,7 +710,6 @@ CONFIG_CRYPTO_RMD160=m CONFIG_CRYPTO_RMD256=m CONFIG_CRYPTO_RMD320=m CONFIG_CRYPTO_SHA3=m -CONFIG_CRYPTO_SM3=m CONFIG_CRYPTO_TGR192=m CONFIG_CRYPTO_WP512=m CONFIG_CRYPTO_AES_TI=m @@ -753,6 +753,7 @@ CONFIG_CRYPTO_DES_S390=m CONFIG_CRYPTO_AES_S390=m CONFIG_CRYPTO_GHASH_S390=m CONFIG_CRYPTO_CRC32_S390=y +CONFIG_CRYPTO_DEV_VIRTIO=m CONFIG_CORDIC=m CONFIG_CRC32_SELFTEST=y CONFIG_CRC4=m @@ -829,6 +830,7 @@ CONFIG_NETDEV_NOTIFIER_ERROR_INJECT=m CONFIG_FAULT_INJECTION=y CONFIG_FAILSLAB=y CONFIG_FAIL_PAGE_ALLOC=y +CONFIG_FAULT_INJECTION_USERCOPY=y CONFIG_FAIL_MAKE_REQUEST=y CONFIG_FAIL_IO_TIMEOUT=y CONFIG_FAIL_FUTEX=y diff --git a/arch/s390/configs/defconfig b/arch/s390/configs/defconfig index 905bc8c4cfaf..17d5df2c1eff 100644 --- a/arch/s390/configs/defconfig +++ b/arch/s390/configs/defconfig @@ -87,9 +87,10 @@ CONFIG_KSM=y CONFIG_TRANSPARENT_HUGEPAGE=y CONFIG_CLEANCACHE=y CONFIG_FRONTSWAP=y +CONFIG_CMA_AREAS=7 CONFIG_MEM_SOFT_DIRTY=y CONFIG_ZSWAP=y -CONFIG_ZSMALLOC=m +CONFIG_ZSMALLOC=y CONFIG_ZSMALLOC_STAT=y CONFIG_DEFERRED_STRUCT_PAGE_INIT=y CONFIG_IDLE_PAGE_TRACKING=y @@ -371,7 +372,6 @@ CONFIG_NETLINK_DIAG=m CONFIG_CGROUP_NET_PRIO=y CONFIG_BPF_JIT=y CONFIG_NET_PKTGEN=m -# CONFIG_NET_DROP_MONITOR is not set CONFIG_PCI=y # CONFIG_PCIEASPM is not set CONFIG_HOTPLUG_PCI=y @@ -379,7 +379,7 @@ CONFIG_HOTPLUG_PCI_S390=y CONFIG_UEVENT_HELPER=y CONFIG_DEVTMPFS=y CONFIG_CONNECTOR=y -CONFIG_ZRAM=m +CONFIG_ZRAM=y CONFIG_BLK_DEV_LOOP=m CONFIG_BLK_DEV_CRYPTOLOOP=m CONFIG_BLK_DEV_DRBD=m @@ -680,6 +680,7 @@ CONFIG_CRYPTO_TEST=m CONFIG_CRYPTO_DH=m CONFIG_CRYPTO_ECDH=m CONFIG_CRYPTO_ECRDSA=m +CONFIG_CRYPTO_SM2=m CONFIG_CRYPTO_CURVE25519=m CONFIG_CRYPTO_GCM=y CONFIG_CRYPTO_CHACHA20POLY1305=m @@ -701,7 +702,6 @@ CONFIG_CRYPTO_RMD160=m CONFIG_CRYPTO_RMD256=m CONFIG_CRYPTO_RMD320=m CONFIG_CRYPTO_SHA3=m -CONFIG_CRYPTO_SM3=m CONFIG_CRYPTO_TGR192=m CONFIG_CRYPTO_WP512=m CONFIG_CRYPTO_AES_TI=m @@ -745,6 +745,7 @@ CONFIG_CRYPTO_DES_S390=m CONFIG_CRYPTO_AES_S390=m CONFIG_CRYPTO_GHASH_S390=m CONFIG_CRYPTO_CRC32_S390=y +CONFIG_CRYPTO_DEV_VIRTIO=m CONFIG_CORDIC=m CONFIG_PRIME_NUMBERS=m CONFIG_CRC4=m diff --git a/arch/s390/configs/zfcpdump_defconfig b/arch/s390/configs/zfcpdump_defconfig index 8f67c55625f9..a302630341ef 100644 --- a/arch/s390/configs/zfcpdump_defconfig +++ b/arch/s390/configs/zfcpdump_defconfig @@ -17,11 +17,11 @@ CONFIG_HZ_100=y # CONFIG_CHSC_SCH is not set # CONFIG_SCM_BUS is not set CONFIG_CRASH_DUMP=y -# CONFIG_SECCOMP is not set # CONFIG_PFAULT is not set # CONFIG_S390_HYPFS_FS is not set # CONFIG_VIRTUALIZATION is not set # CONFIG_S390_GUEST is not set +# CONFIG_SECCOMP is not set CONFIG_PARTITION_ADVANCED=y CONFIG_IBM_PARTITION=y # CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS is not set -- cgit v1.2.3 From de5d9dae150ca1c1b5c7676711a9ca139d1a8dec Mon Sep 17 00:00:00 2001 From: Qian Cai Date: Wed, 28 Oct 2020 14:27:42 -0400 Subject: s390/smp: move rcu_cpu_starting() earlier The call to rcu_cpu_starting() in smp_init_secondary() is not early enough in the CPU-hotplug onlining process, which results in lockdep splats as follows: WARNING: suspicious RCU usage ----------------------------- kernel/locking/lockdep.c:3497 RCU-list traversed in non-reader section!! other info that might help us debug this: RCU used illegally from offline CPU! rcu_scheduler_active = 1, debug_locks = 1 no locks held by swapper/1/0. Call Trace: show_stack+0x158/0x1f0 dump_stack+0x1f2/0x238 __lock_acquire+0x2640/0x4dd0 lock_acquire+0x3a8/0xd08 _raw_spin_lock_irqsave+0xc0/0xf0 clockevents_register_device+0xa8/0x528 init_cpu_timer+0x33e/0x468 smp_init_secondary+0x11a/0x328 smp_start_secondary+0x82/0x88 This is avoided by moving the call to rcu_cpu_starting up near the beginning of the smp_init_secondary() function. Note that the raw_smp_processor_id() is required in order to avoid calling into lockdep before RCU has declared the CPU to be watched for readers. Link: https://lore.kernel.org/lkml/160223032121.7002.1269740091547117869.tip-bot2@tip-bot2/ Signed-off-by: Qian Cai Acked-by: Paul E. McKenney Signed-off-by: Heiko Carstens --- arch/s390/kernel/smp.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/s390/kernel/smp.c b/arch/s390/kernel/smp.c index ebfe86d097f0..390d97daa2b3 100644 --- a/arch/s390/kernel/smp.c +++ b/arch/s390/kernel/smp.c @@ -855,13 +855,14 @@ void __init smp_detect_cpus(void) static void smp_init_secondary(void) { - int cpu = smp_processor_id(); + int cpu = raw_smp_processor_id(); S390_lowcore.last_update_clock = get_tod_clock(); restore_access_regs(S390_lowcore.access_regs_save_area); set_cpu_flag(CIF_ASCE_PRIMARY); set_cpu_flag(CIF_ASCE_SECONDARY); cpu_init(); + rcu_cpu_starting(cpu); preempt_disable(); init_cpu_timer(); vtime_init(); -- cgit v1.2.3 From 0b2ca2c7d0c9e2731d01b6c862375d44a7e13923 Mon Sep 17 00:00:00 2001 From: Niklas Schnelle Date: Mon, 2 Nov 2020 11:33:04 +0100 Subject: s390/pci: fix hot-plug of PCI function missing bus Under some circumstances in particular with "Reconfigure I/O Path" a zPCI function may first appear in Standby through a PCI event with PEC 0x0302 which initially makes it visible to the zPCI subsystem, Only after that is it configured with a zPCI event with PEC 0x0301. If the zbus is still missing a PCI function zero (devfn == 0) when the PCI event 0x0301 is handled zdev->zbus->bus is still NULL and gets dereferenced in common code. Check for this case and enable but don't scan the zPCI function. This matches what would happen if we immediately got the 0x0301 configuration request or the function was included in CLP List PCI. In all cases the PCI functions with devfn != 0 will be scanned once function 0 appears. Fixes: 3047766bc6ec ("s390/pci: fix enabling a reserved PCI function") Cc: # 5.8 Signed-off-by: Niklas Schnelle Acked-by: Pierre Morel Signed-off-by: Heiko Carstens --- arch/s390/pci/pci_event.c | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'arch') diff --git a/arch/s390/pci/pci_event.c b/arch/s390/pci/pci_event.c index d33f21545dfd..9a6bae503fe6 100644 --- a/arch/s390/pci/pci_event.c +++ b/arch/s390/pci/pci_event.c @@ -101,6 +101,10 @@ static void __zpci_event_availability(struct zpci_ccdf_avail *ccdf) if (ret) break; + /* the PCI function will be scanned once function 0 appears */ + if (!zdev->zbus->bus) + break; + pdev = pci_scan_single_device(zdev->zbus->bus, zdev->devfn); if (!pdev) break; -- cgit v1.2.3 From b9bc36704cca500e2b41be4c5bf615c1d7ddc3ce Mon Sep 17 00:00:00 2001 From: Ard Biesheuvel Date: Sat, 31 Oct 2020 11:43:45 +0200 Subject: ARM, xtensa: highmem: avoid clobbering non-page aligned memory reservations free_highpages() iterates over the free memblock regions in high memory, and marks each page as available for the memory management system. Until commit cddb5ddf2b76 ("arm, xtensa: simplify initialization of high memory pages") it rounded beginning of each region upwards and end of each region downwards. However, after that commit free_highmem() rounds the beginning and end of each region downwards, and we may end up freeing a page that is memblock_reserve()d, resulting in memory corruption. Restore the original rounding of the region boundaries to avoid freeing reserved pages. Fixes: cddb5ddf2b76 ("arm, xtensa: simplify initialization of high memory pages") Link: https://lore.kernel.org/r/20201029110334.4118-1-ardb@kernel.org/ Link: https://lore.kernel.org/r/20201031094345.6984-1-rppt@kernel.org Signed-off-by: Ard Biesheuvel Co-developed-by: Mike Rapoport Signed-off-by: Mike Rapoport Acked-by: Max Filippov --- arch/arm/mm/init.c | 4 ++-- arch/xtensa/mm/init.c | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) (limited to 'arch') diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c index d57112a276f5..c23dbf8bebee 100644 --- a/arch/arm/mm/init.c +++ b/arch/arm/mm/init.c @@ -354,8 +354,8 @@ static void __init free_highpages(void) /* set highmem page free */ for_each_free_mem_range(i, NUMA_NO_NODE, MEMBLOCK_NONE, &range_start, &range_end, NULL) { - unsigned long start = PHYS_PFN(range_start); - unsigned long end = PHYS_PFN(range_end); + unsigned long start = PFN_UP(range_start); + unsigned long end = PFN_DOWN(range_end); /* Ignore complete lowmem entries */ if (end <= max_low) diff --git a/arch/xtensa/mm/init.c b/arch/xtensa/mm/init.c index c6fc83efee0c..8731b7ad9308 100644 --- a/arch/xtensa/mm/init.c +++ b/arch/xtensa/mm/init.c @@ -89,8 +89,8 @@ static void __init free_highpages(void) /* set highmem page free */ for_each_free_mem_range(i, NUMA_NO_NODE, MEMBLOCK_NONE, &range_start, &range_end, NULL) { - unsigned long start = PHYS_PFN(range_start); - unsigned long end = PHYS_PFN(range_end); + unsigned long start = PFN_UP(range_start); + unsigned long end = PFN_DOWN(range_end); /* Ignore complete lowmem entries */ if (end <= max_low) -- cgit v1.2.3 From 4d6ffa27b8e5116c0abb318790fd01d4e12d75e6 Mon Sep 17 00:00:00 2001 From: Fangrui Song Date: Mon, 2 Nov 2020 17:23:58 -0800 Subject: x86/lib: Change .weak to SYM_FUNC_START_WEAK for arch/x86/lib/mem*_64.S Commit 393f203f5fd5 ("x86_64: kasan: add interceptors for memset/memmove/memcpy functions") added .weak directives to arch/x86/lib/mem*_64.S instead of changing the existing ENTRY macros to WEAK. This can lead to the assembly snippet .weak memcpy ... .globl memcpy which will produce a STB_WEAK memcpy with GNU as but STB_GLOBAL memcpy with LLVM's integrated assembler before LLVM 12. LLVM 12 (since https://reviews.llvm.org/D90108) will error on such an overridden symbol binding. Commit ef1e03152cb0 ("x86/asm: Make some functions local") changed ENTRY in arch/x86/lib/memcpy_64.S to SYM_FUNC_START_LOCAL, which was ineffective due to the preceding .weak directive. Use the appropriate SYM_FUNC_START_WEAK instead. Fixes: 393f203f5fd5 ("x86_64: kasan: add interceptors for memset/memmove/memcpy functions") Fixes: ef1e03152cb0 ("x86/asm: Make some functions local") Reported-by: Sami Tolvanen Signed-off-by: Fangrui Song Signed-off-by: Borislav Petkov Reviewed-by: Nick Desaulniers Tested-by: Nathan Chancellor Tested-by: Nick Desaulniers Cc: Link: https://lkml.kernel.org/r/20201103012358.168682-1-maskray@google.com --- arch/x86/lib/memcpy_64.S | 4 +--- arch/x86/lib/memmove_64.S | 4 +--- arch/x86/lib/memset_64.S | 4 +--- 3 files changed, 3 insertions(+), 9 deletions(-) (limited to 'arch') diff --git a/arch/x86/lib/memcpy_64.S b/arch/x86/lib/memcpy_64.S index 037faac46b0c..1e299ac73c86 100644 --- a/arch/x86/lib/memcpy_64.S +++ b/arch/x86/lib/memcpy_64.S @@ -16,8 +16,6 @@ * to a jmp to memcpy_erms which does the REP; MOVSB mem copy. */ -.weak memcpy - /* * memcpy - Copy a memory block. * @@ -30,7 +28,7 @@ * rax original destination */ SYM_FUNC_START_ALIAS(__memcpy) -SYM_FUNC_START_LOCAL(memcpy) +SYM_FUNC_START_WEAK(memcpy) ALTERNATIVE_2 "jmp memcpy_orig", "", X86_FEATURE_REP_GOOD, \ "jmp memcpy_erms", X86_FEATURE_ERMS diff --git a/arch/x86/lib/memmove_64.S b/arch/x86/lib/memmove_64.S index 7ff00ea64e4f..41902fe8b859 100644 --- a/arch/x86/lib/memmove_64.S +++ b/arch/x86/lib/memmove_64.S @@ -24,9 +24,7 @@ * Output: * rax: dest */ -.weak memmove - -SYM_FUNC_START_ALIAS(memmove) +SYM_FUNC_START_WEAK(memmove) SYM_FUNC_START(__memmove) mov %rdi, %rax diff --git a/arch/x86/lib/memset_64.S b/arch/x86/lib/memset_64.S index 9ff15ee404a4..0bfd26e4ca9e 100644 --- a/arch/x86/lib/memset_64.S +++ b/arch/x86/lib/memset_64.S @@ -6,8 +6,6 @@ #include #include -.weak memset - /* * ISO C memset - set a memory block to a byte value. This function uses fast * string to get better performance than the original function. The code is @@ -19,7 +17,7 @@ * * rax original destination */ -SYM_FUNC_START_ALIAS(memset) +SYM_FUNC_START_WEAK(memset) SYM_FUNC_START(__memset) /* * Some CPUs support enhanced REP MOVSB/STOSB feature. It is recommended -- cgit v1.2.3 From 9d750c75bd2c3fcf20a3c15378d1bc6b2d4ec31f Mon Sep 17 00:00:00 2001 From: Ryan Kosta Date: Sat, 10 Oct 2020 20:03:51 -0700 Subject: risc-v: kernel: ftrace: Fixes improper SPDX comment style Signed-off-by: Ryan Kosta Signed-off-by: Palmer Dabbelt --- arch/riscv/kernel/ftrace.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/riscv/kernel/ftrace.c b/arch/riscv/kernel/ftrace.c index 99e12faa5498..765b62434f30 100644 --- a/arch/riscv/kernel/ftrace.c +++ b/arch/riscv/kernel/ftrace.c @@ -1,4 +1,4 @@ -/* SPDX-License-Identifier: GPL-2.0 */ +// SPDX-License-Identifier: GPL-2.0 /* * Copyright (C) 2013 Linaro Limited * Author: AKASHI Takahiro -- cgit v1.2.3 From 1344a232016dbb0492be81f8517c4bf8fc1c6610 Mon Sep 17 00:00:00 2001 From: Michael Ellerman Date: Wed, 4 Nov 2020 22:17:42 +1100 Subject: powerpc: Use asm_goto_volatile for put_user() Andreas reported that commit ee0a49a6870e ("powerpc/uaccess: Switch __put_user_size_allowed() to __put_user_asm_goto()") broke CLONE_CHILD_SETTID. Further inspection showed that the put_user() in schedule_tail() was missing entirely, the store not emitted by the compiler. <.schedule_tail>: mflr r0 std r0,16(r1) stdu r1,-112(r1) bl <.finish_task_switch> ld r9,2496(r3) cmpdi cr7,r9,0 bne cr7,<.schedule_tail+0x60> ld r3,392(r13) ld r9,1392(r3) cmpdi cr7,r9,0 beq cr7,<.schedule_tail+0x3c> li r4,0 li r5,0 bl <.__task_pid_nr_ns> nop bl <.calculate_sigpending> nop addi r1,r1,112 ld r0,16(r1) mtlr r0 blr nop nop nop bl <.__balance_callback> b <.schedule_tail+0x1c> Notice there are no stores other than to the stack. There should be a stw in there for the store to current->set_child_tid. This is only seen with GCC 4.9 era compilers (tested with 4.9.3 and 4.9.4), and only when CONFIG_PPC_KUAP is disabled. When CONFIG_PPC_KUAP=y, the inline asm that's part of the isync() and mtspr() inlined via allow_user_access() seems to be enough to avoid the bug. We already have a macro to work around this (or a similar bug), called asm_volatile_goto which includes an empty asm block to tickle the compiler into generating the right code. So use that. With this applied the code generation looks more like it will work: <.schedule_tail>: mflr r0 std r31,-8(r1) std r0,16(r1) stdu r1,-144(r1) std r3,112(r1) bl <._mcount> nop ld r3,112(r1) bl <.finish_task_switch> ld r9,2624(r3) cmpdi cr7,r9,0 bne cr7,<.schedule_tail+0xa0> ld r3,2408(r13) ld r31,1856(r3) cmpdi cr7,r31,0 beq cr7,<.schedule_tail+0x80> li r4,0 li r5,0 bl <.__task_pid_nr_ns> nop li r9,-1 clrldi r9,r9,12 cmpld cr7,r31,r9 bgt cr7,<.schedule_tail+0x80> lis r9,16 rldicr r9,r9,32,31 subf r9,r31,r9 cmpldi cr7,r9,3 ble cr7,<.schedule_tail+0x80> li r9,0 stw r3,0(r31) <-- stw nop bl <.calculate_sigpending> nop addi r1,r1,144 ld r0,16(r1) ld r31,-8(r1) mtlr r0 blr nop bl <.__balance_callback> b <.schedule_tail+0x30> Fixes: ee0a49a6870e ("powerpc/uaccess: Switch __put_user_size_allowed() to __put_user_asm_goto()") Reported-by: Andreas Schwab Tested-by: Andreas Schwab Suggested-by: Christophe Leroy Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20201104111742.672142-1-mpe@ellerman.id.au --- arch/powerpc/include/asm/uaccess.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'arch') diff --git a/arch/powerpc/include/asm/uaccess.h b/arch/powerpc/include/asm/uaccess.h index ef5bbb705c08..501c9a79038c 100644 --- a/arch/powerpc/include/asm/uaccess.h +++ b/arch/powerpc/include/asm/uaccess.h @@ -178,7 +178,7 @@ do { \ * are no aliasing issues. */ #define __put_user_asm_goto(x, addr, label, op) \ - asm volatile goto( \ + asm_volatile_goto( \ "1: " op "%U1%X1 %0,%1 # put_user\n" \ EX_TABLE(1b, %l2) \ : \ @@ -191,7 +191,7 @@ do { \ __put_user_asm_goto(x, ptr, label, "std") #else /* __powerpc64__ */ #define __put_user_asm2_goto(x, addr, label) \ - asm volatile goto( \ + asm_volatile_goto( \ "1: stw%X1 %0, %1\n" \ "2: stw%X1 %L0, %L1\n" \ EX_TABLE(1b, %l2) \ -- cgit v1.2.3 From 11522448e641e8f1690c9db06e01985e8e19b401 Mon Sep 17 00:00:00 2001 From: Christophe Leroy Date: Sat, 10 Oct 2020 15:14:30 +0000 Subject: powerpc/603: Always fault when _PAGE_ACCESSED is not set The kernel expects pte_young() to work regardless of CONFIG_SWAP. Make sure a minor fault is taken to set _PAGE_ACCESSED when it is not already set, regardless of the selection of CONFIG_SWAP. Fixes: 84de6ab0e904 ("powerpc/603: don't handle PAGE_ACCESSED in TLB miss handlers.") Cc: stable@vger.kernel.org Signed-off-by: Christophe Leroy Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/a44367744de54e2315b2f1a8cbbd7f88488072e0.1602342806.git.christophe.leroy@csgroup.eu --- arch/powerpc/kernel/head_book3s_32.S | 12 ------------ 1 file changed, 12 deletions(-) (limited to 'arch') diff --git a/arch/powerpc/kernel/head_book3s_32.S b/arch/powerpc/kernel/head_book3s_32.S index 5eb9eedac920..2aa16d5368e1 100644 --- a/arch/powerpc/kernel/head_book3s_32.S +++ b/arch/powerpc/kernel/head_book3s_32.S @@ -457,11 +457,7 @@ InstructionTLBMiss: cmplw 0,r1,r3 #endif mfspr r2, SPRN_SPRG_PGDIR -#ifdef CONFIG_SWAP li r1,_PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_EXEC -#else - li r1,_PAGE_PRESENT | _PAGE_EXEC -#endif #if defined(CONFIG_MODULES) || defined(CONFIG_DEBUG_PAGEALLOC) bgt- 112f lis r2, (swapper_pg_dir - PAGE_OFFSET)@ha /* if kernel address, use */ @@ -523,11 +519,7 @@ DataLoadTLBMiss: lis r1, TASK_SIZE@h /* check if kernel address */ cmplw 0,r1,r3 mfspr r2, SPRN_SPRG_PGDIR -#ifdef CONFIG_SWAP li r1, _PAGE_PRESENT | _PAGE_ACCESSED -#else - li r1, _PAGE_PRESENT -#endif bgt- 112f lis r2, (swapper_pg_dir - PAGE_OFFSET)@ha /* if kernel address, use */ addi r2, r2, (swapper_pg_dir - PAGE_OFFSET)@l /* kernel page table */ @@ -603,11 +595,7 @@ DataStoreTLBMiss: lis r1, TASK_SIZE@h /* check if kernel address */ cmplw 0,r1,r3 mfspr r2, SPRN_SPRG_PGDIR -#ifdef CONFIG_SWAP li r1, _PAGE_RW | _PAGE_DIRTY | _PAGE_PRESENT | _PAGE_ACCESSED -#else - li r1, _PAGE_RW | _PAGE_DIRTY | _PAGE_PRESENT -#endif bgt- 112f lis r2, (swapper_pg_dir - PAGE_OFFSET)@ha /* if kernel address, use */ addi r2, r2, (swapper_pg_dir - PAGE_OFFSET)@l /* kernel page table */ -- cgit v1.2.3 From 0540b0d2ce9073fd2a736d636218faa61c99e572 Mon Sep 17 00:00:00 2001 From: Christophe Leroy Date: Sat, 10 Oct 2020 15:14:29 +0000 Subject: powerpc/40x: Always fault when _PAGE_ACCESSED is not set The kernel expects pte_young() to work regardless of CONFIG_SWAP. Make sure a minor fault is taken to set _PAGE_ACCESSED when it is not already set, regardless of the selection of CONFIG_SWAP. Fixes: 2c74e2586bb9 ("powerpc/40x: Rework 40x PTE access and TLB miss") Cc: stable@vger.kernel.org Signed-off-by: Christophe Leroy Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/b02ca2ed2d3676a096219b48c0f69ec982a75bcf.1602342801.git.christophe.leroy@csgroup.eu --- arch/powerpc/kernel/head_40x.S | 8 -------- 1 file changed, 8 deletions(-) (limited to 'arch') diff --git a/arch/powerpc/kernel/head_40x.S b/arch/powerpc/kernel/head_40x.S index 44c9018aed1b..a1ae00689e0f 100644 --- a/arch/powerpc/kernel/head_40x.S +++ b/arch/powerpc/kernel/head_40x.S @@ -284,11 +284,7 @@ _ENTRY(saved_ksp_limit) rlwimi r11, r10, 22, 20, 29 /* Compute PTE address */ lwz r11, 0(r11) /* Get Linux PTE */ -#ifdef CONFIG_SWAP li r9, _PAGE_PRESENT | _PAGE_ACCESSED -#else - li r9, _PAGE_PRESENT -#endif andc. r9, r9, r11 /* Check permission */ bne 5f @@ -369,11 +365,7 @@ _ENTRY(saved_ksp_limit) rlwimi r11, r10, 22, 20, 29 /* Compute PTE address */ lwz r11, 0(r11) /* Get Linux PTE */ -#ifdef CONFIG_SWAP li r9, _PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_EXEC -#else - li r9, _PAGE_PRESENT | _PAGE_EXEC -#endif andc. r9, r9, r11 /* Check permission */ bne 5f -- cgit v1.2.3 From 29daf869cbab69088fe1755d9dd224e99ba78b56 Mon Sep 17 00:00:00 2001 From: Christophe Leroy Date: Mon, 12 Oct 2020 08:54:31 +0000 Subject: powerpc/8xx: Always fault when _PAGE_ACCESSED is not set The kernel expects pte_young() to work regardless of CONFIG_SWAP. Make sure a minor fault is taken to set _PAGE_ACCESSED when it is not already set, regardless of the selection of CONFIG_SWAP. This adds at least 3 instructions to the TLB miss exception handlers fast path. Following patch will reduce this overhead. Also update the rotation instruction to the correct number of bits to reflect all changes done to _PAGE_ACCESSED over time. Fixes: d069cb4373fe ("powerpc/8xx: Don't touch ACCESSED when no SWAP.") Fixes: 5f356497c384 ("powerpc/8xx: remove unused _PAGE_WRITETHRU") Fixes: e0a8e0d90a9f ("powerpc/8xx: Handle PAGE_USER via APG bits") Fixes: 5b2753fc3e8a ("powerpc/8xx: Implementation of PAGE_EXEC") Fixes: a891c43b97d3 ("powerpc/8xx: Prepare handlers for _PAGE_HUGE for 512k pages.") Cc: stable@vger.kernel.org Signed-off-by: Christophe Leroy Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/af834e8a0f1fa97bfae65664950f0984a70c4750.1602492856.git.christophe.leroy@csgroup.eu --- arch/powerpc/kernel/head_8xx.S | 14 ++------------ 1 file changed, 2 insertions(+), 12 deletions(-) (limited to 'arch') diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S index 9f359d3fba74..6f3799a04121 100644 --- a/arch/powerpc/kernel/head_8xx.S +++ b/arch/powerpc/kernel/head_8xx.S @@ -202,9 +202,7 @@ SystemCall: InstructionTLBMiss: mtspr SPRN_SPRG_SCRATCH0, r10 -#if defined(ITLB_MISS_KERNEL) || defined(CONFIG_SWAP) || defined(CONFIG_HUGETLBFS) mtspr SPRN_SPRG_SCRATCH1, r11 -#endif /* If we are faulting a kernel address, we have to use the * kernel page tables. @@ -238,11 +236,9 @@ InstructionTLBMiss: rlwimi r11, r10, 32 - 9, _PMD_PAGE_512K mtspr SPRN_MI_TWC, r11 #endif -#ifdef CONFIG_SWAP - rlwinm r11, r10, 32-5, _PAGE_PRESENT + rlwinm r11, r10, 32-7, _PAGE_PRESENT and r11, r11, r10 rlwimi r10, r11, 0, _PAGE_PRESENT -#endif /* The Linux PTE won't go exactly into the MMU TLB. * Software indicator bits 20 and 23 must be clear. * Software indicator bits 22, 24, 25, 26, and 27 must be @@ -256,9 +252,7 @@ InstructionTLBMiss: /* Restore registers */ 0: mfspr r10, SPRN_SPRG_SCRATCH0 -#if defined(ITLB_MISS_KERNEL) || defined(CONFIG_SWAP) || defined(CONFIG_HUGETLBFS) mfspr r11, SPRN_SPRG_SCRATCH1 -#endif rfi patch_site 0b, patch__itlbmiss_exit_1 @@ -268,9 +262,7 @@ InstructionTLBMiss: addi r10, r10, 1 stw r10, (itlb_miss_counter - PAGE_OFFSET)@l(0) mfspr r10, SPRN_SPRG_SCRATCH0 -#if defined(ITLB_MISS_KERNEL) || defined(CONFIG_SWAP) mfspr r11, SPRN_SPRG_SCRATCH1 -#endif rfi #endif @@ -316,11 +308,9 @@ DataStoreTLBMiss: * r11 = ((r10 & PRESENT) & ((r10 & ACCESSED) >> 5)); * r10 = (r10 & ~PRESENT) | r11; */ -#ifdef CONFIG_SWAP - rlwinm r11, r10, 32-5, _PAGE_PRESENT + rlwinm r11, r10, 32-7, _PAGE_PRESENT and r11, r11, r10 rlwimi r10, r11, 0, _PAGE_PRESENT -#endif /* The Linux PTE won't go exactly into the MMU TLB. * Software indicator bits 24, 25, 26, and 27 must be * set. All other Linux PTE bits control the behavior -- cgit v1.2.3 From 33fe43cfd9b1c20f6f9899b44bf04e91823ff1c9 Mon Sep 17 00:00:00 2001 From: Christophe Leroy Date: Mon, 12 Oct 2020 08:54:33 +0000 Subject: powerpc/8xx: Manage _PAGE_ACCESSED through APG bits in L1 entry When _PAGE_ACCESSED is not set, a minor fault is expected. To do this, TLB miss exception ANDs _PAGE_PRESENT and _PAGE_ACCESSED into the L2 entry valid bit. To simplify the processing and reduce the number of instructions in TLB miss exceptions, manage it as an APG bit and get it next to _PAGE_GUARDED bit to allow a copy in one go. Then declare the corresponding groups as handling all accesses as user accesses. As the PP bits always define user as No Access, it will generate a fault. Signed-off-by: Christophe Leroy Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/80f488db230c6b0e7b3b990d72bd94a8a069e93e.1602492856.git.christophe.leroy@csgroup.eu --- arch/powerpc/include/asm/nohash/32/kup-8xx.h | 2 +- arch/powerpc/include/asm/nohash/32/mmu-8xx.h | 47 +++++++++------------------- arch/powerpc/include/asm/nohash/32/pte-8xx.h | 9 +++--- arch/powerpc/kernel/head_8xx.S | 36 +++++---------------- 4 files changed, 28 insertions(+), 66 deletions(-) (limited to 'arch') diff --git a/arch/powerpc/include/asm/nohash/32/kup-8xx.h b/arch/powerpc/include/asm/nohash/32/kup-8xx.h index 85ed2390fb99..567cdc557402 100644 --- a/arch/powerpc/include/asm/nohash/32/kup-8xx.h +++ b/arch/powerpc/include/asm/nohash/32/kup-8xx.h @@ -63,7 +63,7 @@ static inline void restore_user_access(unsigned long flags) static inline bool bad_kuap_fault(struct pt_regs *regs, unsigned long address, bool is_write) { - return WARN(!((regs->kuap ^ MD_APG_KUAP) & 0xf0000000), + return WARN(!((regs->kuap ^ MD_APG_KUAP) & 0xff000000), "Bug: fault blocked by AP register !"); } diff --git a/arch/powerpc/include/asm/nohash/32/mmu-8xx.h b/arch/powerpc/include/asm/nohash/32/mmu-8xx.h index 1d9ac0f9c794..0bd1b144eb76 100644 --- a/arch/powerpc/include/asm/nohash/32/mmu-8xx.h +++ b/arch/powerpc/include/asm/nohash/32/mmu-8xx.h @@ -33,19 +33,18 @@ * respectively NA for All or X for Supervisor and no access for User. * Then we use the APG to say whether accesses are according to Page rules or * "all Supervisor" rules (Access to all) - * Therefore, we define 2 APG groups. lsb is _PMD_USER - * 0 => Kernel => 01 (all accesses performed according to page definition) - * 1 => User => 00 (all accesses performed as supervisor iaw page definition) - * 2-15 => Not Used - */ -#define MI_APG_INIT 0x40000000 - -/* - * 0 => Kernel => 01 (all accesses performed according to page definition) - * 1 => User => 10 (all accesses performed according to swaped page definition) - * 2-15 => Not Used - */ -#define MI_APG_KUEP 0x60000000 + * _PAGE_ACCESSED is also managed via APG. When _PAGE_ACCESSED is not set, say + * "all User" rules, that will lead to NA for all. + * Therefore, we define 4 APG groups. lsb is _PAGE_ACCESSED + * 0 => Kernel => 11 (all accesses performed according as user iaw page definition) + * 1 => Kernel+Accessed => 01 (all accesses performed according to page definition) + * 2 => User => 11 (all accesses performed according as user iaw page definition) + * 3 => User+Accessed => 00 (all accesses performed as supervisor iaw page definition) for INIT + * => 10 (all accesses performed according to swaped page definition) for KUEP + * 4-15 => Not Used + */ +#define MI_APG_INIT 0xdc000000 +#define MI_APG_KUEP 0xde000000 /* The effective page number register. When read, contains the information * about the last instruction TLB miss. When MI_RPN is written, bits in @@ -106,25 +105,9 @@ #define MD_Ks 0x80000000 /* Should not be set */ #define MD_Kp 0x40000000 /* Should always be set */ -/* - * All pages' PP data bits are set to either 000 or 011 or 001, which means - * respectively RW for Supervisor and no access for User, or RO for - * Supervisor and no access for user and NA for ALL. - * Then we use the APG to say whether accesses are according to Page rules or - * "all Supervisor" rules (Access to all) - * Therefore, we define 2 APG groups. lsb is _PMD_USER - * 0 => Kernel => 01 (all accesses performed according to page definition) - * 1 => User => 00 (all accesses performed as supervisor iaw page definition) - * 2-15 => Not Used - */ -#define MD_APG_INIT 0x40000000 - -/* - * 0 => No user => 01 (all accesses performed according to page definition) - * 1 => User => 10 (all accesses performed according to swaped page definition) - * 2-15 => Not Used - */ -#define MD_APG_KUAP 0x60000000 +/* See explanation above at the definition of MI_APG_INIT */ +#define MD_APG_INIT 0xdc000000 +#define MD_APG_KUAP 0xde000000 /* The effective page number register. When read, contains the information * about the last instruction TLB miss. When MD_RPN is written, bits in diff --git a/arch/powerpc/include/asm/nohash/32/pte-8xx.h b/arch/powerpc/include/asm/nohash/32/pte-8xx.h index 66f403a7da44..1581204467e1 100644 --- a/arch/powerpc/include/asm/nohash/32/pte-8xx.h +++ b/arch/powerpc/include/asm/nohash/32/pte-8xx.h @@ -39,9 +39,9 @@ * into the TLB. */ #define _PAGE_GUARDED 0x0010 /* Copied to L1 G entry in DTLB */ -#define _PAGE_SPECIAL 0x0020 /* SW entry */ +#define _PAGE_ACCESSED 0x0020 /* Copied to L1 APG 1 entry in I/DTLB */ #define _PAGE_EXEC 0x0040 /* Copied to PP (bit 21) in ITLB */ -#define _PAGE_ACCESSED 0x0080 /* software: page referenced */ +#define _PAGE_SPECIAL 0x0080 /* SW entry */ #define _PAGE_NA 0x0200 /* Supervisor NA, User no access */ #define _PAGE_RO 0x0600 /* Supervisor RO, User no access */ @@ -59,11 +59,12 @@ #define _PMD_PRESENT 0x0001 #define _PMD_PRESENT_MASK _PMD_PRESENT -#define _PMD_BAD 0x0fd0 +#define _PMD_BAD 0x0f90 #define _PMD_PAGE_MASK 0x000c #define _PMD_PAGE_8M 0x000c #define _PMD_PAGE_512K 0x0004 -#define _PMD_USER 0x0020 /* APG 1 */ +#define _PMD_ACCESSED 0x0020 /* APG 1 */ +#define _PMD_USER 0x0040 /* APG 2 */ #define _PTE_NONE_MASK 0 diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S index 6f3799a04121..ee0bfebc375f 100644 --- a/arch/powerpc/kernel/head_8xx.S +++ b/arch/powerpc/kernel/head_8xx.S @@ -222,23 +222,13 @@ InstructionTLBMiss: 3: mtcr r11 #endif -#if defined(CONFIG_HUGETLBFS) || !defined(CONFIG_PIN_TLB_TEXT) lwz r11, (swapper_pg_dir-PAGE_OFFSET)@l(r10) /* Get level 1 entry */ mtspr SPRN_MD_TWC, r11 -#else - lwz r10, (swapper_pg_dir-PAGE_OFFSET)@l(r10) /* Get level 1 entry */ - mtspr SPRN_MI_TWC, r10 /* Set segment attributes */ - mtspr SPRN_MD_TWC, r10 -#endif mfspr r10, SPRN_MD_TWC lwz r10, 0(r10) /* Get the pte */ -#if defined(CONFIG_HUGETLBFS) || !defined(CONFIG_PIN_TLB_TEXT) + rlwimi r11, r10, 0, _PAGE_GUARDED | _PAGE_ACCESSED rlwimi r11, r10, 32 - 9, _PMD_PAGE_512K mtspr SPRN_MI_TWC, r11 -#endif - rlwinm r11, r10, 32-7, _PAGE_PRESENT - and r11, r11, r10 - rlwimi r10, r11, 0, _PAGE_PRESENT /* The Linux PTE won't go exactly into the MMU TLB. * Software indicator bits 20 and 23 must be clear. * Software indicator bits 22, 24, 25, 26, and 27 must be @@ -289,28 +279,16 @@ DataStoreTLBMiss: mfspr r10, SPRN_MD_TWC lwz r10, 0(r10) /* Get the pte */ - /* Insert the Guarded flag into the TWC from the Linux PTE. + /* Insert Guarded and Accessed flags into the TWC from the Linux PTE. * It is bit 27 of both the Linux PTE and the TWC (at least * I got that right :-). It will be better when we can put * this into the Linux pgd/pmd and load it in the operation * above. */ - rlwimi r11, r10, 0, _PAGE_GUARDED + rlwimi r11, r10, 0, _PAGE_GUARDED | _PAGE_ACCESSED rlwimi r11, r10, 32 - 9, _PMD_PAGE_512K mtspr SPRN_MD_TWC, r11 - /* Both _PAGE_ACCESSED and _PAGE_PRESENT has to be set. - * We also need to know if the insn is a load/store, so: - * Clear _PAGE_PRESENT and load that which will - * trap into DTLB Error with store bit set accordinly. - */ - /* PRESENT=0x1, ACCESSED=0x20 - * r11 = ((r10 & PRESENT) & ((r10 & ACCESSED) >> 5)); - * r10 = (r10 & ~PRESENT) | r11; - */ - rlwinm r11, r10, 32-7, _PAGE_PRESENT - and r11, r11, r10 - rlwimi r10, r11, 0, _PAGE_PRESENT /* The Linux PTE won't go exactly into the MMU TLB. * Software indicator bits 24, 25, 26, and 27 must be * set. All other Linux PTE bits control the behavior @@ -701,7 +679,7 @@ initial_mmu: li r9, 4 /* up to 4 pages of 8M */ mtctr r9 lis r9, KERNELBASE@h /* Create vaddr for TLB */ - li r10, MI_PS8MEG | MI_SVALID /* Set 8M byte page */ + li r10, MI_PS8MEG | _PMD_ACCESSED | MI_SVALID li r11, MI_BOOTINIT /* Create RPN for address 0 */ 1: mtspr SPRN_MI_CTR, r8 /* Set instruction MMU control */ @@ -765,7 +743,7 @@ _GLOBAL(mmu_pin_tlb) #ifdef CONFIG_PIN_TLB_TEXT LOAD_REG_IMMEDIATE(r5, 28 << 8) LOAD_REG_IMMEDIATE(r6, PAGE_OFFSET) - LOAD_REG_IMMEDIATE(r7, MI_SVALID | MI_PS8MEG) + LOAD_REG_IMMEDIATE(r7, MI_SVALID | MI_PS8MEG | _PMD_ACCESSED) LOAD_REG_IMMEDIATE(r8, 0xf0 | _PAGE_RO | _PAGE_SPS | _PAGE_SH | _PAGE_PRESENT) LOAD_REG_ADDR(r9, _sinittext) li r0, 4 @@ -787,7 +765,7 @@ _GLOBAL(mmu_pin_tlb) LOAD_REG_IMMEDIATE(r5, 28 << 8 | MD_TWAM) #ifdef CONFIG_PIN_TLB_DATA LOAD_REG_IMMEDIATE(r6, PAGE_OFFSET) - LOAD_REG_IMMEDIATE(r7, MI_SVALID | MI_PS8MEG) + LOAD_REG_IMMEDIATE(r7, MI_SVALID | MI_PS8MEG | _PMD_ACCESSED) #ifdef CONFIG_PIN_TLB_IMMR li r0, 3 #else @@ -824,7 +802,7 @@ _GLOBAL(mmu_pin_tlb) #endif #ifdef CONFIG_PIN_TLB_IMMR LOAD_REG_IMMEDIATE(r0, VIRT_IMMR_BASE | MD_EVALID) - LOAD_REG_IMMEDIATE(r7, MD_SVALID | MD_PS512K | MD_GUARDED) + LOAD_REG_IMMEDIATE(r7, MD_SVALID | MD_PS512K | MD_GUARDED | _PMD_ACCESSED) mfspr r8, SPRN_IMMR rlwinm r8, r8, 0, 0xfff80000 ori r8, r8, 0xf0 | _PAGE_DIRTY | _PAGE_SPS | _PAGE_SH | \ -- cgit v1.2.3 From 1bd14a66ee5200d6a24419cbd2e0a0fccd4da36f Mon Sep 17 00:00:00 2001 From: Atish Patra Date: Wed, 7 Oct 2020 14:51:59 -0700 Subject: RISC-V: Remove any memblock representing unusable memory area RISC-V limits the physical memory size by -PAGE_OFFSET. Any memory beyond that size from DRAM start is unusable. Just remove any memblock pointing to those memory region without worrying about computing the maximum size. Signed-off-by: Atish Patra Reviewed-by: Mike Rapoport Signed-off-by: Palmer Dabbelt --- arch/riscv/mm/init.c | 18 +++++++----------- 1 file changed, 7 insertions(+), 11 deletions(-) (limited to 'arch') diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c index ea933b789a88..42c14eb6bc43 100644 --- a/arch/riscv/mm/init.c +++ b/arch/riscv/mm/init.c @@ -154,9 +154,8 @@ disable: void __init setup_bootmem(void) { - phys_addr_t mem_size = 0; - phys_addr_t total_mem = 0; - phys_addr_t mem_start, start, end = 0; + phys_addr_t mem_start = 0; + phys_addr_t start, end = 0; phys_addr_t vmlinux_end = __pa_symbol(&_end); phys_addr_t vmlinux_start = __pa_symbol(&_start); u64 i; @@ -164,21 +163,18 @@ void __init setup_bootmem(void) /* Find the memory region containing the kernel */ for_each_mem_range(i, &start, &end) { phys_addr_t size = end - start; - if (!total_mem) + if (!mem_start) mem_start = start; if (start <= vmlinux_start && vmlinux_end <= end) BUG_ON(size == 0); - total_mem = total_mem + size; } /* - * Remove memblock from the end of usable area to the - * end of region + * The maximal physical memory size is -PAGE_OFFSET. + * Make sure that any memory beyond mem_start + (-PAGE_OFFSET) is removed + * as it is unusable by kernel. */ - mem_size = min(total_mem, (phys_addr_t)-PAGE_OFFSET); - if (mem_start + mem_size < end) - memblock_remove(mem_start + mem_size, - end - mem_start - mem_size); + memblock_enforce_memory_limit(mem_start - PAGE_OFFSET); /* Reserve from the start of the kernel to the end of the kernel */ memblock_reserve(vmlinux_start, vmlinux_end - vmlinux_start); -- cgit v1.2.3 From 1978b3a53a74e3230cd46932b149c6e62e832e9a Mon Sep 17 00:00:00 2001 From: Anand K Mistry Date: Thu, 5 Nov 2020 16:33:04 +1100 Subject: x86/speculation: Allow IBPB to be conditionally enabled on CPUs with always-on STIBP On AMD CPUs which have the feature X86_FEATURE_AMD_STIBP_ALWAYS_ON, STIBP is set to on and spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT_PREFERRED At the same time, IBPB can be set to conditional. However, this leads to the case where it's impossible to turn on IBPB for a process because in the PR_SPEC_DISABLE case in ib_prctl_set() the spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT_PREFERRED condition leads to a return before the task flag is set. Similarly, ib_prctl_get() will return PR_SPEC_DISABLE even though IBPB is set to conditional. More generally, the following cases are possible: 1. STIBP = conditional && IBPB = on for spectre_v2_user=seccomp,ibpb 2. STIBP = on && IBPB = conditional for AMD CPUs with X86_FEATURE_AMD_STIBP_ALWAYS_ON The first case functions correctly today, but only because spectre_v2_user_ibpb isn't updated to reflect the IBPB mode. At a high level, this change does one thing. If either STIBP or IBPB is set to conditional, allow the prctl to change the task flag. Also, reflect that capability when querying the state. This isn't perfect since it doesn't take into account if only STIBP or IBPB is unconditionally on. But it allows the conditional feature to work as expected, without affecting the unconditional one. [ bp: Massage commit message and comment; space out statements for better readability. ] Fixes: 21998a351512 ("x86/speculation: Avoid force-disabling IBPB based on STIBP and enhanced IBRS.") Signed-off-by: Anand K Mistry Signed-off-by: Borislav Petkov Acked-by: Thomas Gleixner Acked-by: Tom Lendacky Link: https://lkml.kernel.org/r/20201105163246.v2.1.Ifd7243cd3e2c2206a893ad0a5b9a4f19549e22c6@changeid --- arch/x86/kernel/cpu/bugs.c | 51 ++++++++++++++++++++++++++++++---------------- 1 file changed, 33 insertions(+), 18 deletions(-) (limited to 'arch') diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c index d3f0db463f96..581fb7223ad0 100644 --- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -1254,6 +1254,14 @@ static int ssb_prctl_set(struct task_struct *task, unsigned long ctrl) return 0; } +static bool is_spec_ib_user_controlled(void) +{ + return spectre_v2_user_ibpb == SPECTRE_V2_USER_PRCTL || + spectre_v2_user_ibpb == SPECTRE_V2_USER_SECCOMP || + spectre_v2_user_stibp == SPECTRE_V2_USER_PRCTL || + spectre_v2_user_stibp == SPECTRE_V2_USER_SECCOMP; +} + static int ib_prctl_set(struct task_struct *task, unsigned long ctrl) { switch (ctrl) { @@ -1261,16 +1269,26 @@ static int ib_prctl_set(struct task_struct *task, unsigned long ctrl) if (spectre_v2_user_ibpb == SPECTRE_V2_USER_NONE && spectre_v2_user_stibp == SPECTRE_V2_USER_NONE) return 0; + /* - * Indirect branch speculation is always disabled in strict - * mode. It can neither be enabled if it was force-disabled - * by a previous prctl call. + * With strict mode for both IBPB and STIBP, the instruction + * code paths avoid checking this task flag and instead, + * unconditionally run the instruction. However, STIBP and IBPB + * are independent and either can be set to conditionally + * enabled regardless of the mode of the other. + * + * If either is set to conditional, allow the task flag to be + * updated, unless it was force-disabled by a previous prctl + * call. Currently, this is possible on an AMD CPU which has the + * feature X86_FEATURE_AMD_STIBP_ALWAYS_ON. In this case, if the + * kernel is booted with 'spectre_v2_user=seccomp', then + * spectre_v2_user_ibpb == SPECTRE_V2_USER_SECCOMP and + * spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT_PREFERRED. */ - if (spectre_v2_user_ibpb == SPECTRE_V2_USER_STRICT || - spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT || - spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT_PREFERRED || + if (!is_spec_ib_user_controlled() || task_spec_ib_force_disable(task)) return -EPERM; + task_clear_spec_ib_disable(task); task_update_spec_tif(task); break; @@ -1283,10 +1301,10 @@ static int ib_prctl_set(struct task_struct *task, unsigned long ctrl) if (spectre_v2_user_ibpb == SPECTRE_V2_USER_NONE && spectre_v2_user_stibp == SPECTRE_V2_USER_NONE) return -EPERM; - if (spectre_v2_user_ibpb == SPECTRE_V2_USER_STRICT || - spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT || - spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT_PREFERRED) + + if (!is_spec_ib_user_controlled()) return 0; + task_set_spec_ib_disable(task); if (ctrl == PR_SPEC_FORCE_DISABLE) task_set_spec_ib_force_disable(task); @@ -1351,20 +1369,17 @@ static int ib_prctl_get(struct task_struct *task) if (spectre_v2_user_ibpb == SPECTRE_V2_USER_NONE && spectre_v2_user_stibp == SPECTRE_V2_USER_NONE) return PR_SPEC_ENABLE; - else if (spectre_v2_user_ibpb == SPECTRE_V2_USER_STRICT || - spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT || - spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT_PREFERRED) - return PR_SPEC_DISABLE; - else if (spectre_v2_user_ibpb == SPECTRE_V2_USER_PRCTL || - spectre_v2_user_ibpb == SPECTRE_V2_USER_SECCOMP || - spectre_v2_user_stibp == SPECTRE_V2_USER_PRCTL || - spectre_v2_user_stibp == SPECTRE_V2_USER_SECCOMP) { + else if (is_spec_ib_user_controlled()) { if (task_spec_ib_force_disable(task)) return PR_SPEC_PRCTL | PR_SPEC_FORCE_DISABLE; if (task_spec_ib_disable(task)) return PR_SPEC_PRCTL | PR_SPEC_DISABLE; return PR_SPEC_PRCTL | PR_SPEC_ENABLE; - } else + } else if (spectre_v2_user_ibpb == SPECTRE_V2_USER_STRICT || + spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT || + spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT_PREFERRED) + return PR_SPEC_DISABLE; + else return PR_SPEC_NOT_AFFECTED; } -- cgit v1.2.3 From 108aa503657ee2fe8aa071dc620d96372c252ecd Mon Sep 17 00:00:00 2001 From: Benjamin Gwin Date: Tue, 3 Nov 2020 12:11:06 -0800 Subject: arm64: kexec_file: try more regions if loading segments fails It's possible that the first region picked for the new kernel will make it impossible to fit the other segments in the required 32GB window, especially if we have a very large initrd. Instead of giving up, we can keep testing other regions for the kernel until we find one that works. Suggested-by: Ryan O'Leary Signed-off-by: Benjamin Gwin Link: https://lore.kernel.org/r/20201103201106.2397844-1-bgwin@google.com Signed-off-by: Will Deacon --- arch/arm64/kernel/kexec_image.c | 41 +++++++++++++++++++++++++--------- arch/arm64/kernel/machine_kexec_file.c | 9 +++++++- 2 files changed, 39 insertions(+), 11 deletions(-) (limited to 'arch') diff --git a/arch/arm64/kernel/kexec_image.c b/arch/arm64/kernel/kexec_image.c index af9987c154ca..66adee8b5fc8 100644 --- a/arch/arm64/kernel/kexec_image.c +++ b/arch/arm64/kernel/kexec_image.c @@ -43,7 +43,7 @@ static void *image_load(struct kimage *image, u64 flags, value; bool be_image, be_kernel; struct kexec_buf kbuf; - unsigned long text_offset; + unsigned long text_offset, kernel_segment_number; struct kexec_segment *kernel_segment; int ret; @@ -88,11 +88,37 @@ static void *image_load(struct kimage *image, /* Adjust kernel segment with TEXT_OFFSET */ kbuf.memsz += text_offset; - ret = kexec_add_buffer(&kbuf); - if (ret) + kernel_segment_number = image->nr_segments; + + /* + * The location of the kernel segment may make it impossible to satisfy + * the other segment requirements, so we try repeatedly to find a + * location that will work. + */ + while ((ret = kexec_add_buffer(&kbuf)) == 0) { + /* Try to load additional data */ + kernel_segment = &image->segment[kernel_segment_number]; + ret = load_other_segments(image, kernel_segment->mem, + kernel_segment->memsz, initrd, + initrd_len, cmdline); + if (!ret) + break; + + /* + * We couldn't find space for the other segments; erase the + * kernel segment and try the next available hole. + */ + image->nr_segments -= 1; + kbuf.buf_min = kernel_segment->mem + kernel_segment->memsz; + kbuf.mem = KEXEC_BUF_MEM_UNKNOWN; + } + + if (ret) { + pr_err("Could not find any suitable kernel location!"); return ERR_PTR(ret); + } - kernel_segment = &image->segment[image->nr_segments - 1]; + kernel_segment = &image->segment[kernel_segment_number]; kernel_segment->mem += text_offset; kernel_segment->memsz -= text_offset; image->start = kernel_segment->mem; @@ -101,12 +127,7 @@ static void *image_load(struct kimage *image, kernel_segment->mem, kbuf.bufsz, kernel_segment->memsz); - /* Load additional data */ - ret = load_other_segments(image, - kernel_segment->mem, kernel_segment->memsz, - initrd, initrd_len, cmdline); - - return ERR_PTR(ret); + return 0; } #ifdef CONFIG_KEXEC_IMAGE_VERIFY_SIG diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c index 5b0e67b93cdc..03210f644790 100644 --- a/arch/arm64/kernel/machine_kexec_file.c +++ b/arch/arm64/kernel/machine_kexec_file.c @@ -240,6 +240,11 @@ static int prepare_elf_headers(void **addr, unsigned long *sz) return ret; } +/* + * Tries to add the initrd and DTB to the image. If it is not possible to find + * valid locations, this function will undo changes to the image and return non + * zero. + */ int load_other_segments(struct kimage *image, unsigned long kernel_load_addr, unsigned long kernel_size, @@ -248,7 +253,8 @@ int load_other_segments(struct kimage *image, { struct kexec_buf kbuf; void *headers, *dtb = NULL; - unsigned long headers_sz, initrd_load_addr = 0, dtb_len; + unsigned long headers_sz, initrd_load_addr = 0, dtb_len, + orig_segments = image->nr_segments; int ret = 0; kbuf.image = image; @@ -334,6 +340,7 @@ int load_other_segments(struct kimage *image, return 0; out_err: + image->nr_segments = orig_segments; vfree(dtb); return ret; } -- cgit v1.2.3 From 79605f1394261995c2b955c906a5a20fb27cdc84 Mon Sep 17 00:00:00 2001 From: Sean Anderson Date: Thu, 22 Oct 2020 16:30:12 -0400 Subject: riscv: Set text_offset correctly for M-Mode M-Mode Linux is loaded at the start of RAM, not 2MB later. Perhaps this should be calculated based on PAGE_OFFSET somehow? Even better would be to deprecate text_offset and instead introduce something absolute. Signed-off-by: Sean Anderson Signed-off-by: Palmer Dabbelt --- arch/riscv/kernel/head.S | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'arch') diff --git a/arch/riscv/kernel/head.S b/arch/riscv/kernel/head.S index 11e2a4fe66e0..7e849797c9c3 100644 --- a/arch/riscv/kernel/head.S +++ b/arch/riscv/kernel/head.S @@ -35,12 +35,17 @@ ENTRY(_start) .word 0 #endif .balign 8 +#ifdef CONFIG_RISCV_M_MODE + /* Image load offset (0MB) from start of RAM for M-mode */ + .dword 0 +#else #if __riscv_xlen == 64 /* Image load offset(2MB) from start of RAM */ .dword 0x200000 #else /* Image load offset(4MB) from start of RAM */ .dword 0x400000 +#endif #endif /* Effective size of kernel image */ .dword _end - _start -- cgit v1.2.3 From 3fb4a8fa28b740709bdd3229b80279957f4d37ed Mon Sep 17 00:00:00 2001 From: Scott Cheloha Date: Thu, 5 Nov 2020 16:30:40 -0600 Subject: powerpc/numa: Fix build when CONFIG_NUMA=n Add a non-NUMA definition for of_drconf_to_nid_single() to topology.h so we have one even if powerpc/mm/numa.c is not compiled. On a non-NUMA kernel the appropriate node id is always first_online_node. Fixes: 72cdd117c449 ("pseries/hotplug-memory: hot-add: skip redundant LMB lookup") Reported-by: kernel test robot Signed-off-by: Scott Cheloha Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20201105223040.3612663-1-cheloha@linux.ibm.com --- arch/powerpc/include/asm/topology.h | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) (limited to 'arch') diff --git a/arch/powerpc/include/asm/topology.h b/arch/powerpc/include/asm/topology.h index 8728590f514a..3beeb030cd78 100644 --- a/arch/powerpc/include/asm/topology.h +++ b/arch/powerpc/include/asm/topology.h @@ -6,6 +6,7 @@ struct device; struct device_node; +struct drmem_lmb; #ifdef CONFIG_NUMA @@ -61,6 +62,9 @@ static inline int early_cpu_to_node(int cpu) */ return (nid < 0) ? 0 : nid; } + +int of_drconf_to_nid_single(struct drmem_lmb *lmb); + #else static inline int early_cpu_to_node(int cpu) { return 0; } @@ -84,10 +88,12 @@ static inline int cpu_distance(__be32 *cpu1_assoc, __be32 *cpu2_assoc) return 0; } -#endif /* CONFIG_NUMA */ +static inline int of_drconf_to_nid_single(struct drmem_lmb *lmb) +{ + return first_online_node; +} -struct drmem_lmb; -int of_drconf_to_nid_single(struct drmem_lmb *lmb); +#endif /* CONFIG_NUMA */ #if defined(CONFIG_NUMA) && defined(CONFIG_PPC_SPLPAR) extern int find_and_online_cpu_nid(int cpu); -- cgit v1.2.3 From bcacf5f6f239a9e60287680514f392748cb4ec39 Mon Sep 17 00:00:00 2001 From: Liu Shaohua Date: Mon, 26 Oct 2020 20:26:54 +0800 Subject: riscv: fix pfn_to_virt err in do_page_fault(). The argument to pfn_to_virt() should be pfn not the value of CSR_SATP. Reviewed-by: Palmer Dabbelt Reviewed-by: Anup Patel Signed-off-by: liush Reviewed-by: Pekka Enberg Signed-off-by: Palmer Dabbelt --- arch/riscv/mm/fault.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/riscv/mm/fault.c b/arch/riscv/mm/fault.c index 1359e21c0c62..3c8b9e433c67 100644 --- a/arch/riscv/mm/fault.c +++ b/arch/riscv/mm/fault.c @@ -86,6 +86,7 @@ static inline void vmalloc_fault(struct pt_regs *regs, int code, unsigned long a pmd_t *pmd, *pmd_k; pte_t *pte_k; int index; + unsigned long pfn; /* User mode accesses just cause a SIGSEGV */ if (user_mode(regs)) @@ -100,7 +101,8 @@ static inline void vmalloc_fault(struct pt_regs *regs, int code, unsigned long a * of a task switch. */ index = pgd_index(addr); - pgd = (pgd_t *)pfn_to_virt(csr_read(CSR_SATP)) + index; + pfn = csr_read(CSR_SATP) & SATP_PPN; + pgd = (pgd_t *)pfn_to_virt(pfn) + index; pgd_k = init_mm.pgd + index; if (!pgd_present(*pgd_k)) { -- cgit v1.2.3 From 635e3f3e47f24b2506bc9daf91d70ddf3cd024a9 Mon Sep 17 00:00:00 2001 From: Changbin Du Date: Mon, 2 Nov 2020 15:30:52 +0800 Subject: riscv: uaccess: fix __put_kernel_nofault() The copy_from_kernel_nofault() is broken on riscv because the 'dst' and 'src' are mistakenly reversed in __put_kernel_nofault() macro. copy_to_kernel_nofault: ... 0xffffffe0003159b8 <+30>: sd a4,0(a1) # a1 aka 'src' Fixes: d464118cdc ("riscv: implement __get_kernel_nofault and __put_user_nofault") Signed-off-by: Changbin Du Reviewed-by: Christoph Hellwig Reviewed-by: Anup Patel Tested-by: Anup Patel Signed-off-by: Palmer Dabbelt --- arch/riscv/include/asm/uaccess.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/riscv/include/asm/uaccess.h b/arch/riscv/include/asm/uaccess.h index c47e6b35c551..824b2c9da75b 100644 --- a/arch/riscv/include/asm/uaccess.h +++ b/arch/riscv/include/asm/uaccess.h @@ -476,7 +476,7 @@ do { \ do { \ long __kr_err; \ \ - __put_user_nocheck(*((type *)(dst)), (type *)(src), __kr_err); \ + __put_user_nocheck(*((type *)(src)), (type *)(dst), __kr_err); \ if (unlikely(__kr_err)) \ goto err_label; \ } while (0) -- cgit v1.2.3 From 1074dd44c5ba377f90e2d0d99a784f73dbea6ff7 Mon Sep 17 00:00:00 2001 From: Anup Patel Date: Wed, 4 Nov 2020 12:07:13 +0530 Subject: RISC-V: Use non-PGD mappings for early DTB access Currently, we use PGD mappings for early DTB mapping in early_pgd but this breaks Linux kernel on SiFive Unleashed because on SiFive Unleashed PMP checks don't work correctly for PGD mappings. To fix early DTB mappings on SiFive Unleashed, we use non-PGD mappings (i.e. PMD) for early DTB access. Fixes: 8f3a2b4a96dc ("RISC-V: Move DT mapping outof fixmap") Signed-off-by: Anup Patel Reviewed-by: Atish Patra Tested-by: Atish Patra Signed-off-by: Palmer Dabbelt --- arch/riscv/mm/init.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+) (limited to 'arch') diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c index 42c14eb6bc43..8e577f14f120 100644 --- a/arch/riscv/mm/init.c +++ b/arch/riscv/mm/init.c @@ -293,6 +293,7 @@ pmd_t fixmap_pmd[PTRS_PER_PMD] __page_aligned_bss; #define NUM_EARLY_PMDS (1UL + MAX_EARLY_MAPPING_SIZE / PGDIR_SIZE) #endif pmd_t early_pmd[PTRS_PER_PMD * NUM_EARLY_PMDS] __initdata __aligned(PAGE_SIZE); +pmd_t early_dtb_pmd[PTRS_PER_PMD] __initdata __aligned(PAGE_SIZE); static pmd_t *__init get_pmd_virt_early(phys_addr_t pa) { @@ -490,6 +491,18 @@ asmlinkage void __init setup_vm(uintptr_t dtb_pa) load_pa + (va - PAGE_OFFSET), map_size, PAGE_KERNEL_EXEC); +#ifndef __PAGETABLE_PMD_FOLDED + /* Setup early PMD for DTB */ + create_pgd_mapping(early_pg_dir, DTB_EARLY_BASE_VA, + (uintptr_t)early_dtb_pmd, PGDIR_SIZE, PAGE_TABLE); + /* Create two consecutive PMD mappings for FDT early scan */ + pa = dtb_pa & ~(PMD_SIZE - 1); + create_pmd_mapping(early_dtb_pmd, DTB_EARLY_BASE_VA, + pa, PMD_SIZE, PAGE_KERNEL); + create_pmd_mapping(early_dtb_pmd, DTB_EARLY_BASE_VA + PMD_SIZE, + pa + PMD_SIZE, PMD_SIZE, PAGE_KERNEL); + dtb_early_va = (void *)DTB_EARLY_BASE_VA + (dtb_pa & (PMD_SIZE - 1)); +#else /* Create two consecutive PGD mappings for FDT early scan */ pa = dtb_pa & ~(PGDIR_SIZE - 1); create_pgd_mapping(early_pg_dir, DTB_EARLY_BASE_VA, @@ -497,6 +510,7 @@ asmlinkage void __init setup_vm(uintptr_t dtb_pa) create_pgd_mapping(early_pg_dir, DTB_EARLY_BASE_VA + PGDIR_SIZE, pa + PGDIR_SIZE, PGDIR_SIZE, PAGE_KERNEL); dtb_early_va = (void *)DTB_EARLY_BASE_VA + (dtb_pa & (PGDIR_SIZE - 1)); +#endif dtb_early_pa = dtb_pa; /* -- cgit v1.2.3 From c2c81bb2f69138f902e1a58d3bef6ad97fb8a92c Mon Sep 17 00:00:00 2001 From: Palmer Dabbelt Date: Fri, 23 Oct 2020 21:50:47 -0700 Subject: RISC-V: Fix the VDSO symbol generaton for binutils-2.35+ We were relying on GNU ld's ability to re-link executable files in order to extract our VDSO symbols. This behavior was deemed a bug as of binutils-2.35 (specifically the binutils-gdb commit a87e1817a4 ("Have the linker fail if any attempt to link in an executable is made."), but as that has been backported to at least Debian's binutils-2.34 in may manifest in other places. The previous version of this was a bit of a mess: we were linking a static executable version of the VDSO, containing only a subset of the input symbols, which we then linked into the kernel. This worked, but certainly wasn't a supported path through the toolchain. Instead this new version parses the textual output of nm to produce a symbol table. Both rely on near-zero addresses being linkable, but as we rely on weak undefined symbols being linkable elsewhere I don't view this as a major issue. Fixes: e2c0cdfba7f6 ("RISC-V: User-facing API") Signed-off-by: Palmer Dabbelt --- arch/riscv/kernel/vdso/.gitignore | 1 + arch/riscv/kernel/vdso/Makefile | 18 +++++++++--------- arch/riscv/kernel/vdso/so2s.sh | 6 ++++++ 3 files changed, 16 insertions(+), 9 deletions(-) create mode 100755 arch/riscv/kernel/vdso/so2s.sh (limited to 'arch') diff --git a/arch/riscv/kernel/vdso/.gitignore b/arch/riscv/kernel/vdso/.gitignore index 11ebee9e4c1d..3a19def868ec 100644 --- a/arch/riscv/kernel/vdso/.gitignore +++ b/arch/riscv/kernel/vdso/.gitignore @@ -1,3 +1,4 @@ # SPDX-License-Identifier: GPL-2.0-only vdso.lds *.tmp +vdso-syms.S diff --git a/arch/riscv/kernel/vdso/Makefile b/arch/riscv/kernel/vdso/Makefile index 7d6a94d45ec9..cb8f9e4cfcbf 100644 --- a/arch/riscv/kernel/vdso/Makefile +++ b/arch/riscv/kernel/vdso/Makefile @@ -43,19 +43,14 @@ $(obj)/vdso.o: $(obj)/vdso.so SYSCFLAGS_vdso.so.dbg = $(c_flags) $(obj)/vdso.so.dbg: $(src)/vdso.lds $(obj-vdso) FORCE $(call if_changed,vdsold) +SYSCFLAGS_vdso.so.dbg = -shared -s -Wl,-soname=linux-vdso.so.1 \ + -Wl,--build-id -Wl,--hash-style=both # We also create a special relocatable object that should mirror the symbol # table and layout of the linked DSO. With ld --just-symbols we can then # refer to these symbols in the kernel code rather than hand-coded addresses. - -SYSCFLAGS_vdso.so.dbg = -shared -s -Wl,-soname=linux-vdso.so.1 \ - -Wl,--build-id=sha1 -Wl,--hash-style=both -$(obj)/vdso-dummy.o: $(src)/vdso.lds $(obj)/rt_sigreturn.o FORCE - $(call if_changed,vdsold) - -LDFLAGS_vdso-syms.o := -r --just-symbols -$(obj)/vdso-syms.o: $(obj)/vdso-dummy.o FORCE - $(call if_changed,ld) +$(obj)/vdso-syms.S: $(obj)/vdso.so FORCE + $(call if_changed,so2s) # strip rule for the .so file $(obj)/%.so: OBJCOPYFLAGS := -S @@ -73,6 +68,11 @@ quiet_cmd_vdsold = VDSOLD $@ $(patsubst %, -G __vdso_%, $(vdso-syms)) $@.tmp $@ && \ rm $@.tmp +# Extracts symbol offsets from the VDSO, converting them into an assembly file +# that contains the same symbols at the same offsets. +quiet_cmd_so2s = SO2S $@ + cmd_so2s = $(NM) -D $< | $(srctree)/$(src)/so2s.sh > $@ + # install commands for the unstripped file quiet_cmd_vdso_install = INSTALL $@ cmd_vdso_install = cp $(obj)/$@.dbg $(MODLIB)/vdso/$@ diff --git a/arch/riscv/kernel/vdso/so2s.sh b/arch/riscv/kernel/vdso/so2s.sh new file mode 100755 index 000000000000..e64cb6d9440e --- /dev/null +++ b/arch/riscv/kernel/vdso/so2s.sh @@ -0,0 +1,6 @@ +#!/bin/sh +# SPDX-License-Identifier: GPL-2.0+ +# Copyright 2020 Palmer Dabbelt + +sed 's!\([0-9a-f]*\) T \([a-z0-9_]*\)\(@@LINUX_4.15\)*!.global \2\n.set \2,0x\1!' \ +| grep '^\.' -- cgit v1.2.3 From faf000397e7f103df9953a312e1df21df1dc797f Mon Sep 17 00:00:00 2001 From: Gavin Shan Date: Tue, 3 Nov 2020 11:30:09 +1100 Subject: KVM: arm64: Fix build error in user_mem_abort() MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The PUD and PMD are folded into PGD when the following options are enabled. In that case, PUD_SHIFT is equal to PMD_SHIFT and we fail to build with the indicated errors: CONFIG_ARM64_VA_BITS_42=y CONFIG_ARM64_PAGE_SHIFT=16 CONFIG_PGTABLE_LEVELS=3 arch/arm64/kvm/mmu.c: In function ‘user_mem_abort’: arch/arm64/kvm/mmu.c:798:2: error: duplicate case value case PMD_SHIFT: ^~~~ arch/arm64/kvm/mmu.c:791:2: note: previously used here case PUD_SHIFT: ^~~~ This fixes the issue by skipping the check on PUD huge page when PUD and PMD are folded into PGD. Fixes: 2f40c46021bbb ("KVM: arm64: Use fallback mapping sizes for contiguous huge page sizes") Reported-by: Eric Auger Signed-off-by: Gavin Shan Signed-off-by: Marc Zyngier Link: https://lore.kernel.org/r/20201103003009.32955-1-gshan@redhat.com --- arch/arm64/kvm/mmu.c | 2 ++ 1 file changed, 2 insertions(+) (limited to 'arch') diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c index c7c6df6309d5..a109c5001827 100644 --- a/arch/arm64/kvm/mmu.c +++ b/arch/arm64/kvm/mmu.c @@ -788,10 +788,12 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, } switch (vma_shift) { +#ifndef __PAGETABLE_PMD_FOLDED case PUD_SHIFT: if (fault_supports_stage2_huge_mapping(memslot, hva, PUD_SIZE)) break; fallthrough; +#endif case CONT_PMD_SHIFT: vma_shift = PMD_SHIFT; fallthrough; -- cgit v1.2.3 From f81cb2c3ad41ac6d8cb2650e3d72d5f67db1aa28 Mon Sep 17 00:00:00 2001 From: Andrew Jones Date: Thu, 5 Nov 2020 10:10:19 +0100 Subject: KVM: arm64: Don't hide ID registers from userspace MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit ID registers are RAZ until they've been allocated a purpose, but that doesn't mean they should be removed from the KVM_GET_REG_LIST list. So far we only have one register, SYS_ID_AA64ZFR0_EL1, that is hidden from userspace when its function, SVE, is not present. Expose SYS_ID_AA64ZFR0_EL1 to userspace as RAZ when SVE is not implemented. Removing the userspace visibility checks is enough to reexpose it, as it will already return zero to userspace when SVE is not present. The register already behaves as RAZ for the guest when SVE is not present. Fixes: 73433762fcae ("KVM: arm64/sve: System register context switch and access support") Reported-by: 张东旭 Signed-off-by: Andrew Jones Signed-off-by: Marc Zyngier Cc: stable@vger.kernel.org#v5.2+ Link: https://lore.kernel.org/r/20201105091022.15373-2-drjones@redhat.com --- arch/arm64/kvm/sys_regs.c | 18 +----------------- 1 file changed, 1 insertion(+), 17 deletions(-) (limited to 'arch') diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c index 983994f01a63..3af306e6b9cd 100644 --- a/arch/arm64/kvm/sys_regs.c +++ b/arch/arm64/kvm/sys_regs.c @@ -1193,16 +1193,6 @@ static unsigned int sve_visibility(const struct kvm_vcpu *vcpu, return REG_HIDDEN_USER | REG_HIDDEN_GUEST; } -/* Visibility overrides for SVE-specific ID registers */ -static unsigned int sve_id_visibility(const struct kvm_vcpu *vcpu, - const struct sys_reg_desc *rd) -{ - if (vcpu_has_sve(vcpu)) - return 0; - - return REG_HIDDEN_USER; -} - /* Generate the emulated ID_AA64ZFR0_EL1 value exposed to the guest */ static u64 guest_id_aa64zfr0_el1(const struct kvm_vcpu *vcpu) { @@ -1229,9 +1219,6 @@ static int get_id_aa64zfr0_el1(struct kvm_vcpu *vcpu, { u64 val; - if (WARN_ON(!vcpu_has_sve(vcpu))) - return -ENOENT; - val = guest_id_aa64zfr0_el1(vcpu); return reg_to_user(uaddr, &val, reg->id); } @@ -1244,9 +1231,6 @@ static int set_id_aa64zfr0_el1(struct kvm_vcpu *vcpu, int err; u64 val; - if (WARN_ON(!vcpu_has_sve(vcpu))) - return -ENOENT; - err = reg_from_user(&val, uaddr, id); if (err) return err; @@ -1509,7 +1493,7 @@ static const struct sys_reg_desc sys_reg_descs[] = { ID_SANITISED(ID_AA64PFR1_EL1), ID_UNALLOCATED(4,2), ID_UNALLOCATED(4,3), - { SYS_DESC(SYS_ID_AA64ZFR0_EL1), access_id_aa64zfr0_el1, .get_user = get_id_aa64zfr0_el1, .set_user = set_id_aa64zfr0_el1, .visibility = sve_id_visibility }, + { SYS_DESC(SYS_ID_AA64ZFR0_EL1), access_id_aa64zfr0_el1, .get_user = get_id_aa64zfr0_el1, .set_user = set_id_aa64zfr0_el1, }, ID_UNALLOCATED(4,5), ID_UNALLOCATED(4,6), ID_UNALLOCATED(4,7), -- cgit v1.2.3 From 01fe5ace92ddb8732e3331355e7ba9cb6f2ef787 Mon Sep 17 00:00:00 2001 From: Andrew Jones Date: Thu, 5 Nov 2020 10:10:20 +0100 Subject: KVM: arm64: Consolidate REG_HIDDEN_GUEST/USER REG_HIDDEN_GUEST and REG_HIDDEN_USER are always used together. Consolidate them into a single REG_HIDDEN flag. We can always add another flag later if some register needs to expose itself differently to the guest than it does to userspace. No functional change intended. Signed-off-by: Andrew Jones Signed-off-by: Marc Zyngier Link: https://lore.kernel.org/r/20201105091022.15373-3-drjones@redhat.com --- arch/arm64/kvm/sys_regs.c | 12 ++++++------ arch/arm64/kvm/sys_regs.h | 18 ++++-------------- 2 files changed, 10 insertions(+), 20 deletions(-) (limited to 'arch') diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c index 3af306e6b9cd..26a285127620 100644 --- a/arch/arm64/kvm/sys_regs.c +++ b/arch/arm64/kvm/sys_regs.c @@ -1069,7 +1069,7 @@ static bool trap_ptrauth(struct kvm_vcpu *vcpu, static unsigned int ptrauth_visibility(const struct kvm_vcpu *vcpu, const struct sys_reg_desc *rd) { - return vcpu_has_ptrauth(vcpu) ? 0 : REG_HIDDEN_USER | REG_HIDDEN_GUEST; + return vcpu_has_ptrauth(vcpu) ? 0 : REG_HIDDEN; } #define __PTRAUTH_KEY(k) \ @@ -1190,7 +1190,7 @@ static unsigned int sve_visibility(const struct kvm_vcpu *vcpu, if (vcpu_has_sve(vcpu)) return 0; - return REG_HIDDEN_USER | REG_HIDDEN_GUEST; + return REG_HIDDEN; } /* Generate the emulated ID_AA64ZFR0_EL1 value exposed to the guest */ @@ -2153,7 +2153,7 @@ static void perform_access(struct kvm_vcpu *vcpu, trace_kvm_sys_access(*vcpu_pc(vcpu), params, r); /* Check for regs disabled by runtime config */ - if (sysreg_hidden_from_guest(vcpu, r)) { + if (sysreg_hidden(vcpu, r)) { kvm_inject_undefined(vcpu); return; } @@ -2652,7 +2652,7 @@ int kvm_arm_sys_reg_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg return get_invariant_sys_reg(reg->id, uaddr); /* Check for regs disabled by runtime config */ - if (sysreg_hidden_from_user(vcpu, r)) + if (sysreg_hidden(vcpu, r)) return -ENOENT; if (r->get_user) @@ -2677,7 +2677,7 @@ int kvm_arm_sys_reg_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg return set_invariant_sys_reg(reg->id, uaddr); /* Check for regs disabled by runtime config */ - if (sysreg_hidden_from_user(vcpu, r)) + if (sysreg_hidden(vcpu, r)) return -ENOENT; if (r->set_user) @@ -2748,7 +2748,7 @@ static int walk_one_sys_reg(const struct kvm_vcpu *vcpu, if (!(rd->reg || rd->get_user)) return 0; - if (sysreg_hidden_from_user(vcpu, rd)) + if (sysreg_hidden(vcpu, rd)) return 0; if (!copy_reg_to_user(rd, uind)) diff --git a/arch/arm64/kvm/sys_regs.h b/arch/arm64/kvm/sys_regs.h index 5a6fc30f5989..2641b2ee6a91 100644 --- a/arch/arm64/kvm/sys_regs.h +++ b/arch/arm64/kvm/sys_regs.h @@ -59,8 +59,7 @@ struct sys_reg_desc { const struct sys_reg_desc *rd); }; -#define REG_HIDDEN_USER (1 << 0) /* hidden from userspace ioctls */ -#define REG_HIDDEN_GUEST (1 << 1) /* hidden from guest */ +#define REG_HIDDEN (1 << 0) /* hidden from userspace and guest */ static __printf(2, 3) inline void print_sys_reg_msg(const struct sys_reg_params *p, @@ -111,22 +110,13 @@ static inline void reset_val(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r __vcpu_sys_reg(vcpu, r->reg) = r->val; } -static inline bool sysreg_hidden_from_guest(const struct kvm_vcpu *vcpu, - const struct sys_reg_desc *r) -{ - if (likely(!r->visibility)) - return false; - - return r->visibility(vcpu, r) & REG_HIDDEN_GUEST; -} - -static inline bool sysreg_hidden_from_user(const struct kvm_vcpu *vcpu, - const struct sys_reg_desc *r) +static inline bool sysreg_hidden(const struct kvm_vcpu *vcpu, + const struct sys_reg_desc *r) { if (likely(!r->visibility)) return false; - return r->visibility(vcpu, r) & REG_HIDDEN_USER; + return r->visibility(vcpu, r) & REG_HIDDEN; } static inline int cmp_sys_reg(const struct sys_reg_desc *i1, -- cgit v1.2.3 From 912dee572691ffb2b387dd8b4f183d549a6b24d1 Mon Sep 17 00:00:00 2001 From: Andrew Jones Date: Thu, 5 Nov 2020 10:10:21 +0100 Subject: KVM: arm64: Check RAZ visibility in ID register accessors The instruction encodings of ID registers are preallocated. Until an encoding is assigned a purpose the register is RAZ. KVM's general ID register accessor functions already support both paths, RAZ or not. If for each ID register we can determine if it's RAZ or not, then all ID registers can build on the general functions. The register visibility function allows us to check whether a register should be completely hidden or not, extending it to also report when the register should be RAZ or not allows us to use it for ID registers as well. Check for RAZ visibility in the ID register accessor functions, allowing the RAZ case to be handled in a generic way for all system registers. The new REG_RAZ flag will be used in a later patch. This patch has no intended functional change. Signed-off-by: Andrew Jones Signed-off-by: Marc Zyngier Link: https://lore.kernel.org/r/20201105091022.15373-4-drjones@redhat.com --- arch/arm64/kvm/sys_regs.c | 19 ++++++++++++++++--- arch/arm64/kvm/sys_regs.h | 10 ++++++++++ 2 files changed, 26 insertions(+), 3 deletions(-) (limited to 'arch') diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c index 26a285127620..4ee098abd87c 100644 --- a/arch/arm64/kvm/sys_regs.c +++ b/arch/arm64/kvm/sys_regs.c @@ -1151,6 +1151,12 @@ static u64 read_id_reg(const struct kvm_vcpu *vcpu, return val; } +static unsigned int id_visibility(const struct kvm_vcpu *vcpu, + const struct sys_reg_desc *r) +{ + return 0; +} + /* cpufeature ID register access trap handlers */ static bool __access_id_reg(struct kvm_vcpu *vcpu, @@ -1169,7 +1175,9 @@ static bool access_id_reg(struct kvm_vcpu *vcpu, struct sys_reg_params *p, const struct sys_reg_desc *r) { - return __access_id_reg(vcpu, p, r, false); + bool raz = sysreg_visible_as_raz(vcpu, r); + + return __access_id_reg(vcpu, p, r, raz); } static bool access_raz_id_reg(struct kvm_vcpu *vcpu, @@ -1281,13 +1289,17 @@ static int __set_id_reg(const struct kvm_vcpu *vcpu, static int get_id_reg(struct kvm_vcpu *vcpu, const struct sys_reg_desc *rd, const struct kvm_one_reg *reg, void __user *uaddr) { - return __get_id_reg(vcpu, rd, uaddr, false); + bool raz = sysreg_visible_as_raz(vcpu, rd); + + return __get_id_reg(vcpu, rd, uaddr, raz); } static int set_id_reg(struct kvm_vcpu *vcpu, const struct sys_reg_desc *rd, const struct kvm_one_reg *reg, void __user *uaddr) { - return __set_id_reg(vcpu, rd, uaddr, false); + bool raz = sysreg_visible_as_raz(vcpu, rd); + + return __set_id_reg(vcpu, rd, uaddr, raz); } static int get_raz_id_reg(struct kvm_vcpu *vcpu, const struct sys_reg_desc *rd, @@ -1372,6 +1384,7 @@ static bool access_ccsidr(struct kvm_vcpu *vcpu, struct sys_reg_params *p, .access = access_id_reg, \ .get_user = get_id_reg, \ .set_user = set_id_reg, \ + .visibility = id_visibility, \ } /* diff --git a/arch/arm64/kvm/sys_regs.h b/arch/arm64/kvm/sys_regs.h index 2641b2ee6a91..0f95964339b1 100644 --- a/arch/arm64/kvm/sys_regs.h +++ b/arch/arm64/kvm/sys_regs.h @@ -60,6 +60,7 @@ struct sys_reg_desc { }; #define REG_HIDDEN (1 << 0) /* hidden from userspace and guest */ +#define REG_RAZ (1 << 1) /* RAZ from userspace and guest */ static __printf(2, 3) inline void print_sys_reg_msg(const struct sys_reg_params *p, @@ -119,6 +120,15 @@ static inline bool sysreg_hidden(const struct kvm_vcpu *vcpu, return r->visibility(vcpu, r) & REG_HIDDEN; } +static inline bool sysreg_visible_as_raz(const struct kvm_vcpu *vcpu, + const struct sys_reg_desc *r) +{ + if (likely(!r->visibility)) + return false; + + return r->visibility(vcpu, r) & REG_RAZ; +} + static inline int cmp_sys_reg(const struct sys_reg_desc *i1, const struct sys_reg_desc *i2) { -- cgit v1.2.3 From c512298eed0360923d0cbc4a1f30bc0509af0d50 Mon Sep 17 00:00:00 2001 From: Andrew Jones Date: Thu, 5 Nov 2020 10:10:22 +0100 Subject: KVM: arm64: Remove AA64ZFR0_EL1 accessors The AA64ZFR0_EL1 accessors are just the general accessors with its visibility function open-coded. It also skips the if-else chain in read_id_reg, but there's no reason not to go there. Indeed consolidating ID register accessors and removing lines of code make it worthwhile. Remove the AA64ZFR0_EL1 accessors, replacing them with the general accessors for sanitized ID registers. No functional change intended. Signed-off-by: Andrew Jones Signed-off-by: Marc Zyngier Link: https://lore.kernel.org/r/20201105091022.15373-5-drjones@redhat.com --- arch/arm64/kvm/sys_regs.c | 61 +++++++++-------------------------------------- 1 file changed, 11 insertions(+), 50 deletions(-) (limited to 'arch') diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c index 4ee098abd87c..436043cd327e 100644 --- a/arch/arm64/kvm/sys_regs.c +++ b/arch/arm64/kvm/sys_regs.c @@ -1154,6 +1154,16 @@ static u64 read_id_reg(const struct kvm_vcpu *vcpu, static unsigned int id_visibility(const struct kvm_vcpu *vcpu, const struct sys_reg_desc *r) { + u32 id = sys_reg((u32)r->Op0, (u32)r->Op1, + (u32)r->CRn, (u32)r->CRm, (u32)r->Op2); + + switch (id) { + case SYS_ID_AA64ZFR0_EL1: + if (!vcpu_has_sve(vcpu)) + return REG_RAZ; + break; + } + return 0; } @@ -1201,55 +1211,6 @@ static unsigned int sve_visibility(const struct kvm_vcpu *vcpu, return REG_HIDDEN; } -/* Generate the emulated ID_AA64ZFR0_EL1 value exposed to the guest */ -static u64 guest_id_aa64zfr0_el1(const struct kvm_vcpu *vcpu) -{ - if (!vcpu_has_sve(vcpu)) - return 0; - - return read_sanitised_ftr_reg(SYS_ID_AA64ZFR0_EL1); -} - -static bool access_id_aa64zfr0_el1(struct kvm_vcpu *vcpu, - struct sys_reg_params *p, - const struct sys_reg_desc *rd) -{ - if (p->is_write) - return write_to_read_only(vcpu, p, rd); - - p->regval = guest_id_aa64zfr0_el1(vcpu); - return true; -} - -static int get_id_aa64zfr0_el1(struct kvm_vcpu *vcpu, - const struct sys_reg_desc *rd, - const struct kvm_one_reg *reg, void __user *uaddr) -{ - u64 val; - - val = guest_id_aa64zfr0_el1(vcpu); - return reg_to_user(uaddr, &val, reg->id); -} - -static int set_id_aa64zfr0_el1(struct kvm_vcpu *vcpu, - const struct sys_reg_desc *rd, - const struct kvm_one_reg *reg, void __user *uaddr) -{ - const u64 id = sys_reg_to_index(rd); - int err; - u64 val; - - err = reg_from_user(&val, uaddr, id); - if (err) - return err; - - /* This is what we mean by invariant: you can't change it. */ - if (val != guest_id_aa64zfr0_el1(vcpu)) - return -EINVAL; - - return 0; -} - /* * cpufeature ID register user accessors * @@ -1506,7 +1467,7 @@ static const struct sys_reg_desc sys_reg_descs[] = { ID_SANITISED(ID_AA64PFR1_EL1), ID_UNALLOCATED(4,2), ID_UNALLOCATED(4,3), - { SYS_DESC(SYS_ID_AA64ZFR0_EL1), access_id_aa64zfr0_el1, .get_user = get_id_aa64zfr0_el1, .set_user = set_id_aa64zfr0_el1, }, + ID_SANITISED(ID_AA64ZFR0_EL1), ID_UNALLOCATED(4,5), ID_UNALLOCATED(4,6), ID_UNALLOCATED(4,7), -- cgit v1.2.3 From 1aec69ae56be28b5fd3c9daead5f3840c30153c8 Mon Sep 17 00:00:00 2001 From: Mike Travis Date: Thu, 5 Nov 2020 16:27:39 -0600 Subject: x86/platform/uv: Fix missing OEM_TABLE_ID Testing shows a problem in that the OEM_TABLE_ID was missing for hubless systems. This is used to determine the APIC type (legacy or extended). Add the OEM_TABLE_ID to the early hubless processing. Fixes: 1e61f5a95f191 ("Add and decode Arch Type in UVsystab") Signed-off-by: Mike Travis Signed-off-by: Thomas Gleixner Link: https://lore.kernel.org/r/20201105222741.157029-2-mike.travis@hpe.com --- arch/x86/kernel/apic/x2apic_uv_x.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) (limited to 'arch') diff --git a/arch/x86/kernel/apic/x2apic_uv_x.c b/arch/x86/kernel/apic/x2apic_uv_x.c index 714233cee0b5..a5794794ea59 100644 --- a/arch/x86/kernel/apic/x2apic_uv_x.c +++ b/arch/x86/kernel/apic/x2apic_uv_x.c @@ -366,7 +366,7 @@ static int __init early_get_arch_type(void) return ret; } -static int __init uv_set_system_type(char *_oem_id) +static int __init uv_set_system_type(char *_oem_id, char *_oem_table_id) { /* Save OEM_ID passed from ACPI MADT */ uv_stringify(sizeof(oem_id), oem_id, _oem_id); @@ -394,6 +394,9 @@ static int __init uv_set_system_type(char *_oem_id) else uv_hubless_system = 0x9; + /* Copy APIC type */ + uv_stringify(sizeof(oem_table_id), oem_table_id, _oem_table_id); + pr_info("UV: OEM IDs %s/%s, SystemType %d, HUBLESS ID %x\n", oem_id, oem_table_id, uv_system_type, uv_hubless_system); return 0; @@ -456,7 +459,7 @@ static int __init uv_acpi_madt_oem_check(char *_oem_id, char *_oem_table_id) uv_cpu_info->p_uv_hub_info = &uv_hub_info_node0; /* If not UV, return. */ - if (likely(uv_set_system_type(_oem_id) == 0)) + if (uv_set_system_type(_oem_id, _oem_table_id) == 0) return 0; /* Save and Decode OEM Table ID */ -- cgit v1.2.3 From 1aee505e0171fc38fd5ed70c7f0dcbb7398c759f Mon Sep 17 00:00:00 2001 From: Mike Travis Date: Thu, 5 Nov 2020 16:27:40 -0600 Subject: x86/platform/uv: Remove spaces from OEM IDs Testing shows that trailing spaces caused problems with the OEM_ID and the OEM_TABLE_ID. One being that the OEM_ID would not string compare correctly. Another the OEM_ID and OEM_TABLE_ID would be concatenated in the printout. Remove any trailing spaces. Fixes: 1e61f5a95f191 ("Add and decode Arch Type in UVsystab") Signed-off-by: Mike Travis Signed-off-by: Thomas Gleixner Link: https://lore.kernel.org/r/20201105222741.157029-3-mike.travis@hpe.com --- arch/x86/kernel/apic/x2apic_uv_x.c | 3 +++ 1 file changed, 3 insertions(+) (limited to 'arch') diff --git a/arch/x86/kernel/apic/x2apic_uv_x.c b/arch/x86/kernel/apic/x2apic_uv_x.c index a5794794ea59..0f848d6dddc9 100644 --- a/arch/x86/kernel/apic/x2apic_uv_x.c +++ b/arch/x86/kernel/apic/x2apic_uv_x.c @@ -290,6 +290,9 @@ static void __init uv_stringify(int len, char *to, char *from) { /* Relies on 'to' being NULL chars so result will be NULL terminated */ strncpy(to, from, len-1); + + /* Trim trailing spaces */ + (void)strim(to); } /* Find UV arch type entry in UVsystab */ -- cgit v1.2.3 From 801284f9737883a2b2639bd494455a72c82fdedf Mon Sep 17 00:00:00 2001 From: Mike Travis Date: Thu, 5 Nov 2020 16:27:41 -0600 Subject: x86/platform/uv: Recognize UV5 hubless system identifier Testing shows a problem in that UV5 hubless systems were not being recognized. Add them to the list of OEM IDs checked. Fixes: 6c7794423a998 ("Add UV5 direct references") Signed-off-by: Mike Travis Signed-off-by: Thomas Gleixner Link: https://lore.kernel.org/r/20201105222741.157029-4-mike.travis@hpe.com --- arch/x86/kernel/apic/x2apic_uv_x.c | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-) (limited to 'arch') diff --git a/arch/x86/kernel/apic/x2apic_uv_x.c b/arch/x86/kernel/apic/x2apic_uv_x.c index 0f848d6dddc9..3115caa7d7d0 100644 --- a/arch/x86/kernel/apic/x2apic_uv_x.c +++ b/arch/x86/kernel/apic/x2apic_uv_x.c @@ -389,13 +389,20 @@ static int __init uv_set_system_type(char *_oem_id, char *_oem_table_id) /* (Not hubless), not a UV */ return 0; + /* Is UV hubless system */ + uv_hubless_system = 0x01; + + /* UV5 Hubless */ + if (strncmp(uv_archtype, "NSGI5", 5) == 0) + uv_hubless_system |= 0x20; + /* UV4 Hubless: CH */ - if (strncmp(uv_archtype, "NSGI4", 5) == 0) - uv_hubless_system = 0x11; + else if (strncmp(uv_archtype, "NSGI4", 5) == 0) + uv_hubless_system |= 0x10; /* UV3 Hubless: UV300/MC990X w/o hub */ else - uv_hubless_system = 0x9; + uv_hubless_system |= 0x8; /* Copy APIC type */ uv_stringify(sizeof(oem_table_id), oem_table_id, _oem_table_id); -- cgit v1.2.3 From 0011c6d182774fc781fb9e115ebe8baa356029ae Mon Sep 17 00:00:00 2001 From: Markus Reichl Date: Wed, 4 Nov 2020 17:23:55 +0100 Subject: arm64: dts: rockchip: Assign a fixed index to mmc devices on rk3399 boards. Recently introduced async probe on mmc devices can shuffle block IDs. Pin them to fixed values to ease booting in environments where UUIDs are not practical. Use newly introduced aliases for mmcblk devices from [1]. [1] https://patchwork.kernel.org/patch/11747669/ Signed-off-by: Markus Reichl Reviewed-by: Douglas Anderson Link: https://lore.kernel.org/r/20201104162356.1251-1-m.reichl@fivetechno.de Signed-off-by: Heiko Stuebner --- arch/arm64/boot/dts/rockchip/rk3399.dtsi | 3 +++ 1 file changed, 3 insertions(+) (limited to 'arch') diff --git a/arch/arm64/boot/dts/rockchip/rk3399.dtsi b/arch/arm64/boot/dts/rockchip/rk3399.dtsi index ada724b12f01..7a9a7aca86c6 100644 --- a/arch/arm64/boot/dts/rockchip/rk3399.dtsi +++ b/arch/arm64/boot/dts/rockchip/rk3399.dtsi @@ -29,6 +29,9 @@ i2c6 = &i2c6; i2c7 = &i2c7; i2c8 = &i2c8; + mmc0 = &sdio0; + mmc1 = &sdmmc; + mmc2 = &sdhci; serial0 = &uart0; serial1 = &uart1; serial2 = &uart2; -- cgit v1.2.3 From 7327c8b98e2e14c47021eea14d1ab268086a6408 Mon Sep 17 00:00:00 2001 From: Markus Reichl Date: Wed, 4 Nov 2020 20:29:31 +0100 Subject: arm64: dts: rockchip: Reorder LED triggers from mmc devices on rk3399-roc-pc. After patch [1] SD-card becomes mmc1 and eMMC becomes mmc2. Correct trigger of LEDs accordingly. [1] https://patchwork.kernel.org/patch/11881427 Signed-off-by: Markus Reichl Link: https://lore.kernel.org/r/20201104192933.1001-1-m.reichl@fivetechno.de Signed-off-by: Heiko Stuebner --- arch/arm64/boot/dts/rockchip/rk3399-roc-pc.dtsi | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'arch') diff --git a/arch/arm64/boot/dts/rockchip/rk3399-roc-pc.dtsi b/arch/arm64/boot/dts/rockchip/rk3399-roc-pc.dtsi index e7a459fa4322..20309076dbac 100644 --- a/arch/arm64/boot/dts/rockchip/rk3399-roc-pc.dtsi +++ b/arch/arm64/boot/dts/rockchip/rk3399-roc-pc.dtsi @@ -74,14 +74,14 @@ label = "red:diy"; gpios = <&gpio0 RK_PB5 GPIO_ACTIVE_HIGH>; default-state = "off"; - linux,default-trigger = "mmc1"; + linux,default-trigger = "mmc2"; }; yellow_led: led-2 { label = "yellow:yellow-led"; gpios = <&gpio0 RK_PA2 GPIO_ACTIVE_HIGH>; default-state = "off"; - linux,default-trigger = "mmc0"; + linux,default-trigger = "mmc1"; }; }; -- cgit v1.2.3 From 01776f070ffcbf336be3bf1672bd3c589548d6c4 Mon Sep 17 00:00:00 2001 From: Christophe Leroy Date: Sat, 7 Nov 2020 09:07:40 +0000 Subject: powerpc/32s: Use relocation offset when setting early hash table When calling early_hash_table(), the kernel hasn't been yet relocated to its linking address, so data must be addressed with relocation offset. Add relocation offset to write into Hash in early_hash_table(). Fixes: 69a1593abdbc ("powerpc/32s: Setup the early hash table at all time.") Reported-by: Erhard Furtner Reported-by: Andreas Schwab Signed-off-by: Christophe Leroy Tested-by: Serge Belyshev Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/9e225a856a8b22e0e77587ee22ab7a2f5bca8753.1604740029.git.christophe.leroy@csgroup.eu --- arch/powerpc/kernel/head_book3s_32.S | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/powerpc/kernel/head_book3s_32.S b/arch/powerpc/kernel/head_book3s_32.S index 2aa16d5368e1..a0dda2a1f2df 100644 --- a/arch/powerpc/kernel/head_book3s_32.S +++ b/arch/powerpc/kernel/head_book3s_32.S @@ -156,6 +156,7 @@ __after_mmu_off: bl initial_bats bl load_segment_registers BEGIN_MMU_FTR_SECTION + bl reloc_offset bl early_hash_table END_MMU_FTR_SECTION_IFSET(MMU_FTR_HPTE_TABLE) #if defined(CONFIG_BOOTX_TEXT) @@ -920,7 +921,7 @@ early_hash_table: ori r6, r6, 3 /* 256kB table */ mtspr SPRN_SDR1, r6 lis r6, early_hash@h - lis r3, Hash@ha + addis r3, r3, Hash@ha stw r6, Hash@l(r3) blr -- cgit v1.2.3 From c6c4f961cb879aed67b1343bdef2087c899fdaa9 Mon Sep 17 00:00:00 2001 From: Li RongQing Date: Sun, 27 Sep 2020 16:44:57 +0800 Subject: KVM: x86/mmu: fix counting of rmap entries in pte_list_add Fix an off-by-one style bug in pte_list_add() where it failed to account the last full set of SPTEs, i.e. when desc->sptes is full and desc->more is NULL. Merge the two "PTE_LIST_EXT-1" checks as part of the fix to avoid an extra comparison. Signed-off-by: Li RongQing Reviewed-by: Sean Christopherson Message-Id: <1601196297-24104-1-git-send-email-lirongqing@baidu.com> Signed-off-by: Paolo Bonzini --- arch/x86/kvm/mmu/mmu.c | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) (limited to 'arch') diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 1f96adff8dc4..5bb1939b65d8 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -856,12 +856,14 @@ static int pte_list_add(struct kvm_vcpu *vcpu, u64 *spte, } else { rmap_printk("pte_list_add: %p %llx many->many\n", spte, *spte); desc = (struct pte_list_desc *)(rmap_head->val & ~1ul); - while (desc->sptes[PTE_LIST_EXT-1] && desc->more) { - desc = desc->more; + while (desc->sptes[PTE_LIST_EXT-1]) { count += PTE_LIST_EXT; - } - if (desc->sptes[PTE_LIST_EXT-1]) { - desc->more = mmu_alloc_pte_list_desc(vcpu); + + if (!desc->more) { + desc->more = mmu_alloc_pte_list_desc(vcpu); + desc = desc->more; + break; + } desc = desc->more; } for (i = 0; desc->sptes[i]; ++i) -- cgit v1.2.3 From cc4cb017678aa66d3fb4501b2f7424ed28fc7f4d Mon Sep 17 00:00:00 2001 From: Maxim Levitsky Date: Sun, 1 Nov 2020 13:55:23 +0200 Subject: KVM: x86: use positive error values for msr emulation that causes #GP Recent introduction of the userspace msr filtering added code that uses negative error codes for cases that result in either #GP delivery to the guest, or handled by the userspace msr filtering. This breaks an assumption that a negative error code returned from the msr emulation code is a semi-fatal error which should be returned to userspace via KVM_RUN ioctl and usually kill the guest. Fix this by reusing the already existing KVM_MSR_RET_INVALID error code, and by adding a new KVM_MSR_RET_FILTERED error code for the userspace filtered msrs. Fixes: 291f35fb2c1d1 ("KVM: x86: report negative values from wrmsr emulation to userspace") Reported-by: Qian Cai Signed-off-by: Maxim Levitsky Message-Id: <20201101115523.115780-1-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini --- arch/x86/kvm/x86.c | 28 +++++++++++++++------------- arch/x86/kvm/x86.h | 8 +++++++- 2 files changed, 22 insertions(+), 14 deletions(-) (limited to 'arch') diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index f5ede41bf9e6..a4c59be8d566 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -255,11 +255,10 @@ static struct kmem_cache *x86_emulator_cache; /* * When called, it means the previous get/set msr reached an invalid msr. - * Return 0 if we want to ignore/silent this failed msr access, or 1 if we want - * to fail the caller. + * Return true if we want to ignore/silent this failed msr access. */ -static int kvm_msr_ignored_check(struct kvm_vcpu *vcpu, u32 msr, - u64 data, bool write) +static bool kvm_msr_ignored_check(struct kvm_vcpu *vcpu, u32 msr, + u64 data, bool write) { const char *op = write ? "wrmsr" : "rdmsr"; @@ -268,11 +267,11 @@ static int kvm_msr_ignored_check(struct kvm_vcpu *vcpu, u32 msr, kvm_pr_unimpl("ignored %s: 0x%x data 0x%llx\n", op, msr, data); /* Mask the error */ - return 0; + return true; } else { kvm_debug_ratelimited("unhandled %s: 0x%x data 0x%llx\n", op, msr, data); - return -ENOENT; + return false; } } @@ -1416,7 +1415,8 @@ static int do_get_msr_feature(struct kvm_vcpu *vcpu, unsigned index, u64 *data) if (r == KVM_MSR_RET_INVALID) { /* Unconditionally clear the output for simplicity */ *data = 0; - r = kvm_msr_ignored_check(vcpu, index, 0, false); + if (kvm_msr_ignored_check(vcpu, index, 0, false)) + r = 0; } if (r) @@ -1540,7 +1540,7 @@ static int __kvm_set_msr(struct kvm_vcpu *vcpu, u32 index, u64 data, struct msr_data msr; if (!host_initiated && !kvm_msr_allowed(vcpu, index, KVM_MSR_FILTER_WRITE)) - return -EPERM; + return KVM_MSR_RET_FILTERED; switch (index) { case MSR_FS_BASE: @@ -1581,7 +1581,8 @@ static int kvm_set_msr_ignored_check(struct kvm_vcpu *vcpu, int ret = __kvm_set_msr(vcpu, index, data, host_initiated); if (ret == KVM_MSR_RET_INVALID) - ret = kvm_msr_ignored_check(vcpu, index, data, true); + if (kvm_msr_ignored_check(vcpu, index, data, true)) + ret = 0; return ret; } @@ -1599,7 +1600,7 @@ int __kvm_get_msr(struct kvm_vcpu *vcpu, u32 index, u64 *data, int ret; if (!host_initiated && !kvm_msr_allowed(vcpu, index, KVM_MSR_FILTER_READ)) - return -EPERM; + return KVM_MSR_RET_FILTERED; msr.index = index; msr.host_initiated = host_initiated; @@ -1618,7 +1619,8 @@ static int kvm_get_msr_ignored_check(struct kvm_vcpu *vcpu, if (ret == KVM_MSR_RET_INVALID) { /* Unconditionally clear *data for simplicity */ *data = 0; - ret = kvm_msr_ignored_check(vcpu, index, 0, false); + if (kvm_msr_ignored_check(vcpu, index, 0, false)) + ret = 0; } return ret; @@ -1662,9 +1664,9 @@ static int complete_emulated_wrmsr(struct kvm_vcpu *vcpu) static u64 kvm_msr_reason(int r) { switch (r) { - case -ENOENT: + case KVM_MSR_RET_INVALID: return KVM_MSR_EXIT_REASON_UNKNOWN; - case -EPERM: + case KVM_MSR_RET_FILTERED: return KVM_MSR_EXIT_REASON_FILTER; default: return KVM_MSR_EXIT_REASON_INVAL; diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h index 3900ab0c6004..e7ca622a468f 100644 --- a/arch/x86/kvm/x86.h +++ b/arch/x86/kvm/x86.h @@ -376,7 +376,13 @@ int kvm_handle_memory_failure(struct kvm_vcpu *vcpu, int r, int kvm_handle_invpcid(struct kvm_vcpu *vcpu, unsigned long type, gva_t gva); bool kvm_msr_allowed(struct kvm_vcpu *vcpu, u32 index, u32 type); -#define KVM_MSR_RET_INVALID 2 +/* + * Internal error codes that are used to indicate that MSR emulation encountered + * an error that should result in #GP in the guest, unless userspace + * handles it. + */ +#define KVM_MSR_RET_INVALID 2 /* in-kernel MSR emulation #GP condition */ +#define KVM_MSR_RET_FILTERED 3 /* #GP due to userspace MSR filter */ #define __cr4_reserved_bits(__cpu_has, __c) \ ({ \ -- cgit v1.2.3 From 1930e5ddcead2c23567131e62c86b15efce054be Mon Sep 17 00:00:00 2001 From: Oliver Upton Date: Tue, 27 Oct 2020 16:10:41 -0700 Subject: kvm: x86: reads of restricted pv msrs should also result in #GP commit 66570e966dd9 ("kvm: x86: only provide PV features if enabled in guest's CPUID") only protects against disallowed guest writes to KVM paravirtual msrs, leaving msr reads unchecked. Fix this by enforcing KVM_CPUID_FEATURES for msr reads as well. Fixes: 66570e966dd9 ("kvm: x86: only provide PV features if enabled in guest's CPUID") Signed-off-by: Oliver Upton Reviewed-by: Peter Shier Message-Id: <20201027231044.655110-4-oupton@google.com> Signed-off-by: Paolo Bonzini --- arch/x86/kvm/x86.c | 34 ++++++++++++++++++++++++++++++++++ 1 file changed, 34 insertions(+) (limited to 'arch') diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index a4c59be8d566..d12820acc548 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3465,29 +3465,63 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info) msr_info->data = vcpu->arch.efer; break; case MSR_KVM_WALL_CLOCK: + if (!guest_pv_has(vcpu, KVM_FEATURE_CLOCKSOURCE)) + return 1; + + msr_info->data = vcpu->kvm->arch.wall_clock; + break; case MSR_KVM_WALL_CLOCK_NEW: + if (!guest_pv_has(vcpu, KVM_FEATURE_CLOCKSOURCE2)) + return 1; + msr_info->data = vcpu->kvm->arch.wall_clock; break; case MSR_KVM_SYSTEM_TIME: + if (!guest_pv_has(vcpu, KVM_FEATURE_CLOCKSOURCE)) + return 1; + + msr_info->data = vcpu->arch.time; + break; case MSR_KVM_SYSTEM_TIME_NEW: + if (!guest_pv_has(vcpu, KVM_FEATURE_CLOCKSOURCE2)) + return 1; + msr_info->data = vcpu->arch.time; break; case MSR_KVM_ASYNC_PF_EN: + if (!guest_pv_has(vcpu, KVM_FEATURE_ASYNC_PF)) + return 1; + msr_info->data = vcpu->arch.apf.msr_en_val; break; case MSR_KVM_ASYNC_PF_INT: + if (!guest_pv_has(vcpu, KVM_FEATURE_ASYNC_PF_INT)) + return 1; + msr_info->data = vcpu->arch.apf.msr_int_val; break; case MSR_KVM_ASYNC_PF_ACK: + if (!guest_pv_has(vcpu, KVM_FEATURE_ASYNC_PF)) + return 1; + msr_info->data = 0; break; case MSR_KVM_STEAL_TIME: + if (!guest_pv_has(vcpu, KVM_FEATURE_STEAL_TIME)) + return 1; + msr_info->data = vcpu->arch.st.msr_val; break; case MSR_KVM_PV_EOI_EN: + if (!guest_pv_has(vcpu, KVM_FEATURE_PV_EOI)) + return 1; + msr_info->data = vcpu->arch.pv_eoi.msr_val; break; case MSR_KVM_POLL_CONTROL: + if (!guest_pv_has(vcpu, KVM_FEATURE_POLL_CONTROL)) + return 1; + msr_info->data = vcpu->arch.msr_kvm_poll_control; break; case MSR_IA32_P5_MC_ADDR: -- cgit v1.2.3 From 01b4f510b9f467abfc781e198e810e1ecffb782e Mon Sep 17 00:00:00 2001 From: Oliver Upton Date: Tue, 27 Oct 2020 16:10:42 -0700 Subject: kvm: x86: ensure pv_cpuid.features is initialized when enabling cap Make the paravirtual cpuid enforcement mechanism idempotent to ioctl() ordering by updating pv_cpuid.features whenever userspace requests the capability. Extract this update out of kvm_update_cpuid_runtime() into a new helper function and move its other call site into kvm_vcpu_after_set_cpuid() where it more likely belongs. Fixes: 66570e966dd9 ("kvm: x86: only provide PV features if enabled in guest's CPUID") Signed-off-by: Oliver Upton Reviewed-by: Peter Shier Message-Id: <20201027231044.655110-5-oupton@google.com> Signed-off-by: Paolo Bonzini --- arch/x86/kvm/cpuid.c | 23 ++++++++++++++++------- arch/x86/kvm/cpuid.h | 1 + arch/x86/kvm/x86.c | 2 ++ 3 files changed, 19 insertions(+), 7 deletions(-) (limited to 'arch') diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 06a278b3701d..d50041f570e8 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -90,6 +90,20 @@ static int kvm_check_cpuid(struct kvm_cpuid_entry2 *entries, int nent) return 0; } +void kvm_update_pv_runtime(struct kvm_vcpu *vcpu) +{ + struct kvm_cpuid_entry2 *best; + + best = kvm_find_cpuid_entry(vcpu, KVM_CPUID_FEATURES, 0); + + /* + * save the feature bitmap to avoid cpuid lookup for every PV + * operation + */ + if (best) + vcpu->arch.pv_cpuid.features = best->eax; +} + void kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu) { struct kvm_cpuid_entry2 *best; @@ -124,13 +138,6 @@ void kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu) (best->eax & (1 << KVM_FEATURE_PV_UNHALT))) best->eax &= ~(1 << KVM_FEATURE_PV_UNHALT); - /* - * save the feature bitmap to avoid cpuid lookup for every PV - * operation - */ - if (best) - vcpu->arch.pv_cpuid.features = best->eax; - if (!kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT)) { best = kvm_find_cpuid_entry(vcpu, 0x1, 0); if (best) @@ -162,6 +169,8 @@ static void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu) vcpu->arch.guest_supported_xcr0 = (best->eax | ((u64)best->edx << 32)) & supported_xcr0; + kvm_update_pv_runtime(vcpu); + vcpu->arch.maxphyaddr = cpuid_query_maxphyaddr(vcpu); kvm_mmu_reset_context(vcpu); diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h index bf8577947ed2..f7a6e8f83783 100644 --- a/arch/x86/kvm/cpuid.h +++ b/arch/x86/kvm/cpuid.h @@ -11,6 +11,7 @@ extern u32 kvm_cpu_caps[NCAPINTS] __read_mostly; void kvm_set_cpu_caps(void); void kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu); +void kvm_update_pv_runtime(struct kvm_vcpu *vcpu); struct kvm_cpuid_entry2 *kvm_find_cpuid_entry(struct kvm_vcpu *vcpu, u32 function, u32 index); int kvm_dev_ioctl_get_cpuid(struct kvm_cpuid2 *cpuid, diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index d12820acc548..948561183fd1 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4611,6 +4611,8 @@ static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu *vcpu, case KVM_CAP_ENFORCE_PV_FEATURE_CPUID: vcpu->arch.pv_cpuid.enforce = cap->args[0]; + if (vcpu->arch.pv_cpuid.enforce) + kvm_update_pv_runtime(vcpu); return 0; -- cgit v1.2.3 From 1e293d1ae88cd0e2a0ad4c275f5dc2d8ae7b4387 Mon Sep 17 00:00:00 2001 From: Oliver Upton Date: Tue, 27 Oct 2020 16:10:43 -0700 Subject: kvm: x86: request masterclock update any time guest uses different msr Commit 5b9bb0ebbcdc ("kvm: x86: encapsulate wrmsr(MSR_KVM_SYSTEM_TIME) emulation in helper fn", 2020-10-21) subtly changed the behavior of guest writes to MSR_KVM_SYSTEM_TIME(_NEW). Restore the previous behavior; update the masterclock any time the guest uses a different msr than before. Fixes: 5b9bb0ebbcdc ("kvm: x86: encapsulate wrmsr(MSR_KVM_SYSTEM_TIME) emulation in helper fn", 2020-10-21) Signed-off-by: Oliver Upton Reviewed-by: Peter Shier Message-Id: <20201027231044.655110-6-oupton@google.com> Signed-off-by: Paolo Bonzini --- arch/x86/kvm/x86.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 948561183fd1..d9651faabe68 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1967,7 +1967,7 @@ static void kvm_write_system_time(struct kvm_vcpu *vcpu, gpa_t system_time, struct kvm_arch *ka = &vcpu->kvm->arch; if (vcpu->vcpu_id == 0 && !host_initiated) { - if (ka->boot_vcpu_runs_old_kvmclock && old_msr) + if (ka->boot_vcpu_runs_old_kvmclock != old_msr) kvm_make_request(KVM_REQ_MASTERCLOCK_UPDATE, vcpu); ka->boot_vcpu_runs_old_kvmclock = old_msr; -- cgit v1.2.3 From 2cdef91cf882abc74dd2f6bfae16db782b44c6ce Mon Sep 17 00:00:00 2001 From: Pankaj Gupta Date: Thu, 5 Nov 2020 16:39:32 +0100 Subject: KVM: x86: handle MSR_IA32_DEBUGCTLMSR with report_ignored_msrs Windows2016 guest tries to enable LBR by setting the corresponding bits in MSR_IA32_DEBUGCTLMSR. KVM does not emulate MSR_IA32_DEBUGCTLMSR and spams the host kernel logs with error messages like: kvm [...]: vcpu1, guest rIP: 0xfffff800a8b687d3 kvm_set_msr_common: MSR_IA32_DEBUGCTLMSR 0x1, nop" This patch fixes this by enabling error logging only with 'report_ignored_msrs=1'. Signed-off-by: Pankaj Gupta Message-Id: <20201105153932.24316-1-pankaj.gupta.linux@gmail.com> Signed-off-by: Paolo Bonzini --- arch/x86/kvm/x86.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'arch') diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index d9651faabe68..447edc0d1d5a 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3065,9 +3065,9 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info) /* Values other than LBR and BTF are vendor-specific, thus reserved and should throw a #GP */ return 1; - } - vcpu_unimpl(vcpu, "%s: MSR_IA32_DEBUGCTLMSR 0x%llx, nop\n", - __func__, data); + } else if (report_ignored_msrs) + vcpu_unimpl(vcpu, "%s: MSR_IA32_DEBUGCTLMSR 0x%llx, nop\n", + __func__, data); break; case 0x200 ... 0x2ff: return kvm_mtrr_set_msr(vcpu, msr, data); -- cgit v1.2.3 From ce9dfafe29bed86fe3cda330ac6072ce84e1ff81 Mon Sep 17 00:00:00 2001 From: Heiko Carstens Date: Tue, 3 Nov 2020 16:55:43 +0100 Subject: s390: fix system call exit path The system call exit path is running with interrupts enabled while checking for TIF/PIF/CIF bits which require special handling. If all bits have been checked interrupts are disabled and the kernel exits to user space. The problem is that after checking all bits and before interrupts are disabled bits can be set already again, due to interrupt handling. This means that the kernel can exit to user space with some TIF/PIF/CIF bits set, which should never happen. E.g. TIF_NEED_RESCHED might be set, which might lead to additional latencies, since that bit will only be recognized with next exit to user space. Fix this by checking the corresponding bits only when interrupts are disabled. Fixes: 0b0ed657fe00 ("s390: remove critical section cleanup from entry.S") Cc: # 5.8 Acked-by: Sven Schnelle Signed-off-by: Heiko Carstens --- arch/s390/kernel/entry.S | 2 ++ 1 file changed, 2 insertions(+) (limited to 'arch') diff --git a/arch/s390/kernel/entry.S b/arch/s390/kernel/entry.S index 86235919c2d1..5346545b9860 100644 --- a/arch/s390/kernel/entry.S +++ b/arch/s390/kernel/entry.S @@ -422,6 +422,7 @@ ENTRY(system_call) #endif LOCKDEP_SYS_EXIT .Lsysc_tif: + DISABLE_INTS TSTMSK __PT_FLAGS(%r11),_PIF_WORK jnz .Lsysc_work TSTMSK __TI_flags(%r12),_TIF_WORK @@ -444,6 +445,7 @@ ENTRY(system_call) # One of the work bits is on. Find out which one. # .Lsysc_work: + ENABLE_INTS TSTMSK __TI_flags(%r12),_TIF_NEED_RESCHED jo .Lsysc_reschedule TSTMSK __PT_FLAGS(%r11),_PIF_SYSCALL_RESTART -- cgit v1.2.3 From 7de8bfaa095fcbc2db2952d4b561be102a41c2a6 Mon Sep 17 00:00:00 2001 From: Lad Prabhakar Date: Wed, 4 Nov 2020 10:55:08 +0000 Subject: arm64: dts: renesas: r8a774e1: Add missing audio_clk_b Add audio_clk_b configured as 0 Hz, this will be overridden by the boards providing the audio clock. Fixes: 8183a7938cfec ("arm64: dts: renesas: r8a774e1: Add audio support") Reported-by: Nobuhiro Iwamatsu Signed-off-by: Lad Prabhakar Link: https://lore.kernel.org/r/20201104105508.21197-1-prabhakar.mahadev-lad.rj@bp.renesas.com Signed-off-by: Geert Uytterhoeven --- arch/arm64/boot/dts/renesas/r8a774e1.dtsi | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'arch') diff --git a/arch/arm64/boot/dts/renesas/r8a774e1.dtsi b/arch/arm64/boot/dts/renesas/r8a774e1.dtsi index 9cbf963aa068..c29643442e91 100644 --- a/arch/arm64/boot/dts/renesas/r8a774e1.dtsi +++ b/arch/arm64/boot/dts/renesas/r8a774e1.dtsi @@ -28,6 +28,12 @@ clock-frequency = <0>; }; + audio_clk_b: audio_clk_b { + compatible = "fixed-clock"; + #clock-cells = <0>; + clock-frequency = <0>; + }; + audio_clk_c: audio_clk_c { compatible = "fixed-clock"; #clock-cells = <0>; -- cgit v1.2.3 From 52d9edbe6efc5042cf57fae6a25d07572ddf398b Mon Sep 17 00:00:00 2001 From: Marek Vasut Date: Thu, 8 Oct 2020 21:35:38 +0200 Subject: ARM: dts: stm32: Fix TA3-GPIO-C key on STM32MP1 DHCOM PDK2 On the prototype DHCOM, the TA3-GPIO-C button was connected to pin PI11 of the STM32MP15xx, however on the production SoM this was changed to pin PG0 to free up the IRQ line 11 for LAN8710i PHY IRQ. Update the connection in the DT. Since the IRQ line 0 is used for PMIC as well and cannot be shared with the button, make the button polled. Fixes: 87cabf9405cb ("ARM: dts: stm32: Add GPIO keys for STM32MP1 DHCOM PDK2") Signed-off-by: Marek Vasut Cc: Alexandre Torgue Cc: Maxime Coquelin Cc: Patrice Chotard Cc: Patrick Delaunay Cc: linux-stm32@st-md-mailman.stormreply.com To: linux-arm-kernel@lists.infradead.org Signed-off-by: Alexandre Torgue --- arch/arm/boot/dts/stm32mp15xx-dhcom-pdk2.dtsi | 17 ++++++++++------- 1 file changed, 10 insertions(+), 7 deletions(-) (limited to 'arch') diff --git a/arch/arm/boot/dts/stm32mp15xx-dhcom-pdk2.dtsi b/arch/arm/boot/dts/stm32mp15xx-dhcom-pdk2.dtsi index 5dff24e39af8..9a0a59678097 100644 --- a/arch/arm/boot/dts/stm32mp15xx-dhcom-pdk2.dtsi +++ b/arch/arm/boot/dts/stm32mp15xx-dhcom-pdk2.dtsi @@ -46,6 +46,16 @@ linux,code = ; gpios = <&gpiof 3 GPIO_ACTIVE_LOW>; }; + + /* + * The EXTi IRQ line 0 is shared with PMIC, + * so mark this as polled GPIO key. + */ + button-2 { + label = "TA3-GPIO-C"; + linux,code = ; + gpios = <&gpiog 0 GPIO_ACTIVE_LOW>; + }; }; gpio-keys { @@ -59,13 +69,6 @@ wakeup-source; }; - button-2 { - label = "TA3-GPIO-C"; - linux,code = ; - gpios = <&gpioi 11 GPIO_ACTIVE_LOW>; - wakeup-source; - }; - button-3 { label = "TA4-GPIO-D"; linux,code = ; -- cgit v1.2.3 From 7e5f3155dcbb4d724386b30cc232002d9b9d81f5 Mon Sep 17 00:00:00 2001 From: Marek Vasut Date: Wed, 28 Oct 2020 21:46:17 +0100 Subject: ARM: dts: stm32: Fix LED5 on STM32MP1 DHCOM PDK2 On the prototype DHCOM, the LED5 was connected to pin PG2 of the STM32MP15xx, however on the production SoM this was changed to pin PC6. Update the connection in the DT. Fixes: 81d5fc719798 ("ARM: dts: stm32: Add GPIO LEDs for STM32MP1 DHCOM PDK2") Signed-off-by: Marek Vasut Cc: Alexandre Torgue Cc: Maxime Coquelin Cc: Patrice Chotard Cc: Patrick Delaunay Cc: linux-stm32@st-md-mailman.stormreply.com To: linux-arm-kernel@lists.infradead.org Signed-off-by: Alexandre Torgue --- arch/arm/boot/dts/stm32mp15xx-dhcom-pdk2.dtsi | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/arm/boot/dts/stm32mp15xx-dhcom-pdk2.dtsi b/arch/arm/boot/dts/stm32mp15xx-dhcom-pdk2.dtsi index 9a0a59678097..8456f172d4b1 100644 --- a/arch/arm/boot/dts/stm32mp15xx-dhcom-pdk2.dtsi +++ b/arch/arm/boot/dts/stm32mp15xx-dhcom-pdk2.dtsi @@ -82,7 +82,7 @@ led-0 { label = "green:led5"; - gpios = <&gpiog 2 GPIO_ACTIVE_HIGH>; + gpios = <&gpioc 6 GPIO_ACTIVE_HIGH>; default-state = "off"; }; -- cgit v1.2.3 From 1f3d7fc279b1a299bb8b1b225d80309a2062ab8a Mon Sep 17 00:00:00 2001 From: Marek Vasut Date: Thu, 29 Oct 2020 20:46:17 +0100 Subject: ARM: dts: stm32: Define VIO regulator supply on DHCOM The VIO regulator is supplied by PMIC Buck3, describe this in the DT. Fixes: 34e0c7847dcf ("ARM: dts: stm32: Add DH Electronics DHCOM STM32MP1 SoM and PDK2 board") Signed-off-by: Marek Vasut Cc: Alexandre Torgue Cc: Maxime Coquelin Cc: Patrice Chotard Cc: Patrick Delaunay Cc: linux-stm32@st-md-mailman.stormreply.com To: linux-arm-kernel@lists.infradead.org Signed-off-by: Alexandre Torgue --- arch/arm/boot/dts/stm32mp15xx-dhcom-som.dtsi | 1 + 1 file changed, 1 insertion(+) (limited to 'arch') diff --git a/arch/arm/boot/dts/stm32mp15xx-dhcom-som.dtsi b/arch/arm/boot/dts/stm32mp15xx-dhcom-som.dtsi index b4b52cf634af..c89ebfaa115d 100644 --- a/arch/arm/boot/dts/stm32mp15xx-dhcom-som.dtsi +++ b/arch/arm/boot/dts/stm32mp15xx-dhcom-som.dtsi @@ -68,6 +68,7 @@ gpio = <&gpiog 3 GPIO_ACTIVE_LOW>; regulator-always-on; regulator-boot-on; + vin-supply = <&vdd>; }; }; -- cgit v1.2.3 From e5ace7f62695656ef8a66ad5a4c3edd055894876 Mon Sep 17 00:00:00 2001 From: Marek Vasut Date: Thu, 24 Sep 2020 01:25:35 +0200 Subject: ARM: dts: stm32: Enable thermal sensor support on stm32mp15xx-dhcor Enable STM32 Digital Thermal Sensor driver for stm32mp15xx-dhcor SoMs. Fixes: 94cafe1b6482 ("ARM: dts: stm32: Add Avenger96 devicetree support based on STM32MP157A") Signed-off-by: Marek Vasut Cc: Alexandre Torgue Cc: Manivannan Sadhasivam Cc: linux-stm32@st-md-mailman.stormreply.com To: linux-arm-kernel@lists.infradead.org Reviewed-by: Manivannan Sadhasivam Signed-off-by: Alexandre Torgue --- arch/arm/boot/dts/stm32mp15xx-dhcor-som.dtsi | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'arch') diff --git a/arch/arm/boot/dts/stm32mp15xx-dhcor-som.dtsi b/arch/arm/boot/dts/stm32mp15xx-dhcor-som.dtsi index 04fbb324a541..803eb8bc9c85 100644 --- a/arch/arm/boot/dts/stm32mp15xx-dhcor-som.dtsi +++ b/arch/arm/boot/dts/stm32mp15xx-dhcor-som.dtsi @@ -21,6 +21,10 @@ }; }; +&dts { + status = "okay"; +}; + &i2c4 { pinctrl-names = "default"; pinctrl-0 = <&i2c4_pins_a>; -- cgit v1.2.3 From f4c7fa39415da6db1fa0bc26162ac23a0fbae8bb Mon Sep 17 00:00:00 2001 From: Marek Vasut Date: Thu, 29 Oct 2020 20:46:52 +0100 Subject: ARM: dts: stm32: Keep VDDA LDO1 always on on DHCOM The VDDA LDO1 PMIC output supplies the analog VDDA input of the STM32MP1 on DHCOM, keep it always on, otherwise there could be leakage through the SoC. Fixes: 34e0c7847dcf ("ARM: dts: stm32: Add DH Electronics DHCOM STM32MP1 SoM and PDK2 board") Signed-off-by: Marek Vasut Cc: Alexandre Torgue Cc: Maxime Coquelin Cc: Patrice Chotard Cc: Patrick Delaunay Cc: linux-stm32@st-md-mailman.stormreply.com To: linux-arm-kernel@lists.infradead.org Signed-off-by: Alexandre Torgue --- arch/arm/boot/dts/stm32mp15xx-dhcom-som.dtsi | 1 + 1 file changed, 1 insertion(+) (limited to 'arch') diff --git a/arch/arm/boot/dts/stm32mp15xx-dhcom-som.dtsi b/arch/arm/boot/dts/stm32mp15xx-dhcom-som.dtsi index c89ebfaa115d..f796a6150313 100644 --- a/arch/arm/boot/dts/stm32mp15xx-dhcom-som.dtsi +++ b/arch/arm/boot/dts/stm32mp15xx-dhcom-som.dtsi @@ -203,6 +203,7 @@ vdda: ldo1 { regulator-name = "vdda"; + regulator-always-on; regulator-min-microvolt = <2900000>; regulator-max-microvolt = <2900000>; interrupts = ; -- cgit v1.2.3 From 65cae18882f943215d0505ddc7e70495877308e6 Mon Sep 17 00:00:00 2001 From: Brian Masney Date: Fri, 6 Nov 2020 20:11:19 -0500 Subject: x86/xen: don't unbind uninitialized lock_kicker_irq When booting a hyperthreaded system with the kernel parameter 'mitigations=auto,nosmt', the following warning occurs: WARNING: CPU: 0 PID: 1 at drivers/xen/events/events_base.c:1112 unbind_from_irqhandler+0x4e/0x60 ... Hardware name: Xen HVM domU, BIOS 4.2.amazon 08/24/2006 ... Call Trace: xen_uninit_lock_cpu+0x28/0x62 xen_hvm_cpu_die+0x21/0x30 takedown_cpu+0x9c/0xe0 ? trace_suspend_resume+0x60/0x60 cpuhp_invoke_callback+0x9a/0x530 _cpu_up+0x11a/0x130 cpu_up+0x7e/0xc0 bringup_nonboot_cpus+0x48/0x50 smp_init+0x26/0x79 kernel_init_freeable+0xea/0x229 ? rest_init+0xaa/0xaa kernel_init+0xa/0x106 ret_from_fork+0x35/0x40 The secondary CPUs are not activated with the nosmt mitigations and only the primary thread on each CPU core is used. In this situation, xen_hvm_smp_prepare_cpus(), and more importantly xen_init_lock_cpu(), is not called, so the lock_kicker_irq is not initialized for the secondary CPUs. Let's fix this by exiting early in xen_uninit_lock_cpu() if the irq is not set to avoid the warning from above for each secondary CPU. Signed-off-by: Brian Masney Link: https://lore.kernel.org/r/20201107011119.631442-1-bmasney@redhat.com Reviewed-by: Juergen Gross Signed-off-by: Boris Ostrovsky --- arch/x86/xen/spinlock.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c index 799f4eba0a62..043c73dfd2c9 100644 --- a/arch/x86/xen/spinlock.c +++ b/arch/x86/xen/spinlock.c @@ -93,10 +93,20 @@ void xen_init_lock_cpu(int cpu) void xen_uninit_lock_cpu(int cpu) { + int irq; + if (!xen_pvspin) return; - unbind_from_irqhandler(per_cpu(lock_kicker_irq, cpu), NULL); + /* + * When booting the kernel with 'mitigations=auto,nosmt', the secondary + * CPUs are not activated, and lock_kicker_irq is not initialized. + */ + irq = per_cpu(lock_kicker_irq, cpu); + if (irq == -1) + return; + + unbind_from_irqhandler(irq, NULL); per_cpu(lock_kicker_irq, cpu) = -1; kfree(per_cpu(irq_name, cpu)); per_cpu(irq_name, cpu) = NULL; -- cgit v1.2.3 From 267fb27352b6fc9fdbad753127a239f75618ecbc Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Fri, 30 Oct 2020 15:50:32 +0100 Subject: perf: Reduce stack usage of perf_output_begin() __perf_output_begin() has an on-stack struct perf_sample_data in the unlikely case it needs to generate a LOST record. However, every call to perf_output_begin() must already have a perf_sample_data on-stack. Reported-by: Thomas Gleixner Signed-off-by: Peter Zijlstra (Intel) Link: https://lkml.kernel.org/r/20201030151954.985416146@infradead.org --- arch/powerpc/perf/imc-pmu.c | 2 +- arch/s390/kernel/perf_cpum_sf.c | 2 +- arch/x86/events/intel/ds.c | 4 ++-- include/linux/perf_event.h | 7 +++++-- kernel/events/core.c | 32 +++++++++++++++++--------------- kernel/events/ring_buffer.c | 20 +++++++++++--------- 6 files changed, 37 insertions(+), 30 deletions(-) (limited to 'arch') diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c index 9ed4fcccf8a9..7b25548ec42b 100644 --- a/arch/powerpc/perf/imc-pmu.c +++ b/arch/powerpc/perf/imc-pmu.c @@ -1336,7 +1336,7 @@ static void dump_trace_imc_data(struct perf_event *event) /* If this is a valid record, create the sample */ struct perf_output_handle handle; - if (perf_output_begin(&handle, event, header.size)) + if (perf_output_begin(&handle, &data, event, header.size)) return; perf_output_sample(&handle, &header, &data, event); diff --git a/arch/s390/kernel/perf_cpum_sf.c b/arch/s390/kernel/perf_cpum_sf.c index 4f9e4626df55..00255ae3979d 100644 --- a/arch/s390/kernel/perf_cpum_sf.c +++ b/arch/s390/kernel/perf_cpum_sf.c @@ -672,7 +672,7 @@ static void cpumsf_output_event_pid(struct perf_event *event, rcu_read_lock(); perf_prepare_sample(&header, data, event, regs); - if (perf_output_begin(&handle, event, header.size)) + if (perf_output_begin(&handle, data, event, header.size)) goto out; /* Update the process ID (see also kernel/events/core.c) */ diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c index 404315df1e16..cd2ae14a0a98 100644 --- a/arch/x86/events/intel/ds.c +++ b/arch/x86/events/intel/ds.c @@ -642,8 +642,8 @@ int intel_pmu_drain_bts_buffer(void) rcu_read_lock(); perf_prepare_sample(&header, &data, event, ®s); - if (perf_output_begin(&handle, event, header.size * - (top - base - skip))) + if (perf_output_begin(&handle, &data, event, + header.size * (top - base - skip))) goto unlock; for (at = base; at < top; at++) { diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index 0c19d279b97f..b775ae0a8c87 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -1400,11 +1400,14 @@ perf_event_addr_filters(struct perf_event *event) extern void perf_event_addr_filters_sync(struct perf_event *event); extern int perf_output_begin(struct perf_output_handle *handle, + struct perf_sample_data *data, struct perf_event *event, unsigned int size); extern int perf_output_begin_forward(struct perf_output_handle *handle, - struct perf_event *event, - unsigned int size); + struct perf_sample_data *data, + struct perf_event *event, + unsigned int size); extern int perf_output_begin_backward(struct perf_output_handle *handle, + struct perf_sample_data *data, struct perf_event *event, unsigned int size); diff --git a/kernel/events/core.c b/kernel/events/core.c index 5a29ab09e72d..fc681c7c1e03 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -7186,6 +7186,7 @@ __perf_event_output(struct perf_event *event, struct perf_sample_data *data, struct pt_regs *regs, int (*output_begin)(struct perf_output_handle *, + struct perf_sample_data *, struct perf_event *, unsigned int)) { @@ -7198,7 +7199,7 @@ __perf_event_output(struct perf_event *event, perf_prepare_sample(&header, data, event, regs); - err = output_begin(&handle, event, header.size); + err = output_begin(&handle, data, event, header.size); if (err) goto exit; @@ -7264,7 +7265,7 @@ perf_event_read_event(struct perf_event *event, int ret; perf_event_header__init_id(&read_event.header, &sample, event); - ret = perf_output_begin(&handle, event, read_event.header.size); + ret = perf_output_begin(&handle, &sample, event, read_event.header.size); if (ret) return; @@ -7533,7 +7534,7 @@ static void perf_event_task_output(struct perf_event *event, perf_event_header__init_id(&task_event->event_id.header, &sample, event); - ret = perf_output_begin(&handle, event, + ret = perf_output_begin(&handle, &sample, event, task_event->event_id.header.size); if (ret) goto out; @@ -7636,7 +7637,7 @@ static void perf_event_comm_output(struct perf_event *event, return; perf_event_header__init_id(&comm_event->event_id.header, &sample, event); - ret = perf_output_begin(&handle, event, + ret = perf_output_begin(&handle, &sample, event, comm_event->event_id.header.size); if (ret) @@ -7736,7 +7737,7 @@ static void perf_event_namespaces_output(struct perf_event *event, perf_event_header__init_id(&namespaces_event->event_id.header, &sample, event); - ret = perf_output_begin(&handle, event, + ret = perf_output_begin(&handle, &sample, event, namespaces_event->event_id.header.size); if (ret) goto out; @@ -7863,7 +7864,7 @@ static void perf_event_cgroup_output(struct perf_event *event, void *data) perf_event_header__init_id(&cgroup_event->event_id.header, &sample, event); - ret = perf_output_begin(&handle, event, + ret = perf_output_begin(&handle, &sample, event, cgroup_event->event_id.header.size); if (ret) goto out; @@ -7989,7 +7990,7 @@ static void perf_event_mmap_output(struct perf_event *event, } perf_event_header__init_id(&mmap_event->event_id.header, &sample, event); - ret = perf_output_begin(&handle, event, + ret = perf_output_begin(&handle, &sample, event, mmap_event->event_id.header.size); if (ret) goto out; @@ -8299,7 +8300,7 @@ void perf_event_aux_event(struct perf_event *event, unsigned long head, int ret; perf_event_header__init_id(&rec.header, &sample, event); - ret = perf_output_begin(&handle, event, rec.header.size); + ret = perf_output_begin(&handle, &sample, event, rec.header.size); if (ret) return; @@ -8333,7 +8334,7 @@ void perf_log_lost_samples(struct perf_event *event, u64 lost) perf_event_header__init_id(&lost_samples_event.header, &sample, event); - ret = perf_output_begin(&handle, event, + ret = perf_output_begin(&handle, &sample, event, lost_samples_event.header.size); if (ret) return; @@ -8388,7 +8389,7 @@ static void perf_event_switch_output(struct perf_event *event, void *data) perf_event_header__init_id(&se->event_id.header, &sample, event); - ret = perf_output_begin(&handle, event, se->event_id.header.size); + ret = perf_output_begin(&handle, &sample, event, se->event_id.header.size); if (ret) return; @@ -8463,7 +8464,7 @@ static void perf_log_throttle(struct perf_event *event, int enable) perf_event_header__init_id(&throttle_event.header, &sample, event); - ret = perf_output_begin(&handle, event, + ret = perf_output_begin(&handle, &sample, event, throttle_event.header.size); if (ret) return; @@ -8506,7 +8507,7 @@ static void perf_event_ksymbol_output(struct perf_event *event, void *data) perf_event_header__init_id(&ksymbol_event->event_id.header, &sample, event); - ret = perf_output_begin(&handle, event, + ret = perf_output_begin(&handle, &sample, event, ksymbol_event->event_id.header.size); if (ret) return; @@ -8596,7 +8597,7 @@ static void perf_event_bpf_output(struct perf_event *event, void *data) perf_event_header__init_id(&bpf_event->event_id.header, &sample, event); - ret = perf_output_begin(&handle, event, + ret = perf_output_begin(&handle, data, event, bpf_event->event_id.header.size); if (ret) return; @@ -8705,7 +8706,8 @@ static void perf_event_text_poke_output(struct perf_event *event, void *data) perf_event_header__init_id(&text_poke_event->event_id.header, &sample, event); - ret = perf_output_begin(&handle, event, text_poke_event->event_id.header.size); + ret = perf_output_begin(&handle, &sample, event, + text_poke_event->event_id.header.size); if (ret) return; @@ -8786,7 +8788,7 @@ static void perf_log_itrace_start(struct perf_event *event) rec.tid = perf_event_tid(event, current); perf_event_header__init_id(&rec.header, &sample, event); - ret = perf_output_begin(&handle, event, rec.header.size); + ret = perf_output_begin(&handle, &sample, event, rec.header.size); if (ret) return; diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c index 192b8abc6330..ef91ae75ca56 100644 --- a/kernel/events/ring_buffer.c +++ b/kernel/events/ring_buffer.c @@ -147,6 +147,7 @@ ring_buffer_has_space(unsigned long head, unsigned long tail, static __always_inline int __perf_output_begin(struct perf_output_handle *handle, + struct perf_sample_data *data, struct perf_event *event, unsigned int size, bool backward) { @@ -237,18 +238,16 @@ __perf_output_begin(struct perf_output_handle *handle, handle->size = (1UL << page_shift) - offset; if (unlikely(have_lost)) { - struct perf_sample_data sample_data; - lost_event.header.size = sizeof(lost_event); lost_event.header.type = PERF_RECORD_LOST; lost_event.header.misc = 0; lost_event.id = event->id; lost_event.lost = local_xchg(&rb->lost, 0); - perf_event_header__init_id(&lost_event.header, - &sample_data, event); + /* XXX mostly redundant; @data is already fully initializes */ + perf_event_header__init_id(&lost_event.header, data, event); perf_output_put(handle, lost_event); - perf_event__output_id_sample(event, handle, &sample_data); + perf_event__output_id_sample(event, handle, data); } return 0; @@ -263,22 +262,25 @@ out: } int perf_output_begin_forward(struct perf_output_handle *handle, - struct perf_event *event, unsigned int size) + struct perf_sample_data *data, + struct perf_event *event, unsigned int size) { - return __perf_output_begin(handle, event, size, false); + return __perf_output_begin(handle, data, event, size, false); } int perf_output_begin_backward(struct perf_output_handle *handle, + struct perf_sample_data *data, struct perf_event *event, unsigned int size) { - return __perf_output_begin(handle, event, size, true); + return __perf_output_begin(handle, data, event, size, true); } int perf_output_begin(struct perf_output_handle *handle, + struct perf_sample_data *data, struct perf_event *event, unsigned int size) { - return __perf_output_begin(handle, event, size, + return __perf_output_begin(handle, data, event, size, unlikely(is_write_backward(event))); } -- cgit v1.2.3 From 9dfa9a5c9bae3417b87824e7ac73b00c10b6a874 Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Fri, 30 Oct 2020 14:58:48 +0100 Subject: perf/x86: Reduce stack usage for x86_pmu::drain_pebs() intel_pmu_drain_pebs_*() is typically called from handle_pmi_common(), both have an on-stack struct perf_sample_data, which is *big*. Rewire things so that drain_pebs() can use the one handle_pmi_common() has. Reported-by: Thomas Gleixner Signed-off-by: Peter Zijlstra (Intel) Link: https://lkml.kernel.org/r/20201030151955.054099690@infradead.org --- arch/x86/events/intel/core.c | 2 +- arch/x86/events/intel/ds.c | 47 +++++++++++++++++++++++--------------------- arch/x86/events/perf_event.h | 2 +- 3 files changed, 27 insertions(+), 24 deletions(-) (limited to 'arch') diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c index f1926e9f2143..c37387c8212a 100644 --- a/arch/x86/events/intel/core.c +++ b/arch/x86/events/intel/core.c @@ -2630,7 +2630,7 @@ static int handle_pmi_common(struct pt_regs *regs, u64 status) u64 pebs_enabled = cpuc->pebs_enabled; handled++; - x86_pmu.drain_pebs(regs); + x86_pmu.drain_pebs(regs, &data); status &= x86_pmu.intel_ctrl | GLOBAL_STATUS_TRACE_TOPAPMI; /* diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c index cd2ae14a0a98..276b29da732f 100644 --- a/arch/x86/events/intel/ds.c +++ b/arch/x86/events/intel/ds.c @@ -670,7 +670,9 @@ unlock: static inline void intel_pmu_drain_pebs_buffer(void) { - x86_pmu.drain_pebs(NULL); + struct perf_sample_data data; + + x86_pmu.drain_pebs(NULL, &data); } /* @@ -1719,19 +1721,20 @@ intel_pmu_save_and_restart_reload(struct perf_event *event, int count) return 0; } -static void __intel_pmu_pebs_event(struct perf_event *event, - struct pt_regs *iregs, - void *base, void *top, - int bit, int count, - void (*setup_sample)(struct perf_event *, - struct pt_regs *, - void *, - struct perf_sample_data *, - struct pt_regs *)) +static __always_inline void +__intel_pmu_pebs_event(struct perf_event *event, + struct pt_regs *iregs, + struct perf_sample_data *data, + void *base, void *top, + int bit, int count, + void (*setup_sample)(struct perf_event *, + struct pt_regs *, + void *, + struct perf_sample_data *, + struct pt_regs *)) { struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events); struct hw_perf_event *hwc = &event->hw; - struct perf_sample_data data; struct x86_perf_regs perf_regs; struct pt_regs *regs = &perf_regs.regs; void *at = get_next_pebs_record_by_bit(base, top, bit); @@ -1752,14 +1755,14 @@ static void __intel_pmu_pebs_event(struct perf_event *event, iregs = &dummy_iregs; while (count > 1) { - setup_sample(event, iregs, at, &data, regs); - perf_event_output(event, &data, regs); + setup_sample(event, iregs, at, data, regs); + perf_event_output(event, data, regs); at += cpuc->pebs_record_size; at = get_next_pebs_record_by_bit(at, top, bit); count--; } - setup_sample(event, iregs, at, &data, regs); + setup_sample(event, iregs, at, data, regs); if (iregs == &dummy_iregs) { /* * The PEBS records may be drained in the non-overflow context, @@ -1767,18 +1770,18 @@ static void __intel_pmu_pebs_event(struct perf_event *event, * last record the same as other PEBS records, and doesn't * invoke the generic overflow handler. */ - perf_event_output(event, &data, regs); + perf_event_output(event, data, regs); } else { /* * All but the last records are processed. * The last one is left to be able to call the overflow handler. */ - if (perf_event_overflow(event, &data, regs)) + if (perf_event_overflow(event, data, regs)) x86_pmu_stop(event, 0); } } -static void intel_pmu_drain_pebs_core(struct pt_regs *iregs) +static void intel_pmu_drain_pebs_core(struct pt_regs *iregs, struct perf_sample_data *data) { struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events); struct debug_store *ds = cpuc->ds; @@ -1812,7 +1815,7 @@ static void intel_pmu_drain_pebs_core(struct pt_regs *iregs) return; } - __intel_pmu_pebs_event(event, iregs, at, top, 0, n, + __intel_pmu_pebs_event(event, iregs, data, at, top, 0, n, setup_pebs_fixed_sample_data); } @@ -1835,7 +1838,7 @@ static void intel_pmu_pebs_event_update_no_drain(struct cpu_hw_events *cpuc, int } } -static void intel_pmu_drain_pebs_nhm(struct pt_regs *iregs) +static void intel_pmu_drain_pebs_nhm(struct pt_regs *iregs, struct perf_sample_data *data) { struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events); struct debug_store *ds = cpuc->ds; @@ -1942,14 +1945,14 @@ static void intel_pmu_drain_pebs_nhm(struct pt_regs *iregs) } if (counts[bit]) { - __intel_pmu_pebs_event(event, iregs, base, + __intel_pmu_pebs_event(event, iregs, data, base, top, bit, counts[bit], setup_pebs_fixed_sample_data); } } } -static void intel_pmu_drain_pebs_icl(struct pt_regs *iregs) +static void intel_pmu_drain_pebs_icl(struct pt_regs *iregs, struct perf_sample_data *data) { short counts[INTEL_PMC_IDX_FIXED + MAX_FIXED_PEBS_EVENTS] = {}; struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events); @@ -1997,7 +2000,7 @@ static void intel_pmu_drain_pebs_icl(struct pt_regs *iregs) if (WARN_ON_ONCE(!event->attr.precise_ip)) continue; - __intel_pmu_pebs_event(event, iregs, base, + __intel_pmu_pebs_event(event, iregs, data, base, top, bit, counts[bit], setup_pebs_adaptive_sample_data); } diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h index ee2b9b9fc2a5..1d1fe46552ba 100644 --- a/arch/x86/events/perf_event.h +++ b/arch/x86/events/perf_event.h @@ -727,7 +727,7 @@ struct x86_pmu { int pebs_record_size; int pebs_buffer_size; int max_pebs_events; - void (*drain_pebs)(struct pt_regs *regs); + void (*drain_pebs)(struct pt_regs *regs, struct perf_sample_data *data); struct event_constraint *pebs_constraints; void (*pebs_aliases)(struct perf_event *event); unsigned long large_pebs_flags; -- cgit v1.2.3 From 76a4efa80900fc40e0fdf243b42aec9fb8c35d24 Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Fri, 30 Oct 2020 12:14:21 +0100 Subject: perf/arch: Remove perf_sample_data::regs_user_copy struct perf_sample_data lives on-stack, we should be careful about it's size. Furthermore, the pt_regs copy in there is only because x86_64 is a trainwreck, solve it differently. Reported-by: Thomas Gleixner Signed-off-by: Peter Zijlstra (Intel) Tested-by: Steven Rostedt Link: https://lkml.kernel.org/r/20201030151955.258178461@infradead.org --- arch/arm/kernel/perf_regs.c | 3 +-- arch/arm64/kernel/perf_regs.c | 3 +-- arch/csky/kernel/perf_regs.c | 3 +-- arch/powerpc/perf/perf_regs.c | 3 +-- arch/riscv/kernel/perf_regs.c | 3 +-- arch/s390/kernel/perf_regs.c | 3 +-- arch/x86/kernel/perf_regs.c | 15 +++++++++++---- include/linux/perf_event.h | 6 ------ include/linux/perf_regs.h | 6 ++---- kernel/events/core.c | 8 +++----- 10 files changed, 22 insertions(+), 31 deletions(-) (limited to 'arch') diff --git a/arch/arm/kernel/perf_regs.c b/arch/arm/kernel/perf_regs.c index 05fe92aa7d98..0529f90395c9 100644 --- a/arch/arm/kernel/perf_regs.c +++ b/arch/arm/kernel/perf_regs.c @@ -32,8 +32,7 @@ u64 perf_reg_abi(struct task_struct *task) } void perf_get_regs_user(struct perf_regs *regs_user, - struct pt_regs *regs, - struct pt_regs *regs_user_copy) + struct pt_regs *regs) { regs_user->regs = task_pt_regs(current); regs_user->abi = perf_reg_abi(current); diff --git a/arch/arm64/kernel/perf_regs.c b/arch/arm64/kernel/perf_regs.c index 94e8718e7229..f6f58e6265df 100644 --- a/arch/arm64/kernel/perf_regs.c +++ b/arch/arm64/kernel/perf_regs.c @@ -73,8 +73,7 @@ u64 perf_reg_abi(struct task_struct *task) } void perf_get_regs_user(struct perf_regs *regs_user, - struct pt_regs *regs, - struct pt_regs *regs_user_copy) + struct pt_regs *regs) { regs_user->regs = task_pt_regs(current); regs_user->abi = perf_reg_abi(current); diff --git a/arch/csky/kernel/perf_regs.c b/arch/csky/kernel/perf_regs.c index eb32838b8210..09b7f88a2d6a 100644 --- a/arch/csky/kernel/perf_regs.c +++ b/arch/csky/kernel/perf_regs.c @@ -32,8 +32,7 @@ u64 perf_reg_abi(struct task_struct *task) } void perf_get_regs_user(struct perf_regs *regs_user, - struct pt_regs *regs, - struct pt_regs *regs_user_copy) + struct pt_regs *regs) { regs_user->regs = task_pt_regs(current); regs_user->abi = perf_reg_abi(current); diff --git a/arch/powerpc/perf/perf_regs.c b/arch/powerpc/perf/perf_regs.c index 8e53f2fc3fe0..6f681b105eec 100644 --- a/arch/powerpc/perf/perf_regs.c +++ b/arch/powerpc/perf/perf_regs.c @@ -144,8 +144,7 @@ u64 perf_reg_abi(struct task_struct *task) } void perf_get_regs_user(struct perf_regs *regs_user, - struct pt_regs *regs, - struct pt_regs *regs_user_copy) + struct pt_regs *regs) { regs_user->regs = task_pt_regs(current); regs_user->abi = (regs_user->regs) ? perf_reg_abi(current) : diff --git a/arch/riscv/kernel/perf_regs.c b/arch/riscv/kernel/perf_regs.c index 04a38fbeb9c7..fd304a248de6 100644 --- a/arch/riscv/kernel/perf_regs.c +++ b/arch/riscv/kernel/perf_regs.c @@ -36,8 +36,7 @@ u64 perf_reg_abi(struct task_struct *task) } void perf_get_regs_user(struct perf_regs *regs_user, - struct pt_regs *regs, - struct pt_regs *regs_user_copy) + struct pt_regs *regs) { regs_user->regs = task_pt_regs(current); regs_user->abi = perf_reg_abi(current); diff --git a/arch/s390/kernel/perf_regs.c b/arch/s390/kernel/perf_regs.c index 4352a504f235..6e9e5d5e927e 100644 --- a/arch/s390/kernel/perf_regs.c +++ b/arch/s390/kernel/perf_regs.c @@ -53,8 +53,7 @@ u64 perf_reg_abi(struct task_struct *task) } void perf_get_regs_user(struct perf_regs *regs_user, - struct pt_regs *regs, - struct pt_regs *regs_user_copy) + struct pt_regs *regs) { /* * Use the regs from the first interruption and let diff --git a/arch/x86/kernel/perf_regs.c b/arch/x86/kernel/perf_regs.c index bb7e1132290b..f9e5352b3bef 100644 --- a/arch/x86/kernel/perf_regs.c +++ b/arch/x86/kernel/perf_regs.c @@ -101,8 +101,7 @@ u64 perf_reg_abi(struct task_struct *task) } void perf_get_regs_user(struct perf_regs *regs_user, - struct pt_regs *regs, - struct pt_regs *regs_user_copy) + struct pt_regs *regs) { regs_user->regs = task_pt_regs(current); regs_user->abi = perf_reg_abi(current); @@ -129,12 +128,20 @@ u64 perf_reg_abi(struct task_struct *task) return PERF_SAMPLE_REGS_ABI_64; } +static DEFINE_PER_CPU(struct pt_regs, nmi_user_regs); + void perf_get_regs_user(struct perf_regs *regs_user, - struct pt_regs *regs, - struct pt_regs *regs_user_copy) + struct pt_regs *regs) { + struct pt_regs *regs_user_copy = this_cpu_ptr(&nmi_user_regs); struct pt_regs *user_regs = task_pt_regs(current); + if (!in_nmi()) { + regs_user->regs = user_regs; + regs_user->abi = perf_reg_abi(current); + return; + } + /* * If we're in an NMI that interrupted task_pt_regs setup, then * we can't sample user regs at all. This check isn't really diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index b775ae0a8c87..96450f6fb1de 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -1022,13 +1022,7 @@ struct perf_sample_data { struct perf_callchain_entry *callchain; u64 aux_size; - /* - * regs_user may point to task_pt_regs or to regs_user_copy, depending - * on arch details. - */ struct perf_regs regs_user; - struct pt_regs regs_user_copy; - struct perf_regs regs_intr; u64 stack_user_size; diff --git a/include/linux/perf_regs.h b/include/linux/perf_regs.h index 2d12e97d5e7b..f632c5725f16 100644 --- a/include/linux/perf_regs.h +++ b/include/linux/perf_regs.h @@ -20,8 +20,7 @@ u64 perf_reg_value(struct pt_regs *regs, int idx); int perf_reg_validate(u64 mask); u64 perf_reg_abi(struct task_struct *task); void perf_get_regs_user(struct perf_regs *regs_user, - struct pt_regs *regs, - struct pt_regs *regs_user_copy); + struct pt_regs *regs); #else #define PERF_REG_EXTENDED_MASK 0 @@ -42,8 +41,7 @@ static inline u64 perf_reg_abi(struct task_struct *task) } static inline void perf_get_regs_user(struct perf_regs *regs_user, - struct pt_regs *regs, - struct pt_regs *regs_user_copy) + struct pt_regs *regs) { regs_user->regs = task_pt_regs(current); regs_user->abi = perf_reg_abi(current); diff --git a/kernel/events/core.c b/kernel/events/core.c index fc681c7c1e03..d67c9cbb0f6a 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -6374,14 +6374,13 @@ perf_output_sample_regs(struct perf_output_handle *handle, } static void perf_sample_regs_user(struct perf_regs *regs_user, - struct pt_regs *regs, - struct pt_regs *regs_user_copy) + struct pt_regs *regs) { if (user_mode(regs)) { regs_user->abi = perf_reg_abi(current); regs_user->regs = regs; } else if (!(current->flags & PF_KTHREAD)) { - perf_get_regs_user(regs_user, regs, regs_user_copy); + perf_get_regs_user(regs_user, regs); } else { regs_user->abi = PERF_SAMPLE_REGS_ABI_NONE; regs_user->regs = NULL; @@ -7083,8 +7082,7 @@ void perf_prepare_sample(struct perf_event_header *header, } if (sample_type & (PERF_SAMPLE_REGS_USER | PERF_SAMPLE_STACK_USER)) - perf_sample_regs_user(&data->regs_user, regs, - &data->regs_user_copy); + perf_sample_regs_user(&data->regs_user, regs); if (sample_type & PERF_SAMPLE_REGS_USER) { /* regs dump ABI info */ -- cgit v1.2.3 From e506d1dac0edb2df82f2aa0582e814f9cd9aa07d Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Fri, 30 Oct 2020 12:15:06 +0100 Subject: perf/x86: Make dummy_iregs static Having pt_regs on-stack is unfortunate, it's 168 bytes. Since it isn't actually used, make it a static variable. This both gets if off the stack and ensures it gets 0 initialized, just in case someone does look at it. Reported-by: Steven Rostedt Signed-off-by: Peter Zijlstra (Intel) Link: https://lkml.kernel.org/r/20201030151955.324273677@infradead.org --- arch/x86/events/intel/ds.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c index 276b29da732f..b47cc4226934 100644 --- a/arch/x86/events/intel/ds.c +++ b/arch/x86/events/intel/ds.c @@ -1738,7 +1738,7 @@ __intel_pmu_pebs_event(struct perf_event *event, struct x86_perf_regs perf_regs; struct pt_regs *regs = &perf_regs.regs; void *at = get_next_pebs_record_by_bit(base, top, bit); - struct pt_regs dummy_iregs; + static struct pt_regs dummy_iregs; if (hwc->flags & PERF_X86_EVENT_AUTO_RELOAD) { /* -- cgit v1.2.3 From cadbaa039b99a6d5c26ce1c7f2fc0325943e605a Mon Sep 17 00:00:00 2001 From: Stephane Eranian Date: Wed, 28 Oct 2020 12:42:47 -0700 Subject: perf/x86/intel: Make anythread filter support conditional Starting with Arch Perfmon v5, the anythread filter on generic counters may be deprecated. The current kernel was exporting the any filter without checking. On Icelake, it means you could do cpu/event=0x3c,any/ even though the filter does not exist. This patch corrects the problem by relying on the CPUID 0xa leaf function to determine if anythread is supported or not as described in the Intel SDM Vol3b 18.2.5.1 AnyThread Deprecation section. Signed-off-by: Stephane Eranian Signed-off-by: Peter Zijlstra (Intel) Link: https://lkml.kernel.org/r/20201028194247.3160610-1-eranian@google.com --- arch/x86/events/intel/core.c | 10 ++++++++++ arch/x86/events/perf_event.h | 1 + arch/x86/include/asm/perf_event.h | 4 +++- arch/x86/kvm/cpuid.c | 4 +++- 4 files changed, 17 insertions(+), 2 deletions(-) (limited to 'arch') diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c index c37387c8212a..af457f8cb29d 100644 --- a/arch/x86/events/intel/core.c +++ b/arch/x86/events/intel/core.c @@ -4987,6 +4987,12 @@ __init int intel_pmu_init(void) x86_add_quirk(intel_arch_events_quirk); /* Install first, so it runs last */ + if (version >= 5) { + x86_pmu.intel_cap.anythread_deprecated = edx.split.anythread_deprecated; + if (x86_pmu.intel_cap.anythread_deprecated) + pr_cont(" AnyThread deprecated, "); + } + /* * Install the hw-cache-events table: */ @@ -5512,6 +5518,10 @@ __init int intel_pmu_init(void) x86_pmu.intel_ctrl |= ((1LL << x86_pmu.num_counters_fixed)-1) << INTEL_PMC_IDX_FIXED; + /* AnyThread may be deprecated on arch perfmon v5 or later */ + if (x86_pmu.intel_cap.anythread_deprecated) + x86_pmu.format_attrs = intel_arch_formats_attr; + if (x86_pmu.event_constraints) { /* * event on fixed counter2 (REF_CYCLES) only works on this diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h index 1d1fe46552ba..6a8edfe59b09 100644 --- a/arch/x86/events/perf_event.h +++ b/arch/x86/events/perf_event.h @@ -585,6 +585,7 @@ union perf_capabilities { u64 pebs_baseline:1; u64 perf_metrics:1; u64 pebs_output_pt_available:1; + u64 anythread_deprecated:1; }; u64 capabilities; }; diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h index 6960cd6d1f23..b9a7fd0a27e2 100644 --- a/arch/x86/include/asm/perf_event.h +++ b/arch/x86/include/asm/perf_event.h @@ -137,7 +137,9 @@ union cpuid10_edx { struct { unsigned int num_counters_fixed:5; unsigned int bit_width_fixed:8; - unsigned int reserved:19; + unsigned int reserved1:2; + unsigned int anythread_deprecated:1; + unsigned int reserved2:16; } split; unsigned int full; }; diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 06a278b3701d..0752dec66e29 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -672,7 +672,9 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function) edx.split.num_counters_fixed = min(cap.num_counters_fixed, MAX_FIXED_COUNTERS); edx.split.bit_width_fixed = cap.bit_width_fixed; - edx.split.reserved = 0; + edx.split.anythread_deprecated = 1; + edx.split.reserved1 = 0; + edx.split.reserved2 = 0; entry->eax = eax.full; entry->ebx = cap.events_mask; -- cgit v1.2.3 From d19d2152ca055baf20339cfacbf039c2cfb8d936 Mon Sep 17 00:00:00 2001 From: Lucas Stach Date: Thu, 5 Nov 2020 18:06:12 +0100 Subject: arm64: dts: imx8mm: fix voltage for 1.6GHz CPU operating point The datasheet for both the industrial and consumer variant of the SoC lists a typical voltage of 0.95V for the 1.6GHz CPU operating point. Fixes: e85c9d0faa75 (arm64: dts: imx8mm: Add cpufreq properties) Signed-off-by: Lucas Stach Reviewed-by: Fabio Estevam Signed-off-by: Shawn Guo --- arch/arm64/boot/dts/freescale/imx8mm.dtsi | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/arm64/boot/dts/freescale/imx8mm.dtsi b/arch/arm64/boot/dts/freescale/imx8mm.dtsi index b83f400def8b..05ee062548e4 100644 --- a/arch/arm64/boot/dts/freescale/imx8mm.dtsi +++ b/arch/arm64/boot/dts/freescale/imx8mm.dtsi @@ -129,7 +129,7 @@ opp-1600000000 { opp-hz = /bits/ 64 <1600000000>; - opp-microvolt = <900000>; + opp-microvolt = <950000>; opp-supported-hw = <0xc>, <0x7>; clock-latency-ns = <150000>; opp-suspend; -- cgit v1.2.3 From 33d0d843872c5ddbe28457a92fc6f2487315fb9f Mon Sep 17 00:00:00 2001 From: Fabio Estevam Date: Thu, 5 Nov 2020 18:13:20 -0300 Subject: ARM: dts: imx50-evk: Fix the chip select 1 IOMUX The SPI chip selects are represented as: cs-gpios = <&gpio4 11 GPIO_ACTIVE_LOW>, <&gpio4 13 GPIO_ACTIVE_LOW>; , which means that they are used in GPIO function instead of native SPI mode. Fix the IOMUX for the chip select 1 to use GPIO4_13 instead of the native CSPI_SSI function. Fixes: c605cbf5e135 ("ARM: dts: imx: add device tree support for Freescale imx50evk board") Signed-off-by: Fabio Estevam Signed-off-by: Shawn Guo --- arch/arm/boot/dts/imx50-evk.dts | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/arm/boot/dts/imx50-evk.dts b/arch/arm/boot/dts/imx50-evk.dts index 878e89c20190..4ea5c23f181b 100644 --- a/arch/arm/boot/dts/imx50-evk.dts +++ b/arch/arm/boot/dts/imx50-evk.dts @@ -59,7 +59,7 @@ MX50_PAD_CSPI_MISO__CSPI_MISO 0x00 MX50_PAD_CSPI_MOSI__CSPI_MOSI 0x00 MX50_PAD_CSPI_SS0__GPIO4_11 0xc4 - MX50_PAD_ECSPI1_MOSI__CSPI_SS1 0xf4 + MX50_PAD_ECSPI1_MOSI__GPIO4_13 0x84 >; }; -- cgit v1.2.3 From f969f03888b9438fdb227b6460d99ede5737326d Mon Sep 17 00:00:00 2001 From: Will Deacon Date: Fri, 6 Nov 2020 11:14:26 +0000 Subject: arm64: errata: Fix handling of 1418040 with late CPU onlining In a surprising turn of events, it transpires that CPU capabilities configured as ARM64_CPUCAP_WEAK_LOCAL_CPU_FEATURE are never set as the result of late-onlining. Therefore our handling of erratum 1418040 does not get activated if it is not required by any of the boot CPUs, even though we allow late-onlining of an affected CPU. In order to get things working again, replace the cpus_have_const_cap() invocation with an explicit check for the current CPU using this_cpu_has_cap(). Cc: Sai Prakash Ranjan Cc: Stephen Boyd Cc: Catalin Marinas Cc: Mark Rutland Reviewed-by: Suzuki K Poulose Acked-by: Marc Zyngier Link: https://lore.kernel.org/r/20201106114952.10032-1-will@kernel.org Signed-off-by: Will Deacon --- arch/arm64/include/asm/cpufeature.h | 2 ++ arch/arm64/kernel/process.c | 5 ++--- 2 files changed, 4 insertions(+), 3 deletions(-) (limited to 'arch') diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h index f7e7144af174..c59c16a6ea8b 100644 --- a/arch/arm64/include/asm/cpufeature.h +++ b/arch/arm64/include/asm/cpufeature.h @@ -268,6 +268,8 @@ extern struct arm64_ftr_reg arm64_ftr_reg_ctrel0; /* * CPU feature detected at boot time based on feature of one or more CPUs. * All possible conflicts for a late CPU are ignored. + * NOTE: this means that a late CPU with the feature will *not* cause the + * capability to be advertised by cpus_have_*cap()! */ #define ARM64_CPUCAP_WEAK_LOCAL_CPU_FEATURE \ (ARM64_CPUCAP_SCOPE_LOCAL_CPU | \ diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c index 4784011cecac..a47a40ec6ad9 100644 --- a/arch/arm64/kernel/process.c +++ b/arch/arm64/kernel/process.c @@ -522,14 +522,13 @@ static void erratum_1418040_thread_switch(struct task_struct *prev, bool prev32, next32; u64 val; - if (!(IS_ENABLED(CONFIG_ARM64_ERRATUM_1418040) && - cpus_have_const_cap(ARM64_WORKAROUND_1418040))) + if (!IS_ENABLED(CONFIG_ARM64_ERRATUM_1418040)) return; prev32 = is_compat_thread(task_thread_info(prev)); next32 = is_compat_thread(task_thread_info(next)); - if (prev32 == next32) + if (prev32 == next32 || !this_cpu_has_cap(ARM64_WORKAROUND_1418040)) return; val = read_sysreg(cntkctl_el1); -- cgit v1.2.3 From 85f0b2fc917f8de4bca02d169ef7d23dbfc29155 Mon Sep 17 00:00:00 2001 From: Will Deacon Date: Mon, 9 Nov 2020 10:49:23 +0000 Subject: arm64: kexec_file: Fix sparse warning Sparse gets cross about us returning 0 from image_load(), which has a return type of 'void *': >> arch/arm64/kernel/kexec_image.c:130:16: sparse: sparse: Using plain integer as NULL pointer Return NULL instead, as we don't use the return value for anything if it does not indicate an error. Cc: Benjamin Gwin Reported-by: kernel test robot Fixes: 108aa503657e ("arm64: kexec_file: try more regions if loading segments fails") Link: https://lore.kernel.org/r/202011091736.T0zH8kaC-lkp@intel.com Signed-off-by: Will Deacon --- arch/arm64/kernel/kexec_image.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/arm64/kernel/kexec_image.c b/arch/arm64/kernel/kexec_image.c index 66adee8b5fc8..9ec34690e255 100644 --- a/arch/arm64/kernel/kexec_image.c +++ b/arch/arm64/kernel/kexec_image.c @@ -127,7 +127,7 @@ static void *image_load(struct kimage *image, kernel_segment->mem, kbuf.bufsz, kernel_segment->memsz); - return 0; + return NULL; } #ifdef CONFIG_KEXEC_IMAGE_VERIFY_SIG -- cgit v1.2.3 From 891deb87585017d526b67b59c15d38755b900fea Mon Sep 17 00:00:00 2001 From: Will Deacon Date: Fri, 6 Nov 2020 09:57:55 +0000 Subject: arm64: psci: Avoid printing in cpu_psci_cpu_die() cpu_psci_cpu_die() is called in the context of the dying CPU, which will no longer be online or tracked by RCU. It is therefore not generally safe to call printk() if the PSCI "cpu off" request fails, so remove the pr_crit() invocation. Cc: Qian Cai Cc: "Paul E. McKenney" Cc: Catalin Marinas Link: https://lore.kernel.org/r/20201106103602.9849-2-will@kernel.org Signed-off-by: Will Deacon --- arch/arm64/kernel/psci.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) (limited to 'arch') diff --git a/arch/arm64/kernel/psci.c b/arch/arm64/kernel/psci.c index 43ae4e0c968f..62d2bda7adb8 100644 --- a/arch/arm64/kernel/psci.c +++ b/arch/arm64/kernel/psci.c @@ -66,7 +66,6 @@ static int cpu_psci_cpu_disable(unsigned int cpu) static void cpu_psci_cpu_die(unsigned int cpu) { - int ret; /* * There are no known implementations of PSCI actually using the * power state field, pass a sensible default for now. @@ -74,9 +73,7 @@ static void cpu_psci_cpu_die(unsigned int cpu) u32 state = PSCI_POWER_STATE_TYPE_POWER_DOWN << PSCI_0_2_POWER_STATE_TYPE_SHIFT; - ret = psci_ops.cpu_off(state); - - pr_crit("unable to power off CPU%u (%d)\n", cpu, ret); + psci_ops.cpu_off(state); } static int cpu_psci_cpu_kill(unsigned int cpu) -- cgit v1.2.3 From 04e613ded8c26489b3e0f9101b44462f780d1a35 Mon Sep 17 00:00:00 2001 From: Will Deacon Date: Fri, 6 Nov 2020 10:25:49 +0000 Subject: arm64: smp: Tell RCU about CPUs that fail to come online Commit ce3d31ad3cac ("arm64/smp: Move rcu_cpu_starting() earlier") ensured that RCU is informed early about incoming CPUs that might end up calling into printk() before they are online. However, if such a CPU fails the early CPU feature compatibility checks in check_local_cpu_capabilities(), then it will be powered off or parked without informing RCU, leading to an endless stream of stalls: | rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: | rcu: 2-O...: (0 ticks this GP) idle=002/1/0x4000000000000000 softirq=0/0 fqs=2593 | (detected by 0, t=5252 jiffies, g=9317, q=136) | Task dump for CPU 2: | task:swapper/2 state:R running task stack: 0 pid: 0 ppid: 1 flags:0x00000028 | Call trace: | ret_from_fork+0x0/0x30 Ensure that the dying CPU invokes rcu_report_dead() prior to being powered off or parked. Cc: Qian Cai Cc: "Paul E. McKenney" Reviewed-by: Paul E. McKenney Suggested-by: Qian Cai Link: https://lore.kernel.org/r/20201105222242.GA8842@willie-the-truck Link: https://lore.kernel.org/r/20201106103602.9849-3-will@kernel.org Signed-off-by: Will Deacon --- arch/arm64/kernel/smp.c | 1 + kernel/rcu/tree.c | 2 +- 2 files changed, 2 insertions(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c index 09c96f57818c..18e9727d3f64 100644 --- a/arch/arm64/kernel/smp.c +++ b/arch/arm64/kernel/smp.c @@ -413,6 +413,7 @@ void cpu_die_early(void) /* Mark this CPU absent */ set_cpu_present(cpu, 0); + rcu_report_dead(cpu); if (IS_ENABLED(CONFIG_HOTPLUG_CPU)) { update_cpu_boot_status(CPU_KILL_ME); diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 06895ef85d69..946e7c0c4cf7 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -4077,7 +4077,6 @@ void rcu_cpu_starting(unsigned int cpu) smp_mb(); /* Ensure RCU read-side usage follows above initialization. */ } -#ifdef CONFIG_HOTPLUG_CPU /* * The outgoing function has no further need of RCU, so remove it from * the rcu_node tree's ->qsmaskinitnext bit masks. @@ -4117,6 +4116,7 @@ void rcu_report_dead(unsigned int cpu) rdp->cpu_started = false; } +#ifdef CONFIG_HOTPLUG_CPU /* * The outgoing CPU has just passed through the dying-idle state, and we * are being invoked from the CPU that was IPIed to continue the offline -- cgit v1.2.3 From 1a8cfa24e21c2f154791f0cdd85fc28496918722 Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Mon, 26 Oct 2020 22:51:57 +0100 Subject: perf/x86/intel/uncore: Fix Add BW copypasta gcc -Wextra points out a duplicate initialization of one array member: arch/x86/events/intel/uncore_snb.c:478:37: warning: initialized field overwritten [-Woverride-init] 478 | [SNB_PCI_UNCORE_IMC_DATA_READS] = { SNB_UNCORE_PCI_IMC_DATA_WRITES_BASE, The only sensible explanation is that a duplicate 'READS' was used instead of the correct 'WRITES', so change it back. Fixes: 24633d901ea4 ("perf/x86/intel/uncore: Add BW counters for GT, IA and IO breakdown") Signed-off-by: Arnd Bergmann Signed-off-by: Peter Zijlstra (Intel) Link: https://lkml.kernel.org/r/20201026215203.3893972-1-arnd@kernel.org --- arch/x86/events/intel/uncore_snb.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/x86/events/intel/uncore_snb.c b/arch/x86/events/intel/uncore_snb.c index 39e632ed6ca9..bbd1120ae161 100644 --- a/arch/x86/events/intel/uncore_snb.c +++ b/arch/x86/events/intel/uncore_snb.c @@ -475,7 +475,7 @@ enum perf_snb_uncore_imc_freerunning_types { static struct freerunning_counters snb_uncore_imc_freerunning[] = { [SNB_PCI_UNCORE_IMC_DATA_READS] = { SNB_UNCORE_PCI_IMC_DATA_READS_BASE, 0x0, 0x0, 1, 32 }, - [SNB_PCI_UNCORE_IMC_DATA_READS] = { SNB_UNCORE_PCI_IMC_DATA_WRITES_BASE, + [SNB_PCI_UNCORE_IMC_DATA_WRITES] = { SNB_UNCORE_PCI_IMC_DATA_WRITES_BASE, 0x0, 0x0, 1, 32 }, [SNB_PCI_UNCORE_IMC_GT_REQUESTS] = { SNB_UNCORE_PCI_IMC_GT_REQUESTS_BASE, 0x0, 0x0, 1, 32 }, -- cgit v1.2.3 From c2fe61d8be491ff8188edaf22e838f819999146b Mon Sep 17 00:00:00 2001 From: Arvind Sankar Date: Tue, 10 Nov 2020 11:39:19 -0500 Subject: efi/x86: Free efi_pgd with free_pages() Commit d9e9a6418065 ("x86/mm/pti: Allocate a separate user PGD") changed the PGD allocation to allocate PGD_ALLOCATION_ORDER pages, so in the error path it should be freed using free_pages() rather than free_page(). Commit 06ace26f4e6f ("x86/efi: Free efi_pgd with free_pages()") fixed one instance of this, but missed another. Move the freeing out-of-line to avoid code duplication and fix this bug. Fixes: d9e9a6418065 ("x86/mm/pti: Allocate a separate user PGD") Link: https://lore.kernel.org/r/20201110163919.1134431-1-nivedita@alum.mit.edu Signed-off-by: Arvind Sankar Signed-off-by: Ard Biesheuvel --- arch/x86/platform/efi/efi_64.c | 24 +++++++++++++----------- 1 file changed, 13 insertions(+), 11 deletions(-) (limited to 'arch') diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c index 8f5759df7776..e1e8d4e3a213 100644 --- a/arch/x86/platform/efi/efi_64.c +++ b/arch/x86/platform/efi/efi_64.c @@ -78,28 +78,30 @@ int __init efi_alloc_page_tables(void) gfp_mask = GFP_KERNEL | __GFP_ZERO; efi_pgd = (pgd_t *)__get_free_pages(gfp_mask, PGD_ALLOCATION_ORDER); if (!efi_pgd) - return -ENOMEM; + goto fail; pgd = efi_pgd + pgd_index(EFI_VA_END); p4d = p4d_alloc(&init_mm, pgd, EFI_VA_END); - if (!p4d) { - free_page((unsigned long)efi_pgd); - return -ENOMEM; - } + if (!p4d) + goto free_pgd; pud = pud_alloc(&init_mm, p4d, EFI_VA_END); - if (!pud) { - if (pgtable_l5_enabled()) - free_page((unsigned long) pgd_page_vaddr(*pgd)); - free_pages((unsigned long)efi_pgd, PGD_ALLOCATION_ORDER); - return -ENOMEM; - } + if (!pud) + goto free_p4d; efi_mm.pgd = efi_pgd; mm_init_cpumask(&efi_mm); init_new_context(NULL, &efi_mm); return 0; + +free_p4d: + if (pgtable_l5_enabled()) + free_page((unsigned long)pgd_page_vaddr(*pgd)); +free_pgd: + free_pages((unsigned long)efi_pgd, PGD_ALLOCATION_ORDER); +fail: + return -ENOMEM; } /* -- cgit v1.2.3 From 9a5085b3fad5d5d6019a3d160cdd70357d35c8b1 Mon Sep 17 00:00:00 2001 From: Richard Weinberger Date: Mon, 19 Oct 2020 23:10:49 +0200 Subject: um: Call pgtable_pmd_page_dtor() in __pmd_free_tlb() Commit b2b29d6d0119 ("mm: account PMD tables like PTE tables") uncovered a bug in uml, we forgot to call the destructor. While we are here, give x a sane name. Reported-by: Anton Ivanov Co-developed-by: Matthew Wilcox (Oracle) Signed-off-by: Richard Weinberger Tested-by: Christopher Obbard --- arch/um/include/asm/pgalloc.h | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/um/include/asm/pgalloc.h b/arch/um/include/asm/pgalloc.h index 5393e13e07e0..2bbf28cf3aa9 100644 --- a/arch/um/include/asm/pgalloc.h +++ b/arch/um/include/asm/pgalloc.h @@ -33,7 +33,13 @@ do { \ } while (0) #ifdef CONFIG_3_LEVEL_PGTABLES -#define __pmd_free_tlb(tlb,x, address) tlb_remove_page((tlb),virt_to_page(x)) + +#define __pmd_free_tlb(tlb, pmd, address) \ +do { \ + pgtable_pmd_page_dtor(virt_to_page(pmd)); \ + tlb_remove_page((tlb),virt_to_page(pmd)); \ +} while (0) \ + #endif #endif -- cgit v1.2.3 From 99fba3205cd499255a36fd87f1d6064adc622a5b Mon Sep 17 00:00:00 2001 From: Grygorii Strashko Date: Thu, 1 Oct 2020 22:20:23 +0300 Subject: ARM: dts: am437x-l4: fix compatible for cpsw switch dt node Fix compatible the new CPSW switchdev DT node to avoid probing of legacy CPSW driver which fails: [ 2.781009] cpsw 4a100000.switch: invalid resource Fixes: 7bf8f37aea82 ("ARM: dts: am437x-l4: add dt node for new cpsw switchdev driver") Signed-off-by: Grygorii Strashko Signed-off-by: Tony Lindgren --- arch/arm/boot/dts/am437x-l4.dtsi | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/arm/boot/dts/am437x-l4.dtsi b/arch/arm/boot/dts/am437x-l4.dtsi index c220dc3c4e0f..243e35f7a56c 100644 --- a/arch/arm/boot/dts/am437x-l4.dtsi +++ b/arch/arm/boot/dts/am437x-l4.dtsi @@ -521,7 +521,7 @@ ranges = <0x0 0x100000 0x8000>; mac_sw: switch@0 { - compatible = "ti,am4372-cpsw","ti,cpsw-switch"; + compatible = "ti,am4372-cpsw-switch", "ti,cpsw-switch"; reg = <0x0 0x4000>; ranges = <0 0 0x4000>; clocks = <&cpsw_125mhz_gclk>, <&dpll_clksel_mac_clk>; -- cgit v1.2.3 From 1ed576a20cd5c93295f57d6b7400357bd8d01b21 Mon Sep 17 00:00:00 2001 From: Janosch Frank Date: Tue, 20 Oct 2020 06:12:07 -0400 Subject: KVM: s390: pv: Mark mm as protected after the set secure parameters and improve cleanup We can only have protected guest pages after a successful set secure parameters call as only then the UV allows imports and unpacks. By moving the test we can now also check for it in s390_reset_acc() and do an early return if it is 0. Signed-off-by: Janosch Frank Fixes: 29b40f105ec8 ("KVM: s390: protvirt: Add initial vm and cpu lifecycle handling") Reviewed-by: Cornelia Huck Signed-off-by: Christian Borntraeger --- arch/s390/kvm/kvm-s390.c | 2 +- arch/s390/kvm/pv.c | 3 ++- arch/s390/mm/gmap.c | 2 ++ 3 files changed, 5 insertions(+), 2 deletions(-) (limited to 'arch') diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c index 6b74b92c1a58..08ea6c4735cd 100644 --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -2312,7 +2312,7 @@ static int kvm_s390_handle_pv(struct kvm *kvm, struct kvm_pv_cmd *cmd) struct kvm_s390_pv_unp unp = {}; r = -EINVAL; - if (!kvm_s390_pv_is_protected(kvm)) + if (!kvm_s390_pv_is_protected(kvm) || !mm_is_protected(kvm->mm)) break; r = -EFAULT; diff --git a/arch/s390/kvm/pv.c b/arch/s390/kvm/pv.c index eb99e2f95ebe..f5847f9dec7c 100644 --- a/arch/s390/kvm/pv.c +++ b/arch/s390/kvm/pv.c @@ -208,7 +208,6 @@ int kvm_s390_pv_init_vm(struct kvm *kvm, u16 *rc, u16 *rrc) return -EIO; } kvm->arch.gmap->guest_handle = uvcb.guest_handle; - atomic_set(&kvm->mm->context.is_protected, 1); return 0; } @@ -228,6 +227,8 @@ int kvm_s390_pv_set_sec_parms(struct kvm *kvm, void *hdr, u64 length, u16 *rc, *rrc = uvcb.header.rrc; KVM_UV_EVENT(kvm, 3, "PROTVIRT VM SET PARMS: rc %x rrc %x", *rc, *rrc); + if (!cc) + atomic_set(&kvm->mm->context.is_protected, 1); return cc ? -EINVAL : 0; } diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c index cfb0017f33a7..64795d034926 100644 --- a/arch/s390/mm/gmap.c +++ b/arch/s390/mm/gmap.c @@ -2690,6 +2690,8 @@ static const struct mm_walk_ops reset_acc_walk_ops = { #include void s390_reset_acc(struct mm_struct *mm) { + if (!mm_is_protected(mm)) + return; /* * we might be called during * reset: we walk the pages and clear -- cgit v1.2.3 From 6cbf1e960fa52e4c63a6dfa4cda8736375b34ccc Mon Sep 17 00:00:00 2001 From: Collin Walling Date: Wed, 4 Nov 2020 13:10:32 -0500 Subject: KVM: s390: remove diag318 reset code The diag318 data must be set to 0 by VM-wide reset events triggered by diag308. As such, KVM should not handle resetting this data via the VCPU ioctls. Fixes: 23a60f834406 ("s390/kvm: diagnose 0x318 sync and reset") Signed-off-by: Collin Walling Reviewed-by: Christian Borntraeger Reviewed-by: Janosch Frank Acked-by: Cornelia Huck Signed-off-by: Christian Borntraeger Link: https://lore.kernel.org/r/20201104181032.109800-1-walling@linux.ibm.com --- arch/s390/kvm/kvm-s390.c | 2 -- 1 file changed, 2 deletions(-) (limited to 'arch') diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c index 08ea6c4735cd..425d3d75320b 100644 --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -3564,7 +3564,6 @@ static void kvm_arch_vcpu_ioctl_initial_reset(struct kvm_vcpu *vcpu) vcpu->arch.sie_block->pp = 0; vcpu->arch.sie_block->fpf &= ~FPF_BPBC; vcpu->arch.sie_block->todpr = 0; - vcpu->arch.sie_block->cpnc = 0; } } @@ -3582,7 +3581,6 @@ static void kvm_arch_vcpu_ioctl_clear_reset(struct kvm_vcpu *vcpu) regs->etoken = 0; regs->etoken_extension = 0; - regs->diag318 = 0; } int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs) -- cgit v1.2.3 From b2896458b850ec7cb69b054b195b4b399f7e1f22 Mon Sep 17 00:00:00 2001 From: Jiri Slaby Date: Mon, 9 Nov 2020 10:36:53 +0100 Subject: x86/platform/uv: Drop last traces of uv_flush_tlb_others Commit 39297dde7390 ("x86/platform/uv: Remove UV BAU TLB Shootdown Handler") removed uv_flush_tlb_others. Its declaration was removed also from asm/uv/uv.h. But only for the CONFIG_X86_UV=y case. The inline definition (!X86_UV case) is still in place. So remove this implementation with everything what was added to support uv_flush_tlb_others: * include of asm/tlbflush.h * forward declarations of struct cpumask, mm_struct, and flush_tlb_info Signed-off-by: Jiri Slaby Signed-off-by: Thomas Gleixner Acked-by: Mike Travis Acked-by: Steve Wahl Link: https://lore.kernel.org/r/20201109093653.2042-1-jslaby@suse.cz --- arch/x86/include/asm/uv/uv.h | 10 ---------- 1 file changed, 10 deletions(-) (limited to 'arch') diff --git a/arch/x86/include/asm/uv/uv.h b/arch/x86/include/asm/uv/uv.h index 172d3e4a9e4b..648eb23fe7f0 100644 --- a/arch/x86/include/asm/uv/uv.h +++ b/arch/x86/include/asm/uv/uv.h @@ -2,14 +2,8 @@ #ifndef _ASM_X86_UV_UV_H #define _ASM_X86_UV_UV_H -#include - enum uv_system_type {UV_NONE, UV_LEGACY_APIC, UV_X2APIC}; -struct cpumask; -struct mm_struct; -struct flush_tlb_info; - #ifdef CONFIG_X86_UV #include @@ -44,10 +38,6 @@ static inline int is_uv_system(void) { return 0; } static inline int is_uv_hubbed(int uv) { return 0; } static inline void uv_cpu_init(void) { } static inline void uv_system_init(void) { } -static inline const struct cpumask * -uv_flush_tlb_others(const struct cpumask *cpumask, - const struct flush_tlb_info *info) -{ return cpumask; } #endif /* X86_UV */ -- cgit v1.2.3 From df392aefe96b9f94efb01ef298b617bab346a9be Mon Sep 17 00:00:00 2001 From: Michael Walle Date: Mon, 9 Nov 2020 12:04:36 +0100 Subject: arm64: dts: fsl-ls1028a-kontron-sl28: specify in-band mode for ENETC Since commit 71b77a7a27a3 ("enetc: Migrate to PHYLINK and PCS_LYNX") the network port of the Kontron sl28 board is broken. After the migration to phylink the device tree has to specify the in-band-mode property. Add it. Fixes: 71b77a7a27a3 ("enetc: Migrate to PHYLINK and PCS_LYNX") Suggested-by: Vladimir Oltean Signed-off-by: Michael Walle Reviewed-by: Vladimir Oltean Link: https://lore.kernel.org/r/20201109110436.5906-1-michael@walle.cc Signed-off-by: Jakub Kicinski --- arch/arm64/boot/dts/freescale/fsl-ls1028a-kontron-sl28.dts | 1 + 1 file changed, 1 insertion(+) (limited to 'arch') diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1028a-kontron-sl28.dts b/arch/arm64/boot/dts/freescale/fsl-ls1028a-kontron-sl28.dts index f46eb47cfa4d..8161dd237971 100644 --- a/arch/arm64/boot/dts/freescale/fsl-ls1028a-kontron-sl28.dts +++ b/arch/arm64/boot/dts/freescale/fsl-ls1028a-kontron-sl28.dts @@ -75,6 +75,7 @@ &enetc_port0 { phy-handle = <&phy0>; phy-connection-type = "sgmii"; + managed = "in-band-status"; status = "okay"; mdio { -- cgit v1.2.3 From 966e7ea434484a006700c144bca629a14f93530c Mon Sep 17 00:00:00 2001 From: Heiko Carstens Date: Tue, 10 Nov 2020 14:46:55 +0100 Subject: s390: update defconfigs Signed-off-by: Heiko Carstens --- arch/s390/configs/debug_defconfig | 1 + 1 file changed, 1 insertion(+) (limited to 'arch') diff --git a/arch/s390/configs/debug_defconfig b/arch/s390/configs/debug_defconfig index a4d3c578fbd8..fe6f529ac82c 100644 --- a/arch/s390/configs/debug_defconfig +++ b/arch/s390/configs/debug_defconfig @@ -1,3 +1,4 @@ +CONFIG_UAPI_HEADER_TEST=y CONFIG_SYSVIPC=y CONFIG_POSIX_MQUEUE=y CONFIG_WATCH_QUEUE=y -- cgit v1.2.3 From 78d732e1f326f74f240d416af9484928303d9951 Mon Sep 17 00:00:00 2001 From: Thomas Richter Date: Wed, 11 Nov 2020 16:26:25 +0100 Subject: s390/cpum_sf.c: fix file permission for cpum_sfb_size This file is installed by the s390 CPU Measurement sampling facility device driver to export supported minimum and maximum sample buffer sizes. This file is read by lscpumf tool to display the details of the device driver capabilities. The lscpumf tool might be invoked by a non-root user. In this case it does not print anything because the file contents can not be read. Fix this by allowing read access for all users. Reading the file contents is ok, changing the file contents is left to the root user only. For further reference and details see: [1] https://github.com/ibm-s390-tools/s390-tools/issues/97 Fixes: 69f239ed335a ("s390/cpum_sf: Dynamically extend the sampling buffer if overflows occur") Cc: # 3.14 Signed-off-by: Thomas Richter Acked-by: Sumanth Korikkar Signed-off-by: Heiko Carstens --- arch/s390/kernel/perf_cpum_sf.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/s390/kernel/perf_cpum_sf.c b/arch/s390/kernel/perf_cpum_sf.c index 4f9e4626df55..f100c9209743 100644 --- a/arch/s390/kernel/perf_cpum_sf.c +++ b/arch/s390/kernel/perf_cpum_sf.c @@ -2228,4 +2228,4 @@ out: } arch_initcall(init_cpum_sampling_pmu); -core_param(cpum_sfb_size, CPUM_SF_MAX_SDB, sfb_size, 0640); +core_param(cpum_sfb_size, CPUM_SF_MAX_SDB, sfb_size, 0644); -- cgit v1.2.3 From 23711a5e662c1a66e14cb9288e7dfd2b840efcd5 Mon Sep 17 00:00:00 2001 From: Marc Zyngier Date: Tue, 10 Nov 2020 14:13:06 +0000 Subject: KVM: arm64: Allow setting of ID_AA64PFR0_EL1.CSV2 from userspace We now expose ID_AA64PFR0_EL1.CSV2=1 to guests running on hosts that are immune to Spectre-v2, but that don't have this field set, most likely because they predate the specification. However, this prevents the migration of guests that have started on a host the doesn't fake this CSV2 setting to one that does, as KVM rejects the write to ID_AA64PFR0_EL2 on the grounds that it isn't what is already there. In order to fix this, allow userspace to set this field as long as this doesn't result in a promising more than what is already there (setting CSV2 to 0 is acceptable, but setting it to 1 when it is already set to 0 isn't). Fixes: e1026237f9067 ("KVM: arm64: Set CSV2 for guests on hardware unaffected by Spectre-v2") Reported-by: Peng Liang Signed-off-by: Marc Zyngier Acked-by: Will Deacon Link: https://lore.kernel.org/r/20201110141308.451654-2-maz@kernel.org --- arch/arm64/include/asm/kvm_host.h | 2 ++ arch/arm64/kvm/arm.c | 16 +++++++++++++++ arch/arm64/kvm/sys_regs.c | 42 +++++++++++++++++++++++++++++++++++---- 3 files changed, 56 insertions(+), 4 deletions(-) (limited to 'arch') diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index 781d029b8aa8..0cd9f0f75c13 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -118,6 +118,8 @@ struct kvm_arch { */ unsigned long *pmu_filter; unsigned int pmuver; + + u8 pfr0_csv2; }; struct kvm_vcpu_fault_info { diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c index a3b32df1afb0..a6e25a4b4dc5 100644 --- a/arch/arm64/kvm/arm.c +++ b/arch/arm64/kvm/arm.c @@ -102,6 +102,20 @@ static int kvm_arm_default_max_vcpus(void) return vgic_present ? kvm_vgic_get_max_vcpus() : KVM_MAX_VCPUS; } +static void set_default_csv2(struct kvm *kvm) +{ + /* + * The default is to expose CSV2 == 1 if the HW isn't affected. + * Although this is a per-CPU feature, we make it global because + * asymmetric systems are just a nuisance. + * + * Userspace can override this as long as it doesn't promise + * the impossible. + */ + if (arm64_get_spectre_v2_state() == SPECTRE_UNAFFECTED) + kvm->arch.pfr0_csv2 = 1; +} + /** * kvm_arch_init_vm - initializes a VM data structure * @kvm: pointer to the KVM struct @@ -127,6 +141,8 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type) /* The maximum number of VCPUs is limited by the host's GIC model */ kvm->arch.max_vcpus = kvm_arm_default_max_vcpus(); + set_default_csv2(kvm); + return ret; out_free_stage2_pgd: kvm_free_stage2_pgd(&kvm->arch.mmu); diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c index 6d287eefd9f1..0aa86250e354 100644 --- a/arch/arm64/kvm/sys_regs.c +++ b/arch/arm64/kvm/sys_regs.c @@ -1128,9 +1128,8 @@ static u64 read_id_reg(const struct kvm_vcpu *vcpu, if (!vcpu_has_sve(vcpu)) val &= ~(0xfUL << ID_AA64PFR0_SVE_SHIFT); val &= ~(0xfUL << ID_AA64PFR0_AMU_SHIFT); - if (!(val & (0xfUL << ID_AA64PFR0_CSV2_SHIFT)) && - arm64_get_spectre_v2_state() == SPECTRE_UNAFFECTED) - val |= (1UL << ID_AA64PFR0_CSV2_SHIFT); + val &= ~(0xfUL << ID_AA64PFR0_CSV2_SHIFT); + val |= ((u64)vcpu->kvm->arch.pfr0_csv2 << ID_AA64PFR0_CSV2_SHIFT); } else if (id == SYS_ID_AA64PFR1_EL1) { val &= ~(0xfUL << ID_AA64PFR1_MTE_SHIFT); } else if (id == SYS_ID_AA64ISAR1_EL1 && !vcpu_has_ptrauth(vcpu)) { @@ -1213,6 +1212,40 @@ static unsigned int sve_visibility(const struct kvm_vcpu *vcpu, return REG_HIDDEN; } +static int set_id_aa64pfr0_el1(struct kvm_vcpu *vcpu, + const struct sys_reg_desc *rd, + const struct kvm_one_reg *reg, void __user *uaddr) +{ + const u64 id = sys_reg_to_index(rd); + int err; + u64 val; + u8 csv2; + + err = reg_from_user(&val, uaddr, id); + if (err) + return err; + + /* + * Allow AA64PFR0_EL1.CSV2 to be set from userspace as long as + * it doesn't promise more than what is actually provided (the + * guest could otherwise be covered in ectoplasmic residue). + */ + csv2 = cpuid_feature_extract_unsigned_field(val, ID_AA64PFR0_CSV2_SHIFT); + if (csv2 > 1 || + (csv2 && arm64_get_spectre_v2_state() != SPECTRE_UNAFFECTED)) + return -EINVAL; + + /* We can only differ with CSV2, and anything else is an error */ + val ^= read_id_reg(vcpu, rd, false); + val &= ~(0xFUL << ID_AA64PFR0_CSV2_SHIFT); + if (val) + return -EINVAL; + + vcpu->kvm->arch.pfr0_csv2 = csv2; + + return 0; +} + /* * cpufeature ID register user accessors * @@ -1472,7 +1505,8 @@ static const struct sys_reg_desc sys_reg_descs[] = { /* AArch64 ID registers */ /* CRm=4 */ - ID_SANITISED(ID_AA64PFR0_EL1), + { SYS_DESC(SYS_ID_AA64PFR0_EL1), .access = access_id_reg, + .get_user = get_id_reg, .set_user = set_id_aa64pfr0_el1, }, ID_SANITISED(ID_AA64PFR1_EL1), ID_UNALLOCATED(4,2), ID_UNALLOCATED(4,3), -- cgit v1.2.3 From 338b17933a6077bb5406b33d8b9fb9616fffc1af Mon Sep 17 00:00:00 2001 From: Marc Zyngier Date: Tue, 10 Nov 2020 14:13:07 +0000 Subject: KVM: arm64: Unify trap handlers injecting an UNDEF A large number of system register trap handlers only inject an UNDEF exeption, and yet each class of sysreg seems to provide its own, identical function. Let's unify them all, saving us introducing yet another one later. Signed-off-by: Marc Zyngier Acked-by: Will Deacon Link: https://lore.kernel.org/r/20201110141308.451654-3-maz@kernel.org --- arch/arm64/kvm/sys_regs.c | 65 ++++++++++++++++++----------------------------- 1 file changed, 25 insertions(+), 40 deletions(-) (limited to 'arch') diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c index 0aa86250e354..b0022f37c8f1 100644 --- a/arch/arm64/kvm/sys_regs.c +++ b/arch/arm64/kvm/sys_regs.c @@ -1038,8 +1038,8 @@ static bool access_pmuserenr(struct kvm_vcpu *vcpu, struct sys_reg_params *p, { SYS_DESC(SYS_PMEVTYPERn_EL0(n)), \ access_pmu_evtyper, reset_unknown, (PMEVTYPER0_EL0 + n), } -static bool access_amu(struct kvm_vcpu *vcpu, struct sys_reg_params *p, - const struct sys_reg_desc *r) +static bool undef_access(struct kvm_vcpu *vcpu, struct sys_reg_params *p, + const struct sys_reg_desc *r) { kvm_inject_undefined(vcpu); @@ -1047,24 +1047,10 @@ static bool access_amu(struct kvm_vcpu *vcpu, struct sys_reg_params *p, } /* Macro to expand the AMU counter and type registers*/ -#define AMU_AMEVCNTR0_EL0(n) { SYS_DESC(SYS_AMEVCNTR0_EL0(n)), access_amu } -#define AMU_AMEVTYPER0_EL0(n) { SYS_DESC(SYS_AMEVTYPER0_EL0(n)), access_amu } -#define AMU_AMEVCNTR1_EL0(n) { SYS_DESC(SYS_AMEVCNTR1_EL0(n)), access_amu } -#define AMU_AMEVTYPER1_EL0(n) { SYS_DESC(SYS_AMEVTYPER1_EL0(n)), access_amu } - -static bool trap_ptrauth(struct kvm_vcpu *vcpu, - struct sys_reg_params *p, - const struct sys_reg_desc *rd) -{ - /* - * If we land here, that is because we didn't fixup the access on exit - * by allowing the PtrAuth sysregs. The only way this happens is when - * the guest does not have PtrAuth support enabled. - */ - kvm_inject_undefined(vcpu); - - return false; -} +#define AMU_AMEVCNTR0_EL0(n) { SYS_DESC(SYS_AMEVCNTR0_EL0(n)), undef_access } +#define AMU_AMEVTYPER0_EL0(n) { SYS_DESC(SYS_AMEVTYPER0_EL0(n)), undef_access } +#define AMU_AMEVCNTR1_EL0(n) { SYS_DESC(SYS_AMEVCNTR1_EL0(n)), undef_access } +#define AMU_AMEVTYPER1_EL0(n) { SYS_DESC(SYS_AMEVTYPER1_EL0(n)), undef_access } static unsigned int ptrauth_visibility(const struct kvm_vcpu *vcpu, const struct sys_reg_desc *rd) @@ -1072,8 +1058,14 @@ static unsigned int ptrauth_visibility(const struct kvm_vcpu *vcpu, return vcpu_has_ptrauth(vcpu) ? 0 : REG_HIDDEN; } +/* + * If we land here on a PtrAuth access, that is because we didn't + * fixup the access on exit by allowing the PtrAuth sysregs. The only + * way this happens is when the guest does not have PtrAuth support + * enabled. + */ #define __PTRAUTH_KEY(k) \ - { SYS_DESC(SYS_## k), trap_ptrauth, reset_unknown, k, \ + { SYS_DESC(SYS_## k), undef_access, reset_unknown, k, \ .visibility = ptrauth_visibility} #define PTRAUTH_KEY(k) \ @@ -1374,13 +1366,6 @@ static bool access_ccsidr(struct kvm_vcpu *vcpu, struct sys_reg_params *p, return true; } -static bool access_mte_regs(struct kvm_vcpu *vcpu, struct sys_reg_params *p, - const struct sys_reg_desc *r) -{ - kvm_inject_undefined(vcpu); - return false; -} - /* sys_reg_desc initialiser for known cpufeature ID registers */ #define ID_SANITISED(name) { \ SYS_DESC(SYS_##name), \ @@ -1549,8 +1534,8 @@ static const struct sys_reg_desc sys_reg_descs[] = { { SYS_DESC(SYS_ACTLR_EL1), access_actlr, reset_actlr, ACTLR_EL1 }, { SYS_DESC(SYS_CPACR_EL1), NULL, reset_val, CPACR_EL1, 0 }, - { SYS_DESC(SYS_RGSR_EL1), access_mte_regs }, - { SYS_DESC(SYS_GCR_EL1), access_mte_regs }, + { SYS_DESC(SYS_RGSR_EL1), undef_access }, + { SYS_DESC(SYS_GCR_EL1), undef_access }, { SYS_DESC(SYS_ZCR_EL1), NULL, reset_val, ZCR_EL1, 0, .visibility = sve_visibility }, { SYS_DESC(SYS_TTBR0_EL1), access_vm_reg, reset_unknown, TTBR0_EL1 }, @@ -1576,8 +1561,8 @@ static const struct sys_reg_desc sys_reg_descs[] = { { SYS_DESC(SYS_ERXMISC0_EL1), trap_raz_wi }, { SYS_DESC(SYS_ERXMISC1_EL1), trap_raz_wi }, - { SYS_DESC(SYS_TFSR_EL1), access_mte_regs }, - { SYS_DESC(SYS_TFSRE0_EL1), access_mte_regs }, + { SYS_DESC(SYS_TFSR_EL1), undef_access }, + { SYS_DESC(SYS_TFSRE0_EL1), undef_access }, { SYS_DESC(SYS_FAR_EL1), access_vm_reg, reset_unknown, FAR_EL1 }, { SYS_DESC(SYS_PAR_EL1), NULL, reset_unknown, PAR_EL1 }, @@ -1641,14 +1626,14 @@ static const struct sys_reg_desc sys_reg_descs[] = { { SYS_DESC(SYS_TPIDR_EL0), NULL, reset_unknown, TPIDR_EL0 }, { SYS_DESC(SYS_TPIDRRO_EL0), NULL, reset_unknown, TPIDRRO_EL0 }, - { SYS_DESC(SYS_AMCR_EL0), access_amu }, - { SYS_DESC(SYS_AMCFGR_EL0), access_amu }, - { SYS_DESC(SYS_AMCGCR_EL0), access_amu }, - { SYS_DESC(SYS_AMUSERENR_EL0), access_amu }, - { SYS_DESC(SYS_AMCNTENCLR0_EL0), access_amu }, - { SYS_DESC(SYS_AMCNTENSET0_EL0), access_amu }, - { SYS_DESC(SYS_AMCNTENCLR1_EL0), access_amu }, - { SYS_DESC(SYS_AMCNTENSET1_EL0), access_amu }, + { SYS_DESC(SYS_AMCR_EL0), undef_access }, + { SYS_DESC(SYS_AMCFGR_EL0), undef_access }, + { SYS_DESC(SYS_AMCGCR_EL0), undef_access }, + { SYS_DESC(SYS_AMUSERENR_EL0), undef_access }, + { SYS_DESC(SYS_AMCNTENCLR0_EL0), undef_access }, + { SYS_DESC(SYS_AMCNTENSET0_EL0), undef_access }, + { SYS_DESC(SYS_AMCNTENCLR1_EL0), undef_access }, + { SYS_DESC(SYS_AMCNTENSET1_EL0), undef_access }, AMU_AMEVCNTR0_EL0(0), AMU_AMEVCNTR0_EL0(1), AMU_AMEVCNTR0_EL0(2), -- cgit v1.2.3 From ed4ffaf49bf9ce1002b516d8c6aa04937b7950bc Mon Sep 17 00:00:00 2001 From: Marc Zyngier Date: Tue, 10 Nov 2020 14:13:08 +0000 Subject: KVM: arm64: Handle SCXTNUM_ELx traps As the kernel never sets HCR_EL2.EnSCXT, accesses to SCXTNUM_ELx will trap to EL2. Let's handle that as gracefully as possible by injecting an UNDEF exception into the guest. This is consistent with the guest's view of ID_AA64PFR0_EL1.CSV2 being at most 1. Signed-off-by: Marc Zyngier Acked-by: Will Deacon Link: https://lore.kernel.org/r/20201110141308.451654-4-maz@kernel.org --- arch/arm64/include/asm/sysreg.h | 4 ++++ arch/arm64/kvm/sys_regs.c | 4 ++++ 2 files changed, 8 insertions(+) (limited to 'arch') diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h index d52c1b3ce589..a427a5653369 100644 --- a/arch/arm64/include/asm/sysreg.h +++ b/arch/arm64/include/asm/sysreg.h @@ -372,6 +372,8 @@ #define SYS_CONTEXTIDR_EL1 sys_reg(3, 0, 13, 0, 1) #define SYS_TPIDR_EL1 sys_reg(3, 0, 13, 0, 4) +#define SYS_SCXTNUM_EL1 sys_reg(3, 0, 13, 0, 7) + #define SYS_CNTKCTL_EL1 sys_reg(3, 0, 14, 1, 0) #define SYS_CCSIDR_EL1 sys_reg(3, 1, 0, 0, 0) @@ -404,6 +406,8 @@ #define SYS_TPIDR_EL0 sys_reg(3, 3, 13, 0, 2) #define SYS_TPIDRRO_EL0 sys_reg(3, 3, 13, 0, 3) +#define SYS_SCXTNUM_EL0 sys_reg(3, 3, 13, 0, 7) + /* Definitions for system register interface to AMU for ARMv8.4 onwards */ #define SYS_AM_EL0(crm, op2) sys_reg(3, 3, 13, (crm), (op2)) #define SYS_AMCR_EL0 SYS_AM_EL0(2, 0) diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c index b0022f37c8f1..c1726fb7f3d9 100644 --- a/arch/arm64/kvm/sys_regs.c +++ b/arch/arm64/kvm/sys_regs.c @@ -1598,6 +1598,8 @@ static const struct sys_reg_desc sys_reg_descs[] = { { SYS_DESC(SYS_CONTEXTIDR_EL1), access_vm_reg, reset_val, CONTEXTIDR_EL1, 0 }, { SYS_DESC(SYS_TPIDR_EL1), NULL, reset_unknown, TPIDR_EL1 }, + { SYS_DESC(SYS_SCXTNUM_EL1), undef_access }, + { SYS_DESC(SYS_CNTKCTL_EL1), NULL, reset_val, CNTKCTL_EL1, 0}, { SYS_DESC(SYS_CCSIDR_EL1), access_ccsidr }, @@ -1626,6 +1628,8 @@ static const struct sys_reg_desc sys_reg_descs[] = { { SYS_DESC(SYS_TPIDR_EL0), NULL, reset_unknown, TPIDR_EL0 }, { SYS_DESC(SYS_TPIDRRO_EL0), NULL, reset_unknown, TPIDRRO_EL0 }, + { SYS_DESC(SYS_SCXTNUM_EL0), undef_access }, + { SYS_DESC(SYS_AMCR_EL0), undef_access }, { SYS_DESC(SYS_AMCFGR_EL0), undef_access }, { SYS_DESC(SYS_AMCGCR_EL0), undef_access }, -- cgit v1.2.3 From 77c7e1bc060deab6430f1dff5922ccd3093d9776 Mon Sep 17 00:00:00 2001 From: Mike Travis Date: Tue, 10 Nov 2020 19:04:18 -0600 Subject: x86/platform/uv: Fix copied UV5 output archtype A test shows that the output contains a space: # cat /proc/sgi_uv/archtype NSGI4 U/UVX Remove that embedded space by copying the "trimmed" buffer instead of the untrimmed input character list. Use sizeof to remove size dependency on copy out length. Increase output buffer size by one character just in case BIOS sends an 8 character string for archtype. Fixes: 1e61f5a95f19 ("Add and decode Arch Type in UVsystab") Signed-off-by: Mike Travis Signed-off-by: Thomas Gleixner Reviewed-by: Steve Wahl Link: https://lore.kernel.org/r/20201111010418.82133-1-mike.travis@hpe.com --- arch/x86/kernel/apic/x2apic_uv_x.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'arch') diff --git a/arch/x86/kernel/apic/x2apic_uv_x.c b/arch/x86/kernel/apic/x2apic_uv_x.c index 3115caa7d7d0..1b98f8c12b96 100644 --- a/arch/x86/kernel/apic/x2apic_uv_x.c +++ b/arch/x86/kernel/apic/x2apic_uv_x.c @@ -33,7 +33,7 @@ static union uvh_apicid uvh_apicid; static int uv_node_id; /* Unpack AT/OEM/TABLE ID's to be NULL terminated strings */ -static u8 uv_archtype[UV_AT_SIZE]; +static u8 uv_archtype[UV_AT_SIZE + 1]; static u8 oem_id[ACPI_OEM_ID_SIZE + 1]; static u8 oem_table_id[ACPI_OEM_TABLE_ID_SIZE + 1]; @@ -320,7 +320,7 @@ static int __init decode_arch_type(unsigned long ptr) if (n > 0 && n < sizeof(uv_ate->archtype)) { pr_info("UV: UVarchtype received from BIOS\n"); - uv_stringify(UV_AT_SIZE, uv_archtype, uv_ate->archtype); + uv_stringify(sizeof(uv_archtype), uv_archtype, uv_ate->archtype); return 1; } return 0; @@ -378,7 +378,7 @@ static int __init uv_set_system_type(char *_oem_id, char *_oem_table_id) if (!early_get_arch_type()) /* If not use OEM ID for UVarchtype */ - uv_stringify(UV_AT_SIZE, uv_archtype, _oem_id); + uv_stringify(sizeof(uv_archtype), uv_archtype, oem_id); /* Check if not hubbed */ if (strncmp(uv_archtype, "SGI", 3) != 0) { -- cgit v1.2.3 From 58284a901b426e6130672e9f14c30dfd5a9dbde0 Mon Sep 17 00:00:00 2001 From: Anshuman Khandual Date: Fri, 13 Nov 2020 13:00:14 +0530 Subject: arm64/mm: Validate hotplug range before creating linear mapping During memory hotplug process, the linear mapping should not be created for a given memory range if that would fall outside the maximum allowed linear range. Else it might cause memory corruption in the kernel virtual space. Maximum linear mapping region is [PAGE_OFFSET..(PAGE_END -1)] accommodating both its ends but excluding PAGE_END. Max physical range that can be mapped inside this linear mapping range, must also be derived from its end points. This ensures that arch_add_memory() validates memory hot add range for its potential linear mapping requirements, before creating it with __create_pgd_mapping(). Fixes: 4ab215061554 ("arm64: Add memory hotplug support") Signed-off-by: Anshuman Khandual Reviewed-by: Ard Biesheuvel Cc: Catalin Marinas Cc: Will Deacon Cc: Mark Rutland Cc: Ard Biesheuvel Cc: Steven Price Cc: Robin Murphy Cc: David Hildenbrand Cc: Andrew Morton Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Link: https://lore.kernel.org/r/1605252614-761-1-git-send-email-anshuman.khandual@arm.com Signed-off-by: Will Deacon --- arch/arm64/mm/mmu.c | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) (limited to 'arch') diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c index 1c0f3e02f731..ca692a815731 100644 --- a/arch/arm64/mm/mmu.c +++ b/arch/arm64/mm/mmu.c @@ -1444,11 +1444,28 @@ static void __remove_pgd_mapping(pgd_t *pgdir, unsigned long start, u64 size) free_empty_tables(start, end, PAGE_OFFSET, PAGE_END); } +static bool inside_linear_region(u64 start, u64 size) +{ + /* + * Linear mapping region is the range [PAGE_OFFSET..(PAGE_END - 1)] + * accommodating both its ends but excluding PAGE_END. Max physical + * range which can be mapped inside this linear mapping range, must + * also be derived from its end points. + */ + return start >= __pa(_PAGE_OFFSET(vabits_actual)) && + (start + size - 1) <= __pa(PAGE_END - 1); +} + int arch_add_memory(int nid, u64 start, u64 size, struct mhp_params *params) { int ret, flags = 0; + if (!inside_linear_region(start, size)) { + pr_err("[%llx %llx] is outside linear mapping region\n", start, start + size); + return -EINVAL; + } + if (rodata_full || debug_pagealloc_enabled()) flags = NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS; -- cgit v1.2.3 From 77473cffef21611b4423f613fe32836afb26405e Mon Sep 17 00:00:00 2001 From: Konrad Dybcio Date: Thu, 5 Nov 2020 00:22:10 +0100 Subject: arm64: Add MIDR value for KRYO2XX gold/silver CPU cores Add MIDR value for KRYO2XX gold (big) and silver (LITTLE) CPU cores which are used in Qualcomm Technologies, Inc. SoCs. This will be used to identify and apply errata which are applicable for these CPU cores. Signed-off-by: Konrad Dybcio Link: https://lore.kernel.org/r/20201104232218.198800-2-konrad.dybcio@somainline.org Signed-off-by: Will Deacon --- arch/arm64/include/asm/cputype.h | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'arch') diff --git a/arch/arm64/include/asm/cputype.h b/arch/arm64/include/asm/cputype.h index 9e2e9a63c7b6..ef5b040dee44 100644 --- a/arch/arm64/include/asm/cputype.h +++ b/arch/arm64/include/asm/cputype.h @@ -86,6 +86,8 @@ #define QCOM_CPU_PART_FALKOR_V1 0x800 #define QCOM_CPU_PART_FALKOR 0xC00 #define QCOM_CPU_PART_KRYO 0x200 +#define QCOM_CPU_PART_KRYO_2XX_GOLD 0x800 +#define QCOM_CPU_PART_KRYO_2XX_SILVER 0x801 #define QCOM_CPU_PART_KRYO_3XX_SILVER 0x803 #define QCOM_CPU_PART_KRYO_4XX_GOLD 0x804 #define QCOM_CPU_PART_KRYO_4XX_SILVER 0x805 @@ -116,6 +118,8 @@ #define MIDR_QCOM_FALKOR_V1 MIDR_CPU_MODEL(ARM_CPU_IMP_QCOM, QCOM_CPU_PART_FALKOR_V1) #define MIDR_QCOM_FALKOR MIDR_CPU_MODEL(ARM_CPU_IMP_QCOM, QCOM_CPU_PART_FALKOR) #define MIDR_QCOM_KRYO MIDR_CPU_MODEL(ARM_CPU_IMP_QCOM, QCOM_CPU_PART_KRYO) +#define MIDR_QCOM_KRYO_2XX_GOLD MIDR_CPU_MODEL(ARM_CPU_IMP_QCOM, QCOM_CPU_PART_KRYO_2XX_GOLD) +#define MIDR_QCOM_KRYO_2XX_SILVER MIDR_CPU_MODEL(ARM_CPU_IMP_QCOM, QCOM_CPU_PART_KRYO_2XX_SILVER) #define MIDR_QCOM_KRYO_3XX_SILVER MIDR_CPU_MODEL(ARM_CPU_IMP_QCOM, QCOM_CPU_PART_KRYO_3XX_SILVER) #define MIDR_QCOM_KRYO_4XX_GOLD MIDR_CPU_MODEL(ARM_CPU_IMP_QCOM, QCOM_CPU_PART_KRYO_4XX_GOLD) #define MIDR_QCOM_KRYO_4XX_SILVER MIDR_CPU_MODEL(ARM_CPU_IMP_QCOM, QCOM_CPU_PART_KRYO_4XX_SILVER) -- cgit v1.2.3 From e3dd11a9f2521cecbcf30c2fd17ecc5a445dfb94 Mon Sep 17 00:00:00 2001 From: Konrad Dybcio Date: Thu, 5 Nov 2020 00:22:11 +0100 Subject: arm64: kpti: Add KRYO2XX gold/silver CPU cores to kpti safelist QCOM KRYO2XX gold (big) silver (LITTLE) CPU cores are based on Cortex-A73 and Cortex-A53 respectively and are meltdown safe, hence add them to kpti_safe_list[]. Signed-off-by: Konrad Dybcio Link: https://lore.kernel.org/r/20201104232218.198800-3-konrad.dybcio@somainline.org Signed-off-by: Will Deacon --- arch/arm64/kernel/cpufeature.c | 2 ++ 1 file changed, 2 insertions(+) (limited to 'arch') diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index dcc165b3fc04..6f36c4f62f69 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -1337,6 +1337,8 @@ static bool unmap_kernel_at_el0(const struct arm64_cpu_capabilities *entry, MIDR_ALL_VERSIONS(MIDR_CORTEX_A73), MIDR_ALL_VERSIONS(MIDR_HISI_TSV110), MIDR_ALL_VERSIONS(MIDR_NVIDIA_CARMEL), + MIDR_ALL_VERSIONS(MIDR_QCOM_KRYO_2XX_GOLD), + MIDR_ALL_VERSIONS(MIDR_QCOM_KRYO_2XX_SILVER), MIDR_ALL_VERSIONS(MIDR_QCOM_KRYO_3XX_SILVER), MIDR_ALL_VERSIONS(MIDR_QCOM_KRYO_4XX_SILVER), { /* sentinel */ } -- cgit v1.2.3 From 38328d40116739af0692748427bedda35b286c33 Mon Sep 17 00:00:00 2001 From: Konrad Dybcio Date: Thu, 5 Nov 2020 00:22:12 +0100 Subject: arm64: proton-pack: Add KRYO2XX silver CPUs to spectre-v2 safe-list KRYO2XX silver (LITTLE) CPUs are based on Cortex-A53 and they are not affected by spectre-v2. Signed-off-by: Konrad Dybcio Link: https://lore.kernel.org/r/20201104232218.198800-4-konrad.dybcio@somainline.org Signed-off-by: Will Deacon --- arch/arm64/kernel/proton-pack.c | 1 + 1 file changed, 1 insertion(+) (limited to 'arch') diff --git a/arch/arm64/kernel/proton-pack.c b/arch/arm64/kernel/proton-pack.c index c18eb7d41274..f6e4e3737405 100644 --- a/arch/arm64/kernel/proton-pack.c +++ b/arch/arm64/kernel/proton-pack.c @@ -118,6 +118,7 @@ static enum mitigation_state spectre_v2_get_cpu_hw_mitigation_state(void) MIDR_ALL_VERSIONS(MIDR_CORTEX_A55), MIDR_ALL_VERSIONS(MIDR_BRAHMA_B53), MIDR_ALL_VERSIONS(MIDR_HISI_TSV110), + MIDR_ALL_VERSIONS(MIDR_QCOM_KRYO_2XX_SILVER), MIDR_ALL_VERSIONS(MIDR_QCOM_KRYO_3XX_SILVER), MIDR_ALL_VERSIONS(MIDR_QCOM_KRYO_4XX_SILVER), { /* sentinel */ } -- cgit v1.2.3 From 23c216416056148136bdaf0cdd18caf4904bb6e1 Mon Sep 17 00:00:00 2001 From: Konrad Dybcio Date: Thu, 5 Nov 2020 00:22:13 +0100 Subject: arm64: cpu_errata: Apply Erratum 845719 to KRYO2XX Silver QCOM KRYO2XX Silver cores are Cortex-A53 based and are susceptible to the 845719 erratum. Add them to the lookup list to apply the erratum. Signed-off-by: Konrad Dybcio Link: https://lore.kernel.org/r/20201104232218.198800-5-konrad.dybcio@somainline.org Signed-off-by: Will Deacon --- arch/arm64/kernel/cpu_errata.c | 2 ++ 1 file changed, 2 insertions(+) (limited to 'arch') diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c index 61314fd70f13..cafaf0da05b7 100644 --- a/arch/arm64/kernel/cpu_errata.c +++ b/arch/arm64/kernel/cpu_errata.c @@ -299,6 +299,8 @@ static const struct midr_range erratum_845719_list[] = { MIDR_REV_RANGE(MIDR_CORTEX_A53, 0, 0, 4), /* Brahma-B53 r0p[0] */ MIDR_REV(MIDR_BRAHMA_B53, 0, 0), + /* Kryo2XX Silver rAp4 */ + MIDR_REV(MIDR_QCOM_KRYO_2XX_SILVER, 0xa, 0x4), {}, }; #endif -- cgit v1.2.3 From 51b958e5aeb1e18c00332e0b37c5d4e95a3eff84 Mon Sep 17 00:00:00 2001 From: David Edmondson Date: Tue, 3 Nov 2020 12:04:00 +0000 Subject: KVM: x86: clflushopt should be treated as a no-op by emulation The instruction emulator ignores clflush instructions, yet fails to support clflushopt. Treat both similarly. Fixes: 13e457e0eebf ("KVM: x86: Emulator does not decode clflush well") Signed-off-by: David Edmondson Message-Id: <20201103120400.240882-1-david.edmondson@oracle.com> Reviewed-by: Joao Martins Signed-off-by: Paolo Bonzini --- arch/x86/kvm/emulate.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 0d917eb70319..56cae1ff9e3f 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -4046,6 +4046,12 @@ static int em_clflush(struct x86_emulate_ctxt *ctxt) return X86EMUL_CONTINUE; } +static int em_clflushopt(struct x86_emulate_ctxt *ctxt) +{ + /* emulating clflushopt regardless of cpuid */ + return X86EMUL_CONTINUE; +} + static int em_movsxd(struct x86_emulate_ctxt *ctxt) { ctxt->dst.val = (s32) ctxt->src.val; @@ -4585,7 +4591,7 @@ static const struct opcode group11[] = { }; static const struct gprefix pfx_0f_ae_7 = { - I(SrcMem | ByteOp, em_clflush), N, N, N, + I(SrcMem | ByteOp, em_clflush), I(SrcMem | ByteOp, em_clflushopt), N, N, }; static const struct group_dual group15 = { { -- cgit v1.2.3 From 0107973a80adad5b73232d3fbcd26f710ab1f851 Mon Sep 17 00:00:00 2001 From: Babu Moger Date: Thu, 12 Nov 2020 16:17:56 -0600 Subject: KVM: x86: Introduce cr3_lm_rsvd_bits in kvm_vcpu_arch SEV guests fail to boot on a system that supports the PCID feature. While emulating the RSM instruction, KVM reads the guest CR3 and calls kvm_set_cr3(). If the vCPU is in the long mode, kvm_set_cr3() does a sanity check for the CR3 value. In this case, it validates whether the value has any reserved bits set. The reserved bit range is 63:cpuid_maxphysaddr(). When AMD memory encryption is enabled, the memory encryption bit is set in the CR3 value. The memory encryption bit may fall within the KVM reserved bit range, causing the KVM emulation failure. Introduce a new field cr3_lm_rsvd_bits in kvm_vcpu_arch which will cache the reserved bits in the CR3 value. This will be initialized to rsvd_bits(cpuid_maxphyaddr(vcpu), 63). If the architecture has any special bits(like AMD SEV encryption bit) that needs to be masked from the reserved bits, should be cleared in vendor specific kvm_x86_ops.vcpu_after_set_cpuid handler. Fixes: a780a3ea628268b2 ("KVM: X86: Fix reserved bits check for MOV to CR3") Signed-off-by: Babu Moger Message-Id: <160521947657.32054.3264016688005356563.stgit@bmoger-ubuntu> Signed-off-by: Paolo Bonzini --- arch/x86/include/asm/kvm_host.h | 1 + arch/x86/kvm/cpuid.c | 2 ++ arch/x86/kvm/x86.c | 2 +- 3 files changed, 4 insertions(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index d44858b69353..324ddd7fd0aa 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -639,6 +639,7 @@ struct kvm_vcpu_arch { int cpuid_nent; struct kvm_cpuid_entry2 *cpuid_entries; + unsigned long cr3_lm_rsvd_bits; int maxphyaddr; int max_tdp_level; diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index d50041f570e8..f87b5dfbaba4 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -178,6 +178,8 @@ static void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu) vcpu->arch.cr4_guest_rsvd_bits = __cr4_reserved_bits(guest_cpuid_has, vcpu); + vcpu->arch.cr3_lm_rsvd_bits = rsvd_bits(cpuid_maxphyaddr(vcpu), 63); + /* Invoke the vendor callback only after the above state is updated. */ kvm_x86_ops.vcpu_after_set_cpuid(vcpu); } diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 447edc0d1d5a..078a39d489fe 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1041,7 +1041,7 @@ int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3) } if (is_long_mode(vcpu) && - (cr3 & rsvd_bits(cpuid_maxphyaddr(vcpu), 63))) + (cr3 & vcpu->arch.cr3_lm_rsvd_bits)) return 1; else if (is_pae_paging(vcpu) && !load_pdptrs(vcpu, vcpu->arch.walk_mmu, cr3)) -- cgit v1.2.3 From 96308b066184d6dcdb677890e620e68290ae98ae Mon Sep 17 00:00:00 2001 From: Babu Moger Date: Thu, 12 Nov 2020 16:18:03 -0600 Subject: KVM: SVM: Update cr3_lm_rsvd_bits for AMD SEV guests For AMD SEV guests, update the cr3_lm_rsvd_bits to mask the memory encryption bit in reserved bits. Signed-off-by: Babu Moger Message-Id: <160521948301.32054.5783800787423231162.stgit@bmoger-ubuntu> Signed-off-by: Paolo Bonzini --- arch/x86/kvm/svm/svm.c | 8 ++++++++ 1 file changed, 8 insertions(+) (limited to 'arch') diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index 2f32fd09e259..1e81cfebd491 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -3741,6 +3741,7 @@ static u64 svm_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio) static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu) { struct vcpu_svm *svm = to_svm(vcpu); + struct kvm_cpuid_entry2 *best; vcpu->arch.xsaves_enabled = guest_cpuid_has(vcpu, X86_FEATURE_XSAVE) && boot_cpu_has(X86_FEATURE_XSAVE) && @@ -3753,6 +3754,13 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu) /* Check again if INVPCID interception if required */ svm_check_invpcid(svm); + /* For sev guests, the memory encryption bit is not reserved in CR3. */ + if (sev_guest(vcpu->kvm)) { + best = kvm_find_cpuid_entry(vcpu, 0x8000001F, 0); + if (best) + vcpu->arch.cr3_lm_rsvd_bits &= ~(1UL << (best->ebx & 0x3f)); + } + if (!kvm_vcpu_apicv_active(vcpu)) return; -- cgit v1.2.3 From 53bf2776e31376f0b6a1fd7c9e1abc61241825a2 Mon Sep 17 00:00:00 2001 From: Krzysztof Kozlowski Date: Thu, 5 Nov 2020 22:47:02 +0100 Subject: ARM: dts: exynos: revert "add input clock to CMU in Exynos4412 Odroid" This reverts commit eaf2d2f6895d676dda6c95a652b58594f2887720. The commit eaf2d2f6895d ("ARM: dts: exynos: add input clock to CMU in Exynos4412 Odroid") breaks probing of usb3503 USB hub on Odroid U3. It changes the order of clock drivers probe: the clkout (Exynos PMU) driver is probed before the main clk-exynos4 driver. The clkout driver on Exynos4412 depends on clk-exynos4 but it does not support deferred probe, therefore this dependency and changed probe order causes probe failure. The usb3503 USB hub on Odroid U3 on the other hand requires clkout clock. This can be seen in logs: [ 5.007442] usb3503 0-0008: unable to request refclk (-517) Reported-by: Marek Szyprowski Signed-off-by: Krzysztof Kozlowski Link: https://lore.kernel.org/r/20200921174818.15525-1-krzk@kernel.org Signed-off-by: Arnd Bergmann --- arch/arm/boot/dts/exynos4412-odroid-common.dtsi | 1 - 1 file changed, 1 deletion(-) (limited to 'arch') diff --git a/arch/arm/boot/dts/exynos4412-odroid-common.dtsi b/arch/arm/boot/dts/exynos4412-odroid-common.dtsi index ab291cec650a..2983e91bc7dd 100644 --- a/arch/arm/boot/dts/exynos4412-odroid-common.dtsi +++ b/arch/arm/boot/dts/exynos4412-odroid-common.dtsi @@ -122,7 +122,6 @@ }; &clock { - clocks = <&clock CLK_XUSBXTI>; assigned-clocks = <&clock CLK_FOUT_EPLL>; assigned-clock-rates = <45158401>; }; -- cgit v1.2.3 From c887c9b9ca62c051d339b1c7b796edf2724029ed Mon Sep 17 00:00:00 2001 From: Paolo Bonzini Date: Sun, 15 Nov 2020 08:55:43 -0500 Subject: kvm: mmu: fix is_tdp_mmu_check when the TDP MMU is not in use In some cases where shadow paging is in use, the root page will be either mmu->pae_root or vcpu->arch.mmu->lm_root. Then it will not have an associated struct kvm_mmu_page, because it is allocated with alloc_page instead of kvm_mmu_alloc_page. Just return false quickly from is_tdp_mmu_root if the TDP MMU is not in use, which also includes the case where shadow paging is enabled. Signed-off-by: Paolo Bonzini --- arch/x86/kvm/mmu/tdp_mmu.c | 7 +++++++ 1 file changed, 7 insertions(+) (limited to 'arch') diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 27e381c9da6c..ff28a5c6abd6 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -49,7 +49,14 @@ bool is_tdp_mmu_root(struct kvm *kvm, hpa_t hpa) { struct kvm_mmu_page *sp; + if (!kvm->arch.tdp_mmu_enabled) + return false; + if (WARN_ON(!VALID_PAGE(hpa))) + return false; + sp = to_shadow_page(hpa); + if (WARN_ON(!sp)) + return false; return sp->tdp_mmu_page && sp->root_count; } -- cgit v1.2.3 From 575cba20c421ecb6b563ae352e4e0468e4ca8b3c Mon Sep 17 00:00:00 2001 From: Nicholas Piggin Date: Sat, 14 Nov 2020 21:47:43 +1000 Subject: powerpc/64s: Fix KVM system reset handling when CONFIG_PPC_PSERIES=y pseries guest kernels have a FWNMI handler for SRESET and MCE NMIs, which is basically the same as the regular handlers for those interrupts. The system reset FWNMI handler did not have a KVM guest test in it, although it probably should have because the guest can itself run guests. Commit 4f50541f6703b ("powerpc/64s/exception: Move all interrupt handlers to new style code gen macros") convert the handler faithfully to avoid a KVM test with a "clever" trick to modify the IKVM_REAL setting to 0 when the fwnmi handler is to be generated (PPC_PSERIES=y). This worked when the KVM test was generated in the interrupt entry handlers, but a later patch moved the KVM test to the common handler, and the common handler macro is expanded below the fwnmi entry. This prevents the KVM test from being generated even for the 0x100 entry point as well. The result is NMI IPIs in the host kernel when a guest is running will use gest registers. This goes particularly badly when an HPT guest is running and the MMU is set to guest mode. Remove this trickery and just generate the test always. Fixes: 9600f261acaa ("powerpc/64s/exception: Move KVM test to common code") Cc: stable@vger.kernel.org # v5.7+ Signed-off-by: Nicholas Piggin Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20201114114743.3306283-1-npiggin@gmail.com --- arch/powerpc/kernel/exceptions-64s.S | 2 -- 1 file changed, 2 deletions(-) (limited to 'arch') diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index f7d748b88705..07d64883c0b5 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -1000,8 +1000,6 @@ TRAMP_REAL_BEGIN(system_reset_idle_wake) * Vectors for the FWNMI option. Share common code. */ TRAMP_REAL_BEGIN(system_reset_fwnmi) - /* XXX: fwnmi guest could run a nested/PR guest, so why no test? */ - __IKVM_REAL(system_reset)=0 GEN_INT_ENTRY system_reset, virt=0 #endif /* CONFIG_PPC_PSERIES */ -- cgit v1.2.3 From 7bab16a6075b7b94999666355ab532c3dabb94f9 Mon Sep 17 00:00:00 2001 From: Jamie Iles Date: Fri, 13 Nov 2020 15:04:06 +0000 Subject: KVM: arm64: Correctly align nVHE percpu data The nVHE percpu data is partially linked but the nVHE linker script did not align the percpu section. The PERCPU_INPUT macro would then align the data to a page boundary: #define PERCPU_INPUT(cacheline) \ __per_cpu_start = .; \ *(.data..percpu..first) \ . = ALIGN(PAGE_SIZE); \ *(.data..percpu..page_aligned) \ . = ALIGN(cacheline); \ *(.data..percpu..read_mostly) \ . = ALIGN(cacheline); \ *(.data..percpu) \ *(.data..percpu..shared_aligned) \ PERCPU_DECRYPTED_SECTION \ __per_cpu_end = .; but then when the final vmlinux linking happens the hypervisor percpu data is included after page alignment and so the offsets potentially don't match. On my build I saw that the .hyp.data..percpu section was at address 0x20 and then the percpu data would begin at 0x1000 (because of the page alignment in PERCPU_INPUT), but when linked into vmlinux, everything would be shifted down by 0x20 bytes. This manifests as one of the CPUs getting lost when running kvm-unit-tests or starting any VM and subsequent soft lockup on a Cortex A72 device. Fixes: 30c953911c43 ("kvm: arm64: Set up hyp percpu data for nVHE") Signed-off-by: Jamie Iles Signed-off-by: Marc Zyngier Acked-by: David Brazdil Cc: David Brazdil Cc: Marc Zyngier Cc: Will Deacon Link: https://lore.kernel.org/r/20201113150406.14314-1-jamie@nuviainc.com --- arch/arm64/kvm/hyp/nvhe/hyp.lds.S | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'arch') diff --git a/arch/arm64/kvm/hyp/nvhe/hyp.lds.S b/arch/arm64/kvm/hyp/nvhe/hyp.lds.S index bb2d986ff696..a797abace13f 100644 --- a/arch/arm64/kvm/hyp/nvhe/hyp.lds.S +++ b/arch/arm64/kvm/hyp/nvhe/hyp.lds.S @@ -13,6 +13,11 @@ SECTIONS { HYP_SECTION(.text) + /* + * .hyp..data..percpu needs to be page aligned to maintain the same + * alignment for when linking into vmlinux. + */ + . = ALIGN(PAGE_SIZE); HYP_SECTION_NAME(.data..percpu) : { PERCPU_INPUT(L1_CACHE_BYTES) } -- cgit v1.2.3 From 481535c5b41d191b22775a6873de5ec0e1cdced1 Mon Sep 17 00:00:00 2001 From: Max Filippov Date: Mon, 16 Nov 2020 01:25:56 -0800 Subject: xtensa: fix TLBTEMP area placement fast_second_level_miss handler for the TLBTEMP area has an assumption that page table directory entry for the TLBTEMP address range is 0. For it to be true the TLBTEMP area must be aligned to 4MB boundary and not share its 4MB region with anything that may use a page table. This is not true currently: TLBTEMP shares space with vmalloc space which results in the following kinds of runtime errors when fast_second_level_miss loads page table directory entry for the vmalloc space instead of fixing up the TLBTEMP area: Unable to handle kernel paging request at virtual address c7ff0e00 pc = d0009275, ra = 90009478 Oops: sig: 9 [#1] PREEMPT CPU: 1 PID: 61 Comm: kworker/u9:2 Not tainted 5.10.0-rc3-next-20201110-00007-g1fe4962fa983-dirty #58 Workqueue: xprtiod xs_stream_data_receive_workfn a00: 90009478 d11e1dc0 c7ff0e00 00000020 c7ff0000 00000001 7f8b8107 00000000 a08: 900c5992 d11e1d90 d0cc88b8 5506e97c 00000000 5506e97c d06c8074 d11e1d90 pc: d0009275, ps: 00060310, depc: 00000014, excvaddr: c7ff0e00 lbeg: d0009275, lend: d0009287 lcount: 00000003, sar: 00000010 Call Trace: xs_stream_data_receive_workfn+0x43c/0x770 process_one_work+0x1a1/0x324 worker_thread+0x1cc/0x3c0 kthread+0x10d/0x124 ret_from_kernel_thread+0xc/0x18 Cc: stable@vger.kernel.org Signed-off-by: Max Filippov --- Documentation/xtensa/mmu.rst | 9 ++++++--- arch/xtensa/include/asm/pgtable.h | 2 +- 2 files changed, 7 insertions(+), 4 deletions(-) (limited to 'arch') diff --git a/Documentation/xtensa/mmu.rst b/Documentation/xtensa/mmu.rst index e52a12960fdc..450573afa31a 100644 --- a/Documentation/xtensa/mmu.rst +++ b/Documentation/xtensa/mmu.rst @@ -82,7 +82,8 @@ Default MMUv2-compatible layout:: +------------------+ | VMALLOC area | VMALLOC_START 0xc0000000 128MB - 64KB +------------------+ VMALLOC_END - | Cache aliasing | TLBTEMP_BASE_1 0xc7ff0000 DCACHE_WAY_SIZE + +------------------+ + | Cache aliasing | TLBTEMP_BASE_1 0xc8000000 DCACHE_WAY_SIZE | remap area 1 | +------------------+ | Cache aliasing | TLBTEMP_BASE_2 DCACHE_WAY_SIZE @@ -124,7 +125,8 @@ Default MMUv2-compatible layout:: +------------------+ | VMALLOC area | VMALLOC_START 0xa0000000 128MB - 64KB +------------------+ VMALLOC_END - | Cache aliasing | TLBTEMP_BASE_1 0xa7ff0000 DCACHE_WAY_SIZE + +------------------+ + | Cache aliasing | TLBTEMP_BASE_1 0xa8000000 DCACHE_WAY_SIZE | remap area 1 | +------------------+ | Cache aliasing | TLBTEMP_BASE_2 DCACHE_WAY_SIZE @@ -167,7 +169,8 @@ Default MMUv2-compatible layout:: +------------------+ | VMALLOC area | VMALLOC_START 0x90000000 128MB - 64KB +------------------+ VMALLOC_END - | Cache aliasing | TLBTEMP_BASE_1 0x97ff0000 DCACHE_WAY_SIZE + +------------------+ + | Cache aliasing | TLBTEMP_BASE_1 0x98000000 DCACHE_WAY_SIZE | remap area 1 | +------------------+ | Cache aliasing | TLBTEMP_BASE_2 DCACHE_WAY_SIZE diff --git a/arch/xtensa/include/asm/pgtable.h b/arch/xtensa/include/asm/pgtable.h index fa054a1772e1..4dc04e6c01d7 100644 --- a/arch/xtensa/include/asm/pgtable.h +++ b/arch/xtensa/include/asm/pgtable.h @@ -69,7 +69,7 @@ */ #define VMALLOC_START (XCHAL_KSEG_CACHED_VADDR - 0x10000000) #define VMALLOC_END (VMALLOC_START + 0x07FEFFFF) -#define TLBTEMP_BASE_1 (VMALLOC_END + 1) +#define TLBTEMP_BASE_1 (VMALLOC_START + 0x08000000) #define TLBTEMP_BASE_2 (TLBTEMP_BASE_1 + DCACHE_WAY_SIZE) #if 2 * DCACHE_WAY_SIZE > ICACHE_WAY_SIZE #define TLBTEMP_SIZE (2 * DCACHE_WAY_SIZE) -- cgit v1.2.3 From 3a860d165eb5f4d7cf0bf81ef6a5b5c5e1754422 Mon Sep 17 00:00:00 2001 From: Max Filippov Date: Mon, 16 Nov 2020 01:38:59 -0800 Subject: xtensa: disable preemption around cache alias management calls Although cache alias management calls set up and tear down TLB entries and fast_second_level_miss is able to restore TLB entry should it be evicted they absolutely cannot preempt each other because they use the same TLBTEMP area for different purposes. Disable preemption around all cache alias management calls to enforce that. Cc: stable@vger.kernel.org Signed-off-by: Max Filippov --- arch/xtensa/mm/cache.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+) (limited to 'arch') diff --git a/arch/xtensa/mm/cache.c b/arch/xtensa/mm/cache.c index 5835406b3cec..085b8c77b9d9 100644 --- a/arch/xtensa/mm/cache.c +++ b/arch/xtensa/mm/cache.c @@ -70,8 +70,10 @@ static inline void kmap_invalidate_coherent(struct page *page, kvaddr = TLBTEMP_BASE_1 + (page_to_phys(page) & DCACHE_ALIAS_MASK); + preempt_disable(); __invalidate_dcache_page_alias(kvaddr, page_to_phys(page)); + preempt_enable(); } } } @@ -156,6 +158,7 @@ void flush_dcache_page(struct page *page) if (!alias && !mapping) return; + preempt_disable(); virt = TLBTEMP_BASE_1 + (phys & DCACHE_ALIAS_MASK); __flush_invalidate_dcache_page_alias(virt, phys); @@ -166,6 +169,7 @@ void flush_dcache_page(struct page *page) if (mapping) __invalidate_icache_page_alias(virt, phys); + preempt_enable(); } /* There shouldn't be an entry in the cache for this page anymore. */ @@ -199,8 +203,10 @@ void local_flush_cache_page(struct vm_area_struct *vma, unsigned long address, unsigned long phys = page_to_phys(pfn_to_page(pfn)); unsigned long virt = TLBTEMP_BASE_1 + (address & DCACHE_ALIAS_MASK); + preempt_disable(); __flush_invalidate_dcache_page_alias(virt, phys); __invalidate_icache_page_alias(virt, phys); + preempt_enable(); } EXPORT_SYMBOL(local_flush_cache_page); @@ -227,11 +233,13 @@ update_mmu_cache(struct vm_area_struct * vma, unsigned long addr, pte_t *ptep) unsigned long phys = page_to_phys(page); unsigned long tmp; + preempt_disable(); tmp = TLBTEMP_BASE_1 + (phys & DCACHE_ALIAS_MASK); __flush_invalidate_dcache_page_alias(tmp, phys); tmp = TLBTEMP_BASE_1 + (addr & DCACHE_ALIAS_MASK); __flush_invalidate_dcache_page_alias(tmp, phys); __invalidate_icache_page_alias(tmp, phys); + preempt_enable(); clear_bit(PG_arch_1, &page->flags); } @@ -265,7 +273,9 @@ void copy_to_user_page(struct vm_area_struct *vma, struct page *page, if (alias) { unsigned long t = TLBTEMP_BASE_1 + (vaddr & DCACHE_ALIAS_MASK); + preempt_disable(); __flush_invalidate_dcache_page_alias(t, phys); + preempt_enable(); } /* Copy data */ @@ -280,9 +290,11 @@ void copy_to_user_page(struct vm_area_struct *vma, struct page *page, if (alias) { unsigned long t = TLBTEMP_BASE_1 + (vaddr & DCACHE_ALIAS_MASK); + preempt_disable(); __flush_invalidate_dcache_range((unsigned long) dst, len); if ((vma->vm_flags & VM_EXEC) != 0) __invalidate_icache_page_alias(t, phys); + preempt_enable(); } else if ((vma->vm_flags & VM_EXEC) != 0) { __flush_dcache_range((unsigned long)dst,len); @@ -304,7 +316,9 @@ extern void copy_from_user_page(struct vm_area_struct *vma, struct page *page, if (alias) { unsigned long t = TLBTEMP_BASE_1 + (vaddr & DCACHE_ALIAS_MASK); + preempt_disable(); __flush_invalidate_dcache_page_alias(t, phys); + preempt_enable(); } memcpy(dst, src, len); -- cgit v1.2.3 From 75b49620267c700f0a07fec7f27f69852db70e46 Mon Sep 17 00:00:00 2001 From: Cédric Le Goater Date: Thu, 5 Nov 2020 14:47:13 +0100 Subject: KVM: PPC: Book3S HV: XIVE: Fix possible oops when accessing ESB page MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit When accessing the ESB page of a source interrupt, the fault handler will retrieve the page address from the XIVE interrupt 'xive_irq_data' structure. If the associated KVM XIVE interrupt is not valid, that is not allocated at the HW level for some reason, the fault handler will dereference a NULL pointer leading to the oops below : WARNING: CPU: 40 PID: 59101 at arch/powerpc/kvm/book3s_xive_native.c:259 xive_native_esb_fault+0xe4/0x240 [kvm] CPU: 40 PID: 59101 Comm: qemu-system-ppc Kdump: loaded Tainted: G W --------- - - 4.18.0-240.el8.ppc64le #1 NIP: c00800000e949fac LR: c00000000044b164 CTR: c00800000e949ec8 REGS: c000001f69617840 TRAP: 0700 Tainted: G W --------- - - (4.18.0-240.el8.ppc64le) MSR: 9000000000029033 CR: 44044282 XER: 00000000 CFAR: c00000000044b160 IRQMASK: 0 GPR00: c00000000044b164 c000001f69617ac0 c00800000e96e000 c000001f69617c10 GPR04: 05faa2b21e000080 0000000000000000 0000000000000005 ffffffffffffffff GPR08: 0000000000000000 0000000000000001 0000000000000000 0000000000000001 GPR12: c00800000e949ec8 c000001ffffd3400 0000000000000000 0000000000000000 GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR20: 0000000000000000 0000000000000000 c000001f5c065160 c000000001c76f90 GPR24: c000001f06f20000 c000001f5c065100 0000000000000008 c000001f0eb98c78 GPR28: c000001dcab40000 c000001dcab403d8 c000001f69617c10 0000000000000011 NIP [c00800000e949fac] xive_native_esb_fault+0xe4/0x240 [kvm] LR [c00000000044b164] __do_fault+0x64/0x220 Call Trace: [c000001f69617ac0] [0000000137a5dc20] 0x137a5dc20 (unreliable) [c000001f69617b50] [c00000000044b164] __do_fault+0x64/0x220 [c000001f69617b90] [c000000000453838] do_fault+0x218/0x930 [c000001f69617bf0] [c000000000456f50] __handle_mm_fault+0x350/0xdf0 [c000001f69617cd0] [c000000000457b1c] handle_mm_fault+0x12c/0x310 [c000001f69617d10] [c00000000007ef44] __do_page_fault+0x264/0xbb0 [c000001f69617df0] [c00000000007f8c8] do_page_fault+0x38/0xd0 [c000001f69617e30] [c00000000000a714] handle_page_fault+0x18/0x38 Instruction dump: 40c2fff0 7c2004ac 2fa90000 409e0118 73e90001 41820080 e8bd0008 7c2004ac 7ca90074 39400000 915c0000 7929d182 <0b090000> 2fa50000 419e0080 e89e0018 ---[ end trace 66c6ff034c53f64f ]--- xive-kvm: xive_native_esb_fault: accessing invalid ESB page for source 8 ! Fix that by checking the validity of the KVM XIVE interrupt structure. Fixes: 6520ca64cde7 ("KVM: PPC: Book3S HV: XIVE: Add a mapping for the source ESB pages") Cc: stable@vger.kernel.org # v5.2+ Reported-by: Greg Kurz Signed-off-by: Cédric Le Goater Tested-by: Greg Kurz Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20201105134713.656160-1-clg@kaod.org --- arch/powerpc/kvm/book3s_xive_native.c | 7 +++++++ 1 file changed, 7 insertions(+) (limited to 'arch') diff --git a/arch/powerpc/kvm/book3s_xive_native.c b/arch/powerpc/kvm/book3s_xive_native.c index d0c2db0e07fa..a59a94f02733 100644 --- a/arch/powerpc/kvm/book3s_xive_native.c +++ b/arch/powerpc/kvm/book3s_xive_native.c @@ -251,6 +251,13 @@ static vm_fault_t xive_native_esb_fault(struct vm_fault *vmf) } state = &sb->irq_state[src]; + + /* Some sanity checking */ + if (!state->valid) { + pr_devel("%s: source %lx invalid !\n", __func__, irq); + return VM_FAULT_SIGBUS; + } + kvmppc_xive_select_irq(state, &hw_num, &xd); arch_spin_lock(&sb->lock); -- cgit v1.2.3 From cef397038167ac15d085914493d6c86385773709 Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Wed, 11 Nov 2020 17:52:58 +0100 Subject: arch: pgtable: define MAX_POSSIBLE_PHYSMEM_BITS where needed Stefan Agner reported a bug when using zsram on 32-bit Arm machines with RAM above the 4GB address boundary: Unable to handle kernel NULL pointer dereference at virtual address 00000000 pgd = a27bd01c [00000000] *pgd=236a0003, *pmd=1ffa64003 Internal error: Oops: 207 [#1] SMP ARM Modules linked in: mdio_bcm_unimac(+) brcmfmac cfg80211 brcmutil raspberrypi_hwmon hci_uart crc32_arm_ce bcm2711_thermal phy_generic genet CPU: 0 PID: 123 Comm: mkfs.ext4 Not tainted 5.9.6 #1 Hardware name: BCM2711 PC is at zs_map_object+0x94/0x338 LR is at zram_bvec_rw.constprop.0+0x330/0xa64 pc : [] lr : [] psr: 60000013 sp : e376bbe0 ip : 00000000 fp : c1e2921c r10: 00000002 r9 : c1dda730 r8 : 00000000 r7 : e8ff7a00 r6 : 00000000 r5 : 02f9ffa0 r4 : e3710000 r3 : 000fdffe r2 : c1e0ce80 r1 : ebf979a0 r0 : 00000000 Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user Control: 30c5383d Table: 235c2a80 DAC: fffffffd Process mkfs.ext4 (pid: 123, stack limit = 0x495a22e6) Stack: (0xe376bbe0 to 0xe376c000) As it turns out, zsram needs to know the maximum memory size, which is defined in MAX_PHYSMEM_BITS when CONFIG_SPARSEMEM is set, or in MAX_POSSIBLE_PHYSMEM_BITS on the x86 architecture. The same problem will be hit on all 32-bit architectures that have a physical address space larger than 4GB and happen to not enable sparsemem and include asm/sparsemem.h from asm/pgtable.h. After the initial discussion, I suggested just always defining MAX_POSSIBLE_PHYSMEM_BITS whenever CONFIG_PHYS_ADDR_T_64BIT is set, or provoking a build error otherwise. This addresses all configurations that can currently have this runtime bug, but leaves all other configurations unchanged. I looked up the possible number of bits in source code and datasheets, here is what I found: - on ARC, CONFIG_ARC_HAS_PAE40 controls whether 32 or 40 bits are used - on ARM, CONFIG_LPAE enables 40 bit addressing, without it we never support more than 32 bits, even though supersections in theory allow up to 40 bits as well. - on MIPS, some MIPS32r1 or later chips support 36 bits, and MIPS32r5 XPA supports up to 60 bits in theory, but 40 bits are more than anyone will ever ship - On PowerPC, there are three different implementations of 36 bit addressing, but 32-bit is used without CONFIG_PTE_64BIT - On RISC-V, the normal page table format can support 34 bit addressing. There is no highmem support on RISC-V, so anything above 2GB is unused, but it might be useful to eventually support CONFIG_ZRAM for high pages. Fixes: 61989a80fb3a ("staging: zsmalloc: zsmalloc memory allocation library") Fixes: 02390b87a945 ("mm/zsmalloc: Prepare to variable MAX_PHYSMEM_BITS") Acked-by: Thomas Bogendoerfer Reviewed-by: Stefan Agner Tested-by: Stefan Agner Acked-by: Mike Rapoport Link: https://lore.kernel.org/linux-mm/bdfa44bf1c570b05d6c70898e2bbb0acf234ecdf.1604762181.git.stefan@agner.ch/ Signed-off-by: Arnd Bergmann --- arch/arc/include/asm/pgtable.h | 2 ++ arch/arm/include/asm/pgtable-2level.h | 2 ++ arch/arm/include/asm/pgtable-3level.h | 2 ++ arch/mips/include/asm/pgtable-32.h | 3 +++ arch/powerpc/include/asm/book3s/32/pgtable.h | 2 ++ arch/powerpc/include/asm/nohash/32/pgtable.h | 2 ++ arch/riscv/include/asm/pgtable-32.h | 2 ++ include/linux/pgtable.h | 13 +++++++++++++ 8 files changed, 28 insertions(+) (limited to 'arch') diff --git a/arch/arc/include/asm/pgtable.h b/arch/arc/include/asm/pgtable.h index f1ed17edb085..163641726a2b 100644 --- a/arch/arc/include/asm/pgtable.h +++ b/arch/arc/include/asm/pgtable.h @@ -134,8 +134,10 @@ #ifdef CONFIG_ARC_HAS_PAE40 #define PTE_BITS_NON_RWX_IN_PD1 (0xff00000000 | PAGE_MASK | _PAGE_CACHEABLE) +#define MAX_POSSIBLE_PHYSMEM_BITS 40 #else #define PTE_BITS_NON_RWX_IN_PD1 (PAGE_MASK | _PAGE_CACHEABLE) +#define MAX_POSSIBLE_PHYSMEM_BITS 32 #endif /************************************************************************** diff --git a/arch/arm/include/asm/pgtable-2level.h b/arch/arm/include/asm/pgtable-2level.h index 3502c2f746ca..baf7d0204eb5 100644 --- a/arch/arm/include/asm/pgtable-2level.h +++ b/arch/arm/include/asm/pgtable-2level.h @@ -75,6 +75,8 @@ #define PTE_HWTABLE_OFF (PTE_HWTABLE_PTRS * sizeof(pte_t)) #define PTE_HWTABLE_SIZE (PTRS_PER_PTE * sizeof(u32)) +#define MAX_POSSIBLE_PHYSMEM_BITS 32 + /* * PMD_SHIFT determines the size of the area a second-level page table can map * PGDIR_SHIFT determines what a third-level page table entry can map diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h index fbb6693c3352..2b85d175e999 100644 --- a/arch/arm/include/asm/pgtable-3level.h +++ b/arch/arm/include/asm/pgtable-3level.h @@ -25,6 +25,8 @@ #define PTE_HWTABLE_OFF (0) #define PTE_HWTABLE_SIZE (PTRS_PER_PTE * sizeof(u64)) +#define MAX_POSSIBLE_PHYSMEM_BITS 40 + /* * PGDIR_SHIFT determines the size a top-level page table entry can map. */ diff --git a/arch/mips/include/asm/pgtable-32.h b/arch/mips/include/asm/pgtable-32.h index a950fc1ddb4d..6c0532d7b211 100644 --- a/arch/mips/include/asm/pgtable-32.h +++ b/arch/mips/include/asm/pgtable-32.h @@ -154,6 +154,7 @@ static inline void pmd_clear(pmd_t *pmdp) #if defined(CONFIG_XPA) +#define MAX_POSSIBLE_PHYSMEM_BITS 40 #define pte_pfn(x) (((unsigned long)((x).pte_high >> _PFN_SHIFT)) | (unsigned long)((x).pte_low << _PAGE_PRESENT_SHIFT)) static inline pte_t pfn_pte(unsigned long pfn, pgprot_t prot) @@ -169,6 +170,7 @@ pfn_pte(unsigned long pfn, pgprot_t prot) #elif defined(CONFIG_PHYS_ADDR_T_64BIT) && defined(CONFIG_CPU_MIPS32) +#define MAX_POSSIBLE_PHYSMEM_BITS 36 #define pte_pfn(x) ((unsigned long)((x).pte_high >> 6)) static inline pte_t pfn_pte(unsigned long pfn, pgprot_t prot) @@ -183,6 +185,7 @@ static inline pte_t pfn_pte(unsigned long pfn, pgprot_t prot) #else +#define MAX_POSSIBLE_PHYSMEM_BITS 32 #ifdef CONFIG_CPU_VR41XX #define pte_pfn(x) ((unsigned long)((x).pte >> (PAGE_SHIFT + 2))) #define pfn_pte(pfn, prot) __pte(((pfn) << (PAGE_SHIFT + 2)) | pgprot_val(prot)) diff --git a/arch/powerpc/include/asm/book3s/32/pgtable.h b/arch/powerpc/include/asm/book3s/32/pgtable.h index 36443cda8dcf..1376be95e975 100644 --- a/arch/powerpc/include/asm/book3s/32/pgtable.h +++ b/arch/powerpc/include/asm/book3s/32/pgtable.h @@ -36,8 +36,10 @@ static inline bool pte_user(pte_t pte) */ #ifdef CONFIG_PTE_64BIT #define PTE_RPN_MASK (~((1ULL << PTE_RPN_SHIFT) - 1)) +#define MAX_POSSIBLE_PHYSMEM_BITS 36 #else #define PTE_RPN_MASK (~((1UL << PTE_RPN_SHIFT) - 1)) +#define MAX_POSSIBLE_PHYSMEM_BITS 32 #endif /* diff --git a/arch/powerpc/include/asm/nohash/32/pgtable.h b/arch/powerpc/include/asm/nohash/32/pgtable.h index ee2243ba96cf..96522f7f0618 100644 --- a/arch/powerpc/include/asm/nohash/32/pgtable.h +++ b/arch/powerpc/include/asm/nohash/32/pgtable.h @@ -153,8 +153,10 @@ int map_kernel_page(unsigned long va, phys_addr_t pa, pgprot_t prot); */ #if defined(CONFIG_PPC32) && defined(CONFIG_PTE_64BIT) #define PTE_RPN_MASK (~((1ULL << PTE_RPN_SHIFT) - 1)) +#define MAX_POSSIBLE_PHYSMEM_BITS 36 #else #define PTE_RPN_MASK (~((1UL << PTE_RPN_SHIFT) - 1)) +#define MAX_POSSIBLE_PHYSMEM_BITS 32 #endif /* diff --git a/arch/riscv/include/asm/pgtable-32.h b/arch/riscv/include/asm/pgtable-32.h index b0ab66e5fdb1..5b2e79e5bfa5 100644 --- a/arch/riscv/include/asm/pgtable-32.h +++ b/arch/riscv/include/asm/pgtable-32.h @@ -14,4 +14,6 @@ #define PGDIR_SIZE (_AC(1, UL) << PGDIR_SHIFT) #define PGDIR_MASK (~(PGDIR_SIZE - 1)) +#define MAX_POSSIBLE_PHYSMEM_BITS 34 + #endif /* _ASM_RISCV_PGTABLE_32_H */ diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 71125a4676c4..e237004d498d 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -1427,6 +1427,19 @@ typedef unsigned int pgtbl_mod_mask; #endif /* !__ASSEMBLY__ */ +#if !defined(MAX_POSSIBLE_PHYSMEM_BITS) && !defined(CONFIG_64BIT) +#ifdef CONFIG_PHYS_ADDR_T_64BIT +/* + * ZSMALLOC needs to know the highest PFN on 32-bit architectures + * with physical address space extension, but falls back to + * BITS_PER_LONG otherwise. + */ +#error Missing MAX_POSSIBLE_PHYSMEM_BITS definition +#else +#define MAX_POSSIBLE_PHYSMEM_BITS 32 +#endif +#endif + #ifndef has_transparent_hugepage #ifdef CONFIG_TRANSPARENT_HUGEPAGE #define has_transparent_hugepage() 1 -- cgit v1.2.3 From 854c57f02bc718b0653bc467073b4541b8155a36 Mon Sep 17 00:00:00 2001 From: Ashish Kalra Date: Tue, 10 Nov 2020 22:42:05 +0000 Subject: KVM: SVM: Fix offset computation bug in __sev_dbg_decrypt(). Fix offset computation in __sev_dbg_decrypt() to include the source paddr before it is rounded down to be aligned to 16 bytes as required by SEV API. This fixes incorrect guest memory dumps observed when using qemu monitor. Signed-off-by: Ashish Kalra Message-Id: <20201110224205.29444-1-Ashish.Kalra@amd.com> Signed-off-by: Paolo Bonzini --- arch/x86/kvm/svm/sev.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c index c0b14106258a..566f4d18185b 100644 --- a/arch/x86/kvm/svm/sev.c +++ b/arch/x86/kvm/svm/sev.c @@ -642,8 +642,8 @@ static int __sev_dbg_decrypt(struct kvm *kvm, unsigned long src_paddr, * Its safe to read more than we are asked, caller should ensure that * destination has enough space. */ - src_paddr = round_down(src_paddr, 16); offset = src_paddr & 15; + src_paddr = round_down(src_paddr, 16); sz = round_up(sz + offset, 16); return __sev_issue_dbg_cmd(kvm, src_paddr, dst_paddr, sz, err, false); -- cgit v1.2.3 From 054409ab253d9f31bec5760105144166b4b71e22 Mon Sep 17 00:00:00 2001 From: Chen Zhou Date: Tue, 17 Nov 2020 10:54:26 +0800 Subject: KVM: SVM: fix error return code in svm_create_vcpu() Fix to return a negative error code from the error handling case instead of 0 in function svm_create_vcpu(), as done elsewhere in this function. Fixes: f4c847a95654 ("KVM: SVM: refactor msr permission bitmap allocation") Reported-by: Hulk Robot Signed-off-by: Chen Zhou Message-Id: <20201117025426.167824-1-chenzhou10@huawei.com> Signed-off-by: Paolo Bonzini --- arch/x86/kvm/svm/svm.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index 1e81cfebd491..79b3a564f1c9 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -1309,8 +1309,10 @@ static int svm_create_vcpu(struct kvm_vcpu *vcpu) svm->avic_is_running = true; svm->msrpm = svm_vcpu_alloc_msrpm(); - if (!svm->msrpm) + if (!svm->msrpm) { + err = -ENOMEM; goto error_free_vmcb_page; + } svm_vcpu_init_msrpm(vcpu, svm->msrpm); -- cgit v1.2.3 From 61a2f1aecf6052f7bcf900829ca2b9d74437ec07 Mon Sep 17 00:00:00 2001 From: Thomas Bogendoerfer Date: Mon, 16 Nov 2020 18:45:15 +0100 Subject: MIPS: kernel: Fix for_each_memblock conversion The loop over all memblocks works with PFNs and not physical addresses, so we need for_each_mem_pfn_range(). Fixes: b10d6bca8720 ("arch, drivers: replace for_each_membock() with for_each_mem_range()") Signed-off-by: Thomas Bogendoerfer Reviewed-by: Mike Rapoport Reviewed-by: Serge Semin --- arch/mips/kernel/setup.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'arch') diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c index 0d4253208bde..ca579deef939 100644 --- a/arch/mips/kernel/setup.c +++ b/arch/mips/kernel/setup.c @@ -262,8 +262,8 @@ static void __init bootmem_init(void) static void __init bootmem_init(void) { phys_addr_t ramstart, ramend; - phys_addr_t start, end; - u64 i; + unsigned long start, end; + int i; ramstart = memblock_start_of_DRAM(); ramend = memblock_end_of_DRAM(); @@ -300,7 +300,7 @@ static void __init bootmem_init(void) min_low_pfn = ARCH_PFN_OFFSET; max_pfn = PFN_DOWN(ramend); - for_each_mem_range(i, &start, &end) { + for_each_mem_pfn_range(i, MAX_NUMNODES, &start, &end, NULL) { /* * Skip highmem here so we get an accurate max_low_pfn if low * memory stops short of high memory. -- cgit v1.2.3 From 1a371e67dc77125736cc56d3a0893f06b75855b6 Mon Sep 17 00:00:00 2001 From: Chen Yu Date: Fri, 13 Nov 2020 09:59:23 +0800 Subject: x86/microcode/intel: Check patch signature before saving microcode for early loading Currently, scan_microcode() leverages microcode_matches() to check if the microcode matches the CPU by comparing the family and model. However, the processor stepping and flags of the microcode signature should also be considered when saving a microcode patch for early update. Use find_matching_signature() in scan_microcode() and get rid of the now-unused microcode_matches() which is a good cleanup in itself. Complete the verification of the patch being saved for early loading in save_microcode_patch() directly. This needs to be done there too because save_mc_for_early() will call save_microcode_patch() too. The second reason why this needs to be done is because the loader still tries to support, at least hypothetically, mixed-steppings systems and thus adds all patches to the cache that belong to the same CPU model albeit with different steppings. For example: microcode: CPU: sig=0x906ec, pf=0x2, rev=0xd6 microcode: mc_saved[0]: sig=0x906e9, pf=0x2a, rev=0xd6, total size=0x19400, date = 2020-04-23 microcode: mc_saved[1]: sig=0x906ea, pf=0x22, rev=0xd6, total size=0x19000, date = 2020-04-27 microcode: mc_saved[2]: sig=0x906eb, pf=0x2, rev=0xd6, total size=0x19400, date = 2020-04-23 microcode: mc_saved[3]: sig=0x906ec, pf=0x22, rev=0xd6, total size=0x19000, date = 2020-04-27 microcode: mc_saved[4]: sig=0x906ed, pf=0x22, rev=0xd6, total size=0x19400, date = 2020-04-23 The patch which is being saved for early loading, however, can only be the one which fits the CPU this runs on so do the signature verification before saving. [ bp: Do signature verification in save_microcode_patch() and rewrite commit message. ] Fixes: ec400ddeff20 ("x86/microcode_intel_early.c: Early update ucode on Intel's CPU") Signed-off-by: Chen Yu Signed-off-by: Borislav Petkov Cc: stable@vger.kernel.org Link: https://bugzilla.kernel.org/show_bug.cgi?id=208535 Link: https://lkml.kernel.org/r/20201113015923.13960-1-yu.c.chen@intel.com --- arch/x86/kernel/cpu/microcode/intel.c | 63 ++++++----------------------------- 1 file changed, 10 insertions(+), 53 deletions(-) (limited to 'arch') diff --git a/arch/x86/kernel/cpu/microcode/intel.c b/arch/x86/kernel/cpu/microcode/intel.c index 6a99535d7f37..7e8e07bddd5f 100644 --- a/arch/x86/kernel/cpu/microcode/intel.c +++ b/arch/x86/kernel/cpu/microcode/intel.c @@ -100,53 +100,6 @@ static int has_newer_microcode(void *mc, unsigned int csig, int cpf, int new_rev return find_matching_signature(mc, csig, cpf); } -/* - * Given CPU signature and a microcode patch, this function finds if the - * microcode patch has matching family and model with the CPU. - * - * %true - if there's a match - * %false - otherwise - */ -static bool microcode_matches(struct microcode_header_intel *mc_header, - unsigned long sig) -{ - unsigned long total_size = get_totalsize(mc_header); - unsigned long data_size = get_datasize(mc_header); - struct extended_sigtable *ext_header; - unsigned int fam_ucode, model_ucode; - struct extended_signature *ext_sig; - unsigned int fam, model; - int ext_sigcount, i; - - fam = x86_family(sig); - model = x86_model(sig); - - fam_ucode = x86_family(mc_header->sig); - model_ucode = x86_model(mc_header->sig); - - if (fam == fam_ucode && model == model_ucode) - return true; - - /* Look for ext. headers: */ - if (total_size <= data_size + MC_HEADER_SIZE) - return false; - - ext_header = (void *) mc_header + data_size + MC_HEADER_SIZE; - ext_sig = (void *)ext_header + EXT_HEADER_SIZE; - ext_sigcount = ext_header->count; - - for (i = 0; i < ext_sigcount; i++) { - fam_ucode = x86_family(ext_sig->sig); - model_ucode = x86_model(ext_sig->sig); - - if (fam == fam_ucode && model == model_ucode) - return true; - - ext_sig++; - } - return false; -} - static struct ucode_patch *memdup_patch(void *data, unsigned int size) { struct ucode_patch *p; @@ -164,7 +117,7 @@ static struct ucode_patch *memdup_patch(void *data, unsigned int size) return p; } -static void save_microcode_patch(void *data, unsigned int size) +static void save_microcode_patch(struct ucode_cpu_info *uci, void *data, unsigned int size) { struct microcode_header_intel *mc_hdr, *mc_saved_hdr; struct ucode_patch *iter, *tmp, *p = NULL; @@ -210,6 +163,9 @@ static void save_microcode_patch(void *data, unsigned int size) if (!p) return; + if (!find_matching_signature(p->data, uci->cpu_sig.sig, uci->cpu_sig.pf)) + return; + /* * Save for early loading. On 32-bit, that needs to be a physical * address as the APs are running from physical addresses, before @@ -344,13 +300,14 @@ scan_microcode(void *data, size_t size, struct ucode_cpu_info *uci, bool save) size -= mc_size; - if (!microcode_matches(mc_header, uci->cpu_sig.sig)) { + if (!find_matching_signature(data, uci->cpu_sig.sig, + uci->cpu_sig.pf)) { data += mc_size; continue; } if (save) { - save_microcode_patch(data, mc_size); + save_microcode_patch(uci, data, mc_size); goto next; } @@ -483,14 +440,14 @@ static void show_saved_mc(void) * Save this microcode patch. It will be loaded early when a CPU is * hot-added or resumes. */ -static void save_mc_for_early(u8 *mc, unsigned int size) +static void save_mc_for_early(struct ucode_cpu_info *uci, u8 *mc, unsigned int size) { /* Synchronization during CPU hotplug. */ static DEFINE_MUTEX(x86_cpu_microcode_mutex); mutex_lock(&x86_cpu_microcode_mutex); - save_microcode_patch(mc, size); + save_microcode_patch(uci, mc, size); show_saved_mc(); mutex_unlock(&x86_cpu_microcode_mutex); @@ -935,7 +892,7 @@ static enum ucode_state generic_load_microcode(int cpu, struct iov_iter *iter) * permanent memory. So it will be loaded early when a CPU is hot added * or resumes. */ - save_mc_for_early(new_mc, new_mc_size); + save_mc_for_early(uci, new_mc, new_mc_size); pr_debug("CPU%d found a matching microcode update with version 0x%x (current=0x%x)\n", cpu, new_rev, uci->cpu_sig.rev); -- cgit v1.2.3 From e02152ba2810f7c88cb54e71cda096268dfa9241 Mon Sep 17 00:00:00 2001 From: Michael Ellerman Date: Mon, 16 Nov 2020 23:09:13 +1100 Subject: powerpc: Drop -me200 addition to build flags MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Currently a build with CONFIG_E200=y will fail with: Error: invalid switch -me200 Error: unrecognized option -me200 Upstream binutils has never supported an -me200 option. Presumably it was supported at some point by either a fork or Freescale internal binutils. We can't support code that we can't even build test, so drop the addition of -me200 to the build flags, so we can at least build with CONFIG_E200=y. Reported-by: Németh Márton Reported-by: kernel test robot Signed-off-by: Michael Ellerman Reviewed-by: Nick Desaulniers Acked-by: Scott Wood Link: https://lore.kernel.org/r/20201116120913.165317-1-mpe@ellerman.id.au --- arch/powerpc/Makefile | 1 - 1 file changed, 1 deletion(-) (limited to 'arch') diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile index a4d56f0a41d9..16b8336f91dd 100644 --- a/arch/powerpc/Makefile +++ b/arch/powerpc/Makefile @@ -248,7 +248,6 @@ KBUILD_CFLAGS += $(call cc-option,-mno-string) cpu-as-$(CONFIG_40x) += -Wa,-m405 cpu-as-$(CONFIG_44x) += -Wa,-m440 cpu-as-$(CONFIG_ALTIVEC) += $(call as-option,-Wa$(comma)-maltivec) -cpu-as-$(CONFIG_E200) += -Wa,-me200 cpu-as-$(CONFIG_E500) += -Wa,-me500 # When using '-many -mpower4' gas will first try and find a matching power4 -- cgit v1.2.3 From ac3b57adf87ad9bac7e33ca26bbbb13fae1ed62b Mon Sep 17 00:00:00 2001 From: Zhang Qilong Date: Fri, 13 Nov 2020 21:18:56 +0800 Subject: MIPS: Alchemy: Fix memleak in alchemy_clk_setup_cpu If the clk_register fails, we should free h before function returns to prevent memleak. Fixes: 474402291a0ad ("MIPS: Alchemy: clock framework integration of onchip clocks") Reported-by: Hulk Robot Signed-off-by: Zhang Qilong Signed-off-by: Thomas Bogendoerfer --- arch/mips/alchemy/common/clock.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/mips/alchemy/common/clock.c b/arch/mips/alchemy/common/clock.c index a95a894aceaf..f0c830337104 100644 --- a/arch/mips/alchemy/common/clock.c +++ b/arch/mips/alchemy/common/clock.c @@ -152,6 +152,7 @@ static struct clk __init *alchemy_clk_setup_cpu(const char *parent_name, { struct clk_init_data id; struct clk_hw *h; + struct clk *clk; h = kzalloc(sizeof(*h), GFP_KERNEL); if (!h) @@ -164,7 +165,13 @@ static struct clk __init *alchemy_clk_setup_cpu(const char *parent_name, id.ops = &alchemy_clkops_cpu; h->init = &id; - return clk_register(NULL, h); + clk = clk_register(NULL, h); + if (IS_ERR(clk)) { + pr_err("failed to register clock\n"); + kfree(h); + } + + return clk; } /* AUXPLLs ************************************************************/ -- cgit v1.2.3 From ebd19fc372e3e78bf165f230e7c084e304441c08 Mon Sep 17 00:00:00 2001 From: Sami Tolvanen Date: Fri, 13 Nov 2020 10:31:26 -0800 Subject: perf/x86: fix sysfs type mismatches This change switches rapl to use PMU_FORMAT_ATTR, and fixes two other macros to use device_attribute instead of kobj_attribute to avoid callback type mismatches that trip indirect call checking with Clang's Control-Flow Integrity (CFI). Reported-by: Sedat Dilek Signed-off-by: Sami Tolvanen Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Kees Cook Link: https://lkml.kernel.org/r/20201113183126.1239404-1-samitolvanen@google.com --- arch/x86/events/intel/cstate.c | 6 +++--- arch/x86/events/intel/uncore.c | 4 ++-- arch/x86/events/intel/uncore.h | 12 ++++++------ arch/x86/events/rapl.c | 14 +------------- 4 files changed, 12 insertions(+), 24 deletions(-) (limited to 'arch') diff --git a/arch/x86/events/intel/cstate.c b/arch/x86/events/intel/cstate.c index 442e1ed4acd4..4eb7ee5fed72 100644 --- a/arch/x86/events/intel/cstate.c +++ b/arch/x86/events/intel/cstate.c @@ -107,14 +107,14 @@ MODULE_LICENSE("GPL"); #define DEFINE_CSTATE_FORMAT_ATTR(_var, _name, _format) \ -static ssize_t __cstate_##_var##_show(struct kobject *kobj, \ - struct kobj_attribute *attr, \ +static ssize_t __cstate_##_var##_show(struct device *dev, \ + struct device_attribute *attr, \ char *page) \ { \ BUILD_BUG_ON(sizeof(_format) >= PAGE_SIZE); \ return sprintf(page, _format "\n"); \ } \ -static struct kobj_attribute format_attr_##_var = \ +static struct device_attribute format_attr_##_var = \ __ATTR(_name, 0444, __cstate_##_var##_show, NULL) static ssize_t cstate_get_attr_cpumask(struct device *dev, diff --git a/arch/x86/events/intel/uncore.c b/arch/x86/events/intel/uncore.c index 86d012b3e0b4..80d52cbe2fde 100644 --- a/arch/x86/events/intel/uncore.c +++ b/arch/x86/events/intel/uncore.c @@ -94,8 +94,8 @@ end: return map; } -ssize_t uncore_event_show(struct kobject *kobj, - struct kobj_attribute *attr, char *buf) +ssize_t uncore_event_show(struct device *dev, + struct device_attribute *attr, char *buf) { struct uncore_event_desc *event = container_of(attr, struct uncore_event_desc, attr); diff --git a/arch/x86/events/intel/uncore.h b/arch/x86/events/intel/uncore.h index 83d2a7d490e0..9efea154349d 100644 --- a/arch/x86/events/intel/uncore.h +++ b/arch/x86/events/intel/uncore.h @@ -157,7 +157,7 @@ struct intel_uncore_box { #define UNCORE_BOX_FLAG_CFL8_CBOX_MSR_OFFS 2 struct uncore_event_desc { - struct kobj_attribute attr; + struct device_attribute attr; const char *config; }; @@ -179,8 +179,8 @@ struct pci2phy_map { struct pci2phy_map *__find_pci2phy_map(int segment); int uncore_pcibus_to_physid(struct pci_bus *bus); -ssize_t uncore_event_show(struct kobject *kobj, - struct kobj_attribute *attr, char *buf); +ssize_t uncore_event_show(struct device *dev, + struct device_attribute *attr, char *buf); static inline struct intel_uncore_pmu *dev_to_uncore_pmu(struct device *dev) { @@ -201,14 +201,14 @@ extern int __uncore_max_dies; } #define DEFINE_UNCORE_FORMAT_ATTR(_var, _name, _format) \ -static ssize_t __uncore_##_var##_show(struct kobject *kobj, \ - struct kobj_attribute *attr, \ +static ssize_t __uncore_##_var##_show(struct device *dev, \ + struct device_attribute *attr, \ char *page) \ { \ BUILD_BUG_ON(sizeof(_format) >= PAGE_SIZE); \ return sprintf(page, _format "\n"); \ } \ -static struct kobj_attribute format_attr_##_var = \ +static struct device_attribute format_attr_##_var = \ __ATTR(_name, 0444, __uncore_##_var##_show, NULL) static inline bool uncore_pmc_fixed(int idx) diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c index 7c0120e2e957..7dbbeaacd995 100644 --- a/arch/x86/events/rapl.c +++ b/arch/x86/events/rapl.c @@ -93,18 +93,6 @@ static const char *const rapl_domain_names[NR_RAPL_DOMAINS] __initconst = { * any other bit is reserved */ #define RAPL_EVENT_MASK 0xFFULL - -#define DEFINE_RAPL_FORMAT_ATTR(_var, _name, _format) \ -static ssize_t __rapl_##_var##_show(struct kobject *kobj, \ - struct kobj_attribute *attr, \ - char *page) \ -{ \ - BUILD_BUG_ON(sizeof(_format) >= PAGE_SIZE); \ - return sprintf(page, _format "\n"); \ -} \ -static struct kobj_attribute format_attr_##_var = \ - __ATTR(_name, 0444, __rapl_##_var##_show, NULL) - #define RAPL_CNTR_WIDTH 32 #define RAPL_EVENT_ATTR_STR(_name, v, str) \ @@ -441,7 +429,7 @@ static struct attribute_group rapl_pmu_events_group = { .attrs = attrs_empty, }; -DEFINE_RAPL_FORMAT_ATTR(event, event, "config:0-7"); +PMU_FORMAT_ATTR(event, "config:0-7"); static struct attribute *rapl_formats_attr[] = { &format_attr_event.attr, NULL, -- cgit v1.2.3 From dc293f2106903ab9c24e9cea18c276e32c394c33 Mon Sep 17 00:00:00 2001 From: Laurent Pinchart Date: Tue, 1 Sep 2020 00:09:37 +0300 Subject: xtensa: uaccess: Add missing __user to strncpy_from_user() prototype When adding __user annotations in commit 2adf5352a34a, the strncpy_from_user() function declaration for the CONFIG_GENERIC_STRNCPY_FROM_USER case was missed. Fix it. Reported-by: kernel test robot Signed-off-by: Laurent Pinchart Message-Id: <20200831210937.17938-1-laurent.pinchart@ideasonboard.com> Signed-off-by: Max Filippov --- arch/xtensa/include/asm/uaccess.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/xtensa/include/asm/uaccess.h b/arch/xtensa/include/asm/uaccess.h index b9758119feca..5c9fb8005aa8 100644 --- a/arch/xtensa/include/asm/uaccess.h +++ b/arch/xtensa/include/asm/uaccess.h @@ -302,7 +302,7 @@ strncpy_from_user(char *dst, const char __user *src, long count) return -EFAULT; } #else -long strncpy_from_user(char *dst, const char *src, long count); +long strncpy_from_user(char *dst, const char __user *src, long count); #endif /* -- cgit v1.2.3 From 2013a4b684b6eb614ee5c9a3c07b0ae6f5ca96d9 Mon Sep 17 00:00:00 2001 From: Zhen Lei Date: Fri, 16 Oct 2020 17:08:32 +0800 Subject: arm64: dts: broadcom: clear the warnings caused by empty dma-ranges The scripts/dtc/checks.c requires that the node have empty "dma-ranges" property must have the same "#address-cells" and "#size-cells" values as the parent node. Otherwise, the following warnings is reported: arch/arm64/boot/dts/broadcom/stingray/stingray-usb.dtsi:7.3-14: Warning \ (dma_ranges_format): /usb:dma-ranges: empty "dma-ranges" property but \ its #address-cells (1) differs from / (2) arch/arm64/boot/dts/broadcom/stingray/stingray-usb.dtsi:7.3-14: Warning \ (dma_ranges_format): /usb:dma-ranges: empty "dma-ranges" property but \ its #size-cells (1) differs from / (2) Arnd Bergmann figured out why it's necessary: Also note that the #address-cells=<1> means that any device under this bus is assumed to only support 32-bit addressing, and DMA will have to go through a slow swiotlb in the absence of an IOMMU. Suggested-by: Arnd Bergmann Signed-off-by: Zhen Lei Link: https://lore.kernel.org/r/20201016090833.1892-2-thunder.leizhen@huawei.com' Acked-by: Florian Fainelli Signed-off-by: Arnd Bergmann --- .../boot/dts/broadcom/stingray/stingray-usb.dtsi | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) (limited to 'arch') diff --git a/arch/arm64/boot/dts/broadcom/stingray/stingray-usb.dtsi b/arch/arm64/boot/dts/broadcom/stingray/stingray-usb.dtsi index 55259f973b5a..aef8f2b00778 100644 --- a/arch/arm64/boot/dts/broadcom/stingray/stingray-usb.dtsi +++ b/arch/arm64/boot/dts/broadcom/stingray/stingray-usb.dtsi @@ -5,20 +5,20 @@ usb { compatible = "simple-bus"; dma-ranges; - #address-cells = <1>; - #size-cells = <1>; - ranges = <0x0 0x0 0x68500000 0x00400000>; + #address-cells = <2>; + #size-cells = <2>; + ranges = <0x0 0x0 0x0 0x68500000 0x0 0x00400000>; usbphy0: usb-phy@0 { compatible = "brcm,sr-usb-combo-phy"; - reg = <0x00000000 0x100>; + reg = <0x0 0x00000000 0x0 0x100>; #phy-cells = <1>; status = "disabled"; }; xhci0: usb@1000 { compatible = "generic-xhci"; - reg = <0x00001000 0x1000>; + reg = <0x0 0x00001000 0x0 0x1000>; interrupts = ; phys = <&usbphy0 1>, <&usbphy0 0>; phy-names = "phy0", "phy1"; @@ -28,7 +28,7 @@ bdc0: usb@2000 { compatible = "brcm,bdc-v0.16"; - reg = <0x00002000 0x1000>; + reg = <0x0 0x00002000 0x0 0x1000>; interrupts = ; phys = <&usbphy0 0>, <&usbphy0 1>; phy-names = "phy0", "phy1"; @@ -38,21 +38,21 @@ usbphy1: usb-phy@10000 { compatible = "brcm,sr-usb-combo-phy"; - reg = <0x00010000 0x100>; + reg = <0x0 0x00010000 0x0 0x100>; #phy-cells = <1>; status = "disabled"; }; usbphy2: usb-phy@20000 { compatible = "brcm,sr-usb-hs-phy"; - reg = <0x00020000 0x100>; + reg = <0x0 0x00020000 0x0 0x100>; #phy-cells = <0>; status = "disabled"; }; xhci1: usb@11000 { compatible = "generic-xhci"; - reg = <0x00011000 0x1000>; + reg = <0x0 0x00011000 0x0 0x1000>; interrupts = ; phys = <&usbphy1 1>, <&usbphy2>, <&usbphy1 0>; phy-names = "phy0", "phy1", "phy2"; @@ -62,7 +62,7 @@ bdc1: usb@21000 { compatible = "brcm,bdc-v0.16"; - reg = <0x00021000 0x1000>; + reg = <0x0 0x00021000 0x0 0x1000>; interrupts = ; phys = <&usbphy2>; phy-names = "phy0"; -- cgit v1.2.3 From e3389b0a14952aac7f2998bb98f633afb21eaa92 Mon Sep 17 00:00:00 2001 From: Zhen Lei Date: Fri, 16 Oct 2020 17:08:33 +0800 Subject: arm64: dts: qcom: clear the warnings caused by empty dma-ranges The scripts/dtc/checks.c requires that the node have empty "dma-ranges" property must have the same "#address-cells" and "#size-cells" values as the parent node. Otherwise, the following warnings is reported: arch/arm64/boot/dts/qcom/ipq6018.dtsi:185.3-14: Warning \ (dma_ranges_format): /soc:dma-ranges: empty "dma-ranges" property but \ its #address-cells (1) differs from / (2) arch/arm64/boot/dts/qcom/ipq6018.dtsi:185.3-14: Warning \ (dma_ranges_format): /soc:dma-ranges: empty "dma-ranges" property but \ its #size-cells (1) differs from / (2) Arnd Bergmann figured out why it's necessary: Also note that the #address-cells=<1> means that any device under this bus is assumed to only support 32-bit addressing, and DMA will have to go through a slow swiotlb in the absence of an IOMMU. Suggested-by: Arnd Bergmann Signed-off-by: Zhen Lei Reviewed-by: Bjorn Andersson Link: https://lore.kernel.org/r/20201016090833.1892-3-thunder.leizhen@huawei.com' Signed-off-by: Arnd Bergmann --- arch/arm64/boot/dts/qcom/ipq6018.dtsi | 72 +++++++++++++++++------------------ 1 file changed, 36 insertions(+), 36 deletions(-) (limited to 'arch') diff --git a/arch/arm64/boot/dts/qcom/ipq6018.dtsi b/arch/arm64/boot/dts/qcom/ipq6018.dtsi index a94dac76bf3f..59e0cbfa2214 100644 --- a/arch/arm64/boot/dts/qcom/ipq6018.dtsi +++ b/arch/arm64/boot/dts/qcom/ipq6018.dtsi @@ -179,22 +179,22 @@ }; soc: soc { - #address-cells = <1>; - #size-cells = <1>; - ranges = <0 0 0 0xffffffff>; + #address-cells = <2>; + #size-cells = <2>; + ranges = <0 0 0 0 0x0 0xffffffff>; dma-ranges; compatible = "simple-bus"; prng: qrng@e1000 { compatible = "qcom,prng-ee"; - reg = <0xe3000 0x1000>; + reg = <0x0 0xe3000 0x0 0x1000>; clocks = <&gcc GCC_PRNG_AHB_CLK>; clock-names = "core"; }; cryptobam: dma@704000 { compatible = "qcom,bam-v1.7.0"; - reg = <0x00704000 0x20000>; + reg = <0x0 0x00704000 0x0 0x20000>; interrupts = ; clocks = <&gcc GCC_CRYPTO_AHB_CLK>; clock-names = "bam_clk"; @@ -206,7 +206,7 @@ crypto: crypto@73a000 { compatible = "qcom,crypto-v5.1"; - reg = <0x0073a000 0x6000>; + reg = <0x0 0x0073a000 0x0 0x6000>; clocks = <&gcc GCC_CRYPTO_AHB_CLK>, <&gcc GCC_CRYPTO_AXI_CLK>, <&gcc GCC_CRYPTO_CLK>; @@ -217,7 +217,7 @@ tlmm: pinctrl@1000000 { compatible = "qcom,ipq6018-pinctrl"; - reg = <0x01000000 0x300000>; + reg = <0x0 0x01000000 0x0 0x300000>; interrupts = ; gpio-controller; #gpio-cells = <2>; @@ -235,7 +235,7 @@ gcc: gcc@1800000 { compatible = "qcom,gcc-ipq6018"; - reg = <0x01800000 0x80000>; + reg = <0x0 0x01800000 0x0 0x80000>; clocks = <&xo>, <&sleep_clk>; clock-names = "xo", "sleep_clk"; #clock-cells = <1>; @@ -244,17 +244,17 @@ tcsr_mutex_regs: syscon@1905000 { compatible = "syscon"; - reg = <0x01905000 0x8000>; + reg = <0x0 0x01905000 0x0 0x8000>; }; tcsr_q6: syscon@1945000 { compatible = "syscon"; - reg = <0x01945000 0xe000>; + reg = <0x0 0x01945000 0x0 0xe000>; }; blsp_dma: dma@7884000 { compatible = "qcom,bam-v1.7.0"; - reg = <0x07884000 0x2b000>; + reg = <0x0 0x07884000 0x0 0x2b000>; interrupts = ; clocks = <&gcc GCC_BLSP1_AHB_CLK>; clock-names = "bam_clk"; @@ -264,7 +264,7 @@ blsp1_uart3: serial@78b1000 { compatible = "qcom,msm-uartdm-v1.4", "qcom,msm-uartdm"; - reg = <0x078b1000 0x200>; + reg = <0x0 0x078b1000 0x0 0x200>; interrupts = ; clocks = <&gcc GCC_BLSP1_UART3_APPS_CLK>, <&gcc GCC_BLSP1_AHB_CLK>; @@ -276,7 +276,7 @@ compatible = "qcom,spi-qup-v2.2.1"; #address-cells = <1>; #size-cells = <0>; - reg = <0x078b5000 0x600>; + reg = <0x0 0x078b5000 0x0 0x600>; interrupts = ; spi-max-frequency = <50000000>; clocks = <&gcc GCC_BLSP1_QUP1_SPI_APPS_CLK>, @@ -291,7 +291,7 @@ compatible = "qcom,spi-qup-v2.2.1"; #address-cells = <1>; #size-cells = <0>; - reg = <0x078b6000 0x600>; + reg = <0x0 0x078b6000 0x0 0x600>; interrupts = ; spi-max-frequency = <50000000>; clocks = <&gcc GCC_BLSP1_QUP2_SPI_APPS_CLK>, @@ -306,7 +306,7 @@ compatible = "qcom,i2c-qup-v2.2.1"; #address-cells = <1>; #size-cells = <0>; - reg = <0x078b6000 0x600>; + reg = <0x0 0x078b6000 0x0 0x600>; interrupts = ; clocks = <&gcc GCC_BLSP1_AHB_CLK>, <&gcc GCC_BLSP1_QUP2_I2C_APPS_CLK>; @@ -321,7 +321,7 @@ compatible = "qcom,i2c-qup-v2.2.1"; #address-cells = <1>; #size-cells = <0>; - reg = <0x078b7000 0x600>; + reg = <0x0 0x078b7000 0x0 0x600>; interrupts = ; clocks = <&gcc GCC_BLSP1_AHB_CLK>, <&gcc GCC_BLSP1_QUP3_I2C_APPS_CLK>; @@ -336,24 +336,24 @@ compatible = "qcom,msm-qgic2"; interrupt-controller; #interrupt-cells = <0x3>; - reg = <0x0b000000 0x1000>, /*GICD*/ - <0x0b002000 0x1000>, /*GICC*/ - <0x0b001000 0x1000>, /*GICH*/ - <0x0b004000 0x1000>; /*GICV*/ + reg = <0x0 0x0b000000 0x0 0x1000>, /*GICD*/ + <0x0 0x0b002000 0x0 0x1000>, /*GICC*/ + <0x0 0x0b001000 0x0 0x1000>, /*GICH*/ + <0x0 0x0b004000 0x0 0x1000>; /*GICV*/ interrupts = ; }; watchdog@b017000 { compatible = "qcom,kpss-wdt"; interrupts = ; - reg = <0x0b017000 0x40>; + reg = <0x0 0x0b017000 0x0 0x40>; clocks = <&sleep_clk>; timeout-sec = <10>; }; apcs_glb: mailbox@b111000 { compatible = "qcom,ipq6018-apcs-apps-global"; - reg = <0x0b111000 0x1000>; + reg = <0x0 0x0b111000 0x0 0x1000>; #clock-cells = <1>; clocks = <&a53pll>, <&xo>; clock-names = "pll", "xo"; @@ -362,7 +362,7 @@ a53pll: clock@b116000 { compatible = "qcom,ipq6018-a53pll"; - reg = <0x0b116000 0x40>; + reg = <0x0 0x0b116000 0x0 0x40>; #clock-cells = <0>; clocks = <&xo>; clock-names = "xo"; @@ -377,68 +377,68 @@ }; timer@b120000 { - #address-cells = <1>; - #size-cells = <1>; + #address-cells = <2>; + #size-cells = <2>; ranges; compatible = "arm,armv7-timer-mem"; - reg = <0x0b120000 0x1000>; + reg = <0x0 0x0b120000 0x0 0x1000>; clock-frequency = <19200000>; frame@b120000 { frame-number = <0>; interrupts = , ; - reg = <0x0b121000 0x1000>, - <0x0b122000 0x1000>; + reg = <0x0 0x0b121000 0x0 0x1000>, + <0x0 0x0b122000 0x0 0x1000>; }; frame@b123000 { frame-number = <1>; interrupts = ; - reg = <0xb123000 0x1000>; + reg = <0x0 0xb123000 0x0 0x1000>; status = "disabled"; }; frame@b124000 { frame-number = <2>; interrupts = ; - reg = <0x0b124000 0x1000>; + reg = <0x0 0x0b124000 0x0 0x1000>; status = "disabled"; }; frame@b125000 { frame-number = <3>; interrupts = ; - reg = <0x0b125000 0x1000>; + reg = <0x0 0x0b125000 0x0 0x1000>; status = "disabled"; }; frame@b126000 { frame-number = <4>; interrupts = ; - reg = <0x0b126000 0x1000>; + reg = <0x0 0x0b126000 0x0 0x1000>; status = "disabled"; }; frame@b127000 { frame-number = <5>; interrupts = ; - reg = <0x0b127000 0x1000>; + reg = <0x0 0x0b127000 0x0 0x1000>; status = "disabled"; }; frame@b128000 { frame-number = <6>; interrupts = ; - reg = <0x0b128000 0x1000>; + reg = <0x0 0x0b128000 0x0 0x1000>; status = "disabled"; }; }; q6v5_wcss: remoteproc@cd00000 { compatible = "qcom,ipq8074-wcss-pil"; - reg = <0x0cd00000 0x4040>, - <0x004ab000 0x20>; + reg = <0x0 0x0cd00000 0x0 0x4040>, + <0x0 0x004ab000 0x0 0x20>; reg-names = "qdsp6", "rmb"; interrupts-extended = <&intc GIC_SPI 325 IRQ_TYPE_EDGE_RISING>, -- cgit v1.2.3 From 23bde34771f1ea92fb5e6682c0d8c04304d34b3b Mon Sep 17 00:00:00 2001 From: Zenghui Yu Date: Tue, 17 Nov 2020 23:16:29 +0800 Subject: KVM: arm64: vgic-v3: Drop the reporting of GICR_TYPER.Last for userspace It was recently reported that if GICR_TYPER is accessed before the RD base address is set, we'll suffer from the unset @rdreg dereferencing. Oops... gpa_t last_rdist_typer = rdreg->base + GICR_TYPER + (rdreg->free_index - 1) * KVM_VGIC_V3_REDIST_SIZE; It's "expected" that users will access registers in the redistributor if the RD has been properly configured (e.g., the RD base address is set). But it hasn't yet been covered by the existing documentation. Per discussion on the list [1], the reporting of the GICR_TYPER.Last bit for userspace never actually worked. And it's difficult for us to emulate it correctly given that userspace has the flexibility to access it any time. Let's just drop the reporting of the Last bit for userspace for now (userspace should have full knowledge about it anyway) and it at least prevents kernel from panic ;-) [1] https://lore.kernel.org/kvmarm/c20865a267e44d1e2c0d52ce4e012263@kernel.org/ Fixes: ba7b3f1275fd ("KVM: arm/arm64: Revisit Redistributor TYPER last bit computation") Reported-by: Keqian Zhu Signed-off-by: Zenghui Yu Signed-off-by: Marc Zyngier Reviewed-by: Eric Auger Link: https://lore.kernel.org/r/20201117151629.1738-1-yuzenghui@huawei.com Cc: stable@vger.kernel.org --- arch/arm64/kvm/vgic/vgic-mmio-v3.c | 22 ++++++++++++++++++++-- 1 file changed, 20 insertions(+), 2 deletions(-) (limited to 'arch') diff --git a/arch/arm64/kvm/vgic/vgic-mmio-v3.c b/arch/arm64/kvm/vgic/vgic-mmio-v3.c index 52d6f24f65dc..15a6c98ee92f 100644 --- a/arch/arm64/kvm/vgic/vgic-mmio-v3.c +++ b/arch/arm64/kvm/vgic/vgic-mmio-v3.c @@ -273,6 +273,23 @@ static unsigned long vgic_mmio_read_v3r_typer(struct kvm_vcpu *vcpu, return extract_bytes(value, addr & 7, len); } +static unsigned long vgic_uaccess_read_v3r_typer(struct kvm_vcpu *vcpu, + gpa_t addr, unsigned int len) +{ + unsigned long mpidr = kvm_vcpu_get_mpidr_aff(vcpu); + int target_vcpu_id = vcpu->vcpu_id; + u64 value; + + value = (u64)(mpidr & GENMASK(23, 0)) << 32; + value |= ((target_vcpu_id & 0xffff) << 8); + + if (vgic_has_its(vcpu->kvm)) + value |= GICR_TYPER_PLPIS; + + /* reporting of the Last bit is not supported for userspace */ + return extract_bytes(value, addr & 7, len); +} + static unsigned long vgic_mmio_read_v3r_iidr(struct kvm_vcpu *vcpu, gpa_t addr, unsigned int len) { @@ -593,8 +610,9 @@ static const struct vgic_register_region vgic_v3_rd_registers[] = { REGISTER_DESC_WITH_LENGTH(GICR_IIDR, vgic_mmio_read_v3r_iidr, vgic_mmio_write_wi, 4, VGIC_ACCESS_32bit), - REGISTER_DESC_WITH_LENGTH(GICR_TYPER, - vgic_mmio_read_v3r_typer, vgic_mmio_write_wi, 8, + REGISTER_DESC_WITH_LENGTH_UACCESS(GICR_TYPER, + vgic_mmio_read_v3r_typer, vgic_mmio_write_wi, + vgic_uaccess_read_v3r_typer, vgic_mmio_uaccess_write_wi, 8, VGIC_ACCESS_64bit | VGIC_ACCESS_32bit), REGISTER_DESC_WITH_LENGTH(GICR_WAKER, vgic_mmio_read_raz, vgic_mmio_write_wi, 4, -- cgit v1.2.3 From cd81acc600a9684ea4b4d25a47900d38a3890eab Mon Sep 17 00:00:00 2001 From: Nicholas Piggin Date: Tue, 17 Nov 2020 23:56:17 +1000 Subject: powerpc/64s/exception: KVM Fix for host DSI being taken in HPT guest MMU context Commit 2284ffea8f0c ("powerpc/64s/exception: Only test KVM in SRR interrupts when PR KVM is supported") removed KVM guest tests from interrupts that do not set HV=1, when PR-KVM is not configured. This is wrong for HV-KVM HPT guest MMIO emulation case which attempts to load the faulting instruction word with MSR[DR]=1 and MSR[HV]=1 with the guest MMU context loaded. This can cause host DSI, DSLB interrupts which must test for KVM guest. Restore this and add a comment. Fixes: 2284ffea8f0c ("powerpc/64s/exception: Only test KVM in SRR interrupts when PR KVM is supported") Cc: stable@vger.kernel.org # v5.7+ Signed-off-by: Nicholas Piggin Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20201117135617.3521127-1-npiggin@gmail.com --- arch/powerpc/kernel/exceptions-64s.S | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) (limited to 'arch') diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index 07d64883c0b5..8e6b2cc8db67 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -1410,6 +1410,11 @@ END_FTR_SECTION_IFSET(CPU_FTR_HVMODE) * If none is found, do a Linux page fault. Linux page faults can happen in * kernel mode due to user copy operations of course. * + * KVM: The KVM HDSI handler may perform a load with MSR[DR]=1 in guest + * MMU context, which may cause a DSI in the host, which must go to the + * KVM handler. MSR[IR] is not enabled, so the real-mode handler will + * always be used regardless of AIL setting. + * * - Radix MMU * The hardware loads from the Linux page table directly, so a fault goes * immediately to Linux page fault. @@ -1420,10 +1425,8 @@ INT_DEFINE_BEGIN(data_access) IVEC=0x300 IDAR=1 IDSISR=1 -#ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE IKVM_SKIP=1 IKVM_REAL=1 -#endif INT_DEFINE_END(data_access) EXC_REAL_BEGIN(data_access, 0x300, 0x80) @@ -1462,6 +1465,8 @@ ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_TYPE_RADIX) * ppc64_bolted_size (first segment). The kernel handler must avoid stomping * on user-handler data structures. * + * KVM: Same as 0x300, DSLB must test for KVM guest. + * * A dedicated save area EXSLB is used (XXX: but it actually need not be * these days, we could use EXGEN). */ @@ -1470,10 +1475,8 @@ INT_DEFINE_BEGIN(data_access_slb) IAREA=PACA_EXSLB IRECONCILE=0 IDAR=1 -#ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE IKVM_SKIP=1 IKVM_REAL=1 -#endif INT_DEFINE_END(data_access_slb) EXC_REAL_BEGIN(data_access_slb, 0x380, 0x80) -- cgit v1.2.3 From 78aec9bb1f3c79e4570eb50260d6320063f823a2 Mon Sep 17 00:00:00 2001 From: Gustavo Pimentel Date: Wed, 21 Oct 2020 23:12:20 +0200 Subject: ARC: bitops: Remove unecessary operation and value The 1-bit shift rotation to the left on x variable located on 4 last if statement can be removed because the computed value is will not be used afront. Signed-off-by: Gustavo Pimentel Signed-off-by: Vineet Gupta --- arch/arc/include/asm/bitops.h | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) (limited to 'arch') diff --git a/arch/arc/include/asm/bitops.h b/arch/arc/include/asm/bitops.h index c6606f4d20d6..fb98440c0bd4 100644 --- a/arch/arc/include/asm/bitops.h +++ b/arch/arc/include/asm/bitops.h @@ -243,10 +243,8 @@ static inline int constant_fls(unsigned int x) x <<= 2; r -= 2; } - if (!(x & 0x80000000u)) { - x <<= 1; + if (!(x & 0x80000000u)) r -= 1; - } return r; } -- cgit v1.2.3 From 5f840df591a9554e4e1355ef1f8946bc2120ca9f Mon Sep 17 00:00:00 2001 From: Flavio Suligoi Date: Mon, 9 Nov 2020 14:21:30 +0100 Subject: ARC: mm: fix spelling mistakes Signed-off-by: Flavio Suligoi Signed-off-by: Vineet Gupta --- arch/arc/mm/tlb.c | 24 ++++++++++++------------ 1 file changed, 12 insertions(+), 12 deletions(-) (limited to 'arch') diff --git a/arch/arc/mm/tlb.c b/arch/arc/mm/tlb.c index c340acd989a0..9bb3c24f3677 100644 --- a/arch/arc/mm/tlb.c +++ b/arch/arc/mm/tlb.c @@ -30,14 +30,14 @@ * -Changes related to MMU v2 (Rel 4.8) * * Vineetg: Aug 29th 2008 - * -In TLB Flush operations (Metal Fix MMU) there is a explict command to + * -In TLB Flush operations (Metal Fix MMU) there is a explicit command to * flush Micro-TLBS. If TLB Index Reg is invalid prior to TLBIVUTLB cmd, * it fails. Thus need to load it with ANY valid value before invoking * TLBIVUTLB cmd * * Vineetg: Aug 21th 2008: * -Reduced the duration of IRQ lockouts in TLB Flush routines - * -Multiple copies of TLB erase code seperated into a "single" function + * -Multiple copies of TLB erase code separated into a "single" function * -In TLB Flush routines, interrupt disabling moved UP to retrieve ASID * in interrupt-safe region. * @@ -66,7 +66,7 @@ * * Although J-TLB is 2 way set assoc, ARC700 caches J-TLB into uTLBS which has * much higher associativity. u-D-TLB is 8 ways, u-I-TLB is 4 ways. - * Given this, the thrasing problem should never happen because once the 3 + * Given this, the thrashing problem should never happen because once the 3 * J-TLB entries are created (even though 3rd will knock out one of the prev * two), the u-D-TLB and u-I-TLB will have what is required to accomplish memcpy * @@ -127,7 +127,7 @@ static void utlb_invalidate(void) * There was however an obscure hardware bug, where uTLB flush would * fail when a prior probe for J-TLB (both totally unrelated) would * return lkup err - because the entry didn't exist in MMU. - * The Workround was to set Index reg with some valid value, prior to + * The Workaround was to set Index reg with some valid value, prior to * flush. This was fixed in MMU v3 */ unsigned int idx; @@ -272,7 +272,7 @@ noinline void local_flush_tlb_all(void) } /* - * Flush the entrie MM for userland. The fastest way is to move to Next ASID + * Flush the entire MM for userland. The fastest way is to move to Next ASID */ noinline void local_flush_tlb_mm(struct mm_struct *mm) { @@ -303,7 +303,7 @@ noinline void local_flush_tlb_mm(struct mm_struct *mm) * Difference between this and Kernel Range Flush is * -Here the fastest way (if range is too large) is to move to next ASID * without doing any explicit Shootdown - * -In case of kernel Flush, entry has to be shot down explictly + * -In case of kernel Flush, entry has to be shot down explicitly */ void local_flush_tlb_range(struct vm_area_struct *vma, unsigned long start, unsigned long end) @@ -620,7 +620,7 @@ void update_mmu_cache(struct vm_area_struct *vma, unsigned long vaddr_unaligned, * Super Page size is configurable in hardware (4K to 16M), but fixed once * RTL builds. * - * The exact THP size a Linx configuration will support is a function of: + * The exact THP size a Linux configuration will support is a function of: * - MMU page size (typical 8K, RTL fixed) * - software page walker address split between PGD:PTE:PFN (typical * 11:8:13, but can be changed with 1 line) @@ -698,7 +698,7 @@ void local_flush_pmd_tlb_range(struct vm_area_struct *vma, unsigned long start, #endif -/* Read the Cache Build Confuration Registers, Decode them and save into +/* Read the Cache Build Configuration Registers, Decode them and save into * the cpuinfo structure for later use. * No Validation is done here, simply read/convert the BCRs */ @@ -803,13 +803,13 @@ void arc_mmu_init(void) pr_info("%s", arc_mmu_mumbojumbo(0, str, sizeof(str))); /* - * Can't be done in processor.h due to header include depenedencies + * Can't be done in processor.h due to header include dependencies */ BUILD_BUG_ON(!IS_ALIGNED((CONFIG_ARC_KVADDR_SIZE << 20), PMD_SIZE)); /* * stack top size sanity check, - * Can't be done in processor.h due to header include depenedencies + * Can't be done in processor.h due to header include dependencies */ BUILD_BUG_ON(!IS_ALIGNED(STACK_TOP, PMD_SIZE)); @@ -881,7 +881,7 @@ void arc_mmu_init(void) * the duplicate one. * -Knob to be verbose abt it.(TODO: hook them up to debugfs) */ -volatile int dup_pd_silent; /* Be slient abt it or complain (default) */ +volatile int dup_pd_silent; /* Be silent abt it or complain (default) */ void do_tlb_overlap_fault(unsigned long cause, unsigned long address, struct pt_regs *regs) @@ -948,7 +948,7 @@ void do_tlb_overlap_fault(unsigned long cause, unsigned long address, /*********************************************************************** * Diagnostic Routines - * -Called from Low Level TLB Hanlders if things don;t look good + * -Called from Low Level TLB Handlers if things don;t look good **********************************************************************/ #ifdef CONFIG_ARC_DBG_TLB_PARANOIA -- cgit v1.2.3 From e42404fa10fd11fe72d0a0e149a321d10e577715 Mon Sep 17 00:00:00 2001 From: Vineet Gupta Date: Fri, 6 Nov 2020 16:59:27 -0800 Subject: ARC: stack unwinding: don't assume non-current task is sleeping To start stack unwinding (SP, PC and BLINK) are needed. When the explicit execution context (pt_regs etc) is not available, unwinder assumes the task is sleeping (in __switch_to()) and fetches SP and BLINK from kernel mode stack. But this assumption is not true, specially in a SMP system, when top runs on 1 core, there may be active running processes on all cores. So when unwinding non courrent tasks, ensure they are NOT running. And while at it, handle the self unwinding case explicitly. This came out of investigation of a customer reported hang with rcutorture+top Link: https://github.com/foss-for-synopsys-dwc-arc-processors/linux/issues/31 Signed-off-by: Vineet Gupta --- arch/arc/kernel/stacktrace.c | 23 +++++++++++++++-------- 1 file changed, 15 insertions(+), 8 deletions(-) (limited to 'arch') diff --git a/arch/arc/kernel/stacktrace.c b/arch/arc/kernel/stacktrace.c index b23986f98450..b2557f581ea8 100644 --- a/arch/arc/kernel/stacktrace.c +++ b/arch/arc/kernel/stacktrace.c @@ -38,15 +38,15 @@ #ifdef CONFIG_ARC_DW2_UNWIND -static void seed_unwind_frame_info(struct task_struct *tsk, - struct pt_regs *regs, - struct unwind_frame_info *frame_info) +static int +seed_unwind_frame_info(struct task_struct *tsk, struct pt_regs *regs, + struct unwind_frame_info *frame_info) { /* * synchronous unwinding (e.g. dump_stack) * - uses current values of SP and friends */ - if (tsk == NULL && regs == NULL) { + if (regs == NULL && (tsk == NULL || tsk == current)) { unsigned long fp, sp, blink, ret; frame_info->task = current; @@ -65,11 +65,15 @@ static void seed_unwind_frame_info(struct task_struct *tsk, frame_info->call_frame = 0; } else if (regs == NULL) { /* - * Asynchronous unwinding of sleeping task - * - Gets SP etc from task's pt_regs (saved bottom of kernel - * mode stack of task) + * Asynchronous unwinding of a likely sleeping task + * - first ensure it is actually sleeping + * - if so, it will be in __switch_to, kernel mode SP of task + * is safe-kept and BLINK at a well known location in there */ + if (tsk->state == TASK_RUNNING) + return -1; + frame_info->task = tsk; frame_info->regs.r27 = TSK_K_FP(tsk); @@ -103,6 +107,8 @@ static void seed_unwind_frame_info(struct task_struct *tsk, frame_info->regs.r63 = regs->ret; frame_info->call_frame = 0; } + + return 0; } #endif @@ -116,7 +122,8 @@ arc_unwind_core(struct task_struct *tsk, struct pt_regs *regs, unsigned int address; struct unwind_frame_info frame_info; - seed_unwind_frame_info(tsk, regs, &frame_info); + if (seed_unwind_frame_info(tsk, regs, &frame_info)) + return 0; while (1) { address = UNW_PC(&frame_info); -- cgit v1.2.3 From f737561c709667013d832316dd3198a7fe3d1260 Mon Sep 17 00:00:00 2001 From: Vineet Gupta Date: Fri, 6 Nov 2020 17:37:34 -0800 Subject: ARC: stack unwinding: reorganize how initial register state setup This is a non-functional change, if anything a better fall-back handling. Signed-off-by: Vineet Gupta --- arch/arc/kernel/stacktrace.c | 37 ++++++++++++++++++------------------- 1 file changed, 18 insertions(+), 19 deletions(-) (limited to 'arch') diff --git a/arch/arc/kernel/stacktrace.c b/arch/arc/kernel/stacktrace.c index b2557f581ea8..f73da203b170 100644 --- a/arch/arc/kernel/stacktrace.c +++ b/arch/arc/kernel/stacktrace.c @@ -42,11 +42,23 @@ static int seed_unwind_frame_info(struct task_struct *tsk, struct pt_regs *regs, struct unwind_frame_info *frame_info) { - /* - * synchronous unwinding (e.g. dump_stack) - * - uses current values of SP and friends - */ - if (regs == NULL && (tsk == NULL || tsk == current)) { + if (regs) { + /* + * Asynchronous unwinding of intr/exception + * - Just uses the pt_regs passed + */ + frame_info->task = tsk; + + frame_info->regs.r27 = regs->fp; + frame_info->regs.r28 = regs->sp; + frame_info->regs.r31 = regs->blink; + frame_info->regs.r63 = regs->ret; + frame_info->call_frame = 0; + } else if (tsk == NULL || tsk == current) { + /* + * synchronous unwinding (e.g. dump_stack) + * - uses current values of SP and friends + */ unsigned long fp, sp, blink, ret; frame_info->task = current; @@ -63,7 +75,7 @@ seed_unwind_frame_info(struct task_struct *tsk, struct pt_regs *regs, frame_info->regs.r31 = blink; frame_info->regs.r63 = ret; frame_info->call_frame = 0; - } else if (regs == NULL) { + } else { /* * Asynchronous unwinding of a likely sleeping task * - first ensure it is actually sleeping @@ -94,20 +106,7 @@ seed_unwind_frame_info(struct task_struct *tsk, struct pt_regs *regs, frame_info->regs.r28 += 60; frame_info->call_frame = 0; - } else { - /* - * Asynchronous unwinding of intr/exception - * - Just uses the pt_regs passed - */ - frame_info->task = tsk; - - frame_info->regs.r27 = regs->fp; - frame_info->regs.r28 = regs->sp; - frame_info->regs.r31 = regs->blink; - frame_info->regs.r63 = regs->ret; - frame_info->call_frame = 0; } - return 0; } -- cgit v1.2.3 From 860aaabac8235cfde10fe556aa82abbbe3117888 Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Tue, 17 Nov 2020 21:23:34 +0100 Subject: x86/dumpstack: Do not try to access user space code of other tasks sysrq-t ends up invoking show_opcodes() for each task which tries to access the user space code of other processes, which is obviously bogus. It either manages to dump where the foreign task's regs->ip points to in a valid mapping of the current task or triggers a pagefault and prints "Code: Bad RIP value.". Both is just wrong. Add a safeguard in copy_code() and check whether the @regs pointer matches currents pt_regs. If not, do not even try to access it. While at it, add commentary why using copy_from_user_nmi() is safe in copy_code() even if the function name suggests otherwise. Reported-by: Oleg Nesterov Signed-off-by: Thomas Gleixner Signed-off-by: Borislav Petkov Reviewed-by: Borislav Petkov Acked-by: Oleg Nesterov Tested-by: Borislav Petkov Link: https://lkml.kernel.org/r/20201117202753.667274723@linutronix.de --- arch/x86/kernel/dumpstack.c | 23 +++++++++++++++++++---- 1 file changed, 19 insertions(+), 4 deletions(-) (limited to 'arch') diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c index 25c06b67e7e0..97aa900386cb 100644 --- a/arch/x86/kernel/dumpstack.c +++ b/arch/x86/kernel/dumpstack.c @@ -78,6 +78,9 @@ static int copy_code(struct pt_regs *regs, u8 *buf, unsigned long src, if (!user_mode(regs)) return copy_from_kernel_nofault(buf, (u8 *)src, nbytes); + /* The user space code from other tasks cannot be accessed. */ + if (regs != task_pt_regs(current)) + return -EPERM; /* * Make sure userspace isn't trying to trick us into dumping kernel * memory by pointing the userspace instruction pointer at it. @@ -85,6 +88,12 @@ static int copy_code(struct pt_regs *regs, u8 *buf, unsigned long src, if (__chk_range_not_ok(src, nbytes, TASK_SIZE_MAX)) return -EINVAL; + /* + * Even if named copy_from_user_nmi() this can be invoked from + * other contexts and will not try to resolve a pagefault, which is + * the correct thing to do here as this code can be called from any + * context. + */ return copy_from_user_nmi(buf, (void __user *)src, nbytes); } @@ -115,13 +124,19 @@ void show_opcodes(struct pt_regs *regs, const char *loglvl) u8 opcodes[OPCODE_BUFSIZE]; unsigned long prologue = regs->ip - PROLOGUE_SIZE; - if (copy_code(regs, opcodes, prologue, sizeof(opcodes))) { - printk("%sCode: Unable to access opcode bytes at RIP 0x%lx.\n", - loglvl, prologue); - } else { + switch (copy_code(regs, opcodes, prologue, sizeof(opcodes))) { + case 0: printk("%sCode: %" __stringify(PROLOGUE_SIZE) "ph <%02x> %" __stringify(EPILOGUE_SIZE) "ph\n", loglvl, opcodes, opcodes[PROLOGUE_SIZE], opcodes + PROLOGUE_SIZE + 1); + break; + case -EPERM: + /* No access to the user space stack of other tasks. Ignore. */ + break; + default: + printk("%sCode: Unable to access opcode bytes at RIP 0x%lx.\n", + loglvl, prologue); + break; } } -- cgit v1.2.3 From 4c80d05714d347405865802b7098f1c97362cbef Mon Sep 17 00:00:00 2001 From: Christian Borntraeger Date: Tue, 17 Nov 2020 20:00:33 +0100 Subject: s390/uv: handle destroy page legacy interface Older firmware can return rc=0x107 rrc=0xd for destroy page if the page is already non-secure. This should be handled like a success as already done by newer firmware. Signed-off-by: Christian Borntraeger Fixes: 1a80b54d1ce1 ("s390/uv: add destroy page call") Reviewed-by: David Hildenbrand Acked-by: Cornelia Huck Reviewed-by: Janosch Frank --- arch/s390/kernel/uv.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/s390/kernel/uv.c b/arch/s390/kernel/uv.c index 14bd9d58edc9..883bfed9f5c2 100644 --- a/arch/s390/kernel/uv.c +++ b/arch/s390/kernel/uv.c @@ -129,8 +129,15 @@ int uv_destroy_page(unsigned long paddr) .paddr = paddr }; - if (uv_call(0, (u64)&uvcb)) + if (uv_call(0, (u64)&uvcb)) { + /* + * Older firmware uses 107/d as an indication of a non secure + * page. Let us emulate the newer variant (no-op). + */ + if (uvcb.header.rc == 0x107 && uvcb.header.rrc == 0xd) + return 0; return -EINVAL; + } return 0; } -- cgit v1.2.3 From 4d213e76a359e540ca786ee937da7f35faa8e5f8 Mon Sep 17 00:00:00 2001 From: Zhenzhong Duan Date: Tue, 10 Nov 2020 15:19:08 +0800 Subject: iommu/vt-d: Avoid panic if iommu init fails in tboot system "intel_iommu=off" command line is used to disable iommu but iommu is force enabled in a tboot system for security reason. However for better performance on high speed network device, a new option "intel_iommu=tboot_noforce" is introduced to disable the force on. By default kernel should panic if iommu init fail in tboot for security reason, but it's unnecessory if we use "intel_iommu=tboot_noforce,off". Fix the code setting force_on and move intel_iommu_tboot_noforce from tboot code to intel iommu code. Fixes: 7304e8f28bb2 ("iommu/vt-d: Correctly disable Intel IOMMU force on") Signed-off-by: Zhenzhong Duan Tested-by: Lukasz Hawrylko Acked-by: Lu Baolu Link: https://lore.kernel.org/r/20201110071908.3133-1-zhenzhong.duan@gmail.com Signed-off-by: Will Deacon --- arch/x86/kernel/tboot.c | 3 --- drivers/iommu/intel/iommu.c | 5 +++-- include/linux/intel-iommu.h | 1 - 3 files changed, 3 insertions(+), 6 deletions(-) (limited to 'arch') diff --git a/arch/x86/kernel/tboot.c b/arch/x86/kernel/tboot.c index 992fb1415c0f..420be871d9d4 100644 --- a/arch/x86/kernel/tboot.c +++ b/arch/x86/kernel/tboot.c @@ -514,9 +514,6 @@ int tboot_force_iommu(void) if (!tboot_enabled()) return 0; - if (intel_iommu_tboot_noforce) - return 1; - if (no_iommu || swiotlb || dmar_disabled) pr_warn("Forcing Intel-IOMMU to enabled\n"); diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index 1b1ca63e6bbe..4d9b298002f0 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -179,7 +179,7 @@ static int rwbf_quirk; * (used when kernel is launched w/ TXT) */ static int force_on = 0; -int intel_iommu_tboot_noforce; +static int intel_iommu_tboot_noforce; static int no_platform_optin; #define ROOT_ENTRY_NR (VTD_PAGE_SIZE/sizeof(struct root_entry)) @@ -4885,7 +4885,8 @@ int __init intel_iommu_init(void) * Intel IOMMU is required for a TXT/tboot launch or platform * opt in, so enforce that. */ - force_on = tboot_force_iommu() || platform_optin_force_iommu(); + force_on = (!intel_iommu_tboot_noforce && tboot_force_iommu()) || + platform_optin_force_iommu(); if (iommu_init_mempool()) { if (force_on) diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h index fbf5b3e7707e..d956987ed032 100644 --- a/include/linux/intel-iommu.h +++ b/include/linux/intel-iommu.h @@ -798,7 +798,6 @@ extern int iommu_calculate_agaw(struct intel_iommu *iommu); extern int iommu_calculate_max_sagaw(struct intel_iommu *iommu); extern int dmar_disabled; extern int intel_iommu_enabled; -extern int intel_iommu_tboot_noforce; extern int intel_iommu_gfx_mapped; #else static inline int iommu_calculate_agaw(struct intel_iommu *iommu) -- cgit v1.2.3 From 05d5de6ba7dbe490dd413b5ca11d0875bd2bc006 Mon Sep 17 00:00:00 2001 From: Marc Kleine-Budde Date: Wed, 11 Nov 2020 15:12:11 +0100 Subject: ARM: dts: dra76x: m_can: fix order of clocks According to the bosch,m_can.yaml bindings the first clock shall be the "hclk", while the second clock "cclk". This patch fixes the order accordingly. Fixes: 0adbe832f21a ("ARM: dts: dra76x: Add MCAN node") Cc: Faiz Abbas Cc: Tony Lindgren Cc: linux-omap@vger.kernel.org Signed-off-by: Marc Kleine-Budde Signed-off-by: Tony Lindgren --- arch/arm/boot/dts/dra76x.dtsi | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'arch') diff --git a/arch/arm/boot/dts/dra76x.dtsi b/arch/arm/boot/dts/dra76x.dtsi index b69c7d40f5d8..2f326151116b 100644 --- a/arch/arm/boot/dts/dra76x.dtsi +++ b/arch/arm/boot/dts/dra76x.dtsi @@ -32,8 +32,8 @@ interrupts = , ; interrupt-names = "int0", "int1"; - clocks = <&mcan_clk>, <&l3_iclk_div>; - clock-names = "cclk", "hclk"; + clocks = <&l3_iclk_div>, <&mcan_clk>; + clock-names = "hclk", "cclk"; bosch,mram-cfg = <0x0 0 0 32 0 0 1 1>; }; }; -- cgit v1.2.3 From f79643787e0a0762d2409b7b8334e83f22d85695 Mon Sep 17 00:00:00 2001 From: Nicholas Piggin Date: Tue, 17 Nov 2020 16:59:12 +1100 Subject: powerpc/64s: flush L1D on kernel entry IBM Power9 processors can speculatively operate on data in the L1 cache before it has been completely validated, via a way-prediction mechanism. It is not possible for an attacker to determine the contents of impermissible memory using this method, since these systems implement a combination of hardware and software security measures to prevent scenarios where protected data could be leaked. However these measures don't address the scenario where an attacker induces the operating system to speculatively execute instructions using data that the attacker controls. This can be used for example to speculatively bypass "kernel user access prevention" techniques, as discovered by Anthony Steinhauser of Google's Safeside Project. This is not an attack by itself, but there is a possibility it could be used in conjunction with side-channels or other weaknesses in the privileged code to construct an attack. This issue can be mitigated by flushing the L1 cache between privilege boundaries of concern. This patch flushes the L1 cache on kernel entry. This is part of the fix for CVE-2020-4788. Signed-off-by: Nicholas Piggin Signed-off-by: Daniel Axtens Signed-off-by: Michael Ellerman --- Documentation/admin-guide/kernel-parameters.txt | 3 ++ arch/powerpc/include/asm/exception-64s.h | 9 +++- arch/powerpc/include/asm/feature-fixups.h | 10 +++++ arch/powerpc/include/asm/security_features.h | 4 ++ arch/powerpc/include/asm/setup.h | 3 ++ arch/powerpc/kernel/exceptions-64s.S | 37 +++++++++++++++ arch/powerpc/kernel/setup_64.c | 60 ++++++++++++++++++++++++- arch/powerpc/kernel/vmlinux.lds.S | 7 +++ arch/powerpc/lib/feature-fixups.c | 54 ++++++++++++++++++++++ arch/powerpc/platforms/powernv/setup.c | 11 +++++ arch/powerpc/platforms/pseries/setup.c | 4 ++ 11 files changed, 200 insertions(+), 2 deletions(-) (limited to 'arch') diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 526d65d8573a..383889fe771f 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -2858,6 +2858,7 @@ mds=off [X86] tsx_async_abort=off [X86] kvm.nx_huge_pages=off [X86] + no_entry_flush [PPC] Exceptions: This does not have any effect on @@ -3186,6 +3187,8 @@ noefi Disable EFI runtime services support. + no_entry_flush [PPC] Don't flush the L1-D cache when entering the kernel. + noexec [IA-64] noexec [X86] diff --git a/arch/powerpc/include/asm/exception-64s.h b/arch/powerpc/include/asm/exception-64s.h index ebe95aa04d53..83fa88bc9935 100644 --- a/arch/powerpc/include/asm/exception-64s.h +++ b/arch/powerpc/include/asm/exception-64s.h @@ -57,11 +57,18 @@ nop; \ nop +#define ENTRY_FLUSH_SLOT \ + ENTRY_FLUSH_FIXUP_SECTION; \ + nop; \ + nop; \ + nop; + /* * r10 must be free to use, r13 must be paca */ #define INTERRUPT_TO_KERNEL \ - STF_ENTRY_BARRIER_SLOT + STF_ENTRY_BARRIER_SLOT; \ + ENTRY_FLUSH_SLOT /* * Macros for annotating the expected destination of (h)rfid diff --git a/arch/powerpc/include/asm/feature-fixups.h b/arch/powerpc/include/asm/feature-fixups.h index b0af97add751..06a48219bbf2 100644 --- a/arch/powerpc/include/asm/feature-fixups.h +++ b/arch/powerpc/include/asm/feature-fixups.h @@ -205,6 +205,14 @@ label##3: \ FTR_ENTRY_OFFSET 955b-956b; \ .popsection; +#define ENTRY_FLUSH_FIXUP_SECTION \ +957: \ + .pushsection __entry_flush_fixup,"a"; \ + .align 2; \ +958: \ + FTR_ENTRY_OFFSET 957b-958b; \ + .popsection; + #define RFI_FLUSH_FIXUP_SECTION \ 951: \ .pushsection __rfi_flush_fixup,"a"; \ @@ -237,8 +245,10 @@ label##3: \ #include extern long stf_barrier_fallback; +extern long entry_flush_fallback; extern long __start___stf_entry_barrier_fixup, __stop___stf_entry_barrier_fixup; extern long __start___stf_exit_barrier_fixup, __stop___stf_exit_barrier_fixup; +extern long __start___entry_flush_fixup, __stop___entry_flush_fixup; extern long __start___rfi_flush_fixup, __stop___rfi_flush_fixup; extern long __start___barrier_nospec_fixup, __stop___barrier_nospec_fixup; extern long __start__btb_flush_fixup, __stop__btb_flush_fixup; diff --git a/arch/powerpc/include/asm/security_features.h b/arch/powerpc/include/asm/security_features.h index fbb8fa32150f..9e7459d2edca 100644 --- a/arch/powerpc/include/asm/security_features.h +++ b/arch/powerpc/include/asm/security_features.h @@ -86,12 +86,16 @@ static inline bool security_ftr_enabled(u64 feature) // Software required to flush link stack on context switch #define SEC_FTR_FLUSH_LINK_STACK 0x0000000000001000ull +// The L1-D cache should be flushed when entering the kernel +#define SEC_FTR_L1D_FLUSH_ENTRY 0x0000000000004000ull + // Features enabled by default #define SEC_FTR_DEFAULT \ (SEC_FTR_L1D_FLUSH_HV | \ SEC_FTR_L1D_FLUSH_PR | \ SEC_FTR_BNDS_CHK_SPEC_BAR | \ + SEC_FTR_L1D_FLUSH_ENTRY | \ SEC_FTR_FAVOUR_SECURITY) #endif /* _ASM_POWERPC_SECURITY_FEATURES_H */ diff --git a/arch/powerpc/include/asm/setup.h b/arch/powerpc/include/asm/setup.h index 9efbddee2bca..e2fcd3e874f8 100644 --- a/arch/powerpc/include/asm/setup.h +++ b/arch/powerpc/include/asm/setup.h @@ -52,12 +52,15 @@ enum l1d_flush_type { }; void setup_rfi_flush(enum l1d_flush_type, bool enable); +void setup_entry_flush(bool enable); +void setup_uaccess_flush(bool enable); void do_rfi_flush_fixups(enum l1d_flush_type types); #ifdef CONFIG_PPC_BARRIER_NOSPEC void setup_barrier_nospec(void); #else static inline void setup_barrier_nospec(void) { }; #endif +void do_entry_flush_fixups(enum l1d_flush_type types); void do_barrier_nospec_fixups(bool enable); extern bool barrier_nospec_enabled; diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index f7d748b88705..5577dd887d37 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -2951,6 +2951,43 @@ TRAMP_REAL_BEGIN(stf_barrier_fallback) .endr blr +TRAMP_REAL_BEGIN(entry_flush_fallback) + std r9,PACA_EXRFI+EX_R9(r13) + std r10,PACA_EXRFI+EX_R10(r13) + std r11,PACA_EXRFI+EX_R11(r13) + mfctr r9 + ld r10,PACA_RFI_FLUSH_FALLBACK_AREA(r13) + ld r11,PACA_L1D_FLUSH_SIZE(r13) + srdi r11,r11,(7 + 3) /* 128 byte lines, unrolled 8x */ + mtctr r11 + DCBT_BOOK3S_STOP_ALL_STREAM_IDS(r11) /* Stop prefetch streams */ + + /* order ld/st prior to dcbt stop all streams with flushing */ + sync + + /* + * The load addresses are at staggered offsets within cachelines, + * which suits some pipelines better (on others it should not + * hurt). + */ +1: + ld r11,(0x80 + 8)*0(r10) + ld r11,(0x80 + 8)*1(r10) + ld r11,(0x80 + 8)*2(r10) + ld r11,(0x80 + 8)*3(r10) + ld r11,(0x80 + 8)*4(r10) + ld r11,(0x80 + 8)*5(r10) + ld r11,(0x80 + 8)*6(r10) + ld r11,(0x80 + 8)*7(r10) + addi r10,r10,0x80*8 + bdnz 1b + + mtctr r9 + ld r9,PACA_EXRFI+EX_R9(r13) + ld r10,PACA_EXRFI+EX_R10(r13) + ld r11,PACA_EXRFI+EX_R11(r13) + blr + TRAMP_REAL_BEGIN(rfi_flush_fallback) SET_SCRATCH0(r13); GET_PACA(r13); diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c index bb9cab3641d7..7d447b43ad13 100644 --- a/arch/powerpc/kernel/setup_64.c +++ b/arch/powerpc/kernel/setup_64.c @@ -945,7 +945,9 @@ early_initcall(disable_hardlockup_detector); static enum l1d_flush_type enabled_flush_types; static void *l1d_flush_fallback_area; static bool no_rfi_flush; +static bool no_entry_flush; bool rfi_flush; +bool entry_flush; static int __init handle_no_rfi_flush(char *p) { @@ -955,6 +957,14 @@ static int __init handle_no_rfi_flush(char *p) } early_param("no_rfi_flush", handle_no_rfi_flush); +static int __init handle_no_entry_flush(char *p) +{ + pr_info("entry-flush: disabled on command line."); + no_entry_flush = true; + return 0; +} +early_param("no_entry_flush", handle_no_entry_flush); + /* * The RFI flush is not KPTI, but because users will see doco that says to use * nopti we hijack that option here to also disable the RFI flush. @@ -986,6 +996,18 @@ void rfi_flush_enable(bool enable) rfi_flush = enable; } +void entry_flush_enable(bool enable) +{ + if (enable) { + do_entry_flush_fixups(enabled_flush_types); + on_each_cpu(do_nothing, NULL, 1); + } else { + do_entry_flush_fixups(L1D_FLUSH_NONE); + } + + entry_flush = enable; +} + static void __ref init_fallback_flush(void) { u64 l1d_size, limit; @@ -1044,10 +1066,19 @@ void setup_rfi_flush(enum l1d_flush_type types, bool enable) enabled_flush_types = types; - if (!no_rfi_flush && !cpu_mitigations_off()) + if (!cpu_mitigations_off() && !no_rfi_flush) rfi_flush_enable(enable); } +void setup_entry_flush(bool enable) +{ + if (cpu_mitigations_off()) + return; + + if (!no_entry_flush) + entry_flush_enable(enable); +} + #ifdef CONFIG_DEBUG_FS static int rfi_flush_set(void *data, u64 val) { @@ -1075,9 +1106,36 @@ static int rfi_flush_get(void *data, u64 *val) DEFINE_SIMPLE_ATTRIBUTE(fops_rfi_flush, rfi_flush_get, rfi_flush_set, "%llu\n"); +static int entry_flush_set(void *data, u64 val) +{ + bool enable; + + if (val == 1) + enable = true; + else if (val == 0) + enable = false; + else + return -EINVAL; + + /* Only do anything if we're changing state */ + if (enable != entry_flush) + entry_flush_enable(enable); + + return 0; +} + +static int entry_flush_get(void *data, u64 *val) +{ + *val = entry_flush ? 1 : 0; + return 0; +} + +DEFINE_SIMPLE_ATTRIBUTE(fops_entry_flush, entry_flush_get, entry_flush_set, "%llu\n"); + static __init int rfi_flush_debugfs_init(void) { debugfs_create_file("rfi_flush", 0600, powerpc_debugfs_root, NULL, &fops_rfi_flush); + debugfs_create_file("entry_flush", 0600, powerpc_debugfs_root, NULL, &fops_entry_flush); return 0; } device_initcall(rfi_flush_debugfs_init); diff --git a/arch/powerpc/kernel/vmlinux.lds.S b/arch/powerpc/kernel/vmlinux.lds.S index e0548b4950de..715ec4d4614a 100644 --- a/arch/powerpc/kernel/vmlinux.lds.S +++ b/arch/powerpc/kernel/vmlinux.lds.S @@ -131,6 +131,13 @@ SECTIONS __stop___stf_entry_barrier_fixup = .; } + . = ALIGN(8); + __entry_flush_fixup : AT(ADDR(__entry_flush_fixup) - LOAD_OFFSET) { + __start___entry_flush_fixup = .; + *(__entry_flush_fixup) + __stop___entry_flush_fixup = .; + } + . = ALIGN(8); __stf_exit_barrier_fixup : AT(ADDR(__stf_exit_barrier_fixup) - LOAD_OFFSET) { __start___stf_exit_barrier_fixup = .; diff --git a/arch/powerpc/lib/feature-fixups.c b/arch/powerpc/lib/feature-fixups.c index 4c0a7ee9fa00..70e83cfd74aa 100644 --- a/arch/powerpc/lib/feature-fixups.c +++ b/arch/powerpc/lib/feature-fixups.c @@ -234,6 +234,60 @@ void do_stf_barrier_fixups(enum stf_barrier_type types) do_stf_exit_barrier_fixups(types); } +void do_entry_flush_fixups(enum l1d_flush_type types) +{ + unsigned int instrs[3], *dest; + long *start, *end; + int i; + + start = PTRRELOC(&__start___entry_flush_fixup); + end = PTRRELOC(&__stop___entry_flush_fixup); + + instrs[0] = 0x60000000; /* nop */ + instrs[1] = 0x60000000; /* nop */ + instrs[2] = 0x60000000; /* nop */ + + i = 0; + if (types == L1D_FLUSH_FALLBACK) { + instrs[i++] = 0x7d4802a6; /* mflr r10 */ + instrs[i++] = 0x60000000; /* branch patched below */ + instrs[i++] = 0x7d4803a6; /* mtlr r10 */ + } + + if (types & L1D_FLUSH_ORI) { + instrs[i++] = 0x63ff0000; /* ori 31,31,0 speculation barrier */ + instrs[i++] = 0x63de0000; /* ori 30,30,0 L1d flush*/ + } + + if (types & L1D_FLUSH_MTTRIG) + instrs[i++] = 0x7c12dba6; /* mtspr TRIG2,r0 (SPR #882) */ + + for (i = 0; start < end; start++, i++) { + dest = (void *)start + *start; + + pr_devel("patching dest %lx\n", (unsigned long)dest); + + patch_instruction((struct ppc_inst *)dest, ppc_inst(instrs[0])); + + if (types == L1D_FLUSH_FALLBACK) + patch_branch((struct ppc_inst *)(dest + 1), (unsigned long)&entry_flush_fallback, + BRANCH_SET_LINK); + else + patch_instruction((struct ppc_inst *)(dest + 1), ppc_inst(instrs[1])); + + patch_instruction((struct ppc_inst *)(dest + 2), ppc_inst(instrs[2])); + } + + printk(KERN_DEBUG "entry-flush: patched %d locations (%s flush)\n", i, + (types == L1D_FLUSH_NONE) ? "no" : + (types == L1D_FLUSH_FALLBACK) ? "fallback displacement" : + (types & L1D_FLUSH_ORI) ? (types & L1D_FLUSH_MTTRIG) + ? "ori+mttrig type" + : "ori type" : + (types & L1D_FLUSH_MTTRIG) ? "mttrig type" + : "unknown"); +} + void do_rfi_flush_fixups(enum l1d_flush_type types) { unsigned int instrs[3], *dest; diff --git a/arch/powerpc/platforms/powernv/setup.c b/arch/powerpc/platforms/powernv/setup.c index 9acaa0f131b9..d04a085c423d 100644 --- a/arch/powerpc/platforms/powernv/setup.c +++ b/arch/powerpc/platforms/powernv/setup.c @@ -122,12 +122,23 @@ static void pnv_setup_rfi_flush(void) type = L1D_FLUSH_ORI; } + /* + * If we are non-Power9 bare metal, we don't need to flush on kernel + * entry: it fixes a P9 specific vulnerability. + */ + if (!pvr_version_is(PVR_POWER9)) + security_ftr_clear(SEC_FTR_L1D_FLUSH_ENTRY); + enable = security_ftr_enabled(SEC_FTR_FAVOUR_SECURITY) && \ (security_ftr_enabled(SEC_FTR_L1D_FLUSH_PR) || \ security_ftr_enabled(SEC_FTR_L1D_FLUSH_HV)); setup_rfi_flush(type, enable); setup_count_cache_flush(); + + enable = security_ftr_enabled(SEC_FTR_FAVOUR_SECURITY) && + security_ftr_enabled(SEC_FTR_L1D_FLUSH_ENTRY); + setup_entry_flush(enable); } static void __init pnv_check_guarded_cores(void) diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c index 633c45ec406d..8136c5368ee4 100644 --- a/arch/powerpc/platforms/pseries/setup.c +++ b/arch/powerpc/platforms/pseries/setup.c @@ -579,6 +579,10 @@ void pseries_setup_rfi_flush(void) setup_rfi_flush(types, enable); setup_count_cache_flush(); + + enable = security_ftr_enabled(SEC_FTR_FAVOUR_SECURITY) && + security_ftr_enabled(SEC_FTR_L1D_FLUSH_ENTRY); + setup_entry_flush(enable); } #ifdef CONFIG_PCI_IOV -- cgit v1.2.3 From 9a32a7e78bd0cd9a9b6332cbdc345ee5ffd0c5de Mon Sep 17 00:00:00 2001 From: Nicholas Piggin Date: Tue, 17 Nov 2020 16:59:13 +1100 Subject: powerpc/64s: flush L1D after user accesses IBM Power9 processors can speculatively operate on data in the L1 cache before it has been completely validated, via a way-prediction mechanism. It is not possible for an attacker to determine the contents of impermissible memory using this method, since these systems implement a combination of hardware and software security measures to prevent scenarios where protected data could be leaked. However these measures don't address the scenario where an attacker induces the operating system to speculatively execute instructions using data that the attacker controls. This can be used for example to speculatively bypass "kernel user access prevention" techniques, as discovered by Anthony Steinhauser of Google's Safeside Project. This is not an attack by itself, but there is a possibility it could be used in conjunction with side-channels or other weaknesses in the privileged code to construct an attack. This issue can be mitigated by flushing the L1 cache between privilege boundaries of concern. This patch flushes the L1 cache after user accesses. This is part of the fix for CVE-2020-4788. Signed-off-by: Nicholas Piggin Signed-off-by: Daniel Axtens Signed-off-by: Michael Ellerman --- Documentation/admin-guide/kernel-parameters.txt | 4 ++ arch/powerpc/include/asm/book3s/64/kup-radix.h | 66 ++++++++++++------- arch/powerpc/include/asm/exception-64s.h | 3 + arch/powerpc/include/asm/feature-fixups.h | 9 +++ arch/powerpc/include/asm/kup.h | 19 ++++-- arch/powerpc/include/asm/security_features.h | 3 + arch/powerpc/include/asm/setup.h | 1 + arch/powerpc/kernel/exceptions-64s.S | 85 ++++++++----------------- arch/powerpc/kernel/setup_64.c | 62 ++++++++++++++++++ arch/powerpc/kernel/vmlinux.lds.S | 7 ++ arch/powerpc/lib/feature-fixups.c | 50 +++++++++++++++ arch/powerpc/platforms/powernv/setup.c | 10 ++- arch/powerpc/platforms/pseries/setup.c | 4 ++ 13 files changed, 233 insertions(+), 90 deletions(-) (limited to 'arch') diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 383889fe771f..44fde25bb221 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -2859,6 +2859,7 @@ tsx_async_abort=off [X86] kvm.nx_huge_pages=off [X86] no_entry_flush [PPC] + no_uaccess_flush [PPC] Exceptions: This does not have any effect on @@ -3238,6 +3239,9 @@ nospec_store_bypass_disable [HW] Disable all mitigations for the Speculative Store Bypass vulnerability + no_uaccess_flush + [PPC] Don't flush the L1-D cache after accessing user data. + noxsave [BUGS=X86] Disables x86 extended register state save and restore using xsave. The kernel will fallback to enabling legacy floating-point and sse state. diff --git a/arch/powerpc/include/asm/book3s/64/kup-radix.h b/arch/powerpc/include/asm/book3s/64/kup-radix.h index 3ee1ec60be84..97c2394e7dea 100644 --- a/arch/powerpc/include/asm/book3s/64/kup-radix.h +++ b/arch/powerpc/include/asm/book3s/64/kup-radix.h @@ -61,6 +61,8 @@ #else /* !__ASSEMBLY__ */ +DECLARE_STATIC_KEY_FALSE(uaccess_flush_key); + #ifdef CONFIG_PPC_KUAP #include @@ -103,8 +105,16 @@ static inline void kuap_check_amr(void) static inline unsigned long get_kuap(void) { + /* + * We return AMR_KUAP_BLOCKED when we don't support KUAP because + * prevent_user_access_return needs to return AMR_KUAP_BLOCKED to + * cause restore_user_access to do a flush. + * + * This has no effect in terms of actually blocking things on hash, + * so it doesn't break anything. + */ if (!early_mmu_has_feature(MMU_FTR_RADIX_KUAP)) - return 0; + return AMR_KUAP_BLOCKED; return mfspr(SPRN_AMR); } @@ -123,6 +133,31 @@ static inline void set_kuap(unsigned long value) isync(); } +static inline bool +bad_kuap_fault(struct pt_regs *regs, unsigned long address, bool is_write) +{ + return WARN(mmu_has_feature(MMU_FTR_RADIX_KUAP) && + (regs->kuap & (is_write ? AMR_KUAP_BLOCK_WRITE : AMR_KUAP_BLOCK_READ)), + "Bug: %s fault blocked by AMR!", is_write ? "Write" : "Read"); +} +#else /* CONFIG_PPC_KUAP */ +static inline void kuap_restore_amr(struct pt_regs *regs, unsigned long amr) { } + +static inline unsigned long kuap_get_and_check_amr(void) +{ + return 0UL; +} + +static inline void kuap_check_amr(void) { } + +static inline unsigned long get_kuap(void) +{ + return AMR_KUAP_BLOCKED; +} + +static inline void set_kuap(unsigned long value) { } +#endif /* !CONFIG_PPC_KUAP */ + static __always_inline void allow_user_access(void __user *to, const void __user *from, unsigned long size, unsigned long dir) { @@ -142,6 +177,8 @@ static inline void prevent_user_access(void __user *to, const void __user *from, unsigned long size, unsigned long dir) { set_kuap(AMR_KUAP_BLOCKED); + if (static_branch_unlikely(&uaccess_flush_key)) + do_uaccess_flush(); } static inline unsigned long prevent_user_access_return(void) @@ -149,6 +186,8 @@ static inline unsigned long prevent_user_access_return(void) unsigned long flags = get_kuap(); set_kuap(AMR_KUAP_BLOCKED); + if (static_branch_unlikely(&uaccess_flush_key)) + do_uaccess_flush(); return flags; } @@ -156,30 +195,9 @@ static inline unsigned long prevent_user_access_return(void) static inline void restore_user_access(unsigned long flags) { set_kuap(flags); + if (static_branch_unlikely(&uaccess_flush_key) && flags == AMR_KUAP_BLOCKED) + do_uaccess_flush(); } - -static inline bool -bad_kuap_fault(struct pt_regs *regs, unsigned long address, bool is_write) -{ - return WARN(mmu_has_feature(MMU_FTR_RADIX_KUAP) && - (regs->kuap & (is_write ? AMR_KUAP_BLOCK_WRITE : AMR_KUAP_BLOCK_READ)), - "Bug: %s fault blocked by AMR!", is_write ? "Write" : "Read"); -} -#else /* CONFIG_PPC_KUAP */ -static inline void kuap_restore_amr(struct pt_regs *regs, unsigned long amr) -{ -} - -static inline void kuap_check_amr(void) -{ -} - -static inline unsigned long kuap_get_and_check_amr(void) -{ - return 0; -} -#endif /* CONFIG_PPC_KUAP */ - #endif /* __ASSEMBLY__ */ #endif /* _ASM_POWERPC_BOOK3S_64_KUP_RADIX_H */ diff --git a/arch/powerpc/include/asm/exception-64s.h b/arch/powerpc/include/asm/exception-64s.h index 83fa88bc9935..1d32b174ab6a 100644 --- a/arch/powerpc/include/asm/exception-64s.h +++ b/arch/powerpc/include/asm/exception-64s.h @@ -144,6 +144,9 @@ RFSCV; \ b rfscv_flush_fallback +#else /* __ASSEMBLY__ */ +/* Prototype for function defined in exceptions-64s.S */ +void do_uaccess_flush(void); #endif /* __ASSEMBLY__ */ #endif /* _ASM_POWERPC_EXCEPTION_H */ diff --git a/arch/powerpc/include/asm/feature-fixups.h b/arch/powerpc/include/asm/feature-fixups.h index 06a48219bbf2..fbd406cd6916 100644 --- a/arch/powerpc/include/asm/feature-fixups.h +++ b/arch/powerpc/include/asm/feature-fixups.h @@ -205,6 +205,14 @@ label##3: \ FTR_ENTRY_OFFSET 955b-956b; \ .popsection; +#define UACCESS_FLUSH_FIXUP_SECTION \ +959: \ + .pushsection __uaccess_flush_fixup,"a"; \ + .align 2; \ +960: \ + FTR_ENTRY_OFFSET 959b-960b; \ + .popsection; + #define ENTRY_FLUSH_FIXUP_SECTION \ 957: \ .pushsection __entry_flush_fixup,"a"; \ @@ -248,6 +256,7 @@ extern long stf_barrier_fallback; extern long entry_flush_fallback; extern long __start___stf_entry_barrier_fixup, __stop___stf_entry_barrier_fixup; extern long __start___stf_exit_barrier_fixup, __stop___stf_exit_barrier_fixup; +extern long __start___uaccess_flush_fixup, __stop___uaccess_flush_fixup; extern long __start___entry_flush_fixup, __stop___entry_flush_fixup; extern long __start___rfi_flush_fixup, __stop___rfi_flush_fixup; extern long __start___barrier_nospec_fixup, __stop___barrier_nospec_fixup; diff --git a/arch/powerpc/include/asm/kup.h b/arch/powerpc/include/asm/kup.h index 1d0f7d838b2e..0f5c606ae057 100644 --- a/arch/powerpc/include/asm/kup.h +++ b/arch/powerpc/include/asm/kup.h @@ -53,17 +53,26 @@ static inline void setup_kuep(bool disabled) { } void setup_kuap(bool disabled); #else static inline void setup_kuap(bool disabled) { } + +static inline bool +bad_kuap_fault(struct pt_regs *regs, unsigned long address, bool is_write) +{ + return false; +} + +/* + * book3s/64/kup-radix.h defines these functions for the !KUAP case to flush + * the L1D cache after user accesses. Only include the empty stubs for other + * platforms. + */ +#ifndef CONFIG_PPC64 static inline void allow_user_access(void __user *to, const void __user *from, unsigned long size, unsigned long dir) { } static inline void prevent_user_access(void __user *to, const void __user *from, unsigned long size, unsigned long dir) { } static inline unsigned long prevent_user_access_return(void) { return 0UL; } static inline void restore_user_access(unsigned long flags) { } -static inline bool -bad_kuap_fault(struct pt_regs *regs, unsigned long address, bool is_write) -{ - return false; -} +#endif /* CONFIG_PPC64 */ #endif /* CONFIG_PPC_KUAP */ static inline void allow_read_from_user(const void __user *from, unsigned long size) diff --git a/arch/powerpc/include/asm/security_features.h b/arch/powerpc/include/asm/security_features.h index 9e7459d2edca..b774a4477d5f 100644 --- a/arch/powerpc/include/asm/security_features.h +++ b/arch/powerpc/include/asm/security_features.h @@ -89,6 +89,8 @@ static inline bool security_ftr_enabled(u64 feature) // The L1-D cache should be flushed when entering the kernel #define SEC_FTR_L1D_FLUSH_ENTRY 0x0000000000004000ull +// The L1-D cache should be flushed after user accesses from the kernel +#define SEC_FTR_L1D_FLUSH_UACCESS 0x0000000000008000ull // Features enabled by default #define SEC_FTR_DEFAULT \ @@ -96,6 +98,7 @@ static inline bool security_ftr_enabled(u64 feature) SEC_FTR_L1D_FLUSH_PR | \ SEC_FTR_BNDS_CHK_SPEC_BAR | \ SEC_FTR_L1D_FLUSH_ENTRY | \ + SEC_FTR_L1D_FLUSH_UACCESS | \ SEC_FTR_FAVOUR_SECURITY) #endif /* _ASM_POWERPC_SECURITY_FEATURES_H */ diff --git a/arch/powerpc/include/asm/setup.h b/arch/powerpc/include/asm/setup.h index e2fcd3e874f8..a466749703f1 100644 --- a/arch/powerpc/include/asm/setup.h +++ b/arch/powerpc/include/asm/setup.h @@ -60,6 +60,7 @@ void setup_barrier_nospec(void); #else static inline void setup_barrier_nospec(void) { }; #endif +void do_uaccess_flush_fixups(enum l1d_flush_type types); void do_entry_flush_fixups(enum l1d_flush_type types); void do_barrier_nospec_fixups(bool enable); extern bool barrier_nospec_enabled; diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index 5577dd887d37..f63a3d3bca3d 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -2951,11 +2951,8 @@ TRAMP_REAL_BEGIN(stf_barrier_fallback) .endr blr -TRAMP_REAL_BEGIN(entry_flush_fallback) - std r9,PACA_EXRFI+EX_R9(r13) - std r10,PACA_EXRFI+EX_R10(r13) - std r11,PACA_EXRFI+EX_R11(r13) - mfctr r9 +/* Clobbers r10, r11, ctr */ +.macro L1D_DISPLACEMENT_FLUSH ld r10,PACA_RFI_FLUSH_FALLBACK_AREA(r13) ld r11,PACA_L1D_FLUSH_SIZE(r13) srdi r11,r11,(7 + 3) /* 128 byte lines, unrolled 8x */ @@ -2981,7 +2978,14 @@ TRAMP_REAL_BEGIN(entry_flush_fallback) ld r11,(0x80 + 8)*7(r10) addi r10,r10,0x80*8 bdnz 1b +.endm +TRAMP_REAL_BEGIN(entry_flush_fallback) + std r9,PACA_EXRFI+EX_R9(r13) + std r10,PACA_EXRFI+EX_R10(r13) + std r11,PACA_EXRFI+EX_R11(r13) + mfctr r9 + L1D_DISPLACEMENT_FLUSH mtctr r9 ld r9,PACA_EXRFI+EX_R9(r13) ld r10,PACA_EXRFI+EX_R10(r13) @@ -2997,32 +3001,7 @@ TRAMP_REAL_BEGIN(rfi_flush_fallback) std r10,PACA_EXRFI+EX_R10(r13) std r11,PACA_EXRFI+EX_R11(r13) mfctr r9 - ld r10,PACA_RFI_FLUSH_FALLBACK_AREA(r13) - ld r11,PACA_L1D_FLUSH_SIZE(r13) - srdi r11,r11,(7 + 3) /* 128 byte lines, unrolled 8x */ - mtctr r11 - DCBT_BOOK3S_STOP_ALL_STREAM_IDS(r11) /* Stop prefetch streams */ - - /* order ld/st prior to dcbt stop all streams with flushing */ - sync - - /* - * The load adresses are at staggered offsets within cachelines, - * which suits some pipelines better (on others it should not - * hurt). - */ -1: - ld r11,(0x80 + 8)*0(r10) - ld r11,(0x80 + 8)*1(r10) - ld r11,(0x80 + 8)*2(r10) - ld r11,(0x80 + 8)*3(r10) - ld r11,(0x80 + 8)*4(r10) - ld r11,(0x80 + 8)*5(r10) - ld r11,(0x80 + 8)*6(r10) - ld r11,(0x80 + 8)*7(r10) - addi r10,r10,0x80*8 - bdnz 1b - + L1D_DISPLACEMENT_FLUSH mtctr r9 ld r9,PACA_EXRFI+EX_R9(r13) ld r10,PACA_EXRFI+EX_R10(r13) @@ -3040,32 +3019,7 @@ TRAMP_REAL_BEGIN(hrfi_flush_fallback) std r10,PACA_EXRFI+EX_R10(r13) std r11,PACA_EXRFI+EX_R11(r13) mfctr r9 - ld r10,PACA_RFI_FLUSH_FALLBACK_AREA(r13) - ld r11,PACA_L1D_FLUSH_SIZE(r13) - srdi r11,r11,(7 + 3) /* 128 byte lines, unrolled 8x */ - mtctr r11 - DCBT_BOOK3S_STOP_ALL_STREAM_IDS(r11) /* Stop prefetch streams */ - - /* order ld/st prior to dcbt stop all streams with flushing */ - sync - - /* - * The load adresses are at staggered offsets within cachelines, - * which suits some pipelines better (on others it should not - * hurt). - */ -1: - ld r11,(0x80 + 8)*0(r10) - ld r11,(0x80 + 8)*1(r10) - ld r11,(0x80 + 8)*2(r10) - ld r11,(0x80 + 8)*3(r10) - ld r11,(0x80 + 8)*4(r10) - ld r11,(0x80 + 8)*5(r10) - ld r11,(0x80 + 8)*6(r10) - ld r11,(0x80 + 8)*7(r10) - addi r10,r10,0x80*8 - bdnz 1b - + L1D_DISPLACEMENT_FLUSH mtctr r9 ld r9,PACA_EXRFI+EX_R9(r13) ld r10,PACA_EXRFI+EX_R10(r13) @@ -3116,8 +3070,21 @@ TRAMP_REAL_BEGIN(rfscv_flush_fallback) RFSCV USE_TEXT_SECTION() - MASKED_INTERRUPT - MASKED_INTERRUPT hsrr=1 + +_GLOBAL(do_uaccess_flush) + UACCESS_FLUSH_FIXUP_SECTION + nop + nop + nop + blr + L1D_DISPLACEMENT_FLUSH + blr +_ASM_NOKPROBE_SYMBOL(do_uaccess_flush) +EXPORT_SYMBOL(do_uaccess_flush) + + +MASKED_INTERRUPT +MASKED_INTERRUPT hsrr=1 #ifdef CONFIG_KVM_BOOK3S_64_HANDLER kvmppc_skip_interrupt: diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c index 7d447b43ad13..74fd47f46fa5 100644 --- a/arch/powerpc/kernel/setup_64.c +++ b/arch/powerpc/kernel/setup_64.c @@ -946,8 +946,12 @@ static enum l1d_flush_type enabled_flush_types; static void *l1d_flush_fallback_area; static bool no_rfi_flush; static bool no_entry_flush; +static bool no_uaccess_flush; bool rfi_flush; bool entry_flush; +bool uaccess_flush; +DEFINE_STATIC_KEY_FALSE(uaccess_flush_key); +EXPORT_SYMBOL(uaccess_flush_key); static int __init handle_no_rfi_flush(char *p) { @@ -965,6 +969,14 @@ static int __init handle_no_entry_flush(char *p) } early_param("no_entry_flush", handle_no_entry_flush); +static int __init handle_no_uaccess_flush(char *p) +{ + pr_info("uaccess-flush: disabled on command line."); + no_uaccess_flush = true; + return 0; +} +early_param("no_uaccess_flush", handle_no_uaccess_flush); + /* * The RFI flush is not KPTI, but because users will see doco that says to use * nopti we hijack that option here to also disable the RFI flush. @@ -1008,6 +1020,20 @@ void entry_flush_enable(bool enable) entry_flush = enable; } +void uaccess_flush_enable(bool enable) +{ + if (enable) { + do_uaccess_flush_fixups(enabled_flush_types); + static_branch_enable(&uaccess_flush_key); + on_each_cpu(do_nothing, NULL, 1); + } else { + static_branch_disable(&uaccess_flush_key); + do_uaccess_flush_fixups(L1D_FLUSH_NONE); + } + + uaccess_flush = enable; +} + static void __ref init_fallback_flush(void) { u64 l1d_size, limit; @@ -1079,6 +1105,15 @@ void setup_entry_flush(bool enable) entry_flush_enable(enable); } +void setup_uaccess_flush(bool enable) +{ + if (cpu_mitigations_off()) + return; + + if (!no_uaccess_flush) + uaccess_flush_enable(enable); +} + #ifdef CONFIG_DEBUG_FS static int rfi_flush_set(void *data, u64 val) { @@ -1132,10 +1167,37 @@ static int entry_flush_get(void *data, u64 *val) DEFINE_SIMPLE_ATTRIBUTE(fops_entry_flush, entry_flush_get, entry_flush_set, "%llu\n"); +static int uaccess_flush_set(void *data, u64 val) +{ + bool enable; + + if (val == 1) + enable = true; + else if (val == 0) + enable = false; + else + return -EINVAL; + + /* Only do anything if we're changing state */ + if (enable != uaccess_flush) + uaccess_flush_enable(enable); + + return 0; +} + +static int uaccess_flush_get(void *data, u64 *val) +{ + *val = uaccess_flush ? 1 : 0; + return 0; +} + +DEFINE_SIMPLE_ATTRIBUTE(fops_uaccess_flush, uaccess_flush_get, uaccess_flush_set, "%llu\n"); + static __init int rfi_flush_debugfs_init(void) { debugfs_create_file("rfi_flush", 0600, powerpc_debugfs_root, NULL, &fops_rfi_flush); debugfs_create_file("entry_flush", 0600, powerpc_debugfs_root, NULL, &fops_entry_flush); + debugfs_create_file("uaccess_flush", 0600, powerpc_debugfs_root, NULL, &fops_uaccess_flush); return 0; } device_initcall(rfi_flush_debugfs_init); diff --git a/arch/powerpc/kernel/vmlinux.lds.S b/arch/powerpc/kernel/vmlinux.lds.S index 715ec4d4614a..6db90cdf11da 100644 --- a/arch/powerpc/kernel/vmlinux.lds.S +++ b/arch/powerpc/kernel/vmlinux.lds.S @@ -131,6 +131,13 @@ SECTIONS __stop___stf_entry_barrier_fixup = .; } + . = ALIGN(8); + __uaccess_flush_fixup : AT(ADDR(__uaccess_flush_fixup) - LOAD_OFFSET) { + __start___uaccess_flush_fixup = .; + *(__uaccess_flush_fixup) + __stop___uaccess_flush_fixup = .; + } + . = ALIGN(8); __entry_flush_fixup : AT(ADDR(__entry_flush_fixup) - LOAD_OFFSET) { __start___entry_flush_fixup = .; diff --git a/arch/powerpc/lib/feature-fixups.c b/arch/powerpc/lib/feature-fixups.c index 70e83cfd74aa..321c12a9ef6b 100644 --- a/arch/powerpc/lib/feature-fixups.c +++ b/arch/powerpc/lib/feature-fixups.c @@ -234,6 +234,56 @@ void do_stf_barrier_fixups(enum stf_barrier_type types) do_stf_exit_barrier_fixups(types); } +void do_uaccess_flush_fixups(enum l1d_flush_type types) +{ + unsigned int instrs[4], *dest; + long *start, *end; + int i; + + start = PTRRELOC(&__start___uaccess_flush_fixup); + end = PTRRELOC(&__stop___uaccess_flush_fixup); + + instrs[0] = 0x60000000; /* nop */ + instrs[1] = 0x60000000; /* nop */ + instrs[2] = 0x60000000; /* nop */ + instrs[3] = 0x4e800020; /* blr */ + + i = 0; + if (types == L1D_FLUSH_FALLBACK) { + instrs[3] = 0x60000000; /* nop */ + /* fallthrough to fallback flush */ + } + + if (types & L1D_FLUSH_ORI) { + instrs[i++] = 0x63ff0000; /* ori 31,31,0 speculation barrier */ + instrs[i++] = 0x63de0000; /* ori 30,30,0 L1d flush*/ + } + + if (types & L1D_FLUSH_MTTRIG) + instrs[i++] = 0x7c12dba6; /* mtspr TRIG2,r0 (SPR #882) */ + + for (i = 0; start < end; start++, i++) { + dest = (void *)start + *start; + + pr_devel("patching dest %lx\n", (unsigned long)dest); + + patch_instruction((struct ppc_inst *)dest, ppc_inst(instrs[0])); + + patch_instruction((struct ppc_inst *)(dest + 1), ppc_inst(instrs[1])); + patch_instruction((struct ppc_inst *)(dest + 2), ppc_inst(instrs[2])); + patch_instruction((struct ppc_inst *)(dest + 3), ppc_inst(instrs[3])); + } + + printk(KERN_DEBUG "uaccess-flush: patched %d locations (%s flush)\n", i, + (types == L1D_FLUSH_NONE) ? "no" : + (types == L1D_FLUSH_FALLBACK) ? "fallback displacement" : + (types & L1D_FLUSH_ORI) ? (types & L1D_FLUSH_MTTRIG) + ? "ori+mttrig type" + : "ori type" : + (types & L1D_FLUSH_MTTRIG) ? "mttrig type" + : "unknown"); +} + void do_entry_flush_fixups(enum l1d_flush_type types) { unsigned int instrs[3], *dest; diff --git a/arch/powerpc/platforms/powernv/setup.c b/arch/powerpc/platforms/powernv/setup.c index d04a085c423d..087ec92acfc4 100644 --- a/arch/powerpc/platforms/powernv/setup.c +++ b/arch/powerpc/platforms/powernv/setup.c @@ -124,10 +124,12 @@ static void pnv_setup_rfi_flush(void) /* * If we are non-Power9 bare metal, we don't need to flush on kernel - * entry: it fixes a P9 specific vulnerability. + * entry or after user access: they fix a P9 specific vulnerability. */ - if (!pvr_version_is(PVR_POWER9)) + if (!pvr_version_is(PVR_POWER9)) { security_ftr_clear(SEC_FTR_L1D_FLUSH_ENTRY); + security_ftr_clear(SEC_FTR_L1D_FLUSH_UACCESS); + } enable = security_ftr_enabled(SEC_FTR_FAVOUR_SECURITY) && \ (security_ftr_enabled(SEC_FTR_L1D_FLUSH_PR) || \ @@ -139,6 +141,10 @@ static void pnv_setup_rfi_flush(void) enable = security_ftr_enabled(SEC_FTR_FAVOUR_SECURITY) && security_ftr_enabled(SEC_FTR_L1D_FLUSH_ENTRY); setup_entry_flush(enable); + + enable = security_ftr_enabled(SEC_FTR_FAVOUR_SECURITY) && + security_ftr_enabled(SEC_FTR_L1D_FLUSH_UACCESS); + setup_uaccess_flush(enable); } static void __init pnv_check_guarded_cores(void) diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c index 8136c5368ee4..3617cdb079f6 100644 --- a/arch/powerpc/platforms/pseries/setup.c +++ b/arch/powerpc/platforms/pseries/setup.c @@ -583,6 +583,10 @@ void pseries_setup_rfi_flush(void) enable = security_ftr_enabled(SEC_FTR_FAVOUR_SECURITY) && security_ftr_enabled(SEC_FTR_L1D_FLUSH_ENTRY); setup_entry_flush(enable); + + enable = security_ftr_enabled(SEC_FTR_FAVOUR_SECURITY) && + security_ftr_enabled(SEC_FTR_L1D_FLUSH_UACCESS); + setup_uaccess_flush(enable); } #ifdef CONFIG_PCI_IOV -- cgit v1.2.3 From 178d52c6e89c38d0553b0ac8b99927b11eb995b0 Mon Sep 17 00:00:00 2001 From: Michael Ellerman Date: Thu, 19 Nov 2020 23:43:53 +1100 Subject: powerpc: Only include kup-radix.h for 64-bit Book3S In kup.h we currently include kup-radix.h for all 64-bit builds, which includes Book3S and Book3E. The latter doesn't make sense, Book3E never uses the Radix MMU. This has worked up until now, but almost by accident, and the recent uaccess flush changes introduced a build breakage on Book3E because of the bad structure of the code. So disentangle things so that we only use kup-radix.h for Book3S. This requires some more stubs in kup.h and fixing an include in syscall_64.c. Signed-off-by: Michael Ellerman --- arch/powerpc/include/asm/book3s/64/kup-radix.h | 4 ++-- arch/powerpc/include/asm/kup.h | 11 ++++++++--- arch/powerpc/kernel/syscall_64.c | 2 +- 3 files changed, 11 insertions(+), 6 deletions(-) (limited to 'arch') diff --git a/arch/powerpc/include/asm/book3s/64/kup-radix.h b/arch/powerpc/include/asm/book3s/64/kup-radix.h index 97c2394e7dea..28716e2f13e3 100644 --- a/arch/powerpc/include/asm/book3s/64/kup-radix.h +++ b/arch/powerpc/include/asm/book3s/64/kup-radix.h @@ -27,6 +27,7 @@ #endif .endm +#ifdef CONFIG_PPC_KUAP .macro kuap_check_amr gpr1, gpr2 #ifdef CONFIG_PPC_KUAP_DEBUG BEGIN_MMU_FTR_SECTION_NESTED(67) @@ -38,6 +39,7 @@ END_MMU_FTR_SECTION_NESTED_IFSET(MMU_FTR_RADIX_KUAP, 67) #endif .endm +#endif .macro kuap_save_amr_and_lock gpr1, gpr2, use_cr, msr_pr_cr #ifdef CONFIG_PPC_KUAP @@ -148,8 +150,6 @@ static inline unsigned long kuap_get_and_check_amr(void) return 0UL; } -static inline void kuap_check_amr(void) { } - static inline unsigned long get_kuap(void) { return AMR_KUAP_BLOCKED; diff --git a/arch/powerpc/include/asm/kup.h b/arch/powerpc/include/asm/kup.h index 0f5c606ae057..0d93331d0fab 100644 --- a/arch/powerpc/include/asm/kup.h +++ b/arch/powerpc/include/asm/kup.h @@ -14,7 +14,7 @@ #define KUAP_CURRENT_WRITE 8 #define KUAP_CURRENT (KUAP_CURRENT_READ | KUAP_CURRENT_WRITE) -#ifdef CONFIG_PPC64 +#ifdef CONFIG_PPC_BOOK3S_64 #include #endif #ifdef CONFIG_PPC_8xx @@ -35,6 +35,9 @@ .macro kuap_check current, gpr .endm +.macro kuap_check_amr gpr1, gpr2 +.endm + #endif #else /* !__ASSEMBLY__ */ @@ -60,19 +63,21 @@ bad_kuap_fault(struct pt_regs *regs, unsigned long address, bool is_write) return false; } +static inline void kuap_check_amr(void) { } + /* * book3s/64/kup-radix.h defines these functions for the !KUAP case to flush * the L1D cache after user accesses. Only include the empty stubs for other * platforms. */ -#ifndef CONFIG_PPC64 +#ifndef CONFIG_PPC_BOOK3S_64 static inline void allow_user_access(void __user *to, const void __user *from, unsigned long size, unsigned long dir) { } static inline void prevent_user_access(void __user *to, const void __user *from, unsigned long size, unsigned long dir) { } static inline unsigned long prevent_user_access_return(void) { return 0UL; } static inline void restore_user_access(unsigned long flags) { } -#endif /* CONFIG_PPC64 */ +#endif /* CONFIG_PPC_BOOK3S_64 */ #endif /* CONFIG_PPC_KUAP */ static inline void allow_read_from_user(const void __user *from, unsigned long size) diff --git a/arch/powerpc/kernel/syscall_64.c b/arch/powerpc/kernel/syscall_64.c index 8e50818aa50b..310bcd768cd5 100644 --- a/arch/powerpc/kernel/syscall_64.c +++ b/arch/powerpc/kernel/syscall_64.c @@ -2,7 +2,7 @@ #include #include -#include +#include #include #include #include -- cgit v1.2.3 From da631f7fd623b6c180c8d93a93040d1e0d61291f Mon Sep 17 00:00:00 2001 From: Daniel Axtens Date: Tue, 17 Nov 2020 16:59:16 +1100 Subject: powerpc/64s: rename pnv|pseries_setup_rfi_flush to _setup_security_mitigations pseries|pnv_setup_rfi_flush already does the count cache flush setup, and we just added entry and uaccess flushes. So the name is not very accurate any more. In both platforms we then also immediately setup the STF flush. Rename them to _setup_security_mitigations and fold the STF flush in. Signed-off-by: Daniel Axtens Signed-off-by: Michael Ellerman --- arch/powerpc/platforms/powernv/setup.c | 7 ++++--- arch/powerpc/platforms/pseries/mobility.c | 4 ++-- arch/powerpc/platforms/pseries/pseries.h | 2 +- arch/powerpc/platforms/pseries/setup.c | 7 ++++--- 4 files changed, 11 insertions(+), 9 deletions(-) (limited to 'arch') diff --git a/arch/powerpc/platforms/powernv/setup.c b/arch/powerpc/platforms/powernv/setup.c index 087ec92acfc4..46115231a3b2 100644 --- a/arch/powerpc/platforms/powernv/setup.c +++ b/arch/powerpc/platforms/powernv/setup.c @@ -98,7 +98,7 @@ static void init_fw_feat_flags(struct device_node *np) security_ftr_clear(SEC_FTR_BNDS_CHK_SPEC_BAR); } -static void pnv_setup_rfi_flush(void) +static void pnv_setup_security_mitigations(void) { struct device_node *np, *fw_features; enum l1d_flush_type type; @@ -145,6 +145,8 @@ static void pnv_setup_rfi_flush(void) enable = security_ftr_enabled(SEC_FTR_FAVOUR_SECURITY) && security_ftr_enabled(SEC_FTR_L1D_FLUSH_UACCESS); setup_uaccess_flush(enable); + + setup_stf_barrier(); } static void __init pnv_check_guarded_cores(void) @@ -173,8 +175,7 @@ static void __init pnv_setup_arch(void) { set_arch_panic_timeout(10, ARCH_PANIC_TIMEOUT); - pnv_setup_rfi_flush(); - setup_stf_barrier(); + pnv_setup_security_mitigations(); /* Initialize SMP */ pnv_smp_init(); diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c index d6f4162478a5..2f73cb5bf12d 100644 --- a/arch/powerpc/platforms/pseries/mobility.c +++ b/arch/powerpc/platforms/pseries/mobility.c @@ -349,8 +349,8 @@ void post_mobility_fixup(void) cpus_read_unlock(); - /* Possibly switch to a new RFI flush type */ - pseries_setup_rfi_flush(); + /* Possibly switch to a new L1 flush type */ + pseries_setup_security_mitigations(); /* Reinitialise system information for hv-24x7 */ read_24x7_sys_info(); diff --git a/arch/powerpc/platforms/pseries/pseries.h b/arch/powerpc/platforms/pseries/pseries.h index 13fa370a87e4..593840847cd3 100644 --- a/arch/powerpc/platforms/pseries/pseries.h +++ b/arch/powerpc/platforms/pseries/pseries.h @@ -111,7 +111,7 @@ static inline unsigned long cmo_get_page_size(void) int dlpar_workqueue_init(void); -void pseries_setup_rfi_flush(void); +void pseries_setup_security_mitigations(void); void pseries_lpar_read_hblkrm_characteristics(void); #endif /* _PSERIES_PSERIES_H */ diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c index 3617cdb079f6..090c13f6c881 100644 --- a/arch/powerpc/platforms/pseries/setup.c +++ b/arch/powerpc/platforms/pseries/setup.c @@ -542,7 +542,7 @@ static void init_cpu_char_feature_flags(struct h_cpu_char_result *result) security_ftr_clear(SEC_FTR_BNDS_CHK_SPEC_BAR); } -void pseries_setup_rfi_flush(void) +void pseries_setup_security_mitigations(void) { struct h_cpu_char_result result; enum l1d_flush_type types; @@ -587,6 +587,8 @@ void pseries_setup_rfi_flush(void) enable = security_ftr_enabled(SEC_FTR_FAVOUR_SECURITY) && security_ftr_enabled(SEC_FTR_L1D_FLUSH_UACCESS); setup_uaccess_flush(enable); + + setup_stf_barrier(); } #ifdef CONFIG_PCI_IOV @@ -776,8 +778,7 @@ static void __init pSeries_setup_arch(void) fwnmi_init(); - pseries_setup_rfi_flush(); - setup_stf_barrier(); + pseries_setup_security_mitigations(); pseries_lpar_read_hblkrm_characteristics(); /* By default, only probe PCI (can be overridden by rtas_pci) */ -- cgit v1.2.3 From a927bd6ba952d13c52b8b385030943032f659a3e Mon Sep 17 00:00:00 2001 From: Dan Williams Date: Sat, 21 Nov 2020 22:17:05 -0800 Subject: mm: fix phys_to_target_node() and memory_add_physaddr_to_nid() exports The core-mm has a default __weak implementation of phys_to_target_node() to mirror the weak definition of memory_add_physaddr_to_nid(). That symbol is exported for modules. However, while the export in mm/memory_hotplug.c exported the symbol in the configuration cases of: CONFIG_NUMA_KEEP_MEMINFO=y CONFIG_MEMORY_HOTPLUG=y ...and: CONFIG_NUMA_KEEP_MEMINFO=n CONFIG_MEMORY_HOTPLUG=y ...it failed to export the symbol in the case of: CONFIG_NUMA_KEEP_MEMINFO=y CONFIG_MEMORY_HOTPLUG=n Not only is that broken, but Christoph points out that the kernel should not be exporting any __weak symbol, which means that memory_add_physaddr_to_nid() example that phys_to_target_node() copied is broken too. Rework the definition of phys_to_target_node() and memory_add_physaddr_to_nid() to not require weak symbols. Move to the common arch override design-pattern of an asm header defining a symbol to replace the default implementation. The only common header that all memory_add_physaddr_to_nid() producing architectures implement is asm/sparsemem.h. In fact, powerpc already defines its memory_add_physaddr_to_nid() helper in sparsemem.h. Double-down on that observation and define phys_to_target_node() where necessary in asm/sparsemem.h. An alternate consideration that was discarded was to put this override in asm/numa.h, but that entangles with the definition of MAX_NUMNODES relative to the inclusion of linux/nodemask.h, and requires powerpc to grow a new header. The dependency on NUMA_KEEP_MEMINFO for DEV_DAX_HMEM_DEVICES is invalid now that the symbol is properly exported / stubbed in all combinations of CONFIG_NUMA_KEEP_MEMINFO and CONFIG_MEMORY_HOTPLUG. [dan.j.williams@intel.com: v4] Link: https://lkml.kernel.org/r/160461461867.1505359.5301571728749534585.stgit@dwillia2-desk3.amr.corp.intel.com [dan.j.williams@intel.com: powerpc: fix create_section_mapping compile warning] Link: https://lkml.kernel.org/r/160558386174.2948926.2740149041249041764.stgit@dwillia2-desk3.amr.corp.intel.com Fixes: a035b6bf863e ("mm/memory_hotplug: introduce default phys_to_target_node() implementation") Reported-by: Randy Dunlap Reported-by: Thomas Gleixner Reported-by: kernel test robot Reported-by: Christoph Hellwig Signed-off-by: Dan Williams Signed-off-by: Andrew Morton Tested-by: Randy Dunlap Tested-by: Thomas Gleixner Reviewed-by: Thomas Gleixner Reviewed-by: Christoph Hellwig Cc: Joao Martins Cc: Tony Luck Cc: Fenghua Yu Cc: Michael Ellerman Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Vishal Verma Cc: Stephen Rothwell Link: https://lkml.kernel.org/r/160447639846.1133764.7044090803980177548.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Linus Torvalds --- arch/ia64/include/asm/sparsemem.h | 6 ++++++ arch/powerpc/include/asm/mmzone.h | 5 +++++ arch/powerpc/include/asm/sparsemem.h | 5 ++--- arch/powerpc/mm/mem.c | 1 + arch/x86/include/asm/sparsemem.h | 10 ++++++++++ arch/x86/mm/numa.c | 2 ++ drivers/dax/Kconfig | 1 - include/linux/memory_hotplug.h | 14 -------------- include/linux/numa.h | 30 +++++++++++++++++++++++++++++- mm/memory_hotplug.c | 18 ------------------ 10 files changed, 55 insertions(+), 37 deletions(-) (limited to 'arch') diff --git a/arch/ia64/include/asm/sparsemem.h b/arch/ia64/include/asm/sparsemem.h index 336d0570e1fa..dd8c166ffd7b 100644 --- a/arch/ia64/include/asm/sparsemem.h +++ b/arch/ia64/include/asm/sparsemem.h @@ -18,4 +18,10 @@ #endif #endif /* CONFIG_SPARSEMEM */ + +#ifdef CONFIG_MEMORY_HOTPLUG +int memory_add_physaddr_to_nid(u64 addr); +#define memory_add_physaddr_to_nid memory_add_physaddr_to_nid +#endif + #endif /* _ASM_IA64_SPARSEMEM_H */ diff --git a/arch/powerpc/include/asm/mmzone.h b/arch/powerpc/include/asm/mmzone.h index 91c69ff53a8a..6cda76b57c5d 100644 --- a/arch/powerpc/include/asm/mmzone.h +++ b/arch/powerpc/include/asm/mmzone.h @@ -46,5 +46,10 @@ u64 memory_hotplug_max(void); #define __HAVE_ARCH_RESERVED_KERNEL_PAGES #endif +#ifdef CONFIG_MEMORY_HOTPLUG +extern int create_section_mapping(unsigned long start, unsigned long end, + int nid, pgprot_t prot); +#endif + #endif /* __KERNEL__ */ #endif /* _ASM_MMZONE_H_ */ diff --git a/arch/powerpc/include/asm/sparsemem.h b/arch/powerpc/include/asm/sparsemem.h index 1e6fa371cc38..d072866842e4 100644 --- a/arch/powerpc/include/asm/sparsemem.h +++ b/arch/powerpc/include/asm/sparsemem.h @@ -13,9 +13,9 @@ #endif /* CONFIG_SPARSEMEM */ #ifdef CONFIG_MEMORY_HOTPLUG -extern int create_section_mapping(unsigned long start, unsigned long end, - int nid, pgprot_t prot); extern int remove_section_mapping(unsigned long start, unsigned long end); +extern int memory_add_physaddr_to_nid(u64 start); +#define memory_add_physaddr_to_nid memory_add_physaddr_to_nid #ifdef CONFIG_NUMA extern int hot_add_scn_to_nid(unsigned long scn_addr); @@ -26,6 +26,5 @@ static inline int hot_add_scn_to_nid(unsigned long scn_addr) } #endif /* CONFIG_NUMA */ #endif /* CONFIG_MEMORY_HOTPLUG */ - #endif /* __KERNEL__ */ #endif /* _ASM_POWERPC_SPARSEMEM_H */ diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c index 01ec2a252f09..3fc325bebe4d 100644 --- a/arch/powerpc/mm/mem.c +++ b/arch/powerpc/mm/mem.c @@ -50,6 +50,7 @@ #include #include #include +#include #include diff --git a/arch/x86/include/asm/sparsemem.h b/arch/x86/include/asm/sparsemem.h index 6bfc878f6771..6a9ccc1b2be5 100644 --- a/arch/x86/include/asm/sparsemem.h +++ b/arch/x86/include/asm/sparsemem.h @@ -28,4 +28,14 @@ #endif #endif /* CONFIG_SPARSEMEM */ + +#ifndef __ASSEMBLY__ +#ifdef CONFIG_NUMA_KEEP_MEMINFO +extern int phys_to_target_node(phys_addr_t start); +#define phys_to_target_node phys_to_target_node +extern int memory_add_physaddr_to_nid(u64 start); +#define memory_add_physaddr_to_nid memory_add_physaddr_to_nid +#endif +#endif /* __ASSEMBLY__ */ + #endif /* _ASM_X86_SPARSEMEM_H */ diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c index 44148691d78b..5eb4dc2b97da 100644 --- a/arch/x86/mm/numa.c +++ b/arch/x86/mm/numa.c @@ -938,6 +938,7 @@ int phys_to_target_node(phys_addr_t start) return meminfo_to_nid(&numa_reserved_meminfo, start); } +EXPORT_SYMBOL_GPL(phys_to_target_node); int memory_add_physaddr_to_nid(u64 start) { @@ -947,4 +948,5 @@ int memory_add_physaddr_to_nid(u64 start) nid = numa_meminfo.blk[0].nid; return nid; } +EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid); #endif diff --git a/drivers/dax/Kconfig b/drivers/dax/Kconfig index 567428e10b7b..d2834c2cfa10 100644 --- a/drivers/dax/Kconfig +++ b/drivers/dax/Kconfig @@ -50,7 +50,6 @@ config DEV_DAX_HMEM Say M if unsure. config DEV_DAX_HMEM_DEVICES - depends on NUMA_KEEP_MEMINFO # for phys_to_target_node() depends on DEV_DAX_HMEM && DAX=y def_bool y diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h index d65c6fdc5cfc..551093b74596 100644 --- a/include/linux/memory_hotplug.h +++ b/include/linux/memory_hotplug.h @@ -281,20 +281,6 @@ static inline bool movable_node_is_enabled(void) } #endif /* ! CONFIG_MEMORY_HOTPLUG */ -#ifdef CONFIG_NUMA -extern int memory_add_physaddr_to_nid(u64 start); -extern int phys_to_target_node(u64 start); -#else -static inline int memory_add_physaddr_to_nid(u64 start) -{ - return 0; -} -static inline int phys_to_target_node(u64 start) -{ - return 0; -} -#endif - #if defined(CONFIG_MEMORY_HOTPLUG) || defined(CONFIG_DEFERRED_STRUCT_PAGE_INIT) /* * pgdat resizing functions diff --git a/include/linux/numa.h b/include/linux/numa.h index 8cb33ccfb671..cb44cfe2b725 100644 --- a/include/linux/numa.h +++ b/include/linux/numa.h @@ -21,13 +21,41 @@ #endif #ifdef CONFIG_NUMA +#include +#include + /* Generic implementation available */ int numa_map_to_online_node(int node); -#else + +#ifndef memory_add_physaddr_to_nid +static inline int memory_add_physaddr_to_nid(u64 start) +{ + pr_info_once("Unknown online node for memory at 0x%llx, assuming node 0\n", + start); + return 0; +} +#endif +#ifndef phys_to_target_node +static inline int phys_to_target_node(u64 start) +{ + pr_info_once("Unknown target node for memory at 0x%llx, assuming node 0\n", + start); + return 0; +} +#endif +#else /* !CONFIG_NUMA */ static inline int numa_map_to_online_node(int node) { return NUMA_NO_NODE; } +static inline int memory_add_physaddr_to_nid(u64 start) +{ + return 0; +} +static inline int phys_to_target_node(u64 start) +{ + return 0; +} #endif #endif /* _LINUX_NUMA_H */ diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index b44d4c7ba73b..63b2e46b6555 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -350,24 +350,6 @@ int __ref __add_pages(int nid, unsigned long pfn, unsigned long nr_pages, return err; } -#ifdef CONFIG_NUMA -int __weak memory_add_physaddr_to_nid(u64 start) -{ - pr_info_once("Unknown online node for memory at 0x%llx, assuming node 0\n", - start); - return 0; -} -EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid); - -int __weak phys_to_target_node(u64 start) -{ - pr_info_once("Unknown target node for memory at 0x%llx, assuming node 0\n", - start); - return 0; -} -EXPORT_SYMBOL_GPL(phys_to_target_node); -#endif - /* find the smallest valid pfn in the range [start_pfn, end_pfn) */ static unsigned long find_smallest_section_pfn(int nid, struct zone *zone, unsigned long start_pfn, -- cgit v1.2.3 From b6b79dd53082db11070b4368d85dd6699ff0b063 Mon Sep 17 00:00:00 2001 From: Stephen Rothwell Date: Mon, 23 Nov 2020 18:40:16 +1100 Subject: powerpc/64s: Fix allnoconfig build since uaccess flush Using DECLARE_STATIC_KEY_FALSE needs linux/jump_table.h. Otherwise the build fails with eg: arch/powerpc/include/asm/book3s/64/kup-radix.h:66:1: warning: data definition has no type or storage class 66 | DECLARE_STATIC_KEY_FALSE(uaccess_flush_key); Fixes: 9a32a7e78bd0 ("powerpc/64s: flush L1D after user accesses") Signed-off-by: Stephen Rothwell [mpe: Massage change log] Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20201123184016.693fe464@canb.auug.org.au --- arch/powerpc/include/asm/book3s/64/kup-radix.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'arch') diff --git a/arch/powerpc/include/asm/book3s/64/kup-radix.h b/arch/powerpc/include/asm/book3s/64/kup-radix.h index 28716e2f13e3..a39e2d193fdc 100644 --- a/arch/powerpc/include/asm/book3s/64/kup-radix.h +++ b/arch/powerpc/include/asm/book3s/64/kup-radix.h @@ -63,6 +63,8 @@ #else /* !__ASSEMBLY__ */ +#include + DECLARE_STATIC_KEY_FALSE(uaccess_flush_key); #ifdef CONFIG_PPC_KUAP -- cgit v1.2.3 From 1179f170b6f0af7bb0b3b7628136eaac450ddf31 Mon Sep 17 00:00:00 2001 From: Sven Schnelle Date: Fri, 20 Nov 2020 14:17:52 +0100 Subject: s390: fix fpu restore in entry.S We need to disable interrupts in load_fpu_regs(). Otherwise an interrupt might come in after the registers are loaded, but before CIF_FPU is cleared in load_fpu_regs(). When the interrupt returns, CIF_FPU will be cleared and the registers will never be restored. The entry.S code usually saves the interrupt state in __SF_EMPTY on the stack when disabling/restoring interrupts. sie64a however saves the pointer to the sie control block in __SF_SIE_CONTROL, which references the same location. This is non-obvious to the reader. To avoid thrashing the sie control block pointer in load_fpu_regs(), move the __SIE_* offsets eight bytes after __SF_EMPTY on the stack. Cc: # 5.8 Fixes: 0b0ed657fe00 ("s390: remove critical section cleanup from entry.S") Reported-by: Pierre Morel Signed-off-by: Sven Schnelle Acked-by: Christian Borntraeger Reviewed-by: Heiko Carstens Signed-off-by: Heiko Carstens --- arch/s390/kernel/asm-offsets.c | 10 +++++----- arch/s390/kernel/entry.S | 2 ++ 2 files changed, 7 insertions(+), 5 deletions(-) (limited to 'arch') diff --git a/arch/s390/kernel/asm-offsets.c b/arch/s390/kernel/asm-offsets.c index 2012c1cf0853..483051e10db3 100644 --- a/arch/s390/kernel/asm-offsets.c +++ b/arch/s390/kernel/asm-offsets.c @@ -53,11 +53,11 @@ int main(void) /* stack_frame offsets */ OFFSET(__SF_BACKCHAIN, stack_frame, back_chain); OFFSET(__SF_GPRS, stack_frame, gprs); - OFFSET(__SF_EMPTY, stack_frame, empty1); - OFFSET(__SF_SIE_CONTROL, stack_frame, empty1[0]); - OFFSET(__SF_SIE_SAVEAREA, stack_frame, empty1[1]); - OFFSET(__SF_SIE_REASON, stack_frame, empty1[2]); - OFFSET(__SF_SIE_FLAGS, stack_frame, empty1[3]); + OFFSET(__SF_EMPTY, stack_frame, empty1[0]); + OFFSET(__SF_SIE_CONTROL, stack_frame, empty1[1]); + OFFSET(__SF_SIE_SAVEAREA, stack_frame, empty1[2]); + OFFSET(__SF_SIE_REASON, stack_frame, empty1[3]); + OFFSET(__SF_SIE_FLAGS, stack_frame, empty1[4]); BLANK(); OFFSET(__VDSO_GETCPU_VAL, vdso_per_cpu_data, getcpu_val); BLANK(); diff --git a/arch/s390/kernel/entry.S b/arch/s390/kernel/entry.S index 5346545b9860..26bb0603c5a1 100644 --- a/arch/s390/kernel/entry.S +++ b/arch/s390/kernel/entry.S @@ -1068,6 +1068,7 @@ EXPORT_SYMBOL(save_fpu_regs) * %r4 */ load_fpu_regs: + stnsm __SF_EMPTY(%r15),0xfc lg %r4,__LC_CURRENT aghi %r4,__TASK_thread TSTMSK __LC_CPU_FLAGS,_CIF_FPU @@ -1099,6 +1100,7 @@ load_fpu_regs: .Lload_fpu_regs_done: ni __LC_CPU_FLAGS+7,255-_CIF_FPU .Lload_fpu_regs_exit: + ssm __SF_EMPTY(%r15) BR_EX %r14 .Lload_fpu_regs_end: ENDPROC(load_fpu_regs) -- cgit v1.2.3 From 03659efe4287230b1d65b31c993708f335c8de82 Mon Sep 17 00:00:00 2001 From: Randy Dunlap Date: Sun, 22 Nov 2020 20:45:10 -0800 Subject: arm64/fpsimd: add to to fix fpsimd build Adding brought in which uses , which uses 'pstate_check_t' so the latter needs to #include for this typedef. Fixes this build error: In file included from arch/arm64/include/asm/kprobes.h:24, from arch/arm64/include/asm/exception.h:11, from arch/arm64/kernel/fpsimd.c:35: arch/arm64/include/asm/probes.h:16:2: error: unknown type name 'pstate_check_t' 16 | pstate_check_t *pstate_cc; Fixes: c6b90d5cf637 ("arm64/fpsimd: Fix missing-prototypes in fpsimd.c") Reported-by: kernel test robot Signed-off-by: Randy Dunlap Cc: Will Deacon Cc: Tian Tao Link: https://lore.kernel.org/r/20201123044510.9942-1-rdunlap@infradead.org Signed-off-by: Will Deacon --- arch/arm64/include/asm/probes.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'arch') diff --git a/arch/arm64/include/asm/probes.h b/arch/arm64/include/asm/probes.h index 4266262101fe..006946745352 100644 --- a/arch/arm64/include/asm/probes.h +++ b/arch/arm64/include/asm/probes.h @@ -7,6 +7,8 @@ #ifndef _ARM_PROBES_H #define _ARM_PROBES_H +#include + typedef u32 probe_opcode_t; typedef void (probes_handler_t) (u32 opcode, long addr, struct pt_regs *); -- cgit v1.2.3 From 07509e10dcc77627f8b6a57381e878fe269958d3 Mon Sep 17 00:00:00 2001 From: Will Deacon Date: Fri, 20 Nov 2020 13:28:01 +0000 Subject: arm64: pgtable: Fix pte_accessible() pte_accessible() is used by ptep_clear_flush() to figure out whether TLB invalidation is necessary when unmapping pages for reclaim. Although our implementation is correct according to the architecture, returning true only for valid, young ptes in the absence of racing page-table modifications, this is in fact flawed due to lazy invalidation of old ptes in ptep_clear_flush_young() where we elide the expensive DSB instruction for completing the TLB invalidation. Rather than penalise the aging path, adjust pte_accessible() to return true for any valid pte, even if the access flag is cleared. Cc: Fixes: 76c714be0e5e ("arm64: pgtable: implement pte_accessible()") Reported-by: Yu Zhao Acked-by: Yu Zhao Reviewed-by: Minchan Kim Reviewed-by: Catalin Marinas Link: https://lore.kernel.org/r/20201120143557.6715-2-will@kernel.org Signed-off-by: Will Deacon --- arch/arm64/include/asm/pgtable.h | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) (limited to 'arch') diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index 4ff12a7adcfd..326c34677e86 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -115,8 +115,6 @@ extern unsigned long empty_zero_page[PAGE_SIZE / sizeof(unsigned long)]; #define pte_valid(pte) (!!(pte_val(pte) & PTE_VALID)) #define pte_valid_not_user(pte) \ ((pte_val(pte) & (PTE_VALID | PTE_USER)) == PTE_VALID) -#define pte_valid_young(pte) \ - ((pte_val(pte) & (PTE_VALID | PTE_AF)) == (PTE_VALID | PTE_AF)) #define pte_valid_user(pte) \ ((pte_val(pte) & (PTE_VALID | PTE_USER)) == (PTE_VALID | PTE_USER)) @@ -124,9 +122,12 @@ extern unsigned long empty_zero_page[PAGE_SIZE / sizeof(unsigned long)]; * Could the pte be present in the TLB? We must check mm_tlb_flush_pending * so that we don't erroneously return false for pages that have been * remapped as PROT_NONE but are yet to be flushed from the TLB. + * Note that we can't make any assumptions based on the state of the access + * flag, since ptep_clear_flush_young() elides a DSB when invalidating the + * TLB. */ #define pte_accessible(mm, pte) \ - (mm_tlb_flush_pending(mm) ? pte_present(pte) : pte_valid_young(pte)) + (mm_tlb_flush_pending(mm) ? pte_present(pte) : pte_valid(pte)) /* * p??_access_permitted() is true for valid user mappings (subject to the -- cgit v1.2.3 From ff1712f953e27f0b0718762ec17d0adb15c9fd0b Mon Sep 17 00:00:00 2001 From: Will Deacon Date: Fri, 20 Nov 2020 13:57:48 +0000 Subject: arm64: pgtable: Ensure dirty bit is preserved across pte_wrprotect() With hardware dirty bit management, calling pte_wrprotect() on a writable, dirty PTE will lose the dirty state and return a read-only, clean entry. Move the logic from ptep_set_wrprotect() into pte_wrprotect() to ensure that the dirty bit is preserved for writable entries, as this is required for soft-dirty bit management if we enable it in the future. Cc: Fixes: 2f4b829c625e ("arm64: Add support for hardware updates of the access and dirty pte bits") Reviewed-by: Catalin Marinas Link: https://lore.kernel.org/r/20201120143557.6715-3-will@kernel.org Signed-off-by: Will Deacon --- arch/arm64/include/asm/pgtable.h | 27 ++++++++++++++------------- 1 file changed, 14 insertions(+), 13 deletions(-) (limited to 'arch') diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index 326c34677e86..5628289b9d5e 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -165,13 +165,6 @@ static inline pmd_t set_pmd_bit(pmd_t pmd, pgprot_t prot) return pmd; } -static inline pte_t pte_wrprotect(pte_t pte) -{ - pte = clear_pte_bit(pte, __pgprot(PTE_WRITE)); - pte = set_pte_bit(pte, __pgprot(PTE_RDONLY)); - return pte; -} - static inline pte_t pte_mkwrite(pte_t pte) { pte = set_pte_bit(pte, __pgprot(PTE_WRITE)); @@ -197,6 +190,20 @@ static inline pte_t pte_mkdirty(pte_t pte) return pte; } +static inline pte_t pte_wrprotect(pte_t pte) +{ + /* + * If hardware-dirty (PTE_WRITE/DBM bit set and PTE_RDONLY + * clear), set the PTE_DIRTY bit. + */ + if (pte_hw_dirty(pte)) + pte = pte_mkdirty(pte); + + pte = clear_pte_bit(pte, __pgprot(PTE_WRITE)); + pte = set_pte_bit(pte, __pgprot(PTE_RDONLY)); + return pte; +} + static inline pte_t pte_mkold(pte_t pte) { return clear_pte_bit(pte, __pgprot(PTE_AF)); @@ -846,12 +853,6 @@ static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long addres pte = READ_ONCE(*ptep); do { old_pte = pte; - /* - * If hardware-dirty (PTE_WRITE/DBM bit set and PTE_RDONLY - * clear), set the PTE_DIRTY bit. - */ - if (pte_hw_dirty(pte)) - pte = pte_mkdirty(pte); pte = pte_wrprotect(pte); pte_val(pte) = cmpxchg_relaxed(&pte_val(*ptep), pte_val(old_pte), pte_val(pte)); -- cgit v1.2.3 From fd8d9db3559a29fd737bcdb7c4fcbe1940caae34 Mon Sep 17 00:00:00 2001 From: Xiaochen Shen Date: Sat, 31 Oct 2020 03:10:53 +0800 Subject: x86/resctrl: Remove superfluous kernfs_get() calls to prevent refcount leak Willem reported growing of kernfs_node_cache entries in slabtop when repeatedly creating and removing resctrl subdirectories as well as when repeatedly mounting and unmounting the resctrl filesystem. On resource group (control as well as monitoring) creation via a mkdir an extra kernfs_node reference is obtained to ensure that the rdtgroup structure remains accessible for the rdtgroup_kn_unlock() calls where it is removed on deletion. The kernfs_node reference count is dropped by kernfs_put() in rdtgroup_kn_unlock(). With the above explaining the need for one kernfs_get()/kernfs_put() pair in resctrl there are more places where a kernfs_node reference is obtained without a corresponding release. The excessive amount of reference count on kernfs nodes will never be dropped to 0 and the kernfs nodes will never be freed in the call paths of rmdir and umount. It leads to reference count leak and kernfs_node_cache memory leak. Remove the superfluous kernfs_get() calls and expand the existing comments surrounding the remaining kernfs_get()/kernfs_put() pair that remains in use. Superfluous kernfs_get() calls are removed from two areas: (1) In call paths of mount and mkdir, when kernfs nodes for "info", "mon_groups" and "mon_data" directories and sub-directories are created, the reference count of newly created kernfs node is set to 1. But after kernfs_create_dir() returns, superfluous kernfs_get() are called to take an additional reference. (2) kernfs_get() calls in rmdir call paths. Fixes: 17eafd076291 ("x86/intel_rdt: Split resource group removal in two") Fixes: 4af4a88e0c92 ("x86/intel_rdt/cqm: Add mount,umount support") Fixes: f3cbeacaa06e ("x86/intel_rdt/cqm: Add rmdir support") Fixes: d89b7379015f ("x86/intel_rdt/cqm: Add mon_data") Fixes: c7d9aac61311 ("x86/intel_rdt/cqm: Add mkdir support for RDT monitoring") Fixes: 5dc1d5c6bac2 ("x86/intel_rdt: Simplify info and base file lists") Fixes: 60cf5e101fd4 ("x86/intel_rdt: Add mkdir to resctrl file system") Fixes: 4e978d06dedb ("x86/intel_rdt: Add "info" files to resctrl file system") Reported-by: Willem de Bruijn Signed-off-by: Xiaochen Shen Signed-off-by: Borislav Petkov Reviewed-by: Reinette Chatre Tested-by: Willem de Bruijn Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/1604085053-31639-1-git-send-email-xiaochen.shen@intel.com --- arch/x86/kernel/cpu/resctrl/rdtgroup.c | 35 ++-------------------------------- 1 file changed, 2 insertions(+), 33 deletions(-) (limited to 'arch') diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c index af323e2e3100..2ab1266a5f14 100644 --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c @@ -1769,7 +1769,6 @@ static int rdtgroup_mkdir_info_resdir(struct rdt_resource *r, char *name, if (IS_ERR(kn_subdir)) return PTR_ERR(kn_subdir); - kernfs_get(kn_subdir); ret = rdtgroup_kn_set_ugid(kn_subdir); if (ret) return ret; @@ -1792,7 +1791,6 @@ static int rdtgroup_create_info_dir(struct kernfs_node *parent_kn) kn_info = kernfs_create_dir(parent_kn, "info", parent_kn->mode, NULL); if (IS_ERR(kn_info)) return PTR_ERR(kn_info); - kernfs_get(kn_info); ret = rdtgroup_add_files(kn_info, RF_TOP_INFO); if (ret) @@ -1813,12 +1811,6 @@ static int rdtgroup_create_info_dir(struct kernfs_node *parent_kn) goto out_destroy; } - /* - * This extra ref will be put in kernfs_remove() and guarantees - * that @rdtgrp->kn is always accessible. - */ - kernfs_get(kn_info); - ret = rdtgroup_kn_set_ugid(kn_info); if (ret) goto out_destroy; @@ -1847,12 +1839,6 @@ mongroup_create_dir(struct kernfs_node *parent_kn, struct rdtgroup *prgrp, if (dest_kn) *dest_kn = kn; - /* - * This extra ref will be put in kernfs_remove() and guarantees - * that @rdtgrp->kn is always accessible. - */ - kernfs_get(kn); - ret = rdtgroup_kn_set_ugid(kn); if (ret) goto out_destroy; @@ -2139,13 +2125,11 @@ static int rdt_get_tree(struct fs_context *fc) &kn_mongrp); if (ret < 0) goto out_info; - kernfs_get(kn_mongrp); ret = mkdir_mondata_all(rdtgroup_default.kn, &rdtgroup_default, &kn_mondata); if (ret < 0) goto out_mongrp; - kernfs_get(kn_mondata); rdtgroup_default.mon.mon_data_kn = kn_mondata; } @@ -2499,11 +2483,6 @@ static int mkdir_mondata_subdir(struct kernfs_node *parent_kn, if (IS_ERR(kn)) return PTR_ERR(kn); - /* - * This extra ref will be put in kernfs_remove() and guarantees - * that kn is always accessible. - */ - kernfs_get(kn); ret = rdtgroup_kn_set_ugid(kn); if (ret) goto out_destroy; @@ -2838,8 +2817,8 @@ static int mkdir_rdt_prepare(struct kernfs_node *parent_kn, /* * kernfs_remove() will drop the reference count on "kn" which * will free it. But we still need it to stick around for the - * rdtgroup_kn_unlock(kn} call below. Take one extra reference - * here, which will be dropped inside rdtgroup_kn_unlock(). + * rdtgroup_kn_unlock(kn) call. Take one extra reference here, + * which will be dropped inside rdtgroup_kn_unlock(). */ kernfs_get(kn); @@ -3049,11 +3028,6 @@ static int rdtgroup_rmdir_mon(struct kernfs_node *kn, struct rdtgroup *rdtgrp, WARN_ON(list_empty(&prdtgrp->mon.crdtgrp_list)); list_del(&rdtgrp->mon.crdtgrp_list); - /* - * one extra hold on this, will drop when we kfree(rdtgrp) - * in rdtgroup_kn_unlock() - */ - kernfs_get(kn); kernfs_remove(rdtgrp->kn); return 0; @@ -3065,11 +3039,6 @@ static int rdtgroup_ctrl_remove(struct kernfs_node *kn, rdtgrp->flags = RDT_DELETED; list_del(&rdtgrp->rdtgroup_list); - /* - * one extra hold on this, will drop when we kfree(rdtgrp) - * in rdtgroup_kn_unlock() - */ - kernfs_get(kn); kernfs_remove(rdtgrp->kn); return 0; } -- cgit v1.2.3 From 758999246965eeb8b253d47e72f7bfe508804b16 Mon Sep 17 00:00:00 2001 From: Xiaochen Shen Date: Sat, 31 Oct 2020 03:11:28 +0800 Subject: x86/resctrl: Add necessary kernfs_put() calls to prevent refcount leak On resource group creation via a mkdir an extra kernfs_node reference is obtained by kernfs_get() to ensure that the rdtgroup structure remains accessible for the rdtgroup_kn_unlock() calls where it is removed on deletion. Currently the extra kernfs_node reference count is only dropped by kernfs_put() in rdtgroup_kn_unlock() while the rdtgroup structure is removed in a few other locations that lack the matching reference drop. In call paths of rmdir and umount, when a control group is removed, kernfs_remove() is called to remove the whole kernfs nodes tree of the control group (including the kernfs nodes trees of all child monitoring groups), and then rdtgroup structure is freed by kfree(). The rdtgroup structures of all child monitoring groups under the control group are freed by kfree() in free_all_child_rdtgrp(). Before calling kfree() to free the rdtgroup structures, the kernfs node of the control group itself as well as the kernfs nodes of all child monitoring groups still take the extra references which will never be dropped to 0 and the kernfs nodes will never be freed. It leads to reference count leak and kernfs_node_cache memory leak. For example, reference count leak is observed in these two cases: (1) mount -t resctrl resctrl /sys/fs/resctrl mkdir /sys/fs/resctrl/c1 mkdir /sys/fs/resctrl/c1/mon_groups/m1 umount /sys/fs/resctrl (2) mkdir /sys/fs/resctrl/c1 mkdir /sys/fs/resctrl/c1/mon_groups/m1 rmdir /sys/fs/resctrl/c1 The same reference count leak issue also exists in the error exit paths of mkdir in mkdir_rdt_prepare() and rdtgroup_mkdir_ctrl_mon(). Fix this issue by following changes to make sure the extra kernfs_node reference on rdtgroup is dropped before freeing the rdtgroup structure. (1) Introduce rdtgroup removal helper rdtgroup_remove() to wrap up kernfs_put() and kfree(). (2) Call rdtgroup_remove() in rdtgroup removal path where the rdtgroup structure is about to be freed by kfree(). (3) Call rdtgroup_remove() or kernfs_put() as appropriate in the error exit paths of mkdir where an extra reference is taken by kernfs_get(). Fixes: f3cbeacaa06e ("x86/intel_rdt/cqm: Add rmdir support") Fixes: e02737d5b826 ("x86/intel_rdt: Add tasks files") Fixes: 60cf5e101fd4 ("x86/intel_rdt: Add mkdir to resctrl file system") Reported-by: Willem de Bruijn Signed-off-by: Xiaochen Shen Signed-off-by: Borislav Petkov Reviewed-by: Reinette Chatre Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/1604085088-31707-1-git-send-email-xiaochen.shen@intel.com --- arch/x86/kernel/cpu/resctrl/rdtgroup.c | 32 +++++++++++++++++++++++++------- 1 file changed, 25 insertions(+), 7 deletions(-) (limited to 'arch') diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c index 2ab1266a5f14..6f4ca4bea625 100644 --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c @@ -507,6 +507,24 @@ unlock: return ret ?: nbytes; } +/** + * rdtgroup_remove - the helper to remove resource group safely + * @rdtgrp: resource group to remove + * + * On resource group creation via a mkdir, an extra kernfs_node reference is + * taken to ensure that the rdtgroup structure remains accessible for the + * rdtgroup_kn_unlock() calls where it is removed. + * + * Drop the extra reference here, then free the rdtgroup structure. + * + * Return: void + */ +static void rdtgroup_remove(struct rdtgroup *rdtgrp) +{ + kernfs_put(rdtgrp->kn); + kfree(rdtgrp); +} + struct task_move_callback { struct callback_head work; struct rdtgroup *rdtgrp; @@ -529,7 +547,7 @@ static void move_myself(struct callback_head *head) (rdtgrp->flags & RDT_DELETED)) { current->closid = 0; current->rmid = 0; - kfree(rdtgrp); + rdtgroup_remove(rdtgrp); } if (unlikely(current->flags & PF_EXITING)) @@ -2065,8 +2083,7 @@ void rdtgroup_kn_unlock(struct kernfs_node *kn) rdtgrp->mode == RDT_MODE_PSEUDO_LOCKED) rdtgroup_pseudo_lock_remove(rdtgrp); kernfs_unbreak_active_protection(kn); - kernfs_put(rdtgrp->kn); - kfree(rdtgrp); + rdtgroup_remove(rdtgrp); } else { kernfs_unbreak_active_protection(kn); } @@ -2341,7 +2358,7 @@ static void free_all_child_rdtgrp(struct rdtgroup *rdtgrp) if (atomic_read(&sentry->waitcount) != 0) sentry->flags = RDT_DELETED; else - kfree(sentry); + rdtgroup_remove(sentry); } } @@ -2383,7 +2400,7 @@ static void rmdir_all_sub(void) if (atomic_read(&rdtgrp->waitcount) != 0) rdtgrp->flags = RDT_DELETED; else - kfree(rdtgrp); + rdtgroup_remove(rdtgrp); } /* Notify online CPUs to update per cpu storage and PQR_ASSOC MSR */ update_closid_rmid(cpu_online_mask, &rdtgroup_default); @@ -2818,7 +2835,7 @@ static int mkdir_rdt_prepare(struct kernfs_node *parent_kn, * kernfs_remove() will drop the reference count on "kn" which * will free it. But we still need it to stick around for the * rdtgroup_kn_unlock(kn) call. Take one extra reference here, - * which will be dropped inside rdtgroup_kn_unlock(). + * which will be dropped by kernfs_put() in rdtgroup_remove(). */ kernfs_get(kn); @@ -2859,6 +2876,7 @@ static int mkdir_rdt_prepare(struct kernfs_node *parent_kn, out_idfree: free_rmid(rdtgrp->mon.rmid); out_destroy: + kernfs_put(rdtgrp->kn); kernfs_remove(rdtgrp->kn); out_free_rgrp: kfree(rdtgrp); @@ -2871,7 +2889,7 @@ static void mkdir_rdt_prepare_clean(struct rdtgroup *rgrp) { kernfs_remove(rgrp->kn); free_rmid(rgrp->mon.rmid); - kfree(rgrp); + rdtgroup_remove(rgrp); } /* -- cgit v1.2.3 From 58c644ba512cfbc2e39b758dd979edd1d6d00e27 Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Fri, 20 Nov 2020 11:50:35 +0100 Subject: sched/idle: Fix arch_cpu_idle() vs tracing We call arch_cpu_idle() with RCU disabled, but then use local_irq_{en,dis}able(), which invokes tracing, which relies on RCU. Switch all arch_cpu_idle() implementations to use raw_local_irq_{en,dis}able() and carefully manage the lockdep,rcu,tracing state like we do in entry. (XXX: we really should change arch_cpu_idle() to not return with interrupts enabled) Reported-by: Sven Schnelle Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Mark Rutland Tested-by: Mark Rutland Link: https://lkml.kernel.org/r/20201120114925.594122626@infradead.org --- arch/alpha/kernel/process.c | 2 +- arch/arm/kernel/process.c | 2 +- arch/arm64/kernel/process.c | 2 +- arch/csky/kernel/process.c | 2 +- arch/h8300/kernel/process.c | 2 +- arch/hexagon/kernel/process.c | 2 +- arch/ia64/kernel/process.c | 2 +- arch/microblaze/kernel/process.c | 2 +- arch/mips/kernel/idle.c | 12 ++++++------ arch/nios2/kernel/process.c | 2 +- arch/openrisc/kernel/process.c | 2 +- arch/parisc/kernel/process.c | 2 +- arch/powerpc/kernel/idle.c | 4 ++-- arch/riscv/kernel/process.c | 2 +- arch/s390/kernel/idle.c | 6 +++--- arch/sh/kernel/idle.c | 2 +- arch/sparc/kernel/leon_pmc.c | 4 ++-- arch/sparc/kernel/process_32.c | 2 +- arch/sparc/kernel/process_64.c | 4 ++-- arch/um/kernel/process.c | 2 +- arch/x86/include/asm/mwait.h | 2 -- arch/x86/kernel/process.c | 12 +++++++----- kernel/sched/idle.c | 28 +++++++++++++++++++++++++++- 23 files changed, 64 insertions(+), 38 deletions(-) (limited to 'arch') diff --git a/arch/alpha/kernel/process.c b/arch/alpha/kernel/process.c index 7462a7911002..4c7b0414a3ff 100644 --- a/arch/alpha/kernel/process.c +++ b/arch/alpha/kernel/process.c @@ -57,7 +57,7 @@ EXPORT_SYMBOL(pm_power_off); void arch_cpu_idle(void) { wtint(0); - local_irq_enable(); + raw_local_irq_enable(); } void arch_cpu_idle_dead(void) diff --git a/arch/arm/kernel/process.c b/arch/arm/kernel/process.c index 8e6ace03e960..9f199b1e8383 100644 --- a/arch/arm/kernel/process.c +++ b/arch/arm/kernel/process.c @@ -71,7 +71,7 @@ void arch_cpu_idle(void) arm_pm_idle(); else cpu_do_idle(); - local_irq_enable(); + raw_local_irq_enable(); } void arch_cpu_idle_prepare(void) diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c index 4784011cecac..9ebe02574127 100644 --- a/arch/arm64/kernel/process.c +++ b/arch/arm64/kernel/process.c @@ -126,7 +126,7 @@ void arch_cpu_idle(void) * tricks */ cpu_do_idle(); - local_irq_enable(); + raw_local_irq_enable(); } #ifdef CONFIG_HOTPLUG_CPU diff --git a/arch/csky/kernel/process.c b/arch/csky/kernel/process.c index f730869e21ee..69af6bc87e64 100644 --- a/arch/csky/kernel/process.c +++ b/arch/csky/kernel/process.c @@ -102,6 +102,6 @@ void arch_cpu_idle(void) #ifdef CONFIG_CPU_PM_STOP asm volatile("stop\n"); #endif - local_irq_enable(); + raw_local_irq_enable(); } #endif diff --git a/arch/h8300/kernel/process.c b/arch/h8300/kernel/process.c index aea0a40b77a9..bc1364db58fe 100644 --- a/arch/h8300/kernel/process.c +++ b/arch/h8300/kernel/process.c @@ -57,7 +57,7 @@ asmlinkage void ret_from_kernel_thread(void); */ void arch_cpu_idle(void) { - local_irq_enable(); + raw_local_irq_enable(); __asm__("sleep"); } diff --git a/arch/hexagon/kernel/process.c b/arch/hexagon/kernel/process.c index 5a0a95d93ddb..67767c5ed98c 100644 --- a/arch/hexagon/kernel/process.c +++ b/arch/hexagon/kernel/process.c @@ -44,7 +44,7 @@ void arch_cpu_idle(void) { __vmwait(); /* interrupts wake us up, but irqs are still disabled */ - local_irq_enable(); + raw_local_irq_enable(); } /* diff --git a/arch/ia64/kernel/process.c b/arch/ia64/kernel/process.c index 6b61a703bcf5..c9ff8796b509 100644 --- a/arch/ia64/kernel/process.c +++ b/arch/ia64/kernel/process.c @@ -239,7 +239,7 @@ void arch_cpu_idle(void) if (mark_idle) (*mark_idle)(1); - safe_halt(); + raw_safe_halt(); if (mark_idle) (*mark_idle)(0); diff --git a/arch/microblaze/kernel/process.c b/arch/microblaze/kernel/process.c index a9e46e525cd0..f99860771ff4 100644 --- a/arch/microblaze/kernel/process.c +++ b/arch/microblaze/kernel/process.c @@ -149,5 +149,5 @@ int dump_fpu(struct pt_regs *regs, elf_fpregset_t *fpregs) void arch_cpu_idle(void) { - local_irq_enable(); + raw_local_irq_enable(); } diff --git a/arch/mips/kernel/idle.c b/arch/mips/kernel/idle.c index 5bc3b04693c7..18e69ebf5691 100644 --- a/arch/mips/kernel/idle.c +++ b/arch/mips/kernel/idle.c @@ -33,19 +33,19 @@ static void __cpuidle r3081_wait(void) { unsigned long cfg = read_c0_conf(); write_c0_conf(cfg | R30XX_CONF_HALT); - local_irq_enable(); + raw_local_irq_enable(); } static void __cpuidle r39xx_wait(void) { if (!need_resched()) write_c0_conf(read_c0_conf() | TX39_CONF_HALT); - local_irq_enable(); + raw_local_irq_enable(); } void __cpuidle r4k_wait(void) { - local_irq_enable(); + raw_local_irq_enable(); __r4k_wait(); } @@ -64,7 +64,7 @@ void __cpuidle r4k_wait_irqoff(void) " .set arch=r4000 \n" " wait \n" " .set pop \n"); - local_irq_enable(); + raw_local_irq_enable(); } /* @@ -84,7 +84,7 @@ static void __cpuidle rm7k_wait_irqoff(void) " wait \n" " mtc0 $1, $12 # stalls until W stage \n" " .set pop \n"); - local_irq_enable(); + raw_local_irq_enable(); } /* @@ -257,7 +257,7 @@ void arch_cpu_idle(void) if (cpu_wait) cpu_wait(); else - local_irq_enable(); + raw_local_irq_enable(); } #ifdef CONFIG_CPU_IDLE diff --git a/arch/nios2/kernel/process.c b/arch/nios2/kernel/process.c index 4ffe857e6ada..50b4eb19a6cc 100644 --- a/arch/nios2/kernel/process.c +++ b/arch/nios2/kernel/process.c @@ -33,7 +33,7 @@ EXPORT_SYMBOL(pm_power_off); void arch_cpu_idle(void) { - local_irq_enable(); + raw_local_irq_enable(); } /* diff --git a/arch/openrisc/kernel/process.c b/arch/openrisc/kernel/process.c index 0ff391f00334..3c98728cce24 100644 --- a/arch/openrisc/kernel/process.c +++ b/arch/openrisc/kernel/process.c @@ -79,7 +79,7 @@ void machine_power_off(void) */ void arch_cpu_idle(void) { - local_irq_enable(); + raw_local_irq_enable(); if (mfspr(SPR_UPR) & SPR_UPR_PMP) mtspr(SPR_PMR, mfspr(SPR_PMR) | SPR_PMR_DME); } diff --git a/arch/parisc/kernel/process.c b/arch/parisc/kernel/process.c index f196d96e2f9f..a92a23d6acd9 100644 --- a/arch/parisc/kernel/process.c +++ b/arch/parisc/kernel/process.c @@ -169,7 +169,7 @@ void __cpuidle arch_cpu_idle_dead(void) void __cpuidle arch_cpu_idle(void) { - local_irq_enable(); + raw_local_irq_enable(); /* nop on real hardware, qemu will idle sleep. */ asm volatile("or %%r10,%%r10,%%r10\n":::); diff --git a/arch/powerpc/kernel/idle.c b/arch/powerpc/kernel/idle.c index ae0e2632393d..1f835539fda4 100644 --- a/arch/powerpc/kernel/idle.c +++ b/arch/powerpc/kernel/idle.c @@ -52,9 +52,9 @@ void arch_cpu_idle(void) * interrupts enabled, some don't. */ if (irqs_disabled()) - local_irq_enable(); + raw_local_irq_enable(); } else { - local_irq_enable(); + raw_local_irq_enable(); /* * Go into low thread priority and possibly * low power mode. diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c index 19225ec65db6..dd5f985b1f40 100644 --- a/arch/riscv/kernel/process.c +++ b/arch/riscv/kernel/process.c @@ -36,7 +36,7 @@ extern asmlinkage void ret_from_kernel_thread(void); void arch_cpu_idle(void) { wait_for_interrupt(); - local_irq_enable(); + raw_local_irq_enable(); } void show_regs(struct pt_regs *regs) diff --git a/arch/s390/kernel/idle.c b/arch/s390/kernel/idle.c index f7f1e64e0d98..2b85096964f8 100644 --- a/arch/s390/kernel/idle.c +++ b/arch/s390/kernel/idle.c @@ -33,10 +33,10 @@ void enabled_wait(void) PSW_MASK_IO | PSW_MASK_EXT | PSW_MASK_MCHECK; clear_cpu_flag(CIF_NOHZ_DELAY); - local_irq_save(flags); + raw_local_irq_save(flags); /* Call the assembler magic in entry.S */ psw_idle(idle, psw_mask); - local_irq_restore(flags); + raw_local_irq_restore(flags); /* Account time spent with enabled wait psw loaded as idle time. */ raw_write_seqcount_begin(&idle->seqcount); @@ -123,7 +123,7 @@ void arch_cpu_idle_enter(void) void arch_cpu_idle(void) { enabled_wait(); - local_irq_enable(); + raw_local_irq_enable(); } void arch_cpu_idle_exit(void) diff --git a/arch/sh/kernel/idle.c b/arch/sh/kernel/idle.c index 0dc0f52f9bb8..f59814983bd5 100644 --- a/arch/sh/kernel/idle.c +++ b/arch/sh/kernel/idle.c @@ -22,7 +22,7 @@ static void (*sh_idle)(void); void default_idle(void) { set_bl_bit(); - local_irq_enable(); + raw_local_irq_enable(); /* Isn't this racy ? */ cpu_sleep(); clear_bl_bit(); diff --git a/arch/sparc/kernel/leon_pmc.c b/arch/sparc/kernel/leon_pmc.c index 065e2d4b7290..396f46bca52e 100644 --- a/arch/sparc/kernel/leon_pmc.c +++ b/arch/sparc/kernel/leon_pmc.c @@ -50,7 +50,7 @@ static void pmc_leon_idle_fixup(void) register unsigned int address = (unsigned int)leon3_irqctrl_regs; /* Interrupts need to be enabled to not hang the CPU */ - local_irq_enable(); + raw_local_irq_enable(); __asm__ __volatile__ ( "wr %%g0, %%asr19\n" @@ -66,7 +66,7 @@ static void pmc_leon_idle_fixup(void) static void pmc_leon_idle(void) { /* Interrupts need to be enabled to not hang the CPU */ - local_irq_enable(); + raw_local_irq_enable(); /* For systems without power-down, this will be no-op */ __asm__ __volatile__ ("wr %g0, %asr19\n\t"); diff --git a/arch/sparc/kernel/process_32.c b/arch/sparc/kernel/process_32.c index adfcaeab3ddc..a02363735915 100644 --- a/arch/sparc/kernel/process_32.c +++ b/arch/sparc/kernel/process_32.c @@ -74,7 +74,7 @@ void arch_cpu_idle(void) { if (sparc_idle) (*sparc_idle)(); - local_irq_enable(); + raw_local_irq_enable(); } /* XXX cli/sti -> local_irq_xxx here, check this works once SMP is fixed. */ diff --git a/arch/sparc/kernel/process_64.c b/arch/sparc/kernel/process_64.c index a75093b993f9..6f8c7822fc06 100644 --- a/arch/sparc/kernel/process_64.c +++ b/arch/sparc/kernel/process_64.c @@ -62,11 +62,11 @@ void arch_cpu_idle(void) { if (tlb_type != hypervisor) { touch_nmi_watchdog(); - local_irq_enable(); + raw_local_irq_enable(); } else { unsigned long pstate; - local_irq_enable(); + raw_local_irq_enable(); /* The sun4v sleeping code requires that we have PSTATE.IE cleared over * the cpu sleep hypervisor call. diff --git a/arch/um/kernel/process.c b/arch/um/kernel/process.c index 3bed09538dd9..9505a7e87396 100644 --- a/arch/um/kernel/process.c +++ b/arch/um/kernel/process.c @@ -217,7 +217,7 @@ void arch_cpu_idle(void) { cpu_tasks[current_thread_info()->cpu].pid = os_getpid(); um_idle_sleep(); - local_irq_enable(); + raw_local_irq_enable(); } int __cant_sleep(void) { diff --git a/arch/x86/include/asm/mwait.h b/arch/x86/include/asm/mwait.h index e039a933aca3..29dd27b5a339 100644 --- a/arch/x86/include/asm/mwait.h +++ b/arch/x86/include/asm/mwait.h @@ -88,8 +88,6 @@ static inline void __mwaitx(unsigned long eax, unsigned long ebx, static inline void __sti_mwait(unsigned long eax, unsigned long ecx) { - trace_hardirqs_on(); - mds_idle_clear_cpu_buffers(); /* "mwait %eax, %ecx;" */ asm volatile("sti; .byte 0x0f, 0x01, 0xc9;" diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index ba4593a913fa..145a7ac0c19a 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -685,7 +685,7 @@ void arch_cpu_idle(void) */ void __cpuidle default_idle(void) { - safe_halt(); + raw_safe_halt(); } #if defined(CONFIG_APM_MODULE) || defined(CONFIG_HALTPOLL_CPUIDLE_MODULE) EXPORT_SYMBOL(default_idle); @@ -736,6 +736,8 @@ void stop_this_cpu(void *dummy) /* * AMD Erratum 400 aware idle routine. We handle it the same way as C3 power * states (local apic timer and TSC stop). + * + * XXX this function is completely buggered vs RCU and tracing. */ static void amd_e400_idle(void) { @@ -757,9 +759,9 @@ static void amd_e400_idle(void) * The switch back from broadcast mode needs to be called with * interrupts disabled. */ - local_irq_disable(); + raw_local_irq_disable(); tick_broadcast_exit(); - local_irq_enable(); + raw_local_irq_enable(); } /* @@ -801,9 +803,9 @@ static __cpuidle void mwait_idle(void) if (!need_resched()) __sti_mwait(0, 0); else - local_irq_enable(); + raw_local_irq_enable(); } else { - local_irq_enable(); + raw_local_irq_enable(); } __current_clr_polling(); } diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c index 24d0ee26377d..c6932b8f4467 100644 --- a/kernel/sched/idle.c +++ b/kernel/sched/idle.c @@ -78,7 +78,7 @@ void __weak arch_cpu_idle_dead(void) { } void __weak arch_cpu_idle(void) { cpu_idle_force_poll = 1; - local_irq_enable(); + raw_local_irq_enable(); } /** @@ -94,9 +94,35 @@ void __cpuidle default_idle_call(void) trace_cpu_idle(1, smp_processor_id()); stop_critical_timings(); + + /* + * arch_cpu_idle() is supposed to enable IRQs, however + * we can't do that because of RCU and tracing. + * + * Trace IRQs enable here, then switch off RCU, and have + * arch_cpu_idle() use raw_local_irq_enable(). Note that + * rcu_idle_enter() relies on lockdep IRQ state, so switch that + * last -- this is very similar to the entry code. + */ + trace_hardirqs_on_prepare(); + lockdep_hardirqs_on_prepare(_THIS_IP_); rcu_idle_enter(); + lockdep_hardirqs_on(_THIS_IP_); + arch_cpu_idle(); + + /* + * OK, so IRQs are enabled here, but RCU needs them disabled to + * turn itself back on.. funny thing is that disabling IRQs + * will cause tracing, which needs RCU. Jump through hoops to + * make it 'work'. + */ + raw_local_irq_disable(); + lockdep_hardirqs_off(_THIS_IP_); rcu_idle_exit(); + lockdep_hardirqs_on(_THIS_IP_); + raw_local_irq_enable(); + start_critical_timings(); trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id()); } -- cgit v1.2.3 From e2be2a833ab5338fa5b8b99ba622b911d96f1795 Mon Sep 17 00:00:00 2001 From: Lu Baolu Date: Wed, 25 Nov 2020 09:41:24 +0800 Subject: x86/tboot: Don't disable swiotlb when iommu is forced on After commit 327d5b2fee91c ("iommu/vt-d: Allow 32bit devices to uses DMA domain"), swiotlb could also be used for direct memory access if IOMMU is enabled but a device is configured to pass through the DMA translation. Keep swiotlb when IOMMU is forced on, otherwise, some devices won't work if "iommu=pt" kernel parameter is used. Fixes: 327d5b2fee91 ("iommu/vt-d: Allow 32bit devices to uses DMA domain") Reported-and-tested-by: Adrian Huang Signed-off-by: Lu Baolu Link: https://lore.kernel.org/r/20201125014124.4070776-1-baolu.lu@linux.intel.com Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=210237 Signed-off-by: Will Deacon --- arch/x86/kernel/tboot.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) (limited to 'arch') diff --git a/arch/x86/kernel/tboot.c b/arch/x86/kernel/tboot.c index 420be871d9d4..ae64f98ec2ab 100644 --- a/arch/x86/kernel/tboot.c +++ b/arch/x86/kernel/tboot.c @@ -514,13 +514,10 @@ int tboot_force_iommu(void) if (!tboot_enabled()) return 0; - if (no_iommu || swiotlb || dmar_disabled) + if (no_iommu || dmar_disabled) pr_warn("Forcing Intel-IOMMU to enabled\n"); dmar_disabled = 0; -#ifdef CONFIG_SWIOTLB - swiotlb = 0; -#endif no_iommu = 0; return 1; -- cgit v1.2.3 From fb319496935b7475a863a00c76895e8bb3216704 Mon Sep 17 00:00:00 2001 From: Jon Hunter Date: Mon, 16 Nov 2020 16:20:26 +0000 Subject: arm64: tegra: Disable the ACONNECT for Jetson TX2 Commit ff4c371d2bc0 ("arm64: defconfig: Build ADMA and ACONNECT driver") enable the Tegra ADMA and ACONNECT drivers and this is causing resume from system suspend to fail on Jetson TX2. Resume is failing because the ACONNECT driver is being resumed before the BPMP driver, and the ACONNECT driver is attempting to power on a power-domain that is provided by the BPMP. While a proper fix for the resume sequencing problem is identified, disable the ACONNECT for Jetson TX2 temporarily to avoid breaking system suspend. Please note that ACONNECT driver is used by the Audio Processing Engine (APE) on Tegra, but because there is no mainline support for APE on Jetson TX2 currently, disabling the ACONNECT does not disable any useful feature at the moment. Signed-off-by: Jon Hunter Signed-off-by: Thierry Reding --- arch/arm64/boot/dts/nvidia/tegra186-p2771-0000.dts | 12 ------------ 1 file changed, 12 deletions(-) (limited to 'arch') diff --git a/arch/arm64/boot/dts/nvidia/tegra186-p2771-0000.dts b/arch/arm64/boot/dts/nvidia/tegra186-p2771-0000.dts index 381a84912ba8..c28d51cc5797 100644 --- a/arch/arm64/boot/dts/nvidia/tegra186-p2771-0000.dts +++ b/arch/arm64/boot/dts/nvidia/tegra186-p2771-0000.dts @@ -10,18 +10,6 @@ model = "NVIDIA Jetson TX2 Developer Kit"; compatible = "nvidia,p2771-0000", "nvidia,tegra186"; - aconnect { - status = "okay"; - - dma-controller@2930000 { - status = "okay"; - }; - - interrupt-controller@2a40000 { - status = "okay"; - }; - }; - i2c@3160000 { power-monitor@42 { compatible = "ti,ina3221"; -- cgit v1.2.3 From 476e23f4c540949ac5ea4fad4f6f6fa0e2d41f42 Mon Sep 17 00:00:00 2001 From: Jon Hunter Date: Wed, 11 Nov 2020 10:41:17 +0000 Subject: arm64: tegra: Correct the UART for Jetson Xavier NX The Jetson Xavier NX board routes UARTA to the 40-pin header and UARTC to a 12-pin debug header. The UARTs can be used by either the Tegra Combined UART (TCU) driver or the Tegra 8250 driver. By default, the TCU will use UARTC on Jetson Xavier NX. Currently, device-tree for Xavier NX enables the TCU and the Tegra 8250 node for UARTC. Fix this by disabling the Tegra 8250 node for UARTC and enabling the Tegra 8250 node for UARTA. Fixes: 3f9efbbe57bc ("arm64: tegra: Add support for Jetson Xavier NX") Cc: stable@vger.kernel.org Signed-off-by: Jon Hunter Signed-off-by: Thierry Reding --- arch/arm64/boot/dts/nvidia/tegra194-p3668-0000.dtsi | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/arm64/boot/dts/nvidia/tegra194-p3668-0000.dtsi b/arch/arm64/boot/dts/nvidia/tegra194-p3668-0000.dtsi index a2893be80507..0dc8304a2edd 100644 --- a/arch/arm64/boot/dts/nvidia/tegra194-p3668-0000.dtsi +++ b/arch/arm64/boot/dts/nvidia/tegra194-p3668-0000.dtsi @@ -54,7 +54,7 @@ status = "okay"; }; - serial@c280000 { + serial@3100000 { status = "okay"; }; -- cgit v1.2.3 From f24a2acc15bcc7bbd295f9759efc873b88fbe429 Mon Sep 17 00:00:00 2001 From: JC Kuo Date: Thu, 19 Nov 2020 15:23:45 +0800 Subject: arm64: tegra: Fix USB_VBUS_EN0 regulator on Jetson TX1 USB host mode is broken on the OTG port of Jetson TX1 platform because the USB_VBUS_EN0 regulator (regulator@11) is being overwritten by the vdd-cam-1v2 regulator. This commit rearranges USB_VBUS_EN0 to be regulator@14. Fixes: 257c8047be44 ("arm64: tegra: jetson-tx1: Add camera supplies") Cc: stable@vger.kernel.org Signed-off-by: JC Kuo Reviewed-by: Jon Hunter Signed-off-by: Thierry Reding --- arch/arm64/boot/dts/nvidia/tegra210-p2597.dtsi | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) (limited to 'arch') diff --git a/arch/arm64/boot/dts/nvidia/tegra210-p2597.dtsi b/arch/arm64/boot/dts/nvidia/tegra210-p2597.dtsi index e18e1a9a3011..a9caaf7c0d67 100644 --- a/arch/arm64/boot/dts/nvidia/tegra210-p2597.dtsi +++ b/arch/arm64/boot/dts/nvidia/tegra210-p2597.dtsi @@ -1663,16 +1663,6 @@ vin-supply = <&vdd_5v0_sys>; }; - vdd_usb_vbus_otg: regulator@11 { - compatible = "regulator-fixed"; - regulator-name = "USB_VBUS_EN0"; - regulator-min-microvolt = <5000000>; - regulator-max-microvolt = <5000000>; - gpio = <&gpio TEGRA_GPIO(CC, 4) GPIO_ACTIVE_HIGH>; - enable-active-high; - vin-supply = <&vdd_5v0_sys>; - }; - vdd_hdmi: regulator@10 { compatible = "regulator-fixed"; regulator-name = "VDD_HDMI_5V0"; @@ -1712,4 +1702,14 @@ enable-active-high; vin-supply = <&vdd_3v3_sys>; }; + + vdd_usb_vbus_otg: regulator@14 { + compatible = "regulator-fixed"; + regulator-name = "USB_VBUS_EN0"; + regulator-min-microvolt = <5000000>; + regulator-max-microvolt = <5000000>; + gpio = <&gpio TEGRA_GPIO(CC, 4) GPIO_ACTIVE_HIGH>; + enable-active-high; + vin-supply = <&vdd_5v0_sys>; + }; }; -- cgit v1.2.3 From 1741e18737948c140ccc4cc643e8126d95ee6e79 Mon Sep 17 00:00:00 2001 From: Dipen Patel Date: Fri, 11 Sep 2020 19:26:45 -0700 Subject: arm64: tegra: Wrong AON HSP reg property size The AON HSP node's "reg" property size 0xa0000 will overlap with other resources. This patch fixes that wrong value with correct size 0x90000. Reviewed-by: Mikko Perttunen Signed-off-by: Dipen Patel Fixes: a38570c22e9d ("arm64: tegra: Add nodes for TCU on Tegra194") Signed-off-by: Thierry Reding --- arch/arm64/boot/dts/nvidia/tegra194.dtsi | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/arm64/boot/dts/nvidia/tegra194.dtsi b/arch/arm64/boot/dts/nvidia/tegra194.dtsi index e9c90f0f44ff..93438d2b9469 100644 --- a/arch/arm64/boot/dts/nvidia/tegra194.dtsi +++ b/arch/arm64/boot/dts/nvidia/tegra194.dtsi @@ -1161,7 +1161,7 @@ hsp_aon: hsp@c150000 { compatible = "nvidia,tegra194-hsp", "nvidia,tegra186-hsp"; - reg = <0x0c150000 0xa0000>; + reg = <0x0c150000 0x90000>; interrupts = , , , -- cgit v1.2.3 From d98bccf10dd0f36cabee71a425381fce0908de3b Mon Sep 17 00:00:00 2001 From: Jon Hunter Date: Wed, 18 Nov 2020 16:04:58 +0000 Subject: arm64: tegra: Fix Tegra234 VDK node names When the device-tree board file was added for the Tegra234 VDK simulator it incorrectly used the names 'cbb' and 'sdhci' instead of 'bus' and 'mmc', respectively. The names 'bus' and 'mmc' are required by the device-tree json-schema validation tools. Therefore, fix this by renaming these nodes accordingly. Fixes: 639448912ba1 ("arm64: tegra: Initial Tegra234 VDK support") Reported-by: Ashish Singhal Signed-off-by: Jon Hunter Signed-off-by: Thierry Reding --- arch/arm64/boot/dts/nvidia/tegra234-sim-vdk.dts | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'arch') diff --git a/arch/arm64/boot/dts/nvidia/tegra234-sim-vdk.dts b/arch/arm64/boot/dts/nvidia/tegra234-sim-vdk.dts index f6e6a24829af..b5d9a5526272 100644 --- a/arch/arm64/boot/dts/nvidia/tegra234-sim-vdk.dts +++ b/arch/arm64/boot/dts/nvidia/tegra234-sim-vdk.dts @@ -8,7 +8,7 @@ compatible = "nvidia,tegra234-vdk", "nvidia,tegra234"; aliases { - sdhci3 = "/cbb@0/sdhci@3460000"; + mmc3 = "/bus@0/mmc@3460000"; serial0 = &uarta; }; @@ -17,12 +17,12 @@ stdout-path = "serial0:115200n8"; }; - cbb@0 { + bus@0 { serial@3100000 { status = "okay"; }; - sdhci@3460000 { + mmc@3460000 { status = "okay"; bus-width = <8>; non-removable; -- cgit v1.2.3 From e553fdc8105ac2ef3f321739da3908bb6673f7de Mon Sep 17 00:00:00 2001 From: Nathan Chancellor Date: Sun, 8 Nov 2020 13:37:37 -0700 Subject: riscv: Explicitly specify the build id style in vDSO Makefile again Commit a96843372331 ("kbuild: explicitly specify the build id style") explicitly set the build ID style to SHA1. Commit c2c81bb2f691 ("RISC-V: Fix the VDSO symbol generaton for binutils-2.35+") undid this change, likely unintentionally. Restore it so that the build ID style stays consistent across the tree regardless of linker. Fixes: c2c81bb2f691 ("RISC-V: Fix the VDSO symbol generaton for binutils-2.35+") Signed-off-by: Nathan Chancellor Reviewed-by: Nick Desaulniers Reviewed-by: Bill Wendling Signed-off-by: Palmer Dabbelt --- arch/riscv/kernel/vdso/Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/riscv/kernel/vdso/Makefile b/arch/riscv/kernel/vdso/Makefile index cb8f9e4cfcbf..0cfd6da784f8 100644 --- a/arch/riscv/kernel/vdso/Makefile +++ b/arch/riscv/kernel/vdso/Makefile @@ -44,7 +44,7 @@ SYSCFLAGS_vdso.so.dbg = $(c_flags) $(obj)/vdso.so.dbg: $(src)/vdso.lds $(obj-vdso) FORCE $(call if_changed,vdsold) SYSCFLAGS_vdso.so.dbg = -shared -s -Wl,-soname=linux-vdso.so.1 \ - -Wl,--build-id -Wl,--hash-style=both + -Wl,--build-id=sha1 -Wl,--hash-style=both # We also create a special relocatable object that should mirror the symbol # table and layout of the linked DSO. With ld --just-symbols we can then -- cgit v1.2.3 From 6134b110f97178d6919441a82dc91a7f3664b4e0 Mon Sep 17 00:00:00 2001 From: Anup Patel Date: Fri, 6 Nov 2020 13:23:59 +0530 Subject: RISC-V: Add missing jump label initialization The jump_label_init() should be called from setup_arch() very early for proper functioning of jump label support. Fixes: ebc00dde8a97 ("riscv: Add jump-label implementation") Signed-off-by: Anup Patel Signed-off-by: Palmer Dabbelt --- arch/riscv/kernel/setup.c | 1 + 1 file changed, 1 insertion(+) (limited to 'arch') diff --git a/arch/riscv/kernel/setup.c b/arch/riscv/kernel/setup.c index c424cc6dd833..117f3212a8e4 100644 --- a/arch/riscv/kernel/setup.c +++ b/arch/riscv/kernel/setup.c @@ -75,6 +75,7 @@ void __init setup_arch(char **cmdline_p) *cmdline_p = boot_command_line; early_ioremap_setup(); + jump_label_init(); parse_early_param(); efi_init(); -- cgit v1.2.3 From 30aca1bacb398dec6c1ed5eeca33f355bd7b6203 Mon Sep 17 00:00:00 2001 From: Randy Dunlap Date: Mon, 16 Nov 2020 17:39:51 -0800 Subject: RISC-V: fix barrier() use in riscv's uses barrier() so it should include Fixes this build error: CC [M] drivers/net/ethernet/emulex/benet/be_main.o In file included from ./include/vdso/processor.h:10, from ./arch/riscv/include/asm/processor.h:11, from ./include/linux/prefetch.h:15, from drivers/net/ethernet/emulex/benet/be_main.c:14: ./arch/riscv/include/asm/vdso/processor.h: In function 'cpu_relax': ./arch/riscv/include/asm/vdso/processor.h:14:2: error: implicit declaration of function 'barrier' [-Werror=implicit-function-declaration] 14 | barrier(); This happens with a total of 5 networking drivers -- they all use . rv64 allmodconfig now builds cleanly after this patch. Fixes fallout from: 815f0ddb346c ("include/linux/compiler*.h: make compiler-*.h mutually exclusive") Fixes: ad5d1122b82f ("riscv: use vDSO common flow to reduce the latency of the time-related functions") Reported-by: Andreas Schwab Signed-off-by: Randy Dunlap Acked-by: Arvind Sankar Signed-off-by: Palmer Dabbelt Reviewed-by: Nick Desaulniers Signed-off-by: Palmer Dabbelt --- arch/riscv/include/asm/vdso/processor.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'arch') diff --git a/arch/riscv/include/asm/vdso/processor.h b/arch/riscv/include/asm/vdso/processor.h index 82a5693b1861..134388cbaaa1 100644 --- a/arch/riscv/include/asm/vdso/processor.h +++ b/arch/riscv/include/asm/vdso/processor.h @@ -4,6 +4,8 @@ #ifndef __ASSEMBLY__ +#include + static inline void cpu_relax(void) { #ifdef __riscv_muldiv -- cgit v1.2.3 From 33fc379df76b4991e5ae312f07bcd6820811971e Mon Sep 17 00:00:00 2001 From: Anand K Mistry Date: Tue, 10 Nov 2020 12:33:53 +1100 Subject: x86/speculation: Fix prctl() when spectre_v2_user={seccomp,prctl},ibpb When spectre_v2_user={seccomp,prctl},ibpb is specified on the command line, IBPB is force-enabled and STIPB is conditionally-enabled (or not available). However, since 21998a351512 ("x86/speculation: Avoid force-disabling IBPB based on STIBP and enhanced IBRS.") the spectre_v2_user_ibpb variable is set to SPECTRE_V2_USER_{PRCTL,SECCOMP} instead of SPECTRE_V2_USER_STRICT, which is the actual behaviour. Because the issuing of IBPB relies on the switch_mm_*_ibpb static branches, the mitigations behave as expected. Since 1978b3a53a74 ("x86/speculation: Allow IBPB to be conditionally enabled on CPUs with always-on STIBP") this discrepency caused the misreporting of IB speculation via prctl(). On CPUs with STIBP always-on and spectre_v2_user=seccomp,ibpb, prctl(PR_GET_SPECULATION_CTRL) would return PR_SPEC_PRCTL | PR_SPEC_ENABLE instead of PR_SPEC_DISABLE since both IBPB and STIPB are always on. It also allowed prctl(PR_SET_SPECULATION_CTRL) to set the IB speculation mode, even though the flag is ignored. Similarly, for CPUs without SMT, prctl(PR_GET_SPECULATION_CTRL) should also return PR_SPEC_DISABLE since IBPB is always on and STIBP is not available. [ bp: Massage commit message. ] Fixes: 21998a351512 ("x86/speculation: Avoid force-disabling IBPB based on STIBP and enhanced IBRS.") Fixes: 1978b3a53a74 ("x86/speculation: Allow IBPB to be conditionally enabled on CPUs with always-on STIBP") Signed-off-by: Anand K Mistry Signed-off-by: Borislav Petkov Cc: Link: https://lkml.kernel.org/r/20201110123349.1.Id0cbf996d2151f4c143c90f9028651a5b49a5908@changeid --- arch/x86/kernel/cpu/bugs.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'arch') diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c index 581fb7223ad0..d41b70fe4918 100644 --- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -739,11 +739,13 @@ spectre_v2_user_select_mitigation(enum spectre_v2_mitigation_cmd v2_cmd) if (boot_cpu_has(X86_FEATURE_IBPB)) { setup_force_cpu_cap(X86_FEATURE_USE_IBPB); + spectre_v2_user_ibpb = mode; switch (cmd) { case SPECTRE_V2_USER_CMD_FORCE: case SPECTRE_V2_USER_CMD_PRCTL_IBPB: case SPECTRE_V2_USER_CMD_SECCOMP_IBPB: static_branch_enable(&switch_mm_always_ibpb); + spectre_v2_user_ibpb = SPECTRE_V2_USER_STRICT; break; case SPECTRE_V2_USER_CMD_PRCTL: case SPECTRE_V2_USER_CMD_AUTO: @@ -757,8 +759,6 @@ spectre_v2_user_select_mitigation(enum spectre_v2_mitigation_cmd v2_cmd) pr_info("mitigation: Enabling %s Indirect Branch Prediction Barrier\n", static_key_enabled(&switch_mm_always_ibpb) ? "always-on" : "conditional"); - - spectre_v2_user_ibpb = mode; } /* -- cgit v1.2.3 From 72c3bcdcda494cbd600712a32e67702cdee60c07 Mon Sep 17 00:00:00 2001 From: Paolo Bonzini Date: Fri, 27 Nov 2020 08:53:52 +0100 Subject: KVM: x86: handle !lapic_in_kernel case in kvm_cpu_*_extint Centralize handling of interrupts from the userspace APIC in kvm_cpu_has_extint and kvm_cpu_get_extint, since userspace APIC interrupts are handled more or less the same as ExtINTs are with split irqchip. This removes duplicated code from kvm_cpu_has_injectable_intr and kvm_cpu_has_interrupt, and makes the code more similar between kvm_cpu_has_{extint,interrupt} on one side and kvm_cpu_get_{extint,interrupt} on the other. Cc: stable@vger.kernel.org Reviewed-by: Filippo Sironi Reviewed-by: David Woodhouse Tested-by: David Woodhouse Signed-off-by: Paolo Bonzini --- arch/x86/kvm/irq.c | 83 +++++++++++++++++++++------------------------------- arch/x86/kvm/lapic.c | 2 +- 2 files changed, 34 insertions(+), 51 deletions(-) (limited to 'arch') diff --git a/arch/x86/kvm/irq.c b/arch/x86/kvm/irq.c index 99d118ffc67d..ee94671272e3 100644 --- a/arch/x86/kvm/irq.c +++ b/arch/x86/kvm/irq.c @@ -41,28 +41,9 @@ static int pending_userspace_extint(struct kvm_vcpu *v) * non-APIC source without intack. */ static int kvm_cpu_has_extint(struct kvm_vcpu *v) -{ - u8 accept = kvm_apic_accept_pic_intr(v); - - if (accept) { - if (irqchip_split(v->kvm)) - return pending_userspace_extint(v); - else - return v->kvm->arch.vpic->output; - } else - return 0; -} - -/* - * check if there is injectable interrupt: - * when virtual interrupt delivery enabled, - * interrupt from apic will handled by hardware, - * we don't need to check it here. - */ -int kvm_cpu_has_injectable_intr(struct kvm_vcpu *v) { /* - * FIXME: interrupt.injected represents an interrupt that it's + * FIXME: interrupt.injected represents an interrupt whose * side-effects have already been applied (e.g. bit from IRR * already moved to ISR). Therefore, it is incorrect to rely * on interrupt.injected to know if there is a pending @@ -75,6 +56,23 @@ int kvm_cpu_has_injectable_intr(struct kvm_vcpu *v) if (!lapic_in_kernel(v)) return v->arch.interrupt.injected; + if (!kvm_apic_accept_pic_intr(v)) + return 0; + + if (irqchip_split(v->kvm)) + return pending_userspace_extint(v); + else + return v->kvm->arch.vpic->output; +} + +/* + * check if there is injectable interrupt: + * when virtual interrupt delivery enabled, + * interrupt from apic will handled by hardware, + * we don't need to check it here. + */ +int kvm_cpu_has_injectable_intr(struct kvm_vcpu *v) +{ if (kvm_cpu_has_extint(v)) return 1; @@ -91,20 +89,6 @@ EXPORT_SYMBOL_GPL(kvm_cpu_has_injectable_intr); */ int kvm_cpu_has_interrupt(struct kvm_vcpu *v) { - /* - * FIXME: interrupt.injected represents an interrupt that it's - * side-effects have already been applied (e.g. bit from IRR - * already moved to ISR). Therefore, it is incorrect to rely - * on interrupt.injected to know if there is a pending - * interrupt in the user-mode LAPIC. - * This leads to nVMX/nSVM not be able to distinguish - * if it should exit from L2 to L1 on EXTERNAL_INTERRUPT on - * pending interrupt or should re-inject an injected - * interrupt. - */ - if (!lapic_in_kernel(v)) - return v->arch.interrupt.injected; - if (kvm_cpu_has_extint(v)) return 1; @@ -118,16 +102,21 @@ EXPORT_SYMBOL_GPL(kvm_cpu_has_interrupt); */ static int kvm_cpu_get_extint(struct kvm_vcpu *v) { - if (kvm_cpu_has_extint(v)) { - if (irqchip_split(v->kvm)) { - int vector = v->arch.pending_external_vector; - - v->arch.pending_external_vector = -1; - return vector; - } else - return kvm_pic_read_irq(v->kvm); /* PIC */ - } else + if (!kvm_cpu_has_extint(v)) { + WARN_ON(!lapic_in_kernel(v)); return -1; + } + + if (!lapic_in_kernel(v)) + return v->arch.interrupt.nr; + + if (irqchip_split(v->kvm)) { + int vector = v->arch.pending_external_vector; + + v->arch.pending_external_vector = -1; + return vector; + } else + return kvm_pic_read_irq(v->kvm); /* PIC */ } /* @@ -135,13 +124,7 @@ static int kvm_cpu_get_extint(struct kvm_vcpu *v) */ int kvm_cpu_get_interrupt(struct kvm_vcpu *v) { - int vector; - - if (!lapic_in_kernel(v)) - return v->arch.interrupt.nr; - - vector = kvm_cpu_get_extint(v); - + int vector = kvm_cpu_get_extint(v); if (vector != -1) return vector; /* PIC */ diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 105e7859d1f2..86c33d53c90a 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -2465,7 +2465,7 @@ int kvm_apic_has_interrupt(struct kvm_vcpu *vcpu) struct kvm_lapic *apic = vcpu->arch.apic; u32 ppr; - if (!kvm_apic_hw_enabled(apic)) + if (!kvm_apic_present(vcpu)) return -1; __apic_update_ppr(apic, &ppr); -- cgit v1.2.3 From 71cc849b7093bb83af966c0e60cb11b7f35cd746 Mon Sep 17 00:00:00 2001 From: Paolo Bonzini Date: Fri, 27 Nov 2020 09:18:20 +0100 Subject: KVM: x86: Fix split-irqchip vs interrupt injection window request kvm_cpu_accept_dm_intr and kvm_vcpu_ready_for_interrupt_injection are a hodge-podge of conditions, hacked together to get something that more or less works. But what is actually needed is much simpler; in both cases the fundamental question is, do we have a place to stash an interrupt if userspace does KVM_INTERRUPT? In userspace irqchip mode, that is !vcpu->arch.interrupt.injected. Currently kvm_event_needs_reinjection(vcpu) covers it, but it is unnecessarily restrictive. In split irqchip mode it's a bit more complicated, we need to check kvm_apic_accept_pic_intr(vcpu) (the IRQ window exit is basically an INTACK cycle and thus requires ExtINTs not to be masked) as well as !pending_userspace_extint(vcpu). However, there is no need to check kvm_event_needs_reinjection(vcpu), since split irqchip keeps pending ExtINT state separate from event injection state, and checking kvm_cpu_has_interrupt(vcpu) is wrong too since ExtINT has higher priority than APIC interrupts. In fact the latter fixes a bug: when userspace requests an IRQ window vmexit, an interrupt in the local APIC can cause kvm_cpu_has_interrupt() to be true and thus kvm_vcpu_ready_for_interrupt_injection() to return false. When this happens, vcpu_run does not exit to userspace but the interrupt window vmexits keep occurring. The VM loops without any hope of making progress. Once we try to fix these with something like return kvm_arch_interrupt_allowed(vcpu) && - !kvm_cpu_has_interrupt(vcpu) && - !kvm_event_needs_reinjection(vcpu) && - kvm_cpu_accept_dm_intr(vcpu); + (!lapic_in_kernel(vcpu) + ? !vcpu->arch.interrupt.injected + : (kvm_apic_accept_pic_intr(vcpu) + && !pending_userspace_extint(v))); we realize two things. First, thanks to the previous patch the complex conditional can reuse !kvm_cpu_has_extint(vcpu). Second, the interrupt window request in vcpu_enter_guest() bool req_int_win = dm_request_for_irq_injection(vcpu) && kvm_cpu_accept_dm_intr(vcpu); should be kept in sync with kvm_vcpu_ready_for_interrupt_injection(): it is unnecessary to ask the processor for an interrupt window if we would not be able to return to userspace. Therefore, kvm_cpu_accept_dm_intr(vcpu) is basically !kvm_cpu_has_extint(vcpu) ANDed with the existing check for masked ExtINT. It all makes sense: - we can accept an interrupt from userspace if there is a place to stash it (and, for irqchip split, ExtINTs are not masked). Interrupts from userspace _can_ be accepted even if right now EFLAGS.IF=0. - in order to tell userspace we will inject its interrupt ("IRQ window open" i.e. kvm_vcpu_ready_for_interrupt_injection), both KVM and the vCPU need to be ready to accept the interrupt. ... and this is what the patch implements. Reported-by: David Woodhouse Analyzed-by: David Woodhouse Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini Reviewed-by: Nikos Tsironis Reviewed-by: David Woodhouse Tested-by: David Woodhouse --- arch/x86/include/asm/kvm_host.h | 1 + arch/x86/kvm/irq.c | 2 +- arch/x86/kvm/x86.c | 18 ++++++++++-------- 3 files changed, 12 insertions(+), 9 deletions(-) (limited to 'arch') diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 324ddd7fd0aa..7e5f33a0d0e2 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1656,6 +1656,7 @@ int kvm_test_age_hva(struct kvm *kvm, unsigned long hva); int kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte); int kvm_cpu_has_injectable_intr(struct kvm_vcpu *v); int kvm_cpu_has_interrupt(struct kvm_vcpu *vcpu); +int kvm_cpu_has_extint(struct kvm_vcpu *v); int kvm_arch_interrupt_allowed(struct kvm_vcpu *vcpu); int kvm_cpu_get_interrupt(struct kvm_vcpu *v); void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event); diff --git a/arch/x86/kvm/irq.c b/arch/x86/kvm/irq.c index ee94671272e3..814698e5b152 100644 --- a/arch/x86/kvm/irq.c +++ b/arch/x86/kvm/irq.c @@ -40,7 +40,7 @@ static int pending_userspace_extint(struct kvm_vcpu *v) * check if there is pending interrupt from * non-APIC source without intack. */ -static int kvm_cpu_has_extint(struct kvm_vcpu *v) +int kvm_cpu_has_extint(struct kvm_vcpu *v) { /* * FIXME: interrupt.injected represents an interrupt whose diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 078a39d489fe..e545a8a613b1 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4051,21 +4051,23 @@ static int kvm_vcpu_ioctl_set_lapic(struct kvm_vcpu *vcpu, static int kvm_cpu_accept_dm_intr(struct kvm_vcpu *vcpu) { + /* + * We can accept userspace's request for interrupt injection + * as long as we have a place to store the interrupt number. + * The actual injection will happen when the CPU is able to + * deliver the interrupt. + */ + if (kvm_cpu_has_extint(vcpu)) + return false; + + /* Acknowledging ExtINT does not happen if LINT0 is masked. */ return (!lapic_in_kernel(vcpu) || kvm_apic_accept_pic_intr(vcpu)); } -/* - * if userspace requested an interrupt window, check that the - * interrupt window is open. - * - * No need to exit to userspace if we already have an interrupt queued. - */ static int kvm_vcpu_ready_for_interrupt_injection(struct kvm_vcpu *vcpu) { return kvm_arch_interrupt_allowed(vcpu) && - !kvm_cpu_has_interrupt(vcpu) && - !kvm_event_needs_reinjection(vcpu) && kvm_cpu_accept_dm_intr(vcpu); } -- cgit v1.2.3 From 9a2a0d3ca163fc645991804b8b032f7d59326bb5 Mon Sep 17 00:00:00 2001 From: Vitaly Kuznetsov Date: Thu, 26 Nov 2020 12:02:06 +0100 Subject: kvm: x86/mmu: Fix get_mmio_spte() on CPUs supporting 5-level PT Commit 95fb5b0258b7 ("kvm: x86/mmu: Support MMIO in the TDP MMU") caused the following WARNING on an Intel Ice Lake CPU: get_mmio_spte: detect reserved bits on spte, addr 0xb80a0, dump hierarchy: ------ spte 0xb80a0 level 5. ------ spte 0xfcd210107 level 4. ------ spte 0x1004c40107 level 3. ------ spte 0x1004c41107 level 2. ------ spte 0x1db00000000b83b6 level 1. WARNING: CPU: 109 PID: 10254 at arch/x86/kvm/mmu/mmu.c:3569 kvm_mmu_page_fault.cold.150+0x54/0x22f [kvm] ... Call Trace: ? kvm_io_bus_get_first_dev+0x55/0x110 [kvm] vcpu_enter_guest+0xaa1/0x16a0 [kvm] ? vmx_get_cs_db_l_bits+0x17/0x30 [kvm_intel] ? skip_emulated_instruction+0xaa/0x150 [kvm_intel] kvm_arch_vcpu_ioctl_run+0xca/0x520 [kvm] The guest triggering this crashes. Note, this happens with the traditional MMU and EPT enabled, not with the newly introduced TDP MMU. Turns out, there was a subtle change in the above mentioned commit. Previously, walk_shadow_page_get_mmio_spte() was setting 'root' to 'iterator.level' which is returned by shadow_walk_init() and this equals to 'vcpu->arch.mmu->shadow_root_level'. Now, get_mmio_spte() sets it to 'int root = vcpu->arch.mmu->root_level'. The difference between 'root_level' and 'shadow_root_level' on CPUs supporting 5-level page tables is that in some case we don't want to use 5-level, in particular when 'cpuid_maxphyaddr(vcpu) <= 48' kvm_mmu_get_tdp_level() returns '4'. In case upper layer is not used, the corresponding SPTE will fail '__is_rsvd_bits_set()' check. Revert to using 'shadow_root_level'. Fixes: 95fb5b0258b7 ("kvm: x86/mmu: Support MMIO in the TDP MMU") Signed-off-by: Vitaly Kuznetsov Message-Id: <20201126110206.2118959-1-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini --- arch/x86/kvm/mmu/mmu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 5bb1939b65d8..7a6ae9e90bd7 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -3517,7 +3517,7 @@ static bool get_mmio_spte(struct kvm_vcpu *vcpu, u64 addr, u64 *sptep) { u64 sptes[PT64_ROOT_MAX_LEVEL]; struct rsvd_bits_validate *rsvd_check; - int root = vcpu->arch.mmu->root_level; + int root = vcpu->arch.mmu->shadow_root_level; int leaf; int level; bool reserved = false; -- cgit v1.2.3 From 25bc65d8ddfc17cc1d7a45bd48e9bdc0e729ced3 Mon Sep 17 00:00:00 2001 From: Gabriele Paoloni Date: Fri, 27 Nov 2020 16:18:15 +0000 Subject: x86/mce: Do not overwrite no_way_out if mce_end() fails Currently, if mce_end() fails, no_way_out - the variable denoting whether the machine can recover from this MCE - is determined by whether the worst severity that was found across the MCA banks associated with the current CPU, is of panic severity. However, at this point no_way_out could have been already set by mca_start() after looking at all severities of all CPUs that entered the MCE handler. If mce_end() fails, check first if no_way_out is already set and, if so, stick to it, otherwise use the local worst value. [ bp: Massage. ] Signed-off-by: Gabriele Paoloni Signed-off-by: Borislav Petkov Reviewed-by: Tony Luck Cc: Link: https://lkml.kernel.org/r/20201127161819.3106432-2-gabriele.paoloni@intel.com --- arch/x86/kernel/cpu/mce/core.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) (limited to 'arch') diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c index 4102b866e7c0..32b7099e3511 100644 --- a/arch/x86/kernel/cpu/mce/core.c +++ b/arch/x86/kernel/cpu/mce/core.c @@ -1384,8 +1384,10 @@ noinstr void do_machine_check(struct pt_regs *regs) * When there's any problem use only local no_way_out state. */ if (!lmce) { - if (mce_end(order) < 0) - no_way_out = worst >= MCE_PANIC_SEVERITY; + if (mce_end(order) < 0) { + if (!no_way_out) + no_way_out = worst >= MCE_PANIC_SEVERITY; + } } else { /* * If there was a fatal machine check we should have -- cgit v1.2.3