summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2022-05-28Merge tag 'cxl-for-5.19' of ↵Linus Torvalds48-863/+1266
git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl Pull cxl updates from Dan Williams: "Compute Express Link (CXL) updates for this cycle. The highlight is new driver-core infrastructure and CXL subsystem changes for allowing lockdep to validate device_lock() usage. Thanks to PeterZ for setting me straight on the current capabilities of the lockdep API, and Greg acked it as well. On the CXL ACPI side this update adds support for CXL _OSC so that platform firmware knows that it is safe to still grant Linux native control of PCIe hotplug and error handling in the presence of CXL devices. A circular dependency problem was discovered between suspend and CXL memory for cases where the suspend image might be stored in CXL memory where that image also contains the PCI register state to restore to re-enable the device. Disable suspend for now until an architecture is defined to clarify that conflict. Lastly a collection of reworks, fixes, and cleanups to the CXL subsystem where support for snooping mailbox commands and properly handling the "mem_enable" flow are the highlights. Summary: - Add driver-core infrastructure for lockdep validation of device_lock(), and fixup a deadlock report that was previously hidden behind the 'lockdep no validate' policy. - Add CXL _OSC support for claiming native control of CXL hotplug and error handling. - Disable suspend in the presence of CXL memory unless and until a protocol is identified for restoring PCI device context from memory hosted on CXL PCI devices. - Add support for snooping CXL mailbox commands to protect against inopportune changes, like set-partition with the 'immediate' flag set. - Rework how the driver detects legacy CXL 1.1 configurations (CXL DVSEC / 'mem_enable') before enabling new CXL 2.0 decode configurations (CXL HDM Capability). - Miscellaneous cleanups and fixes from -next exposure" * tag 'cxl-for-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: (47 commits) cxl/port: Enable HDM Capability after validating DVSEC Ranges cxl/port: Reuse 'struct cxl_hdm' context for hdm init cxl/port: Move endpoint HDM Decoder Capability init to port driver cxl/pci: Drop @info argument to cxl_hdm_decode_init() cxl/mem: Merge cxl_dvsec_ranges() and cxl_hdm_decode_init() cxl/mem: Skip range enumeration if mem_enable clear cxl/mem: Consolidate CXL DVSEC Range enumeration in the core cxl/pci: Move cxl_await_media_ready() to the core cxl/mem: Validate port connectivity before dvsec ranges cxl/mem: Fix cxl_mem_probe() error exit cxl/pci: Drop wait_for_valid() from cxl_await_media_ready() cxl/pci: Consolidate wait_for_media() and wait_for_media_ready() cxl/mem: Drop mem_enabled check from wait_for_media() nvdimm: Fix firmware activation deadlock scenarios device-core: Kill the lockdep_mutex nvdimm: Drop nd_device_lock() ACPI: NFIT: Drop nfit_device_lock() nvdimm: Replace lockdep_mutex with local lock classes cxl: Drop cxl_device_lock() cxl/acpi: Add root device lockdep validation ...
2022-05-28Merge tag 'clang-format-for-linus-v5.19-rc1' of https://github.com/ojeda/linuxLinus Torvalds1-48/+170
Pull clang-format updates from Miguel Ojeda: "clang-format modernization and cleanups. A few changes from Brian Norris and Mickaël Salaün to start taking advantage of some clang-format 11 features, plus a few cleanups and the usual update of the macro list" * tag 'clang-format-for-linus-v5.19-rc1' of https://github.com/ojeda/linux: clang-format: Fix space after for_each macros clang-format: Fix goto labels indentation clang-format: Update to clang-format >= 6 clang-format: Extend the for_each list with tools/ clang-format: Simplify command with `sort -u` clang-format: Use POSIX locale for `sort` clang-format: Update with v5.18-rc7's `for_each` macro list
2022-05-28Merge tag 'v5.19-p1' of ↵Linus Torvalds118-1058/+5534
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto updates from Herbert Xu: "API: - Test in-place en/decryption with two sglists in testmgr - Fix process vs softirq race in cryptd Algorithms: - Add arm64 acceleration for sm4 - Add s390 acceleration for chacha20 Drivers: - Add polarfire soc hwrng support in mpsf - Add support for TI SoC AM62x in sa2ul - Add support for ATSHA204 cryptochip in atmel-sha204a - Add support for PRNG in caam - Restore support for storage encryption in qat - Restore support for storage encryption in hisilicon/sec" * tag 'v5.19-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (116 commits) hwrng: omap3-rom - fix using wrong clk_disable() in omap_rom_rng_runtime_resume() crypto: hisilicon/sec - delete the flag CRYPTO_ALG_ALLOCATES_MEMORY crypto: qat - add support for 401xx devices crypto: qat - re-enable registration of algorithms crypto: qat - honor CRYPTO_TFM_REQ_MAY_SLEEP flag crypto: qat - add param check for DH crypto: qat - add param check for RSA crypto: qat - remove dma_free_coherent() for DH crypto: qat - remove dma_free_coherent() for RSA crypto: qat - fix memory leak in RSA crypto: qat - add backlog mechanism crypto: qat - refactor submission logic crypto: qat - use pre-allocated buffers in datapath crypto: qat - set to zero DH parameters before free crypto: s390 - add crypto library interface for ChaCha20 crypto: talitos - Uniform coding style with defined variable crypto: octeontx2 - simplify the return expression of otx2_cpt_aead_cbc_aes_sha_setkey() crypto: cryptd - Protect per-CPU resource by disabling BH. crypto: sun8i-ce - do not fallback if cryptlen is less than sg length crypto: sun8i-ce - rework debugging ...
2022-05-28Merge tag '5.19-rc-smb3-client-fixes-updated' of ↵Linus Torvalds24-189/+559
git://git.samba.org/sfrench/cifs-2.6 Pull cifs client updates from Steve French: - multichannel fixes to improve reconnect after network failure - improved caching of root directory contents (extending benefit of directory leases) - two DFS fixes - three fixes for improved debugging - an NTLMSSP fix for mounts t0 older servers - new mount parm to allow disabling creating sparse files - various cleanup fixes and minor fixes pointed out by coverity * tag '5.19-rc-smb3-client-fixes-updated' of git://git.samba.org/sfrench/cifs-2.6: (24 commits) smb3: remove unneeded null check in cifs_readdir cifs: fix ntlmssp on old servers cifs: cache the dirents for entries in a cached directory cifs: avoid parallel session setups on same channel cifs: use new enum for ses_status cifs: do not use tcpStatus after negotiate completes smb3: add mount parm nosparse smb3: don't set rc when used and unneeded in query_info_compound smb3: check for null tcon cifs: fix minor compile warning Add various fsctl structs Add defines for various newer FSCTLs smb3: add trace point for oplock not found cifs: return the more nuanced writeback error on close() smb3: add trace point for lease not found issue cifs: smbd: fix typo in comment cifs: set the CREATE_NOT_FILE when opening the directory in use_cached_dir() cifs: check for smb1 in open_cached_dir() cifs: move definition of cifs_fattr earlier in cifsglob.h cifs: print TIDs as hex ...
2022-05-28Merge tag 'jfs-5.19' of https://github.com/kleikamp/linux-shaggyLinus Torvalds10-1652/+3
Pull jfs updates from David Kleikamp: "One bug fix and some code cleanup" * tag 'jfs-5.19' of https://github.com/kleikamp/linux-shaggy: fs/jfs: Remove dead code fs: jfs: fix possible NULL pointer dereference in dbFree()
2022-05-28Merge tag 'libnvdimm-for-5.19' of ↵Linus Torvalds24-171/+359
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm and DAX updates from Dan Williams: "New support for clearing memory errors when a file is in DAX mode, alongside with some other fixes and cleanups. Previously it was only possible to clear these errors using a truncate or hole-punch operation to trigger the filesystem to reallocate the block, now, any page aligned write can opportunistically clear errors as well. This change spans x86/mm, nvdimm, and fs/dax, and has received the appropriate sign-offs. Thanks to Jane for her work on this. Summary: - Add support for clearing memory error via pwrite(2) on DAX - Fix 'security overwrite' support in the presence of media errors - Miscellaneous cleanups and fixes for nfit_test (nvdimm unit tests)" * tag 'libnvdimm-for-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: pmem: implement pmem_recovery_write() pmem: refactor pmem_clear_poison() dax: add .recovery_write dax_operation dax: introduce DAX_RECOVERY_WRITE dax access mode mce: fix set_mce_nospec to always unmap the whole page x86/mce: relocate set{clear}_mce_nospec() functions acpi/nfit: rely on mce->misc to determine poison granularity testing: nvdimm: asm/mce.h is not needed in nfit.c testing: nvdimm: iomap: make __nfit_test_ioremap a macro nvdimm: Allow overwrite in the presence of disabled dimms tools/testing/nvdimm: remove unneeded flush_workqueue
2022-05-28Merge tag 'mfd-next-5.19' of ↵Linus Torvalds26-520/+1442
git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd Pull MFD updates from Lee Jones: "New Device Support - Add support for {Power,Home} Keys to MediaTek MT6359 - Add support for SC2730 to Spreadtrum SPRD SC27XX SPI - Add support for additional Alder Lake-P I2C Controllers to Intel LPSS PCI Fix-ups: - Convert GPIO to GPIOD (hi655x-pmic) - Only register devices that exist (cros_ec_dev) - Remove unused code (syscon, reg-mux) - Rework .remove() API to return void (twl-core, rt4831) - Trivial - whitespace, spelling, coding style (tps65218, sprd-sc27xx-spi, google,cros-ec) - DT binding changes (samsung,exynos5433-lpass, rockchip,rk805, rockchip,rk808, rockchip,rk809, rockchip,rk817, rockchip,rk818, wlf,arizona) Bug Fixes: - Fix error handling bugs (ipaq-micro, davinci_voicecodec)" * tag 'mfd-next-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd: dt-bindings: cros-ec: Fix a typo in description dt-bindings: mfd: wlf,arizona: Add spi-max-frequency mfd: rt4831: Improve error reporting for problems during .remove() mfd: davinci_voicecodec: Fix possible null-ptr-deref davinci_vc_probe() mfd: intel-lpss: Add support for ADL-P i2c6 and i2c7 dt-bindings: mfd: rk808: Convert bindings to yaml mfd: twl4030: Make twl4030_exit_irq() return void mfd: twl6030: Make twl6030_exit_irq() return void dt-bindings: mfd: samsung,exynos5433-lpass: Fix 'dma-channels/requests' properties mfd: sprd: Jugle {of,spi}_device_id tables into numerical order mfd: sprd: Add SC2730 PMIC to SPI device ID table dt-bindings: Drop undocumented i.MX iomuxc-gpr bindings in examples mfd: cros_ec_dev: Only register PCHG device if present mfd: mt6397-core: Add resources for PMIC keys for MT6359 mfd: mt6359: Add missing defines necessary for mtk-pmic-keys support mfd: ipaq-micro: Fix error check return value of platform_get_irq() mfd: hi655x-pmic: Replace legacy gpio interface for gpiod interface mfd: tps65218: Fix trivial typo in comment
2022-05-28Merge tag 'clk-for-linus' of ↵Linus Torvalds249-2050/+22752
git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux Pull clk updates from Stephen Boyd: "Mainly driver updates this time around. There's a single patch to the core clk framework that simplifies a runtime PM call. Otherwise the majority of the diff falls to a few SoC drivers: Qualcomm, STM32 and MediaTek. Those SoCs gain some new hardware support and what comes along with that is quite a few lines of data and some clk_ops code. Beyond the new hardware support we have the usual pile of driver updates that add missing clks on already supported SoCs or fix up problems like bad clk tree descriptions. It's nice to see that more drivers are moving to clk_hw based APIs too. New Drivers: - Add STM32MP13 RCC driver (Reset Clock Controller) - MediaTek MT8186 SoC clk support - Airoha EN7523 SoC system clocks - Clock driver for exynosautov9 SoC - Renesas R-Car V4H and RZ/V2M SoCs - Renesas RZ/G2UL SoC - LPASS clk driver for Qualcomm sc7280 SoC - GCC clk driver for Qualcomm SC8280XP SoC Updates: - SDCC uses floor clk ops on Qualcomm MSM8976 - Add modem reset and fix RPM clks on Qualcomm MSM8976 - Add the two missing CLKOUT clocks for U8500/DB8500 SoC - Mark some clks critical on Ingenic X1000 - Convert ux500 to clk_hw - Move MediaTek driver to clk_hw provider APIs - Use i2c driver probe_new to avoid id scans - Convert a number of Rockchip dt bindings to YAML - Mark hclk_vo critical on Rockchip rk3568 - Use pm_runtime_resume_and_get to fix pm_runtime_get_sync() usage - Various cleanups like memory allocation error checks and plugged leaks - Allwinner H6 RTC clock support - Allwinner H616 32 kHz clock support - Add the Universal Flash Storage clock on Renesas R-Car S4-8 - Add I2C, SSIF-2 (sound), USB, CANFD, OSTM (timer), WDT, SPI Multi I/O Bus, RSPI, TSU (thermal), and ADC clocks and resets on Renesas RZ/G2UL - Add display clock support on Renesas RZ/G2L - Add RPC (QSPI/HyperFlash) clocks on Renesas R-Car E3 and D3 - Add 27 MHz phy PLL ref clock on i.MX - Add mcore_booted module parameter to tell kernel M core has already booted for i.MX - Remove snvs clock on i.MX because it was for secure world only - Add dt bindings for i.MX8MN GPT - Add DISP2 pixel clock for i.MX8MP - Add clkout1/2 for i.MX8MP - Fix parent clock of ubs_root_clk for i.MX8MP - Implement better RCG parking on Qualcomm SoCs using the shared RCG clk ops - Kerneldoc fixes - Switch Tegra BPMP to determine_rate clk op - Add a pointer to dt schema for generic clock bindings" * tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: (168 commits) Revert "clk: qcom: regmap-mux: add pipe clk implementation" Revert "clk: qcom: gcc-sc7280: use new clk_regmap_mux_safe_ops for PCIe pipe clocks" Revert "clk: qcom: gcc-sm8450: use new clk_regmap_mux_safe_ops for PCIe pipe clocks" clk: bcm: rpi: Use correct order for the parameters of devm_kcalloc() clk: stm32mp13: add safe mux management clk: stm32mp13: add multi mux function clk: stm32mp13: add all STM32MP13 kernel clocks clk: stm32mp13: add all STM32MP13 peripheral clocks clk: stm32mp13: manage secured clocks clk: stm32mp13: add composite clock clk: stm32mp13: add stm32 divider clock clk: stm32mp13: add stm32_gate management clk: stm32mp13: add stm32_mux clock management clk: stm32: Introduce STM32MP13 RCC drivers (Reset Clock Controller) dt-bindings: rcc: stm32: add new compatible for STM32MP13 SoC clk: ti: clkctrl: replace usage of found with dedicated list iterator variable clk: ti: composite: Prefer kcalloc over open coded arithmetic dt-bindings: clock: exynosautov9: correct count of NR_CLK clk: mediatek: mt8173: Switch to clk_hw provider APIs clk: mediatek: Switch to clk_hw provider APIs ...
2022-05-28Merge tag 'pci-v5.19-changes' of ↵Linus Torvalds43-748/+1646
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull pci updates from Bjorn Helgaas: "Resource management: - Restrict E820 clipping to PCI host bridge windows (Bjorn Helgaas) - Log E820 clipping better (Bjorn Helgaas) - Add kernel cmdline options to enable/disable E820 clipping (Hans de Goede) - Disable E820 reserved region clipping for IdeaPads, Yoga, Yoga Slip, Acer Spin 5, Clevo Barebone systems where clipping leaves no usable address space for touchpads, Thunderbolt devices, etc (Hans de Goede) - Disable E820 clipping by default starting in 2023 (Hans de Goede) PCI device hotplug: - Include files to remove implicit dependencies (Christophe Leroy) - Only put Root Ports in D3 if they can signal and wake from D3 so AMD Yellow Carp doesn't miss hotplug events (Mario Limonciello) Power management: - Define pci_restore_standard_config() only for CONFIG_PM_SLEEP since it's unused otherwise (Krzysztof Kozlowski) - Power up devices completely, including anything platform firmware needs to do, during runtime resume (Rafael J. Wysocki) - Move pci_resume_bus() to PM callbacks so we observe the required bridge power-up delays (Rafael J. Wysocki) - Drop unneeded runtime_d3cold device flag (Rafael J. Wysocki) - Split pci_raw_set_power_state() between pci_power_up() and a new pci_set_low_power_state() (Rafael J. Wysocki) - Set current_state to D3cold if config read returns ~0, indicating the device is not accessible (Rafael J. Wysocki) - Do not call pci_update_current_state() from pci_power_up() so BARs and ASPM config are restored correctly (Rafael J. Wysocki) - Write 0 to PMCSR in pci_power_up() in all cases (Rafael J. Wysocki) - Split pci_power_up() to pci_set_full_power_state() to avoid some redundant operations (Rafael J. Wysocki) - Skip restoring BARs if device is not in D0 (Rafael J. Wysocki) - Rearrange and clarify pci_set_power_state() (Rafael J. Wysocki) - Remove redundant BAR restores from pci_pm_thaw_noirq() (Rafael J. Wysocki) Virtualization: - Acquire device lock before config space access lock to avoid AB/BA deadlock with sriov_numvfs_store() (Yicong Yang) Error handling: - Clear MULTI_ERR_COR/UNCOR_RCV bits, which a race could previously leave permanently set (Kuppuswamy Sathyanarayanan) Peer-to-peer DMA: - Whitelist Intel Skylake-E Root Ports regardless of which devfn they are (Shlomo Pongratz) ASPM: - Override L1 acceptable latency advertised by Intel DG2 so ASPM L1 can be enabled (Mika Westerberg) Cadence PCIe controller driver: - Set up device-specific register to allow PTM Responder to be enabled by the normal architected bit (Christian Gmeiner) - Override advertised FLR support since the controller doesn't implement FLR correctly (Parshuram Thombare) Cadence PCIe endpoint driver: - Correct bitmap size for the ob_region_map of outbound window usage (Dan Carpenter) Freescale i.MX6 PCIe controller driver: - Fix PERST# assertion/deassertion so we observe the required delays before accessing device (Francesco Dolcini) Freescale Layerscape PCIe controller driver: - Add "big-endian" DT property (Hou Zhiqiang) - Update SCFG DT property (Hou Zhiqiang) - Add "aer", "pme", "intr" DT properties (Li Yang) - Add DT compatible strings for ls1028a (Xiaowei Bao) Intel VMD host bridge driver: - Assign VMD IRQ domain before enumeration to avoid IOMMU interrupt remapping errors when MSI-X remapping is disabled (Nirmal Patel) - Revert VMD workaround that kept MSI-X remapping enabled when IOMMU remapping was enabled (Nirmal Patel) Marvell MVEBU PCIe controller driver: - Add of_pci_get_slot_power_limit() to parse the 'slot-power-limit-milliwatt' DT property (Pali Rohár) - Add mvebu support for sending Set_Slot_Power_Limit message (Pali Rohár) MediaTek PCIe controller driver: - Fix refcount leak in mtk_pcie_subsys_powerup() (Miaoqian Lin) MediaTek PCIe Gen3 controller driver: - Reset PHY and MAC at probe time (AngeloGioacchino Del Regno) Microchip PolarFlare PCIe controller driver: - Add chained_irq_enter()/chained_irq_exit() calls to mc_handle_msi() and mc_handle_intx() to avoid lost interrupts (Conor Dooley) - Fix interrupt handling race (Daire McNamara) NVIDIA Tegra194 PCIe controller driver: - Drop tegra194 MSI register save/restore, which is unnecessary since the DWC core does it (Jisheng Zhang) Qualcomm PCIe controller driver: - Add SM8150 SoC DT binding and support (Bhupesh Sharma) - Fix pipe clock imbalance (Johan Hovold) - Fix runtime PM imbalance on probe errors (Johan Hovold) - Fix PHY init imbalance on probe errors (Johan Hovold) - Convert DT binding to YAML (Dmitry Baryshkov) - Update DT binding to show that resets aren't required for MSM8996/APQ8096 platforms (Dmitry Baryshkov) - Add explicit register names per chipset in DT binding (Dmitry Baryshkov) - Add sc7280-specific clock and reset definitions to DT binding (Dmitry Baryshkov) Rockchip PCIe controller driver: - Fix bitmap size when searching for free outbound region (Dan Carpenter) Rockchip DesignWare PCIe controller driver: - Remove "snps,dw-pcie" from rockchip-dwc DT "compatible" property because it's not fully compatible with rockchip (Peter Geis) - Reset rockchip-dwc controller at probe (Peter Geis) - Add rockchip-dwc INTx support (Peter Geis) Synopsys DesignWare PCIe controller driver: - Return error instead of success if DMA mapping of MSI area fails (Jiantao Zhang) Miscellaneous: - Change pci_set_dma_mask() documentation references to dma_set_mask() (Alex Williamson)" * tag 'pci-v5.19-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (64 commits) dt-bindings: PCI: qcom: Add schema for sc7280 chipset dt-bindings: PCI: qcom: Specify reg-names explicitly dt-bindings: PCI: qcom: Do not require resets on msm8996 platforms dt-bindings: PCI: qcom: Convert to YAML PCI: qcom: Fix unbalanced PHY init on probe errors PCI: qcom: Fix runtime PM imbalance on probe errors PCI: qcom: Fix pipe clock imbalance PCI: qcom: Add SM8150 SoC support dt-bindings: pci: qcom: Document PCIe bindings for SM8150 SoC x86/PCI: Disable E820 reserved region clipping starting in 2023 x86/PCI: Disable E820 reserved region clipping via quirks x86/PCI: Add kernel cmdline options to use/ignore E820 reserved regions PCI: microchip: Fix potential race in interrupt handling PCI/AER: Clear MULTI_ERR_COR/UNCOR_RCV bits PCI: cadence: Clear FLR in device capabilities register PCI: cadence: Allow PTM Responder to be enabled PCI: vmd: Revert 2565e5b69c44 ("PCI: vmd: Do not disable MSI-X remapping if interrupt remapping is enabled by IOMMU.") PCI: vmd: Assign VMD IRQ domain before enumeration PCI: Avoid pci_dev_lock() AB/BA deadlock with sriov_numvfs_store() PCI: rockchip-dwc: Add legacy interrupt support ...
2022-05-27Merge tag 'mm-stable-2022-05-27' of ↵Linus Torvalds20-364/+436
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull more MM updates from Andrew Morton: - Two follow-on fixes for the post-5.19 series "Use pageblock_order for cma and alloc_contig_range alignment", from Zi Yan. - A series of z3fold cleanups and fixes from Miaohe Lin. - Some memcg selftests work from Michal Koutný <mkoutny@suse.com> - Some swap fixes and cleanups from Miaohe Lin - Several individual minor fixups * tag 'mm-stable-2022-05-27' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (25 commits) mm/shmem.c: suppress shift warning mm: Kconfig: reorganize misplaced mm options mm: kasan: fix input of vmalloc_to_page() mm: fix is_pinnable_page against a cma page mm: filter out swapin error entry in shmem mapping mm/shmem: fix infinite loop when swap in shmem error at swapoff time mm/madvise: free hwpoison and swapin error entry in madvise_free_pte_range mm/swapfile: fix lost swap bits in unuse_pte() mm/swapfile: unuse_pte can map random data if swap read fails selftests: memcg: factor out common parts of memory.{low,min} tests selftests: memcg: remove protection from top level memcg selftests: memcg: adjust expected reclaim values of protected cgroups selftests: memcg: expect no low events in unprotected sibling selftests: memcg: fix compilation mm/z3fold: fix z3fold_page_migrate races with z3fold_map mm/z3fold: fix z3fold_reclaim_page races with z3fold_free mm/z3fold: always clear PAGE_CLAIMED under z3fold page lock mm/z3fold: put z3fold page back into unbuddied list when reclaim or migration fails revert "mm/z3fold.c: allow __GFP_HIGHMEM in z3fold_alloc" mm/z3fold: throw warning on failure of trylock_page in z3fold_alloc ...
2022-05-27Merge tag 'mm-hotfixes-stable-2022-05-27' of ↵Linus Torvalds9-51/+103
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull hotfixes from Andrew Morton: "Six hotfixes. The page_table_check one from Miaohe Lin is considered a minor thing so it isn't marked for -stable. The remainder address pre-5.19 issues and are cc:stable" * tag 'mm-hotfixes-stable-2022-05-27' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: mm/page_table_check: fix accessing unmapped ptep kexec_file: drop weak attribute from arch_kexec_apply_relocations[_add] mm/page_alloc: always attempt to allocate at least one page during bulk allocation hugetlb: fix huge_pmd_unshare address update zsmalloc: fix races between asynchronous zspage free and page migration Revert "mm/cma.c: remove redundant cma_mutex lock"
2022-05-27Merge tag 'mm-nonmm-stable-2022-05-26' of ↵Linus Torvalds83-706/+1181
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc updates from Andrew Morton: "The non-MM patch queue for this merge window. Not a lot of material this cycle. Many singleton patches against various subsystems. Most notably some maintenance work in ocfs2 and initramfs" * tag 'mm-nonmm-stable-2022-05-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (65 commits) kcov: update pos before writing pc in trace function ocfs2: dlmfs: fix error handling of user_dlm_destroy_lock ocfs2: dlmfs: don't clear USER_LOCK_ATTACHED when destroying lock fs/ntfs: remove redundant variable idx fat: remove time truncations in vfat_create/vfat_mkdir fat: report creation time in statx fat: ignore ctime updates, and keep ctime identical to mtime in memory fat: split fat_truncate_time() into separate functions MAINTAINERS: add Muchun as a memcg reviewer proc/sysctl: make protected_* world readable ia64: mca: drop redundant spinlock initialization tty: fix deadlock caused by calling printk() under tty_port->lock relay: remove redundant assignment to pointer buf fs/ntfs3: validate BOOT sectors_per_clusters lib/string_helpers: fix not adding strarray to device's resource list kernel/crash_core.c: remove redundant check of ck_cmdline ELF, uapi: fixup ELF_ST_TYPE definition ipc/mqueue: use get_tree_nodev() in mqueue_get_tree() ipc: update semtimedop() to use hrtimer ipc/sem: remove redundant assignments ...
2022-05-27crypto: poly1305 - cleanup stray CRYPTO_LIB_POLY1305_RSIZEJason A. Donenfeld1-0/+1
When CRYPTO_LIB_POLY1305 is unset, CRYPTO_LIB_POLY1305_RSIZE is still set in the Kconfig, cluttering things. Fix this by making CRYPTO_LIB_POLY1305_RSIZE depend on CRYPTO_LIB_POLY1305. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-05-27arm64/hugetlb: Fix building errors in huge_ptep_clear_flush()Baolin Wang1-1/+4
Fix the arm64 build error which was caused by commit ae07562909f3 ("mm: change huge_ptep_clear_flush() to return the original pte") interacting with commit fb396bb459c1 ("arm64/hugetlb: Drop TLB flush from get_clear_flush()"): arch/arm64/mm/hugetlbpage.c: In function ‘huge_ptep_clear_flush’: arch/arm64/mm/hugetlbpage.c:515:9: error: implicit declaration of function ‘get_clear_flush’; did you mean ‘ptep_clear_flush’? [-Werror=implicit-function-declaration] 515 | return get_clear_flush(vma->vm_mm, addr, ptep, pgsize, ncontig); | ^~~~~~~~~~~~~~~ | ptep_clear_flush Due to the new get_clear_contig() has dropped TLB flush, we should add an explicit TLB flush in huge_ptep_clear_flush() to keep original semantics when changing to use new get_clear_contig(). Fixes: fb396bb459c1 ("arm64/hugetlb: Drop TLB flush from get_clear_flush()"). Fixes: ae07562909f3 ("mm: change huge_ptep_clear_flush() to return the original pte") Reported-and-tested-by: Linux Kernel Functional Testing <lkft@linaro.org> Reported-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com> Suggested-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-05-27pipe: Fix missing lock in pipe_resize_ring()David Howells1-13/+18
pipe_resize_ring() needs to take the pipe->rd_wait.lock spinlock to prevent post_one_notification() from trying to insert into the ring whilst the ring is being replaced. The occupancy check must be done after the lock is taken, and the lock must be taken after the new ring is allocated. The bug can lead to an oops looking something like: BUG: KASAN: use-after-free in post_one_notification.isra.0+0x62e/0x840 Read of size 4 at addr ffff88801cc72a70 by task poc/27196 ... Call Trace: post_one_notification.isra.0+0x62e/0x840 __post_watch_notification+0x3b7/0x650 key_create_or_update+0xb8b/0xd20 __do_sys_add_key+0x175/0x340 __x64_sys_add_key+0xbe/0x140 do_syscall_64+0x5c/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xae Reported by Selim Enes Karaduman @Enesdex working with Trend Micro Zero Day Initiative. Fixes: c73be61cede5 ("pipe: Add general notification queue support") Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-17291 Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-05-27smb3: remove unneeded null check in cifs_readdirSteve French2-4/+3
Coverity pointed out an unneeded check. Addresses-Coverity: 1518030 ("Null pointer dereferences") Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2022-05-27mm/shmem.c: suppress shift warningAndrew Morton1-1/+1
mm/shmem.c:1948 shmem_getpage_gfp() warn: should '(((1) << 12) / 512) << folio_order(folio)' be a 64 bit type? On i386, so an unsigned long is 32-bit, but i_blocks is a 64-bit blkcnt_t. Reported-by: kernel test robot <lkp@intel.com> Reported-by: Jessica Clarke <jrtc27@jrtc27.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-27mm: Kconfig: reorganize misplaced mm optionsVlastimil Babka4-87/+87
After commits 7b42f1041c98 ("mm: Kconfig: move swap and slab config options to the MM section") and 519bcb797907 ("mm: Kconfig: group swap, slab, hotplug and thp options into submenus") we now have nicely organized mm related config options. I have noticed some that were still misplaced, so this moves them from various places into the new structure: VM_EVENT_COUNTERS, COMPAT_BRK, MMAP_ALLOW_UNINITIALIZED to mm/Kconfig and general MM section. SLUB_STATS to mm/Kconfig and the slab submenu. DEBUG_SLAB, SLUB_DEBUG, SLUB_DEBUG_ON to mm/Kconfig.debug and the Kernel hacking / Memory Debugging submenu. Link: https://lkml.kernel.org/r/20220525112559.1139-1-vbabka@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-27mm: kasan: fix input of vmalloc_to_page()Kefeng Wang1-1/+1
When print virtual mapping info for vmalloc address, it should pass the addr not page, fix it. Link: https://lkml.kernel.org/r/20220525120804.38155-1-wangkefeng.wang@huawei.com Fixes: c056a364e954 ("kasan: print virtual mapping info in reports") Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-27mm: fix is_pinnable_page against a cma pageMinchan Kim2-4/+13
Pages in the CMA area could have MIGRATE_ISOLATE as well as MIGRATE_CMA so the current is_pinnable_page() could miss CMA pages which have MIGRATE_ISOLATE. It ends up pinning CMA pages as longterm for the pin_user_pages() API so CMA allocations keep failing until the pin is released. CPU 0 CPU 1 - Task B cma_alloc alloc_contig_range pin_user_pages_fast(FOLL_LONGTERM) change pageblock as MIGRATE_ISOLATE internal_get_user_pages_fast lockless_pages_from_mm gup_pte_range try_grab_folio is_pinnable_page return true; So, pinned the page successfully. page migration failure with pinned page .. .. After 30 sec unpin_user_page(page) CMA allocation succeeded after 30 sec. The CMA allocation path protects the migration type change race using zone->lock but what GUP path need to know is just whether the page is on CMA area or not rather than exact migration type. Thus, we don't need zone->lock but just checks migration type in either of (MIGRATE_ISOLATE and MIGRATE_CMA). Adding the MIGRATE_ISOLATE check in is_pinnable_page could cause rejecting of pinning pages on MIGRATE_ISOLATE pageblocks even though it's neither CMA nor movable zone if the page is temporarily unmovable. However, such a migration failure by unexpected temporal refcount holding is general issue, not only come from MIGRATE_ISOLATE and the MIGRATE_ISOLATE is also transient state like other temporal elevated refcount problem. Link: https://lkml.kernel.org/r/20220524171525.976723-1-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Acked-by: Paul E. McKenney <paulmck@kernel.org> Cc: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-27mm: filter out swapin error entry in shmem mappingMiaohe Lin2-1/+7
There might be swapin error entries in shmem mapping. Filter them out to avoid "Bad swap file entry" complaint. Link: https://lkml.kernel.org/r/20220519125030.21486-6-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: NeilBrown <neilb@suse.de> Cc: Peter Xu <peterx@redhat.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-27mm/shmem: fix infinite loop when swap in shmem error at swapoff timeMiaohe Lin1-0/+39
When swap in shmem error at swapoff time, there would be a infinite loop in the while loop in shmem_unuse_inode(). It's because swapin error is deliberately ignored now and thus info->swapped will never reach 0. So we can't escape the loop in shmem_unuse(). In order to fix the issue, swapin_error entry is stored in the mapping when swapin error occurs. So the swapcache page can be freed and the user won't end up with a permanently mounted swap because a sector is bad. If the page is accessed later, the user process will be killed so that corrupted data is never consumed. On the other hand, if the page is never accessed, the user won't even notice it. Link: https://lkml.kernel.org/r/20220519125030.21486-5-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reported-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: NeilBrown <neilb@suse.de> Cc: Peter Xu <peterx@redhat.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-27mm/madvise: free hwpoison and swapin error entry in madvise_free_pte_rangeMiaohe Lin1-5/+8
Once the MADV_FREE operation has succeeded, callers can expect they might get zero-fill pages if accessing the memory again. Therefore it should be safe to delete the hwpoison entry and swapin error entry. There is no reason to kill the process if it has called MADV_FREE on the range. Link: https://lkml.kernel.org/r/20220519125030.21486-4-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Suggested-by: Alistair Popple <apopple@nvidia.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: David Howells <dhowells@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: NeilBrown <neilb@suse.de> Cc: Peter Xu <peterx@redhat.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-27mm/swapfile: fix lost swap bits in unuse_pte()Miaohe Lin1-3/+7
This is observed by code review only but not any real report. When we turn off swapping we could have lost the bits stored in the swap ptes. The new rmap-exclusive bit is fine since that turned into a page flag, but not for soft-dirty and uffd-wp. Add them. Link: https://lkml.kernel.org/r/20220519125030.21486-3-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Suggested-by: Peter Xu <peterx@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: David Howells <dhowells@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: NeilBrown <neilb@suse.de> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-27mm/swapfile: unuse_pte can map random data if swap read failsMiaohe Lin4-2/+31
Patch series "A few fixup patches for mm", v4. This series contains a few patches to avoid mapping random data if swap read fails and fix lost swap bits in unuse_pte. Also we free hwpoison and swapin error entry in madvise_free_pte_range and so on. More details can be found in the respective changelogs. This patch (of 5): There is a bug in unuse_pte(): when swap page happens to be unreadable, page filled with random data is mapped into user address space. In case of error, a special swap entry indicating swap read fails is set to the page table. So the swapcache page can be freed and the user won't end up with a permanently mounted swap because a sector is bad. And if the page is accessed later, the user process will be killed so that corrupted data is never consumed. On the other hand, if the page is never accessed, the user won't even notice it. Link: https://lkml.kernel.org/r/20220519125030.21486-1-linmiaohe@huawei.com Link: https://lkml.kernel.org/r/20220519125030.21486-2-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: David Howells <dhowells@redhat.com> Cc: NeilBrown <neilb@suse.de> Cc: Alistair Popple <apopple@nvidia.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Peter Xu <peterx@redhat.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-27selftests: memcg: factor out common parts of memory.{low,min} testsMichal Koutný1-163/+36
The memory protection test setup and runtime is almost equal for memory.low and memory.min cases. It makes modification of the common parts prone to mistakes, since the protections are similar not only in setup but also in principle, factor the common part out. Past exceptions between the tests: - missing memory.min is fine (kept), - test_memcg_low protected orphaned pagecache (adapted like test_memcg_min and we keep the processes of protected memory running). The evaluation in two tests is different (OOM of allocator vs low events of protégés), this is kept different. Link: https://lkml.kernel.org/r/20220518161859.21565-6-mkoutny@suse.com Signed-off-by: Michal Koutný <mkoutny@suse.com> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> CC: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Shakeel Butt <shakeelb@google.com> Cc: Richard Palethorpe <rpalethorpe@suse.de> Cc: David Vernet <void@manifault.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-27selftests: memcg: remove protection from top level memcgMichal Koutný1-7/+3
The reclaim is triggered by memory limit in a subtree, therefore the testcase does not need configured protection against external reclaim. Also, correct respective comments. Link: https://lkml.kernel.org/r/20220518161859.21565-5-mkoutny@suse.com Signed-off-by: Michal Koutný <mkoutny@suse.com> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Cc: David Vernet <void@manifault.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Richard Palethorpe <rpalethorpe@suse.de> Cc: Shakeel Butt <shakeelb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-27selftests: memcg: adjust expected reclaim values of protected cgroupsMichal Koutný3-12/+107
The numbers are not easy to derive in a closed form (certainly mere protections ratios do not apply), therefore use a simulation to obtain expected numbers. Link: https://lkml.kernel.org/r/20220518161859.21565-4-mkoutny@suse.com Signed-off-by: Michal Koutný <mkoutny@suse.com> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Cc: David Vernet <void@manifault.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Richard Palethorpe <rpalethorpe@suse.de> Cc: Shakeel Butt <shakeelb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-27selftests: memcg: expect no low events in unprotected siblingMichal Koutný1-1/+1
This is effectively a revert of commit cdc69458a5f3 ("cgroup: account for memory_recursiveprot in test_memcg_low()"). The case test_memcg_low will fail with memory_recursiveprot until resolved in reclaim code. However, this patch preserves the existing helpers and variables for later uses. Link: https://lkml.kernel.org/r/20220518161859.21565-3-mkoutny@suse.com Signed-off-by: Michal Koutný <mkoutny@suse.com> Reviewed-by: David Vernet <void@manifault.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Richard Palethorpe <rpalethorpe@suse.de> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Shakeel Butt <shakeelb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-27selftests: memcg: fix compilationMichal Koutný1-11/+14
Patch series "memcontrol selftests fixups", v2. Flushing the patches to make memcontrol selftests check the events behavior we had consensus about (test_memcg_low fails). (test_memcg_reclaim, test_memcg_swap_max fail for me now but it's present even before the refactoring.) The two bigger changes are: - adjustment of the protected values to make tests succeed with the given tolerance, - both test_memcg_low and test_memcg_min check protection of memory in populated cgroups (actually as per Documentation/admin-guide/cgroup-v2.rst memory.min should not apply to empty cgroups, which is not the case currently. Therefore I unified tests with the populated case in order to to bring more broken tests). This patch (of 5): This fixes mis-applied changes from commit 72b1e03aa725 ("cgroup: account for memory_localevents in test_memcg_oom_group_leaf_events()"). Link: https://lkml.kernel.org/r/20220518161859.21565-1-mkoutny@suse.com Link: https://lkml.kernel.org/r/20220518161859.21565-2-mkoutny@suse.com Signed-off-by: Michal Koutný <mkoutny@suse.com> Reviewed-by: David Vernet <void@manifault.com> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Richard Palethorpe <rpalethorpe@suse.de> Cc: Shakeel Butt <shakeelb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-27mm/z3fold: fix z3fold_page_migrate races with z3fold_mapMiaohe Lin1-4/+12
Think about the below scenario: CPU1 CPU2 z3fold_page_migrate z3fold_map z3fold_page_trylock ... z3fold_page_unlock /* slots still points to old zhdr*/ get_z3fold_header get slots from handle get old zhdr from slots z3fold_page_trylock return *old* zhdr encode_handle(new_zhdr, FIRST|LAST|MIDDLE) put_page(page) /* zhdr is freed! */ but zhdr is still used by caller! z3fold_map can map freed z3fold page and lead to use-after-free bug. To fix it, we add PAGE_MIGRATED to indicate z3fold page is migrated and soon to be released. So get_z3fold_header won't return such page. Link: https://lkml.kernel.org/r/20220429064051.61552-10-linmiaohe@huawei.com Fixes: 1f862989b04a ("mm/z3fold.c: support page migration") Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Vitaly Wool <vitaly.wool@konsulko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-27mm/z3fold: fix z3fold_reclaim_page races with z3fold_freeMiaohe Lin1-15/+3
Think about the below scenario: CPU1 CPU2 z3fold_reclaim_page z3fold_free spin_lock(&pool->lock) get_z3fold_header -- hold page_lock kref_get_unless_zero kref_put--zhdr->refcount can be 1 now !z3fold_page_trylock kref_put -- zhdr->refcount is 0 now release_z3fold_page WARN_ON(!list_empty(&zhdr->buddy)); -- we're on buddy now! spin_lock(&pool->lock); -- deadlock here! z3fold_reclaim_page might race with z3fold_free and will lead to pool lock deadlock and zhdr buddy non-empty warning. To fix this, defer getting the refcount until page_lock is held just like what __z3fold_alloc does. Note this has the side effect that we won't break the reclaim if we meet a soon to be released z3fold page now. Link: https://lkml.kernel.org/r/20220429064051.61552-9-linmiaohe@huawei.com Fixes: dcf5aedb24f8 ("z3fold: stricter locking and more careful reclaim") Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Vitaly Wool <vitaly.wool@konsulko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-27mm/z3fold: always clear PAGE_CLAIMED under z3fold page lockMiaohe Lin1-3/+3
Think about the below race window: CPU1 CPU2 z3fold_reclaim_page z3fold_free test_and_set_bit PAGE_CLAIMED failed to reclaim page z3fold_page_lock(zhdr); add back to the lru list; z3fold_page_unlock(zhdr); get_z3fold_header page_claimed=test_and_set_bit PAGE_CLAIMED clear_bit(PAGE_CLAIMED, &page->private); if (!page_claimed) /* it's false true */ free_handle is not called free_handle won't be called in this case. So z3fold_buddy_slots will leak. Fix it by always clear PAGE_CLAIMED under z3fold page lock. Link: https://lkml.kernel.org/r/20220429064051.61552-8-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Vitaly Wool <vitaly.wool@konsulko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-27mm/z3fold: put z3fold page back into unbuddied list when reclaim or ↵Miaohe Lin1-0/+4
migration fails When doing z3fold page reclaim or migration, the page is removed from unbuddied list. If reclaim or migration succeeds, it's fine as page is released. But in case it fails, the page is not put back into unbuddied list now. The page will be leaked until next compaction work, reclaim or migration is done. Link: https://lkml.kernel.org/r/20220429064051.61552-7-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Vitaly Wool <vitaly.wool@konsulko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-27revert "mm/z3fold.c: allow __GFP_HIGHMEM in z3fold_alloc"Miaohe Lin1-5/+3
Revert commit f1549cb5ab2b ("mm/z3fold.c: allow __GFP_HIGHMEM in z3fold_alloc"). z3fold can't support GFP_HIGHMEM page now. page_address is used directly at all places. Moreover, z3fold_header is on per cpu unbuddied list which could be accessed anytime. So we should remove the support of GFP_HIGHMEM allocation for z3fold. Link: https://lkml.kernel.org/r/20220429064051.61552-6-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Cc: Vitaly Wool <vitaly.wool@konsulko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-27mm/z3fold: throw warning on failure of trylock_page in z3fold_allocMiaohe Lin1-4/+3
If trylock_page fails, the page won't be non-lru movable page. When this page is freed via free_z3fold_page, it will trigger bug on PageMovable check in __ClearPageMovable. Throw warning on failure of trylock_page to guard against such rare case just as what zsmalloc does. Link: https://lkml.kernel.org/r/20220429064051.61552-5-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Cc: Vitaly Wool <vitaly.wool@konsulko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-27mm/z3fold: remove buggy use of stale list for allocationMiaohe Lin1-22/+1
Currently if z3fold couldn't find an unbuddied page it would first try to pull a page off the stale list. But this approach is problematic. If init z3fold page fails later, the page should be freed via free_z3fold_page to clean up the relevant resource instead of using __free_page directly. And if page is successfully reused, it will BUG_ON later in __SetPageMovable because it's already non-lru movable page, i.e. PAGE_MAPPING_MOVABLE is already set in page->mapping. In order to fix all of these issues, we can simply remove the buggy use of stale list for allocation because can_sleep should always be false and we never really hit the reusing code path now. Link: https://lkml.kernel.org/r/20220429064051.61552-4-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Vitaly Wool <vitaly.wool@konsulko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-27mm/z3fold: fix possible null pointer dereferencingMiaohe Lin1-1/+11
alloc_slots could fail to allocate memory under heavy memory pressure. So we should check zhdr->slots against NULL to avoid future null pointer dereferencing. Link: https://lkml.kernel.org/r/20220429064051.61552-3-linmiaohe@huawei.com Fixes: fc5488651c7d ("z3fold: simplify freeing slots") Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Vitaly Wool <vitaly.wool@konsulko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-27mm/z3fold: fix sheduling while atomicMiaohe Lin1-2/+1
Patch series "A few fixup patches for z3fold". This series contains a few fixup patches to fix sheduling while atomic, fix possible null pointer dereferencing, fix various race conditions and so on. More details can be found in the respective changelogs. This patch (of 9): z3fold's page_lock is always held when calling alloc_slots. So gfp should be GFP_ATOMIC to avoid "scheduling while atomic" bug. Link: https://lkml.kernel.org/r/20220429064051.61552-1-linmiaohe@huawei.com Link: https://lkml.kernel.org/r/20220429064051.61552-2-linmiaohe@huawei.com Fixes: fc5488651c7d ("z3fold: simplify freeing slots") Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Vitaly Wool <vitaly.wool@konsulko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-27mm: split free page with properly free memory accounting and without raceZi Yan3-9/+29
In isolate_single_pageblock(), free pages are checked without holding zone lock, but they can go away in split_free_page() when zone lock is held. Check the free page and its order again in split_free_page() when zone lock is held. Recheck the page if the free page is gone under zone lock. In addition, in split_free_page(), the free page was deleted from the page list without changing free page accounting. Add the missing free page accounting code. Fix the type of order parameter in split_free_page(). Link: https://lore.kernel.org/lkml/20220525103621.987185e2ca0079f7b97b856d@linux-foundation.org/ Link: https://lkml.kernel.org/r/20220526231531.2404977-2-zi.yan@sent.com Fixes: b2c9e2fbba32 ("mm: make alloc_contig_range work at pageblock granularity") Signed-off-by: Zi Yan <ziy@nvidia.com> Reported-by: Doug Berger <opendmb@gmail.com> Link: https://lore.kernel.org/linux-mm/c3932a6f-77fe-29f7-0c29-fe6b1c67ab7b@gmail.com/ Cc: David Hildenbrand <david@redhat.com> Cc: Qian Cai <quic_qiancai@quicinc.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Eric Ren <renzhengeek@gmail.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Oscar Salvador <osalvador@suse.de> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Michael Walle <michael@walle.cc> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-27mm: page-isolation: skip isolated pageblock in start_isolate_page_range()Zi Yan1-8/+18
start_isolate_page_range() first isolates the first and the last pageblocks in the range and ensure pages across range boundaries are split during isolation. But it missed the case when the range is <= a pageblock and the first and the last pageblocks are the same one, so the second isolate_single_pageblock() will always fail. To fix it, skip the pageblock isolation in second isolate_single_pageblock(). Link: https://lkml.kernel.org/r/20220526231531.2404977-1-zi.yan@sent.com Fixes: 88ee134320b8 ("mm: fix a potential infinite loop in start_isolate_page_range()") Signed-off-by: Zi Yan <ziy@nvidia.com> Reported-by: Marek Szyprowski <m.szyprowski@samsung.com> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/linux-mm/ac65adc0-a7e4-cdfe-a0d8-757195b86293@samsung.com/ Reported-by: Michael Walle <michael@walle.cc> Tested-by: Michael Walle <michael@walle.cc> Link: https://lore.kernel.org/linux-mm/8ca048ca8b547e0dd1c95387ee05c23d@walle.cc/ Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: David Hildenbrand <david@redhat.com> Cc: Doug Berger <opendmb@gmail.com> Cc: Eric Ren <renzhengeek@gmail.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Mike Rapoport <rppt@kernel.org> Cc: Oscar Salvador <osalvador@suse.de> Cc: Qian Cai <quic_qiancai@quicinc.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-27mm/page_table_check: fix accessing unmapped ptepMiaohe Lin1-1/+1
ptep is unmapped too early, so ptep could theoretically be accessed while it's unmapped. This might become a problem if/when CONFIG_HIGHPTE becomes available on riscv. Fix it by deferring pte_unmap() until page table checking is done. [akpm@linux-foundation.org: account for ptep alteration, per Matthew] Link: https://lkml.kernel.org/r/20220526113350.30806-1-linmiaohe@huawei.com Fixes: 80110bbfbba6 ("mm/page_table_check: check entries at pmd levels") Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Acked-by: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-27kexec_file: drop weak attribute from arch_kexec_apply_relocations[_add]Naveen N. Rao4-42/+56
Since commit d1bcae833b32f1 ("ELF: Don't generate unused section symbols") [1], binutils (v2.36+) started dropping section symbols that it thought were unused. This isn't an issue in general, but with kexec_file.c, gcc is placing kexec_arch_apply_relocations[_add] into a separate .text.unlikely section and the section symbol ".text.unlikely" is being dropped. Due to this, recordmcount is unable to find a non-weak symbol in .text.unlikely to generate a relocation record against. Address this by dropping the weak attribute from these functions. Instead, follow the existing pattern of having architectures #define the name of the function they want to override in their headers. [1] https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=d1bcae833b32f1 [akpm@linux-foundation.org: arch/s390/include/asm/kexec.h needs linux/module.h] Link: https://lkml.kernel.org/r/20220519091237.676736-1-naveen.n.rao@linux.vnet.ibm.com Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-27mm/page_alloc: always attempt to allocate at least one page during bulk ↵Mel Gorman1-2/+2
allocation Peter Pavlisko reported the following problem on kernel bugzilla 216007. When I try to extract an uncompressed tar archive (2.6 milion files, 760.3 GiB in size) on newly created (empty) XFS file system, after first low tens of gigabytes extracted the process hangs in iowait indefinitely. One CPU core is 100% occupied with iowait, the other CPU core is idle (on 2-core Intel Celeron G1610T). It was bisected to c9fa563072e1 ("xfs: use alloc_pages_bulk_array() for buffers") but XFS is only the messenger. The problem is that nothing is waking kswapd to reclaim some pages at a time the PCP lists cannot be refilled until some reclaim happens. The bulk allocator checks that there are some pages in the array and the original intent was that a bulk allocator did not necessarily need all the requested pages and it was best to return as quickly as possible. This was fine for the first user of the API but both NFS and XFS require the requested number of pages be available before making progress. Both could be adjusted to call the page allocator directly if a bulk allocation fails but it puts a burden on users of the API. Adjust the semantics to attempt at least one allocation via __alloc_pages() before returning so kswapd is woken if necessary. It was reported via bugzilla that the patch addressed the problem and that the tar extraction completed successfully. This may also address bug 215975 but has yet to be confirmed. BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=216007 BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215975 Link: https://lkml.kernel.org/r/20220526091210.GC3441@techsingularity.net Fixes: 387ba26fb1cb ("mm/page_alloc: add a bulk page allocator") Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Cc: "Darrick J. Wong" <djwong@kernel.org> Cc: Dave Chinner <dchinner@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: <stable@vger.kernel.org> [5.13+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-27hugetlb: fix huge_pmd_unshare address updateMike Kravetz1-1/+8
The routine huge_pmd_unshare() is passed a pointer to an address associated with an area which may be unshared. If unshare is successful this address is updated to 'optimize' callers iterating over huge page addresses. For the optimization to work correctly, address should be updated to the last huge page in the unmapped/unshared area. However, in the common case where the passed address is PUD_SIZE aligned, the address is incorrectly updated to the address of the preceding huge page. That wastes CPU cycles as the unmapped/unshared range is scanned twice. Link: https://lkml.kernel.org/r/20220524205003.126184-1-mike.kravetz@oracle.com Fixes: 39dde65c9940 ("shared page table for hugetlb page") Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Acked-by: Muchun Song <songmuchun@bytedance.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-27Merge tag 'for-5.19/dm-changes' of ↵Linus Torvalds15-287/+409
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper updates from Mike Snitzer: - Enable DM core bioset's per-cpu bio cache if QUEUE_FLAG_POLL set. This change improves DM's hipri bio polling (REQ_POLLED) performance by 7 - 20% depending on the system. - Update DM core to use jump_labels to further reduce cost of unlikely branches for zoned block devices, dm-stats and swap_bios throttling. - Various DM core changes to reduce bio-based DM overhead and simplify IO accounting. - Fundamental DM core improvements to dm_io reference counting and the elimination of using bio_split()+bio_chain() -- instead DM's bio-based IO accounting is updated to account that a split occurred. - Improve DM core's abnormal bio processing to do less work. - Improve DM core's hipri polling support to use a single list rather than an hlist. - Update DM core to pass NULL bdev to bio_alloc_clone() so that initialization that isn't useful for DM can be elided. - Add cond_resched to DM stats' various loops that loop over all entries. - Fix incorrect error code return from DM integrity's constructor. - Make DM crypt's printing of the key constant-time. - Update bio-based DM multipath to provide high-resolution timer to the Historical Service Time (HST) path selector. * tag 'for-5.19/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (26 commits) dm: pass NULL bdev to bio_alloc_clone dm cache metadata: remove unnecessary variable in __dump_mapping dm mpath: provide high-resolution timer to HST for bio-based dm crypt: make printing of the key constant-time dm integrity: fix error code in dm_integrity_ctr() dm stats: add cond_resched when looping over entries dm: improve abnormal bio processing dm: simplify bio-based IO accounting further dm: put all polled dm_io instances into a single list dm: improve dm_io reference counting dm: don't grab target io reference in dm_zone_map_bio dm: improve bio splitting and associated IO accounting dm: switch to bdev based IO accounting interfaces dm: pass dm_io instance to dm_io_acct directly dm: don't pass bio to __dm_start_io_acct and dm_end_io_acct dm: use bio_sectors in dm_aceept_partial_bio dm: simplify basic targets dm: conditionally enable branching for less used features dm: introduce dm_{get,put}_live_table_bio called from dm_submit_bio dm: move hot dm_io members to same cacheline as dm_target_io ...
2022-05-27Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds88-2002/+1968
Pull rdma updates from Jason Gunthorpe: "Small collection of incremental improvement patches: - Minor code cleanup patches, comment improvements, etc from static tools - Clean the some of the kernel caps, reducing the historical stealth uAPI leftovers - Bug fixes and minor changes for rdmavt, hns, rxe, irdma - Remove unimplemented cruft from rxe - Reorganize UMR QP code in mlx5 to avoid going through the IB verbs layer - flush_workqueue(system_unbound_wq) removal - Ensure rxe waits for objects to be unused before allowing the core to free them - Several rc quality bug fixes for hfi1" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (67 commits) RDMA/rtrs-clt: Fix one kernel-doc comment RDMA/hfi1: Remove all traces of diagpkt support RDMA/hfi1: Consolidate software versions RDMA/hfi1: Remove pointless driver version RDMA/hfi1: Fix potential integer multiplication overflow errors RDMA/hfi1: Prevent panic when SDMA is disabled RDMA/hfi1: Prevent use of lock before it is initialized RDMA/rxe: Fix an error handling path in rxe_get_mcg() IB/core: Fix typo in comment RDMA/core: Fix typo in comment IB/hf1: Fix typo in comment IB/qib: Fix typo in comment IB/iser: Fix typo in comment RDMA/mlx4: Avoid flush_scheduled_work() usage IB/isert: Avoid flush_scheduled_work() usage RDMA/mlx5: Remove duplicate pointer assignment in mlx5_ib_alloc_implicit_mr() RDMA/qedr: Remove unnecessary synchronize_irq() before free_irq() RDMA/hns: Use hr_reg_read() instead of remaining roce_get_xxx() RDMA/hns: Use hr_reg_xxx() instead of remaining roce_set_xxx() RDMA/irdma: Add SW mechanism to generate completions on error ...
2022-05-27Merge tag 'hardening-v5.19-rc1-fix1' of ↵Linus Torvalds6-6/+6
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull kernel hardening fix from Kees Cook: "This fixes an unlucky build race condition when using the GCC plugins, noticed by a few folks. - Avoid GCC plugins needing utsrelease.h build target (Masahiro Yamada)" * tag 'hardening-v5.19-rc1-fix1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: gcc-plugins: use KERNELVERSION for plugin version
2022-05-27Merge tag 'nfsd-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linuxLinus Torvalds30-426/+985
Pull nfsd updates from Chuck Lever: "We introduce 'courteous server' in this release. Previously NFSD would purge open and lock state for an unresponsive client after one lease period (typically 90 seconds). Now, after one lease period, another client can open and lock those files and the unresponsive client's lease is purged; otherwise if the unresponsive client's open and lock state is uncontended, the server retains that open and lock state for up to 24 hours, allowing the client's workload to resume after a lengthy network partition. A longstanding issue with NFSv4 file creation is also addressed. Previously a file creation can fail internally, returning an error to the client, but leave the newly created file in place as an artifact. The file creation code path has been reorganized so that internal failures and race conditions are less likely to result in an unwanted file creation. A fault injector has been added to help exercise paths that are run during kernel metadata cache invalidation. These caches contain information maintained by user space about exported filesystems. Many of our test workloads do not trigger cache invalidation. There is one patch that is needed to support PREEMPT_RT and a fix for an ancient 'sleep while spin-locked' splat that seems to have become easier to hit since v5.18-rc3" * tag 'nfsd-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (36 commits) NFSD: nfsd_file_put() can sleep NFSD: Add documenting comment for nfsd4_release_lockowner() NFSD: Modernize nfsd4_release_lockowner() NFSD: Fix possible sleep during nfsd4_release_lockowner() nfsd: destroy percpu stats counters after reply cache shutdown nfsd: Fix null-ptr-deref in nfsd_fill_super() nfsd: Unregister the cld notifier when laundry_wq create failed SUNRPC: Use RMW bitops in single-threaded hot paths NFSD: Clean up the show_nf_flags() macro NFSD: Trace filecache opens NFSD: Move documenting comment for nfsd4_process_open2() NFSD: Fix whitespace NFSD: Remove dprintk call sites from tail of nfsd4_open() NFSD: Instantiate a struct file when creating a regular NFSv4 file NFSD: Clean up nfsd_open_verified() NFSD: Remove do_nfsd_create() NFSD: Refactor NFSv4 OPEN(CREATE) NFSD: Refactor NFSv3 CREATE NFSD: Refactor nfsd_create_setattr() NFSD: Avoid calling fh_drop_write() twice in do_nfsd_create() ...
2022-05-27Merge tag 'for-linus' of https://github.com/openrisc/linuxLinus Torvalds11-315/+272
Pull OpenRISC updates from Stafford Horne: - A few sparse warning fixups and other cleanups I noticed when working on a recent TLB bug found on a new OpenRISC core bring up. - A few fixup's from me and Jason A Donenfeld to help shutdown OpenRISC platforms when running CI tests * tag 'for-linus' of https://github.com/openrisc/linux: openrisc: Allow power off handler overriding openrisc: Remove unused IMMU tlb workardound openrisc/fault: Fix symbol scope warnings openrisc/delay: Add include to fix symbol not declared warning openrisc/time: Fix symbol scope warnings openrisc/traps: Declare unhandled_exception for asmlinkage openrisc/traps: Remove die_if_kernel function openrisc/traps: Declare file scope symbols as static openrisc: Update litex defconfig to support glibc userland openrisc: Pretty print show_registers memory dumps openrisc: Add syscall details to emergency syscall debugging openrisc: Add support for liteuart emergency printing openrisc: Cleanup emergency print handling openrisc: Add gcc machine instruction flag configuration openrisc: define nop command for simulator reboot openrisc: remove bogus nops and shutdowns openrisc: fix typos in comments