summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2020-12-16asm-generic: force inlining of get_order() to work around gcc10 poor decisionChristophe Leroy1-1/+1
When building mpc885_ads_defconfig with gcc 10.1, the function get_order() appears 50 times in vmlinux: [linux]# ppc-linux-objdump -x vmlinux | grep get_order | wc -l 50 [linux]# size vmlinux text data bss dec hex filename 3842620 675624 135160 4653404 47015c vmlinux In the old days, marking a function 'static inline' was forcing GCC to inline, but since commit ac7c3e4ff401 ("compiler: enable CONFIG_OPTIMIZE_INLINING forcibly") GCC may decide to not inline a function. It looks like GCC 10 is taking poor decisions on this. get_order() compiles into the following tiny function, occupying 20 bytes of text. 0000007c <get_order>: 7c: 38 63 ff ff addi r3,r3,-1 80: 54 63 a3 3e rlwinm r3,r3,20,12,31 84: 7c 63 00 34 cntlzw r3,r3 88: 20 63 00 20 subfic r3,r3,32 8c: 4e 80 00 20 blr By forcing get_order() to be __always_inline, the size of text is reduced by 1940 bytes, that is almost twice the space occupied by 50 times get_order() [linux-powerpc]# size vmlinux text data bss dec hex filename 3840680 675588 135176 4651444 46f9b4 vmlinux Link: https://lkml.kernel.org/r/96c6172d619c51acc5c1c4884b80785c59af4102.1602949927.git.christophe.leroy@csgroup.eu Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Joel Stanley <joel@jms.id.au> Cc: Segher Boessenkool <segher@kernel.crashing.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-16fs/proc: make pde_get() return nothingHui Su1-2/+1
We don't need pde_get()'s return value, so make pde_get() return nothing Link: https://lkml.kernel.org/r/20201211061944.GA2387571@rlk Signed-off-by: Hui Su <sh_def@163.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-16proc: fix lookup in /proc/net subdirectories after setns(2)Alexey Dobriyan4-19/+36
Commit 1fde6f21d90f ("proc: fix /proc/net/* after setns(2)") only forced revalidation of regular files under /proc/net/ However, /proc/net/ is unusual in the sense of /proc/net/foo handlers take netns pointer from parent directory which is old netns. Steps to reproduce: (void)open("/proc/net/sctp/snmp", O_RDONLY); unshare(CLONE_NEWNET); int fd = open("/proc/net/sctp/snmp", O_RDONLY); read(fd, &c, 1); Read will read wrong data from original netns. Patch forces lookup on every directory under /proc/net . Link: https://lkml.kernel.org/r/20201205160916.GA109739@localhost.localdomain Fixes: 1da4d377f943 ("proc: revalidate misc dentries") Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Reported-by: "Rantala, Tommi T. (Nokia - FI/Espoo)" <tommi.t.rantala@nokia.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-16proc: provide details on indirect branch speculationAnand K Mistry2-0/+30
Similar to speculation store bypass, show information about the indirect branch speculation mode of a task in /proc/$pid/status. For testing/benchmarking, I needed to see whether IB (Indirect Branch) speculation (see Spectre-v2) is enabled on a task, to see whether an IBPB instruction should be executed on an address space switch. Unfortunately, this information isn't available anywhere else and currently the only way to get it is to hack the kernel to expose it (like this change). It also helped expose a bug with conditional IB speculation on certain CPUs. Another place this could be useful is to audit the system when using sanboxing. With this change, I can confirm that seccomp-enabled process have IB speculation force disabled as expected when the kernel command line parameter `spectre_v2_user=seccomp`. Since there's already a 'Speculation_Store_Bypass' field, I used that as precedent for adding this one. [amistry@google.com: remove underscores from field name to workaround documentation issue] Link: https://lkml.kernel.org/r/20201106131015.v2.1.I7782b0cedb705384a634cfd8898eb7523562da99@changeid Link: https://lkml.kernel.org/r/20201030172731.1.I7782b0cedb705384a634cfd8898eb7523562da99@changeid Signed-off-by: Anand K Mistry <amistry@google.com> Cc: Anthony Steinhauser <asteinhauser@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Anand K Mistry <amistry@google.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Alexey Gladkov <gladkov.alexey@gmail.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kees Cook <keescook@chromium.org> Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: NeilBrown <neilb@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-16procfs: delete duplicated words + other fixesRandy Dunlap2-3/+3
Delete repeated words in fs/proc/. {the, which} where "which which" was changed to "with which". Link: https://lkml.kernel.org/r/20201028191525.13413-1-rdunlap@infradead.org Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-16alpha: replace bogus in_interrupt()Thomas Gleixner1-1/+1
in_interrupt() is true for a variety of things including bottom half disabled regions. Deducing hard interrupt context from it is dubious at best. Use in_irq() which is true if called in hard interrupt context. Otherwise calling irq_exit() would do more harm than good. Link: https://lkml.kernel.org/r/20201113135832.2202833-1-bigeasy@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Tested-by: Serge Belyshev <belyshev@depni.sinp.msu.ru> Cc: Richard Henderson <rth@twiddle.net> Cc: Matt Turner <mattst88@gmail.com> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-16mm/memory_hotplug: quieting offline operationLaurent Dufour1-1/+1
On PowerPC, when dymically removing memory from a system we can see in the console a lot of messages like this: [ 186.575389] Offlined Pages 4096 This message is displayed on each LMB (256MB) removed, which means that we removing 1TB of memory, this message is displayed 4096 times. Moving it to DEBUG to not flood the console. Link: https://lkml.kernel.org/r/20201211150157.91399-1-ldufour@linux.ibm.com Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: David Hildenbrand <david@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Nathan Lynch <nathanl@linux.ibm.com> Cc: Scott Cheloha <cheloha@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-16mm: fix a race on nr_swap_pagesZhaoyang Huang1-5/+6
The scenario on which "Free swap = -4kB" happens in my system, which is caused by several get_swap_pages racing with each other and show_swap_cache_info happens simutaniously. No need to add a lock on get_swap_page_of_type as we remove "Presub/PosAdd" here. ProcessA ProcessB ProcessC ngoals = 1 ngoals = 1 avail = nr_swap_pages(1) avail = nr_swap_pages(1) nr_swap_pages(1) -= ngoals nr_swap_pages(0) -= ngoals nr_swap_pages = -1 Link: https://lkml.kernel.org/r/1607050340-4535-1-git-send-email-zhaoyang.huang@unisoc.com Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-16Merge tag 'pci-v5.11-changes' of ↵Linus Torvalds82-1867/+2028
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI updates from Bjorn Helgaas: "Enumeration: - Decode PCIe 64 GT/s link speed (Gustavo Pimentel) - Remove unused HAVE_PCI_SET_MWI (Heiner Kallweit) - Reduce pci_set_cacheline_size() message to debug level (Heiner Kallweit) - Fix pci_slot_release() NULL pointer dereference (Jubin Zhong) - Unify ECAM constants in native PCI Express drivers (Krzysztof Wilczyński) - Return u8 from pci_find_capability() and similar (Puranjay Mohan) - Return u16 from pci_find_ext_capability() and similar (Bjorn Helgaas) - Fix ACPI companion lookup for device 0 on the root bus (Rafael J. Wysocki) Resource management: - Keep both device and resource name for config space remaps (Alexander Lobakin) - Bounds-check command-line resource alignment requests (Bjorn Helgaas) - Fix overflow in command-line resource alignment requests (Colin Ian King) Driver binding: - Avoid duplicate IDs in driver dynamic IDs list (Zhenzhong Duan) Power management: - Save/restore Precision Time Measurement Capability for suspend/resume (David E. Box) - Disable PTM during suspend to save power (David E. Box) - Add sysfs attribute for device power state (Maximilian Luz) - Rename pci_wakeup_bus() to pci_resume_bus() (Mika Westerberg) - Do not generate wakeup event when runtime resuming device (Mika Westerberg) - Save/restore ASPM L1SS Capability for suspend/resume (Vidya Sagar) Virtualization: - Mark AMD Raven iGPU ATS as broken in some platforms (Alex Deucher) - Add function 1 DMA alias quirk for Marvell 9215 SATA controller (Bjorn Helgaas) MSI: - Disable MSI for Pericom PCIe-USB adapter (Andy Shevchenko) - Improve warnings for 32-bit-limited MSI support (Vidya Sagar) Error handling: - Cache RCEC EA Capability offset in pci_init_capabilities() (Sean V Kelley) - Rename reset_link() to reset_subordinates() (Sean V Kelley) - Write AER Capability only when we control it (Sean V Kelley) - Clear AER status only when we control AER (Sean V Kelley) - Bind RCEC devices to the Root Port driver (Qiuxu Zhuo) - Recover from RCiEP AER errors (Qiuxu Zhuo) - Recover from RCEC AER errors (Sean V Kelley) - Add pcie_link_rcec() to associate RCiEPs (Sean V Kelley) - Add pcie_walk_rcec() to RCEC AER handling (Sean V Kelley) - Add pcie_walk_rcec() to RCEC PME handling (Sean V Kelley) - Add RCEC AER error injection support (Qiuxu Zhuo) Broadcom iProc PCIe controller driver: - Fix out-of-bound array accesses (Bharat Gooty) - Invalidate correct PAXB inbound windows (Roman Bacik) - Enhance PCIe Link information display (Srinath Mannam) Cadence PCIe controller driver: - Make "cdns,max-outbound-regions" property optional (Kishon Vijay Abraham I) Intel VMD host bridge driver: - Offset client MSI-X vectors (Jon Derrick) - Update type of __iomem pointers (Krzysztof Wilczyński) NVIDIA Tegra PCIe controller driver: - Move "dbi" accesses to post common DWC initialization (Vidya Sagar) - Read "dbi" base address to program in application logic (Vidya Sagar) - Fix ASPM-L1SS advertisement disable code (Vidya Sagar) - Set DesignWare IP version (Vidya Sagar) - Continue unconfig sequence even if parts fail (Vidya Sagar) - Check return value of tegra_pcie_init_controller() (Vidya Sagar) - Disable LTSSM during L2 entry (Vidya Sagar) Qualcomm PCIe controller driver: - Document PCIe bindings for SM8250 SoC (Manivannan Sadhasivam) - Add SM8250 SoC support (Manivannan Sadhasivam) - Add support for configuring BDF to SID mapping for SM8250 (Manivannan Sadhasivam) Renesas R-Car PCIe controller driver: - rcar: Drop unused members from struct rcar_pcie_host (Lad Prabhakar) - PCI: rcar-pci-host: Document r8a774e1 bindings (Lad Prabhakar) - PCI: rcar-pci-host: Convert bindings to json-schema (Yoshihiro Shimoda) - PCI: rcar-pci-host: Document r8a77965 bindings (Yoshihiro Shimoda) Samsung Exynos PCIe controller driver: - Rework driver to support Exynos5433 PCIe PHY (Jaehoon Chung) - Rework driver to support Exynos5433 variant (Jaehoon Chung) - Drop samsung,exynos5440-pcie binding (Marek Szyprowski) - Add the samsung,exynos-pcie binding (Marek Szyprowski) - Add the samsung,exynos-pcie-phy binding (Marek Szyprowski) Synopsys DesignWare PCIe controller driver: - Support multiple ATU memory regions (Rob Herring) - Move intel-gw ATU offset out of driver match data (Rob Herring) - Move "dbi", "dbi2", and "addr_space" resource setup into common code (Rob Herring) - Remove intel-gw unneeded function wrappers (Rob Herring) - Ensure all outbound ATU windows are reset (Rob Herring) - Use the common MSI irq_chip in dra7xx (Rob Herring) - Drop the .set_num_vectors() host op (Rob Herring) - Move MSI interrupt setup into DWC common code (Rob Herring) - Rework MSI initialization (Rob Herring) - Move link handling into common code (Rob Herring) - Move dw_pcie_msi_init() into core (Rob Herring) - Move dw_pcie_setup_rc() to DWC common code (Rob Herring) - Remove unnecessary wrappers around dw_pcie_host_init() (Rob Herring) - Drop keystone duplicated 'num-viewport'" (Rob Herring) - Move inbound and outbound windows to common struct (Rob Herring) - Detect number of iATU windows (Rob Herring) - Warn if non-prefetchable memory aperture size is > 32-bit (Vidya Sagar) - Add support to program ATU for >4GB memory (Vidya Sagar) - Set 32-bit DMA mask for MSI target address allocation (Vidya Sagar) TI J721E PCIe driver: - Fix "ti,syscon-pcie-ctrl" to take argument (Kishon Vijay Abraham I) - Add host mode dt-bindings for TI's J7200 SoC (Kishon Vijay Abraham I) - Add EP mode dt-bindings for TI's J7200 SoC (Kishon Vijay Abraham I) - Get offset within "syscon" from "ti,syscon-pcie-ctrl" phandle arg (Kishon Vijay Abraham I) TI Keystone PCIe controller driver: - Enable compile-testing on !ARM (Alex Dewar)" * tag 'pci-v5.11-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (100 commits) PCI: Add function 1 DMA alias quirk for Marvell 9215 SATA controller PCI/ACPI: Fix companion lookup for device 0 on the root bus PCI: Keep both device and resource name for config space remaps PCI: xgene: Removed unused ".bus_shift" initialisers from pci-xgene.c PCI: vmd: Update type of the __iomem pointers PCI: iproc: Convert to use the new ECAM constants PCI: thunder-pem: Add constant for custom ".bus_shift" initialiser PCI: Unify ECAM constants in native PCI Express drivers PCI: Disable PTM during suspend to save power PCI/PTM: Save/restore Precision Time Measurement Capability for suspend/resume PCI: Mark AMD Raven iGPU ATS as broken in some platforms PCI: j721e: Get offset within "syscon" from "ti,syscon-pcie-ctrl" phandle arg dt-bindings: PCI: Add EP mode dt-bindings for TI's J7200 SoC dt-bindings: PCI: Add host mode dt-bindings for TI's J7200 SoC dt-bindings: pci: ti,j721e: Fix "ti,syscon-pcie-ctrl" to take argument PCI: dwc: Set 32-bit DMA mask for MSI target address allocation PCI: qcom: Add support for configuring BDF to SID mapping for SM8250 PCI: Reduce pci_set_cacheline_size() message to debug level PCI: Remove unused HAVE_PCI_SET_MWI PCI: qcom: Add SM8250 SoC support ...
2020-12-16Merge tag 'acpi-5.11-rc1' of ↵Linus Torvalds34-251/+612
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI updates from Rafael Wysocki: "These update the ACPICA code in the kernel to upstream revision 20201113, fix and clean up some resources manipulation code, extend the enumeration and gpio-line-names property documentation, clean up the handling of _DEP during device enumeration, add a new backlight DMI quirk, clean up transaction handling in the EC driver and make some assorted janitorial changes. Specifics: - Update ACPICA code in the kernel to upstream revision 20201113 with changes as follows: * Add 5 new UUIDs to the known UUID table (Bob Moore) * Remove extreaneous "the" in comments (Colin Ian King) * Add function trace macros to improve debugging (Erik Kaneda) * Fix interpreter memory leak (Erik Kaneda) * Handle "orphan" _REG for GPIO OpRegions (Hans de Goede) - Introduce resource_union() and resource_intersection() helpers and clean up some resource-manipulation code with the help of them (Andy Shevchenko) - Revert problematic commit related to the handling of resources in the ACPI core (Daniel Scally) - Extend the ACPI device enumeration documentation and the gpio-line-names _DSD property documentation, clean up the latter (Flavio Suligoi) - Clean up _DEP handling during device enumeration, modify the list of _DEP exceptions and the handling of it and fix up terminology related to _DEP (Hans de Goede, Rafael Wysocki) - Eliminate in_interrupt() usage from the ACPI EC driver (Sebastian Andrzej Siewior) - Clean up the advance_transaction() routine and related code in the ACPI EC driver (Rafael Wysocki) - Add new backlight quirk for GIGABYTE GB-BXBT-2807 (Jasper St Pierre) - Make assorted janitorial changes in several ACPI-related pieces of code (Hanjun Guo, Jason Yan, Punit Agrawal)" * tag 'acpi-5.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (40 commits) ACPI: scan: Fix up _DEP-related terminology with supplier/consumer ACPI: scan: Drop INT3396 from acpi_ignore_dep_ids[] ACPI: video: Add DMI quirk for GIGABYTE GB-BXBT-2807 Revert "ACPI / resources: Use AE_CTRL_TERMINATE to terminate resources walks" ACPI: scan: Add PNP0D80 to the _DEP exceptions list ACPI: scan: Call acpi_get_object_info() from acpi_add_single_object() ACPI: scan: Add acpi_info_matches_hids() helper ACPICA: Update version to 20201113 ACPICA: Interpreter: fix memory leak by using existing buffer ACPICA: Add function trace macros to improve debugging ACPICA: Also handle "orphan" _REG methods for GPIO OpRegions ACPICA: Remove extreaneous "the" in comments ACPICA: Add 5 new UUIDs to the known UUID table resource: provide meaningful MODULE_LICENSE() in test suite ASoC: Intel: catpt: Replace open coded variant of resource_intersection() ACPI: processor: Drop duplicate setting of shared_cpu_map ACPI: EC: Clean up status flags checks in advance_transaction() ACPI: EC: Untangle error handling in advance_transaction() ACPI: EC: Simplify error handling in advance_transaction() ACPI: EC: Rename acpi_ec_is_gpe_raised() ...
2020-12-16Merge tag 'pm-5.11-rc1' of ↵Linus Torvalds80-1210/+1692
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management updates from Rafael Wysocki: "These update cpufreq (core and drivers), cpuidle (polling state implementation and the PSCI driver), the OPP (operating performance points) framework, devfreq (core and drivers), the power capping RAPL (Running Average Power Limit) driver, the Energy Model support, the generic power domains (genpd) framework, the ACPI device power management, the core system-wide suspend code and power management utilities. Specifics: - Use local_clock() instead of jiffies in the cpufreq statistics to improve accuracy (Viresh Kumar). - Fix up OPP usage in the cpufreq-dt and qcom-cpufreq-nvmem cpufreq drivers (Viresh Kumar). - Clean up the cpufreq core, the intel_pstate driver and the schedutil cpufreq governor (Rafael Wysocki). - Fix up error code paths in the sti-cpufreq and mediatek cpufreq drivers (Yangtao Li, Qinglang Miao). - Fix cpufreq_online() to return error codes instead of success (0) in all cases when it fails (Wang ShaoBo). - Add mt8167 support to the mediatek cpufreq driver and blacklist mt8516 in the cpufreq-dt-platdev driver (Fabien Parent). - Modify the tegra194 cpufreq driver to always return values from the frequency table as the current frequency and clean up that driver (Sumit Gupta, Jon Hunter). - Modify the arm_scmi cpufreq driver to allow it to discover the power scale present in the performance protocol and provide this information to the Energy Model (Lukasz Luba). - Add missing MODULE_DEVICE_TABLE to several cpufreq drivers (Pali Rohár). - Clean up the CPPC cpufreq driver (Ionela Voinescu). - Fix NVMEM_IMX_OCOTP dependency in the imx cpufreq driver (Arnd Bergmann). - Rework the poling interval selection for the polling state in cpuidle (Mel Gorman). - Enable suspend-to-idle for PSCI OSI mode in the PSCI cpuidle driver (Ulf Hansson). - Modify the OPP framework to support empty (node-less) OPP tables in DT for passing dependency information (Nicola Mazzucato). - Fix potential lockdep issue in the OPP core and clean up the OPP core (Viresh Kumar). - Modify dev_pm_opp_put_regulators() to accept a NULL argument and update its users accordingly (Viresh Kumar). - Add frequency changes tracepoint to devfreq (Matthias Kaehlcke). - Add support for governor feature flags to devfreq, make devfreq sysfs file permissions depend on the governor and clean up the devfreq core (Chanwoo Choi). - Clean up the tegra20 devfreq driver and deprecate it to allow another driver based on EMC_STAT to be used instead of it (Dmitry Osipenko). - Add interconnect support to the tegra30 devfreq driver, allow it to take the interconnect and OPP information from DT and clean it up (Dmitry Osipenko). - Add interconnect support to the exynos-bus devfreq driver along with interconnect properties documentation (Sylwester Nawrocki). - Add suport for AMD Fam17h and Fam19h processors to the RAPL power capping driver (Victor Ding, Kim Phillips). - Fix handling of overly long constraint names in the powercap framework (Lukasz Luba). - Fix the wakeup configuration handling for bridges in the ACPI device power management core (Rafael Wysocki). - Add support for using an abstract scale for power units in the Energy Model (EM) and document it (Lukasz Luba). - Add em_cpu_energy() micro-optimization to the EM (Pavankumar Kondeti). - Modify the generic power domains (genpd) framwework to support suspend-to-idle (Ulf Hansson). - Fix creation of debugfs nodes in genpd (Thierry Strudel). - Clean up genpd (Lina Iyer). - Clean up the core system-wide suspend code and make it print driver flags for devices with debug enabled (Alex Shi, Patrice Chotard, Chen Yu). - Modify the ACPI system reboot code to make it prepare for system power off to avoid confusing the platform firmware (Kai-Heng Feng). - Update the pm-graph (multiple changes, mostly usability-related) and cpupower (online and offline CPU information support) PM utilities (Todd Brandt, Brahadambal Srinivasan)" * tag 'pm-5.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (86 commits) cpufreq: Fix cpufreq_online() return value on errors cpufreq: Fix up several kerneldoc comments cpufreq: stats: Use local_clock() instead of jiffies cpufreq: schedutil: Simplify sugov_update_next_freq() cpufreq: intel_pstate: Simplify intel_cpufreq_update_pstate() PM: domains: create debugfs nodes when adding power domains opp: of: Allow empty opp-table with opp-shared dt-bindings: opp: Allow empty OPP tables media: venus: dev_pm_opp_put_*() accepts NULL argument drm/panfrost: dev_pm_opp_put_*() accepts NULL argument drm/lima: dev_pm_opp_put_*() accepts NULL argument PM / devfreq: exynos: dev_pm_opp_put_*() accepts NULL argument cpufreq: qcom-cpufreq-nvmem: dev_pm_opp_put_*() accepts NULL argument cpufreq: dt: dev_pm_opp_put_regulators() accepts NULL argument opp: Allow dev_pm_opp_put_*() APIs to accept NULL opp_table opp: Don't create an OPP table from dev_pm_opp_get_opp_table() cpufreq: dt: Don't (ab)use dev_pm_opp_get_opp_table() to create OPP table opp: Reduce the size of critical section in _opp_kref_release() PM / EM: Micro optimization in em_cpu_energy cpufreq: arm_scmi: Discover the power scale in performance protocol ...
2020-12-16Merge tag 'thermal-v5.11-rc1' of ↵Linus Torvalds31-799/+1278
git://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux Pull thermal updates from Daniel Lezcano: - Add upper and lower limits clamps for the cooling device state in the power allocator governor (Michael Kao) - Add upper and lower limits support for the power allocator governor (Lukasz Luba) - Optimize conditions testing for the trip points (Bernard Zhao) - Replace spin_lock_irqsave by spin_lock in hard IRQ on the rcar driver (Tian Tao) - Add MT8516 dt-bindings and device reset optional support (Fabien Parent) - Add a quiescent period to cool down the PCH when entering S0iX (Sumeet Pawnikar) - Use bitmap API instead of re-inventing the wheel on sun8i (Yangtao Li) - Remove useless NULL check in the hwmon driver (Bernard Zhao) - Update the current state in the cpufreq cooling device only if the frequency change is effective (Zhuguangqing) - Improve the schema validation for the rcar DT bindings (Geert Uytterhoeven) - Fix the user time unit in the documentation (Viresh Kumar) - Add PCI ids for Lewisburg PCH (Andres Freund) - Add hwmon support on amlogic (Martin Blumenstingl) - Fix build failure for PCH entering on in S0iX (Randy Dunlap) - Improve the k_* coefficient for the power allocator governor (Lukasz Luba) - Fix missing const on a sysfs attribute (Rikard Falkeborn) - Remove broken interrupt support on rcar to be replaced by a new one (Niklas Söderlund) - Improve the error code handling at init time on imx8mm (Fabio Estevam) - Compute interval validity once instead at each temperature reading iteration on acerhdf (Daniel Lezcano) - Add r8a779a0 support (Niklas Söderlund) - Add PCI ids for AlderLake PCH and mmio refactoring (Srinivas Pandruvada) - Add RFIM and mailbox support on int340x (Srinivas Pandruvada) - Use macro for temperature calculation on PCH (Sumeet Pawnikar) - Simplify return conditions at probe time on Broadcom (Zheng Yongjun) - Fix workload name on PCH (Srinivas Pandruvada) - Migrate the devfreq cooling device code to the energy model API (Lukasz Luba) - Emit a warning if the thermal_zone_device_update is called without the .get_temp() ops (Daniel Lezcano) - Add critical and hot ops for the thermal zone (Daniel Lezcano) - Remove notification usage when critical is reached on rcar (Daniel Lezcano) - Fix devfreq build when ENERGY_MODEL is not set (Lukasz Luba) * tag 'thermal-v5.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux: (45 commits) thermal/drivers/devfreq_cooling: Fix the build when !ENERGY_MODEL thermal/drivers/rcar: Remove notification usage thermal/core: Add critical and hot ops thermal/core: Emit a warning if the thermal zone is updated without ops drm/panfrost: Register devfreq cooling and attempt to add Energy Model thermal: devfreq_cooling: remove old power model and use EM thermal: devfreq_cooling: add new registration functions with Energy Model thermal: devfreq_cooling: use a copy of device status thermal: devfreq_cooling: change tracing function and arguments thermal: int340x: processor_thermal: Correct workload type name thermal: broadcom: simplify the return expression of bcm2711_thermal_probe() thermal: intel: pch: use macro for temperature calculation thermal: int340x: processor_thermal: Add mailbox driver thermal: int340x: processor_thermal: Add RFIM driver thermal: int340x: processor_thermal: Add AlderLake PCI device id thermal: int340x: processor_thermal: Refactor MMIO interface thermal: rcar_gen3_thermal: Add r8a779a0 support dt-bindings: thermal: rcar-gen3-thermal: Add r8a779a0 support platform/x86/drivers/acerhdf: Check the interval value when it is set platform/x86/drivers/acerhdf: Use module_param_cb to set/get polling interval ...
2020-12-16Merge branch 'for-linus' of ↵Linus Torvalds121-2381/+2989
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input Pull input updates from Dmitry Torokhov: - support for inhibiting input devices at request from userspace. If a device implements open/close methods, it can also put device into low power state. This is needed, for example, to disable keyboard and touchpad on convertibles when they are transitioned into tablet mode - now that ordinary input devices can be configured for polling mode, dedicated input polling device implementation has been removed - GTCO tablet driver has been removed, as it used problematic custom HID parser, devices are EOL, and there is no interest from the manufacturer - a new driver for Dialog DA7280 haptic chips has been introduced - a new driver for power button on Dell Wyse 3020 - support for eKTF2132 in ektf2127 driver - support for SC2721 and SC2730 in sc27xx-vibra driver - enhancements for Atmel touchscreens, AD7846 touchscreens, Elan touchpads, ADP5589, ST1232 touchscreen, TM2 touchkey drivers - fixes and cleanups to allow clean builds with W=1 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (86 commits) Input: da7280 - fix spelling mistake "sequemce" -> "sequence" Input: cyapa_gen6 - fix out-of-bounds stack access Input: sc27xx - add support for sc2730 and sc2721 dt-bindings: input: Add compatible string for SC2721 and SC2730 dt-bindings: input: Convert sc27xx-vibra.txt to json-schema Input: stmpe - add axis inversion and swapping capability Input: adp5589-keys - do not explicitly control IRQ for wakeup Input: adp5589-keys - do not unconditionally configure as wakeup source Input: ipx4xx-beeper - convert comma to semicolon Input: parkbd - convert comma to semicolon Input: new da7280 haptic driver dt-bindings: input: Add document bindings for DA7280 MAINTAINERS: da7280 updates to the Dialog Semiconductor search terms Input: elantech - fix protocol errors for some trackpoints in SMBus mode Input: elan_i2c - add new trackpoint report type 0x5F Input: elants - document some registers and values Input: atmel_mxt_ts - simplify the return expression of mxt_send_bootloader_cmd() Input: imx_keypad - add COMPILE_TEST support Input: applespi - use new structure for SPI transfer delays Input: synaptics-rmi4 - use new structure for SPI transfer delays ...
2020-12-16Merge tag 'platform-drivers-x86-v5.11-1' of ↵Linus Torvalds58-257/+6595
git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86 Pull x86 platform driver updates from Hans de Goede: "Highlights: - New driver for changing BIOS settings from within Linux on Dell devices. This introduces a new generic sysfs API for this. Lenovo is working on also supporting this API on their devices - New Intel PMT telemetry and crashlog drivers - Support for SW_TABLET_MODE reporting for the acer-wmi and intel-hid drivers - Preparation work for improving support for Microsoft Surface hardware - Various fixes / improvements / quirks for the panasonic-laptop and others" * tag 'platform-drivers-x86-v5.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: (81 commits) platform/x86: ISST: Mark mmio_range_devid_0 and mmio_range_devid_1 with static keyword platform/x86: intel-hid: add Rocket Lake ACPI device ID x86/platform: classmate-laptop: add WiFi media button platform/x86: mlx-platform: Fix item counter assignment for MSN2700/ComEx system platform/x86: mlx-platform: Fix item counter assignment for MSN2700, MSN24xx systems tools/power/x86/intel-speed-select: Update version for v5.11 tools/power/x86/intel-speed-select: Account for missing sysfs for die_id tools/power/x86/intel-speed-select: Read TRL from mailbox platform/x86: intel-hid: Do not create SW_TABLET_MODE input-dev when a KIOX010A ACPI dev is present platform/x86: intel-hid: Add alternative method to enable switches platform/x86: intel-hid: Add support for SW_TABLET_MODE platform/x86: intel-vbtn: Fix SW_TABLET_MODE always reporting 1 on some HP x360 models platform/x86: ISST: Change PCI device macros platform/x86: ISST: Allow configurable offset range platform/x86: ISST: Check for unaligned mmio address acer-wireless: send an EV_SYN/SYN_REPORT between state changes platform/x86: dell-wmi-sysman: work around for BIOS bug platform/x86: mlx-platform: remove an unused variable platform/x86: thinkpad_acpi: remove trailing semicolon in macro definition platform/x86: dell-smbios-base: Fix error return code in dell_smbios_init ...
2020-12-16Merge tag 'hwmon-for-v5.11' of ↵Linus Torvalds81-316/+3875
git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging Pull hwmon updates from Guenter Roeck: "New drivers: - SB-TSI sensors - Lineat Technology LTC2992 - Delta power supplies Q54SJ108A2 - Maxim MAX127 - Corsair PSU - STMicroelectronics PM6764 Voltage Regulator New chip support: - P10 added to fsi/occ driver - NCT6687D added to nct6883 driver - Intel-based Xserves added to applesmc driver - AMD family 19h model 01h added to amd_energy driver And various minor bug fixes and improvements" * tag 'hwmon-for-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging: (41 commits) dt-bindings: (hwmon/sbtsi_temp) Add SB-TSI hwmon driver bindings hwmon: (sbtsi) Add documentation hwmon: (sbtsi) Add basic support for SB-TSI sensors hwmon: (iio_hwmon) Drop bogus __refdata annotation hwmon: (xgene) Drop bogus __refdata annotation dt-bindings: hwmon: convert AD ADM1275 bindings to dt-schema hwmon: (occ) Add new temperature sensor type fsi: occ: Add support for P10 dt-bindings: fsi: Add P10 OCC device documentation dt-bindings: hwmon: convert TI ADS7828 bindings to dt-schema dt-bindings: hwmon: convert AD AD741x bindings to dt-schema dt-bindings: hwmon: convert TI INA2xx bindings to dt-schema hwmon: (ltc2992) Fix less than zero comparisons with an unsigned integer hwmon: (pmbus/q54sj108a2) Correct title underline length dt-bindings: hwmon: Add documentation for ltc2992 hwmon: (ltc2992) Add support for GPIOs. hwmon: (ltc2992) Add support hwmon: (pmbus) Driver for Delta power supplies Q54SJ108A2 hwmon: Add driver for STMicroelectronics PM6764 Voltage Regulator hwmon: (nct6683) Support NCT6687D. ...
2020-12-16Merge tag 'mmc-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmcLinus Torvalds52-454/+839
Pull MMC updates from Ulf Hansson: "MMC core: - Initial support for SD express card/host MMC host: - mxc: Convert the driver to DT-only - mtk-sd: Add HS400 enhanced strobe support - mtk-sd: Add support for the MT8192 SoC variant - sdhci-acpi: Allow changing HS200/HS400 driver strength for AMDI0040 - sdhci-esdhc-imx: Convert the driver to DT-only - sdhci-pci-gli: Improve performance for HS400 mode for GL9763E - sdhci-pci-gli: Reduce power consumption for GL9755 - sdhci-xenon: Introduce ACPI support - tmio: Fix command error processing - tmio: Inform the core about the max_busy_timeout - tmio/renesas_sdhi: Support custom calculation of busy-wait time - renesas_sdhi: Reset SCC only when available - rtsx_pci: Add SD Express mode support for RTS5261 - rtsx_pci: Various fixes and improvements for RTS5261 MEMSTICK: - Minor fixes/improvements" * tag 'mmc-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: (72 commits) dt-bindings: mmc: eliminate yamllint warnings mmc: sdhci-xenon: introduce ACPI support mmc: sdhci-xenon: use clk only with DT mmc: sdhci-xenon: switch to device_* API mmc: sdhci-xenon: use match data for controllers variants dt-bindings: mmc: Fix xlnx,mio-bank property values for arasan driver mmc: renesas_sdhi: populate hook for longer busy_wait mmc: tmio: add hook for custom busy_wait calculation mmc: tmio: set max_busy_timeout dt-bindings: mmc: imx: fix the wrongly dropped imx8qm compatible string mmc: sdhci-pci-gli: Disable slow mode in HS400 mode for GL9763E mmc: sdhci: Use more concise device_property_read_u64 memstick: r592: Fix error return in r592_probe() mmc: mxc: Convert the driver to DT-only mmc: mxs: Remove the unused .id_table mmc: sdhci-of-arasan: Fix fall-through warnings for Clang mmc: sdhci-pci-gli: Reduce power consumption for GL9755 mmc: mediatek: depend on COMMON_CLK to fix compile tests mmc: pxamci: Fix error return code in pxamci_probe mmc: sdhci: Update firmware interface API ...
2020-12-16Merge branch 'i2c/for-5.11' of ↵Linus Torvalds25-340/+604
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c updates from Wolfram Sang: "A bit smaller this time with mostly usual driver updates. Slave support for imx stands out a little" * 'i2c/for-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: (30 commits) i2c: remove check that can never be true i2c: Warn when device removing fails dt-bindings: i2c: Update DT binding docs to support SiFive FU740 SoC dt-bindings: i2c: Add compatible string for AM64 SoC i2c: designware: Make register offsets all of the same width i2c: designware: Switch header to use BIT() and GENMASK() i2c: pxa: move to generic GPIO recovery i2c: sh_mobile: Mark adapter suspended during suspend i2c: owl: Add compatible for the Actions Semi S500 I2C controller dt-bindings: i2c: owl: Convert Actions Semi Owl binding to a schema i2c: imx: support slave mode for imx I2C driver i2c: ismt: Adding support for I2C_SMBUS_BLOCK_PROC_CALL i2c: ocores: Avoid false-positive error log message. Revert "i2c: qcom-geni: Disable DMA processing on the Lenovo Yoga C630" i2c: mxs: Remove unneeded platform_device_id i2c: pca-platform: drop two members from driver data that are assigned to only i2c: imx: Remove unused .id_table support i2c: nvidia-gpu: drop empty stub for runtime pm dt-bindings: i2c: mellanox,i2c-mlxbf: convert txt to YAML schema i2c: mv64xxx: Add bus error recovery ...
2020-12-16Merge tag 'spi-v5.11' of ↵Linus Torvalds49-239/+482
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi Pull spi updates from Mark Brown: "The big change this release has been some excellent work from Lukas Wunner which closes a bunch of holes in the cleanup paths for drivers, mainly introduced as a result of devm conversions causing bad interactions with the support SPI has for allocating the bus and driver data together. Together with some of the other work done it feels like we've turned the corner on several long standing pain points with the API. Summary: - Many cleanups around probe/remove and error handling from Lukas Wunner and Uwe Kleine-König, and further fixes around PM from Zhang Qilong. - Provide a mask for which bits of the mode can safely be configured by drivers and use that to fix an issue with the ADS7846 driver. - Documentation of the expected interactions between SPI and GPIO level chip select polarity configuration from H. Nikolaus Schaller, hopefully we're pretty much at the end of sorting out the interactions there. Thanks to Nikolaus, Sven Van Asbroeck and Linus Walleij for this. - DMA support for Allwinner sun6i controllers. - Support for Canaan K210 Designware implementations and Intel Adler Lake" * tag 'spi-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: (69 commits) spi: dt-bindings: clarify CS behavior for spi-cs-high and gpio descriptors spi: Limit the spi device max speed to controller's max speed spi: spi-geni-qcom: Use the new method of gpio CS control platform/chrome: cros_ec_spi: Drop bits_per_word assignment platform/chrome: cros_ec_spi: Don't overwrite spi::mode spi: dw: Add support for the Canaan K210 SoC SPI spi: dw: Add support for 32-bits max xfer size dt-bindings: spi: dw-apb-ssi: Add Canaan K210 SPI controller spi: Update DT binding docs to support SiFive FU740 SoC spi: atmel-quadspi: Fix use-after-free on unbind spi: npcm-fiu: Disable clock in probe error path spi: ar934x: Don't leak SPI master in probe error path spi: mt7621: Don't leak SPI master in probe error path spi: mt7621: Disable clock in probe error path media: netup_unidvb: Don't leak SPI master in probe error path spi: sc18is602: Don't leak SPI master in probe error path spi: rb4xx: Don't leak SPI master in probe error path spi: gpio: Don't leak SPI master in probe error path spi: spi-mtk-nor: Don't leak SPI master in probe error path spi: mxic: Don't leak SPI master in probe error path ...
2020-12-16Merge tag 'regulator-v5.11' of ↵Linus Torvalds37-84/+3876
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator Pull regulator updates from Mark Brown: "This has been a quiet release for the regulator API, a few new drivers and the usual fixes and cleanup traffic but not much else going on: - Optimisations for the handling of voltage enumeration, especially with sparse selector sets, from Claudiu Beznea. - Support for several ARM SCMI regulators, Dialog DA9121, NXP PF8x00, Qualcomm PMX55, PM8350 and PM8350c The addition of the SCMI regulator driver (which controls regulators via system firmware) means that we've pulled in the support for the underlying firmware operations from the firmware tree" * tag 'regulator-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: (53 commits) regulator: mc13892-regulator: convert comma to semicolon regulator: pfuze100: Convert the driver to DT-only regulator: max14577: Add proper module aliases strings regulator: da9121: Potential Oops in da9121_assign_chip_model() regulator: da9121: Fix index used for DT property regulator: da9121: Remove uninitialised string variable regulator: axp20x: Fix DLDO2 voltage control register mask for AXP22x regulator: qcom-rpmh: Add support for PM8350/PM8350c regulator: dt-bindings: Add PM8350x compatibles regulator: da9121: include linux/gpio/consumer.h regulator: da9121: Mark some symbols with static keyword regulator: da9121: Request IRQ directly and free in release function to avoid masking race regulator: da9121: add interrupt support regulator: da9121: add mode support regulator: da9121: add current support regulator: da9121: Update registration to support multiple buck variants regulator: da9121: Add support for device variants via devicetree regulator: da9121: Add device variant descriptors regulator: da9121: Add device variant regmaps regulator: da9121: Add device variants ...
2020-12-16Merge tag 'regmap-v5.11' of ↵Linus Torvalds11-1530/+96
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap Pull regmap updates from Mark Brown: "This is quite a busy release for regmap with two substantial features being added: - Support for register maps Soundwire 1.2 multi-byte operations, allowing atomic support for registers larger than a single byte. - Support for relaxed I/O without barriers in MMIO regmaps, allowing them to be used efficiently on systems where default MMIO operations include barriers. There was also an addition and revert of use of the new Soundwire support for RT715 due to build issues with the driver built in, my tests only covered building it as a module, the patch wasn't just dropped as it had already been merged elsewhere" * tag 'regmap-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap: ASoC: rt715: Fix build regmap: sdw: add required header files regmap: Remove duplicate `type` field from regmap `regcache_sync` trace event regmap: Fix order of regmap write log regmap: mmio: add config option to allow relaxed MMIO accesses
2020-12-16Merge tag 'irq-core-2020-12-15' of ↵Linus Torvalds58-626/+758
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq updates from Thomas Gleixner: "Generic interrupt and irqchips subsystem updates. Unusually, there is not a single completely new irq chip driver, just new DT bindings and extensions of existing drivers to accomodate new variants! Core: - Consolidation and robustness changes for irq time accounting - Cleanup and consolidation of irq stats - Remove the fasteoi IPI flow which has been proved useless - Provide an interface for converting legacy interrupt mechanism into irqdomains Drivers: - Preliminary support for managed interrupts on platform devices - Correctly identify allocation of MSIs proxyied by another device - Generalise the Ocelot support to new SoCs - Improve GICv4.1 vcpu entry, matching the corresponding KVM optimisation - Work around spurious interrupts on Qualcomm PDC - Random fixes and cleanups" * tag 'irq-core-2020-12-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (54 commits) irqchip/qcom-pdc: Fix phantom irq when changing between rising/falling driver core: platform: Add devm_platform_get_irqs_affinity() ACPI: Drop acpi_dev_irqresource_disabled() resource: Add irqresource_disabled() genirq/affinity: Add irq_update_affinity_desc() irqchip/gic-v3-its: Flag device allocation as proxied if behind a PCI bridge irqchip/gic-v3-its: Tag ITS device as shared if allocating for a proxy device platform-msi: Track shared domain allocation irqchip/ti-sci-intr: Fix freeing of irqs irqchip/ti-sci-inta: Fix printing of inta id on probe success drivers/irqchip: Remove EZChip NPS interrupt controller Revert "genirq: Add fasteoi IPI flow" irqchip/hip04: Make IPIs use handle_percpu_devid_irq() irqchip/bcm2836: Make IPIs use handle_percpu_devid_irq() irqchip/armada-370-xp: Make IPIs use handle_percpu_devid_irq() irqchip/gic, gic-v3: Make SGIs use handle_percpu_devid_irq() irqchip/ocelot: Add support for Jaguar2 platforms irqchip/ocelot: Add support for Serval platforms irqchip/ocelot: Add support for Luton platforms irqchip/ocelot: prepare to support more SoC ...
2020-12-16Merge branch 'akpm' (patches from Andrew)Linus Torvalds21-372/+536
Merge more updates from Andrew Morton: "More MM work: a memcg scalability improvememt" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: mm/lru: revise the comments of lru_lock mm/lru: introduce relock_page_lruvec() mm/lru: replace pgdat lru_lock with lruvec lock mm/swap.c: serialize memcg changes in pagevec_lru_move_fn mm/compaction: do page isolation first in compaction mm/lru: introduce TestClearPageLRU() mm/mlock: remove __munlock_isolate_lru_page() mm/mlock: remove lru_lock on TestClearPageMlocked mm/vmscan: remove lruvec reget in move_pages_to_lru mm/lru: move lock into lru_note_cost mm/swap.c: fold vm event PGROTATED into pagevec_move_tail_fn mm/memcg: add debug checking in lock_page_memcg mm: page_idle_get_page() does not need lru_lock mm/rmap: stop store reordering issue on page->mapping mm/vmscan: remove unnecessary lruvec adding mm/thp: narrow lru locking mm/thp: simplify lru_add_page_tail() mm/thp: use head for head page in lru_add_page_tail() mm/thp: move lru_add_page_tail() to huge_memory.c
2020-12-16mm/lru: revise the comments of lru_lockHugh Dickins9-64/+50
Since we changed the pgdat->lru_lock to lruvec->lru_lock, it's time to fix the incorrect comments in code. Also fixed some zone->lru_lock comment error from ancient time. etc. I struggled to understand the comment above move_pages_to_lru() (surely it never calls page_referenced()), and eventually realized that most of it had got separated from shrink_active_list(): move that comment back. Link: https://lkml.kernel.org/r/1604566549-62481-20-git-send-email-alex.shi@linux.alibaba.com Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Tejun Heo <tj@kernel.org> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Jann Horn <jannh@google.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Matthew Wilcox <willy@infradead.org> Cc: Alexander Duyck <alexander.duyck@gmail.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: "Chen, Rong A" <rong.a.chen@intel.com> Cc: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Cc: Michal Hocko <mhocko@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mika Penttilä <mika.penttila@nextfour.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Shakeel Butt <shakeelb@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Yang Shi <yang.shi@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-16mm/lru: introduce relock_page_lruvec()Alexander Duyck4-46/+62
Add relock_page_lruvec() to replace repeated same code, no functional change. When testing for relock we can avoid the need for RCU locking if we simply compare the page pgdat and memcg pointers versus those that the lruvec is holding. By doing this we can avoid the extra pointer walks and accesses of the memory cgroup. In addition we can avoid the checks entirely if lruvec is currently NULL. [alex.shi@linux.alibaba.com: use page_memcg()] Link: https://lkml.kernel.org/r/66d8e79d-7ec6-bfbc-1c82-bf32db3ae5b7@linux.alibaba.com Link: https://lkml.kernel.org/r/1604566549-62481-19-git-send-email-alex.shi@linux.alibaba.com Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com> Acked-by: Hugh Dickins <hughd@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Cc: Tejun Heo <tj@kernel.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: "Chen, Rong A" <rong.a.chen@intel.com> Cc: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Jann Horn <jannh@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Michal Hocko <mhocko@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mika Penttilä <mika.penttila@nextfour.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Shakeel Butt <shakeelb@google.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Yang Shi <yang.shi@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-16mm/lru: replace pgdat lru_lock with lruvec lockAlex Shi10-126/+275
This patch moves per node lru_lock into lruvec, thus bring a lru_lock for each of memcg per node. So on a large machine, each of memcg don't have to suffer from per node pgdat->lru_lock competition. They could go fast with their self lru_lock. After move memcg charge before lru inserting, page isolation could serialize page's memcg, then per memcg lruvec lock is stable and could replace per node lru lock. In isolate_migratepages_block(), compact_unlock_should_abort and lock_page_lruvec_irqsave are open coded to work with compact_control. Also add a debug func in locking which may give some clues if there are sth out of hands. Daniel Jordan's testing show 62% improvement on modified readtwice case on his 2P * 10 core * 2 HT broadwell box. https://lore.kernel.org/lkml/20200915165807.kpp7uhiw7l3loofu@ca-dmjordan1.us.oracle.com/ Hugh Dickins helped on the patch polish, thanks! [alex.shi@linux.alibaba.com: fix comment typo] Link: https://lkml.kernel.org/r/5b085715-292a-4b43-50b3-d73dc90d1de5@linux.alibaba.com [alex.shi@linux.alibaba.com: use page_memcg()] Link: https://lkml.kernel.org/r/5a4c2b72-7ee8-2478-fc0e-85eb83aafec4@linux.alibaba.com Link: https://lkml.kernel.org/r/1604566549-62481-18-git-send-email-alex.shi@linux.alibaba.com Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com> Acked-by: Hugh Dickins <hughd@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Rong Chen <rong.a.chen@intel.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Yang Shi <yang.shi@linux.alibaba.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Cc: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: Alexander Duyck <alexander.duyck@gmail.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Jann Horn <jannh@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@suse.com> Cc: Mika Penttilä <mika.penttila@nextfour.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Shakeel Butt <shakeelb@google.com> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wei Yang <richard.weiyang@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-16mm/swap.c: serialize memcg changes in pagevec_lru_move_fnAlex Shi1-9/+35
Hugh Dickins' found a memcg change bug on original version: If we want to change the pgdat->lru_lock to memcg's lruvec lock, we have to serialize mem_cgroup_move_account during pagevec_lru_move_fn. The possible bad scenario would like: cpu 0 cpu 1 lruvec = mem_cgroup_page_lruvec() if (!isolate_lru_page()) mem_cgroup_move_account spin_lock_irqsave(&lruvec->lru_lock <== wrong lock. So we need TestClearPageLRU to block isolate_lru_page(), that serializes the memcg change. and then removing the PageLRU check in move_fn callee as the consequence. __pagevec_lru_add_fn() is different from the others, because the pages it deals with are, by definition, not yet on the lru. TestClearPageLRU is not needed and would not work, so __pagevec_lru_add() goes its own way. Link: https://lkml.kernel.org/r/1604566549-62481-17-git-send-email-alex.shi@linux.alibaba.com Reported-by: Hugh Dickins <hughd@google.com> Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com> Acked-by: Hugh Dickins <hughd@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Alexander Duyck <alexander.duyck@gmail.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: "Chen, Rong A" <rong.a.chen@intel.com> Cc: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Jann Horn <jannh@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mika Penttilä <mika.penttila@nextfour.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Shakeel Butt <shakeelb@google.com> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Yang Shi <yang.shi@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-16mm/compaction: do page isolation first in compactionAlex Shi3-31/+56
Currently, compaction would get the lru_lock and then do page isolation which works fine with pgdat->lru_lock, since any page isoltion would compete for the lru_lock. If we want to change to memcg lru_lock, we have to isolate the page before getting lru_lock, thus isoltion would block page's memcg change which relay on page isoltion too. Then we could safely use per memcg lru_lock later. The new page isolation use previous introduced TestClearPageLRU() + pgdat lru locking which will be changed to memcg lru lock later. Hugh Dickins <hughd@google.com> fixed following bugs in this patch's early version: Fix lots of crashes under compaction load: isolate_migratepages_block() must clean up appropriately when rejecting a page, setting PageLRU again if it had been cleared; and a put_page() after get_page_unless_zero() cannot safely be done while holding locked_lruvec - it may turn out to be the final put_page(), which will take an lruvec lock when PageLRU. And move __isolate_lru_page_prepare back after get_page_unless_zero to make trylock_page() safe: trylock_page() is not safe to use at this time: its setting PG_locked can race with the page being freed or allocated ("Bad page"), and can also erase flags being set by one of those "sole owners" of a freshly allocated page who use non-atomic __SetPageFlag(). Link: https://lkml.kernel.org/r/1604566549-62481-16-git-send-email-alex.shi@linux.alibaba.com Suggested-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com> Acked-by: Hugh Dickins <hughd@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Matthew Wilcox <willy@infradead.org> Cc: Alexander Duyck <alexander.duyck@gmail.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: "Chen, Rong A" <rong.a.chen@intel.com> Cc: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Jann Horn <jannh@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mika Penttilä <mika.penttila@nextfour.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Shakeel Butt <shakeelb@google.com> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Yang Shi <yang.shi@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-16mm/lru: introduce TestClearPageLRU()Alex Shi3-22/+21
Currently lru_lock still guards both lru list and page's lru bit, that's ok. but if we want to use specific lruvec lock on the page, we need to pin down the page's lruvec/memcg during locking. Just taking lruvec lock first may be undermined by the page's memcg charge/migration. To fix this problem, we will clear the lru bit out of locking and use it as pin down action to block the page isolation in memcg changing. So now a standard steps of page isolation is following: 1, get_page(); #pin the page avoid to be free 2, TestClearPageLRU(); #block other isolation like memcg change 3, spin_lock on lru_lock; #serialize lru list access 4, delete page from lru list; This patch start with the first part: TestClearPageLRU, which combines PageLRU check and ClearPageLRU into a macro func TestClearPageLRU. This function will be used as page isolation precondition to prevent other isolations some where else. Then there are may !PageLRU page on lru list, need to remove BUG() checking accordingly. There 2 rules for lru bit now: 1, the lru bit still indicate if a page on lru list, just in some temporary moment(isolating), the page may have no lru bit when it's on lru list. but the page still must be on lru list when the lru bit set. 2, have to remove lru bit before delete it from lru list. As Andrew Morton mentioned this change would dirty cacheline for a page which isn't on the LRU. But the loss would be acceptable in Rong Chen <rong.a.chen@intel.com> report: https://lore.kernel.org/lkml/20200304090301.GB5972@shao2-debian/ Link: https://lkml.kernel.org/r/1604566549-62481-15-git-send-email-alex.shi@linux.alibaba.com Suggested-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com> Acked-by: Hugh Dickins <hughd@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Michal Hocko <mhocko@kernel.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Alexander Duyck <alexander.duyck@gmail.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Jann Horn <jannh@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@suse.com> Cc: Mika Penttilä <mika.penttila@nextfour.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Shakeel Butt <shakeelb@google.com> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Yang Shi <yang.shi@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-16mm/mlock: remove __munlock_isolate_lru_page()Alex Shi1-22/+9
__munlock_isolate_lru_page() only has one caller, remove it to clean up and simplify code. Link: https://lkml.kernel.org/r/1604566549-62481-14-git-send-email-alex.shi@linux.alibaba.com Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com> Acked-by: Hugh Dickins <hughd@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Alexander Duyck <alexander.duyck@gmail.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: "Chen, Rong A" <rong.a.chen@intel.com> Cc: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Jann Horn <jannh@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mika Penttilä <mika.penttila@nextfour.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Shakeel Butt <shakeelb@google.com> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Yang Shi <yang.shi@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-16mm/mlock: remove lru_lock on TestClearPageMlockedAlex Shi1-21/+5
In the func munlock_vma_page, comments mentained lru_lock needed for serialization with split_huge_pages. But the page must be PageLocked as well as pages in split_huge_page series funcs. Thus the PageLocked is enough to serialize both funcs. Further more, Hugh Dickins pointed: before splitting in split_huge_page_to_list, the page was unmap_page() to remove pmd/ptes which protect the page from munlock. Thus, no needs to guard __split_huge_page_tail for mlock clean, just keep the lru_lock there for isolation purpose. LKP found a preempt issue on __mod_zone_page_state which need change to mod_zone_page_state. Thanks! Link: https://lkml.kernel.org/r/1604566549-62481-13-git-send-email-alex.shi@linux.alibaba.com Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com> Acked-by: Hugh Dickins <hughd@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Alexander Duyck <alexander.duyck@gmail.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: "Chen, Rong A" <rong.a.chen@intel.com> Cc: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Jann Horn <jannh@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mika Penttilä <mika.penttila@nextfour.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Shakeel Butt <shakeelb@google.com> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Yang Shi <yang.shi@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-16mm/vmscan: remove lruvec reget in move_pages_to_lruAlex Shi1-1/+6
Isolated page shouldn't be recharged by memcg since the memcg migration isn't possible at the time. All pages were isolated from the same lruvec (and isolation inhibits memcg migration). So remove unnecessary regetting. Thanks to Alexander Duyck for pointing this out. Link: https://lkml.kernel.org/r/1604566549-62481-12-git-send-email-alex.shi@linux.alibaba.com Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com> Acked-by: Hugh Dickins <hughd@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Cc: Michal Hocko <mhocko@kernel.org> Cc: Alexander Duyck <alexander.duyck@gmail.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: "Chen, Rong A" <rong.a.chen@intel.com> Cc: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Jann Horn <jannh@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@suse.com> Cc: Mika Penttilä <mika.penttila@nextfour.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Shakeel Butt <shakeelb@google.com> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Yang Shi <yang.shi@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-16mm/lru: move lock into lru_note_costAlex Shi3-5/+4
We have to move lru_lock into lru_note_cost, since it cycle up on memcg tree, for future per lruvec lru_lock replace. It's a bit ugly and may cost a bit more locking, but benefit from multiple memcg locking could cover the lost. Link: https://lkml.kernel.org/r/1604566549-62481-11-git-send-email-alex.shi@linux.alibaba.com Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com> Acked-by: Hugh Dickins <hughd@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Alexander Duyck <alexander.duyck@gmail.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: "Chen, Rong A" <rong.a.chen@intel.com> Cc: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Jann Horn <jannh@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mika Penttilä <mika.penttila@nextfour.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Shakeel Butt <shakeelb@google.com> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Yang Shi <yang.shi@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-16mm/swap.c: fold vm event PGROTATED into pagevec_move_tail_fnAlex Shi1-42/+23
Fold the PGROTATED event collection into pagevec_move_tail_fn call back func like other funcs does in pagevec_lru_move_fn. Thus we could save func call pagevec_move_tail(). Now all usage of pagevec_lru_move_fn are same and no needs of its 3rd parameter. It's just simply the calling. No functional change. [lkp@intel.com: found a build issue in the original patch, thanks] Link: https://lkml.kernel.org/r/1604566549-62481-10-git-send-email-alex.shi@linux.alibaba.com Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com> Acked-by: Hugh Dickins <hughd@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Alexander Duyck <alexander.duyck@gmail.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: "Chen, Rong A" <rong.a.chen@intel.com> Cc: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Jann Horn <jannh@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mika Penttilä <mika.penttila@nextfour.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Shakeel Butt <shakeelb@google.com> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Yang Shi <yang.shi@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-16mm/memcg: add debug checking in lock_page_memcgAlex Shi1-0/+6
Add a debug checking in lock_page_memcg, then we could get alarm if anything wrong here. Link: https://lkml.kernel.org/r/1604566549-62481-9-git-send-email-alex.shi@linux.alibaba.com Suggested-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com> Acked-by: Hugh Dickins <hughd@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Alexander Duyck <alexander.duyck@gmail.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: "Chen, Rong A" <rong.a.chen@intel.com> Cc: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Jann Horn <jannh@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@suse.com> Cc: Mika Penttilä <mika.penttila@nextfour.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Shakeel Butt <shakeelb@google.com> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Yang Shi <yang.shi@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-16mm: page_idle_get_page() does not need lru_lockHugh Dickins1-4/+0
It is necessary for page_idle_get_page() to recheck PageLRU() after get_page_unless_zero(), but holding lru_lock around that serves no useful purpose, and adds to lru_lock contention: delete it. See https://lore.kernel.org/lkml/20150504031722.GA2768@blaptop for the discussion that led to lru_lock there; but __page_set_anon_rmap() now uses WRITE_ONCE(), and I see no other risk in page_idle_clear_pte_refs() using rmap_walk() (beyond the risk of racing PageAnon->PageKsm, mostly but not entirely prevented by page_count() check in ksm.c's write_protect_page(): that risk being shared with page_referenced() and not helped by lru_lock). Link: https://lkml.kernel.org/r/1604566549-62481-8-git-send-email-alex.shi@linux.alibaba.com Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: "Huang, Ying" <ying.huang@intel.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Alex Shi <alex.shi@linux.alibaba.com> Cc: Alexander Duyck <alexander.duyck@gmail.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: "Chen, Rong A" <rong.a.chen@intel.com> Cc: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: Jann Horn <jannh@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mika Penttilä <mika.penttila@nextfour.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Yang Shi <yang.shi@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-16mm/rmap: stop store reordering issue on page->mappingAlex Shi1-1/+7
Hugh Dickins and Minchan Kim observed a long time issue which discussed here, but actully the mentioned fix in https://lore.kernel.org/lkml/20150504031722.GA2768@blaptop/ was missed. The store reordering may cause problem in the scenario: CPU 0 CPU1 do_anonymous_page page_add_new_anon_rmap() page->mapping = anon_vma + PAGE_MAPPING_ANON lru_cache_add_inactive_or_unevictable() spin_lock(lruvec->lock) SetPageLRU() spin_unlock(lruvec->lock) /* idletacking judged it as LRU * page so pass the page in * page_idle_clear_pte_refs */ page_idle_clear_pte_refs rmap_walk if PageAnon(page) Johannes give detailed examples how the store reordering could cause trouble: "The concern is the SetPageLRU may get reorder before 'page->mapping' setting, That would make CPU 1 will observe at page->mapping after observing PageLRU set on the page. 1. anon_vma + PAGE_MAPPING_ANON That's the in-order scenario and is fine. 2. NULL That's possible if the page->mapping store gets reordered to occur after SetPageLRU. That's fine too because we check for it. 3. anon_vma without the PAGE_MAPPING_ANON bit That would be a problem and could lead to all kinds of undesirable behavior including crashes and data corruption. Is it possible? AFAICT the compiler is allowed to tear the store to page->mapping and I don't see anything that would prevent it. That said, I also don't see how the reader testing PageLRU under the lru_lock would prevent that in the first place. AFAICT we need that WRITE_ONCE() around the page->mapping assignment." [alex.shi@linux.alibaba.com: updated for comments change from Johannes] Link: https://lkml.kernel.org/r/e66ef2e5-c74c-6498-e8b3-56c37b9d2d15@linux.alibaba.com Link: https://lkml.kernel.org/r/1604566549-62481-7-git-send-email-alex.shi@linux.alibaba.com Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Hugh Dickins <hughd@google.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Alexander Duyck <alexander.duyck@gmail.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: "Chen, Rong A" <rong.a.chen@intel.com> Cc: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Jann Horn <jannh@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mika Penttilä <mika.penttila@nextfour.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Yang Shi <yang.shi@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-16mm/vmscan: remove unnecessary lruvec addingAlex Shi1-13/+25
We don't have to add a freeable page into lru and then remove from it. This change saves a couple of actions and makes the moving more clear. The SetPageLRU needs to be kept before put_page_testzero for list integrity, otherwise: #0 move_pages_to_lru #1 release_pages if !put_page_testzero if (put_page_testzero()) !PageLRU //skip lru_lock SetPageLRU() list_add(&page->lru,) list_add(&page->lru,) [akpm@linux-foundation.org: coding style fixes] Link: https://lkml.kernel.org/r/1604566549-62481-6-git-send-email-alex.shi@linux.alibaba.com Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com> Acked-by: Hugh Dickins <hughd@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Tejun Heo <tj@kernel.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Alexander Duyck <alexander.duyck@gmail.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: "Chen, Rong A" <rong.a.chen@intel.com> Cc: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Jann Horn <jannh@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mika Penttilä <mika.penttila@nextfour.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Shakeel Butt <shakeelb@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Yang Shi <yang.shi@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-16mm/thp: narrow lru lockingAlex Shi1-12/+13
lru_lock and page cache xa_lock have no obvious reason to be taken one way round or the other: until now, lru_lock has been taken before page cache xa_lock, when splitting a THP; but nothing else takes them together. Reverse that ordering: let's narrow the lru locking - but leave local_irq_disable to block interrupts throughout, like before. Hugh Dickins point: split_huge_page_to_list() was already silly, to be using the _irqsave variant: it's just been taking sleeping locks, so would already be broken if entered with interrupts enabled. So we can save passing flags argument down to __split_huge_page(). Why change the lock ordering here? That was hard to decide. One reason: when this series reaches per-memcg lru locking, it relies on the THP's memcg to be stable when taking the lru_lock: that is now done after the THP's refcount has been frozen, which ensures page memcg cannot change. Another reason: previously, lock_page_memcg()'s move_lock was presumed to nest inside lru_lock; but now lru_lock must nest inside (page cache lock inside) move_lock, so it becomes possible to use lock_page_memcg() to stabilize page memcg before taking its lru_lock. That is not the mechanism used in this series, but it is an option we want to keep open. [hughd@google.com: rewrite commit log] Link: https://lkml.kernel.org/r/1604566549-62481-5-git-send-email-alex.shi@linux.alibaba.com Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com> Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Hugh Dickins <hughd@google.com> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Alexander Duyck <alexander.duyck@gmail.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: "Chen, Rong A" <rong.a.chen@intel.com> Cc: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Jann Horn <jannh@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mika Penttilä <mika.penttila@nextfour.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Shakeel Butt <shakeelb@google.com> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Yang Shi <yang.shi@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-16mm/thp: simplify lru_add_page_tail()Alex Shi1-14/+6
Simplify lru_add_page_tail(), there are actually only two cases possible: split_huge_page_to_list(), with list supplied and head isolated from lru by its caller; or split_huge_page(), with NULL list and head on lru - because when head is racily isolated from lru, the isolator's reference will stop the split from getting any further than its page_ref_freeze(). So decide between the two cases by "list", but add VM_WARN_ON()s to verify that they match our lru expectations. [Hugh Dickins: rewrite commit log] Link: https://lkml.kernel.org/r/1604566549-62481-4-git-send-email-alex.shi@linux.alibaba.com Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com> Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mika Penttilä <mika.penttila@nextfour.com> Cc: Alexander Duyck <alexander.duyck@gmail.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: "Chen, Rong A" <rong.a.chen@intel.com> Cc: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Jann Horn <jannh@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Shakeel Butt <shakeelb@google.com> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Yang Shi <yang.shi@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-16mm/thp: use head for head page in lru_add_page_tail()Alex Shi1-12/+11
Since the first parameter is only used by head page, it's better to make it explicit. Link: https://lkml.kernel.org/r/1604566549-62481-3-git-send-email-alex.shi@linux.alibaba.com Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com> Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Hugh Dickins <hughd@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Alexander Duyck <alexander.duyck@gmail.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: "Chen, Rong A" <rong.a.chen@intel.com> Cc: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Jann Horn <jannh@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mika Penttilä <mika.penttila@nextfour.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Shakeel Butt <shakeelb@google.com> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Yang Shi <yang.shi@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-16mm/thp: move lru_add_page_tail() to huge_memory.cAlex Shi3-35/+30
Patch series "per memcg lru lock", v21. This patchset includes 3 parts: 1) some code cleanup and minimum optimization as preparation 2) use TestCleanPageLRU as page isolation's precondition 3) replace per node lru_lock with per memcg per node lru_lock Current lru_lock is one for each of node, pgdat->lru_lock, that guard for lru lists, but now we had moved the lru lists into memcg for long time. Still using per node lru_lock is clearly unscalable, pages on each of memcgs have to compete each others for a whole lru_lock. This patchset try to use per lruvec/memcg lru_lock to repleace per node lru lock to guard lru lists, make it scalable for memcgs and get performance gain. Currently lru_lock still guards both lru list and page's lru bit, that's ok. but if we want to use specific lruvec lock on the page, we need to pin down the page's lruvec/memcg during locking. Just taking lruvec lock first may be undermined by the page's memcg charge/migration. To fix this problem, we could take out the page's lru bit clear and use it as pin down action to block the memcg changes. That's the reason for new atomic func TestClearPageLRU. So now isolating a page need both actions: TestClearPageLRU and hold the lru_lock. The typical usage of this is isolate_migratepages_block() in compaction.c we have to take lru bit before lru lock, that serialized the page isolation in memcg page charge/migration which will change page's lruvec and new lru_lock in it. The above solution suggested by Johannes Weiner, and based on his new memcg charge path, then have this patchset. (Hugh Dickins tested and contributed much code from compaction fix to general code polish, thanks a lot!). Daniel Jordan's testing show 62% improvement on modified readtwice case on his 2P * 10 core * 2 HT broadwell box on v18, which has no much different with this v20. https://lore.kernel.org/lkml/20200915165807.kpp7uhiw7l3loofu@ca-dmjordan1.us.oracle.com/ Thanks to Hugh Dickins and Konstantin Khlebnikov, they both brought this idea 8 years ago, and others who gave comments as well: Daniel Jordan, Mel Gorman, Shakeel Butt, Matthew Wilcox, Alexander Duyck etc. Thanks for Testing support from Intel 0day and Rong Chen, Fengguang Wu, and Yun Wang. Hugh Dickins also shared his kbuild-swap case. This patch (of 19): lru_add_page_tail() is only used in huge_memory.c, defining it in other file with a CONFIG_TRANSPARENT_HUGEPAGE macro restrict just looks weird. Let's move it THP. And make it static as Hugh Dickins suggested. Link: https://lkml.kernel.org/r/1604566549-62481-1-git-send-email-alex.shi@linux.alibaba.com Link: https://lkml.kernel.org/r/1604566549-62481-2-git-send-email-alex.shi@linux.alibaba.com Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com> Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Hugh Dickins <hughd@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Tejun Heo <tj@kernel.org> Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Cc: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Alexander Duyck <alexander.duyck@gmail.com> Cc: "Chen, Rong A" <rong.a.chen@intel.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Jann Horn <jannh@google.com> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Michal Hocko <mhocko@kernel.org> Cc: Mika Penttilä <mika.penttila@nextfour.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Yang Shi <yang.shi@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-16Merge tag 'staging-5.11-rc1' of ↵Linus Torvalds434-10875/+10177
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging Pull staging / IIO driver updates from Greg KH: "Here is the big staging and IIO driver pull request for 5.11-rc1 Lots of different things in here: - loads of driver updates - so many coding style cleanups - new IIO drivers - Android ION code is finally removed from the tree - wimax drivers are moved to staging on their way out of the kernel Nothing really exciting, just the constant grind of kernel development :) All have been in linux-next for a while with no reported issues" * tag 'staging-5.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (341 commits) staging: olpc_dcon: Do not call platform_device_unregister() in dcon_probe() staging: most: Fix spelling mistake "tranceiver" -> "transceiver" staging: qlge: remove duplicate word in comment staging: comedi: mf6x4: Fix AI end-of-conversion detection staging: greybus: Add TODO item about modernizing the pwm code pinctrl: ralink: add a pinctrl driver for the rt2880 family dt-bindings: pinctrl: rt2880: add binding document staging: rtl8723bs: remove ELEMENT_ID enum staging: rtl8723bs: remove unused macros staging: rtl8723bs: replace EID_EXTCapability staging: rtl8723bs: replace EID_BSSIntolerantChlReport staging: rtl8723bs: replace EID_BSSCoexistence staging: rtl8723bs: replace _MME_IE_ staging: rtl8723bs: replace _WAPI_IE_ staging: rtl8723bs: replace _EXT_SUPPORTEDRATES_IE_ staging: rtl8723bs: replace _ERPINFO_IE_ staging: rtl8723bs: replace _CHLGETXT_IE_ staging: rtl8723bs: replace _COUNTRY_IE_ staging: rtl8723bs: replace _IBSS_PARA_IE_ staging: rtl8723bs: replace _TIM_IE_ ...
2020-12-16Merge tag 'char-misc-5.11-rc1' of ↵Linus Torvalds290-4037/+23582
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char / misc driver updates from Greg KH: "Here is the big char/misc driver update for 5.11-rc1. Continuing the tradition of previous -rc1 pulls, there seems to be more and more tiny driver subsystems flowing through this tree. Lots of different things, all of which have been in linux-next for a while with no reported issues: - extcon driver updates - habannalab driver updates - mei driver updates - uio driver updates - binder fixes and features added - soundwire driver updates - mhi bus driver updates - phy driver updates - coresight driver updates - fpga driver updates - speakup driver updates - slimbus driver updates - various small char and misc driver updates" * tag 'char-misc-5.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (305 commits) extcon: max77693: Fix modalias string extcon: fsa9480: Support TI TSU6111 variant extcon: fsa9480: Rewrite bindings in YAML and extend dt-bindings: extcon: add binding for TUSB320 extcon: Add driver for TI TUSB320 slimbus: qcom: fix potential NULL dereference in qcom_slim_prg_slew() siox: Make remove callback return void siox: Use bus_type functions for probe, remove and shutdown spmi: Add driver shutdown support spmi: fix some coding style issues at the spmi core spmi: get rid of a warning when built with W=1 uio: uio_hv_generic: use devm_kzalloc() for private data alloc uio: uio_fsl_elbc_gpcm: use device-managed allocators uio: uio_aec: use devm_kzalloc() for uio_info object uio: uio_cif: use devm_kzalloc() for uio_info object uio: uio_netx: use devm_kzalloc() for or uio_info object uio: uio_mf624: use devm_kzalloc() for uio_info object uio: uio_sercos3: use device-managed functions for simple allocs uio: uio_dmem_genirq: finalize conversion of probe to devm_ handlers uio: uio_dmem_genirq: convert simple allocations to device-managed ...
2020-12-16Merge tag 'driver-core-5.11-rc1' of ↵Linus Torvalds31-770/+827
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is the big driver core updates for 5.11-rc1 This time there was a lot of different work happening here for some reason: - redo of the fwnode link logic, speeding it up greatly - auxiliary bus added (this was a tag that will be pulled in from other trees/maintainers this merge window as well, as driver subsystems started to rely on it) - platform driver core cleanups on the way to fixing some long-time api updates in future releases - minor fixes and tweaks. All have been in linux-next with no (finally) reported issues. Testing there did helped in shaking issues out a lot :)" * tag 'driver-core-5.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (39 commits) driver core: platform: don't oops in platform_shutdown() on unbound devices ACPI: Use fwnode_init() to set up fwnode misc: pvpanic: Replace OF headers by mod_devicetable.h misc: pvpanic: Combine ACPI and platform drivers usb: host: sl811: Switch to use platform_get_mem_or_io() vfio: platform: Switch to use platform_get_mem_or_io() driver core: platform: Introduce platform_get_mem_or_io() dyndbg: fix use before null check soc: fix comment for freeing soc_dev_attr driver core: platform: use bus_type functions driver core: platform: change logic implementing platform_driver_probe driver core: platform: reorder functions driver core: make driver_probe_device() static driver core: Fix a couple of typos driver core: Reorder devices on successful probe driver core: Delete pointless parameter in fwnode_operations.add_links driver core: Refactor fw_devlink feature efi: Update implementation of add_links() to create fwnode links of: property: Update implementation of add_links() to create fwnode links driver core: Use device's fwnode to check if it is waiting for suppliers ...
2020-12-16Merge tag 'tty-5.11-rc1' of ↵Linus Torvalds67-15086/+464
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull tty / serial updates from Greg KH: "Here is the "large" set of tty and serial patches for 5.11-rc1. Nothing major at all, some cleanups and some driver removals, always a nice sign: - build warning cleanups - vt locking and logic unwinding and cleanups - tiny serial driver fixes and updates - removal of the synclink serial driver as it's no longer needed - removal of dead termiox code All of this has been in linux-next for a while with no reported issues" * tag 'tty-5.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (89 commits) serial: 8250_pci: Drop bogus __refdata annotation tty: serial: meson: enable console as module serial: 8250_omap: Avoid FIFO corruption caused by MDR1 access serial: imx: Move imx_uart_probe_dt() content into probe() serial: imx: Remove unneeded of_device_get_match_data() NULL check tty: Fix whitespace inconsistencies in vt_io_ioctl serial_core: Check for port state when tty is in error state dt-bindings: serial: Update DT binding docs to support SiFive FU740 SoC tty: use const parameters in port-flag accessors tty: use assign_bit() in port-flag accessors earlycon: drop semicolon from earlycon macro tty: Remove dead termiox code tty/serial/imx: Enable TXEN bit in imx_poll_init(). tty : serial: jsm: Fixed file by adding spacing tty: serial: uartlite: Support probe deferral earlycon: simplify earlycon-table implementation tty: serial: bcm63xx: lower driver dependencies serial: mxs-auart: Remove unneeded platform_device_id serial: 8250-mtk: Fix reference leak in mtk8250_probe serial: imx: Remove unused .id_table support ...
2020-12-16Merge tag 'usb-5.11-rc1' of ↵Linus Torvalds158-4702/+4292
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb Pull USB / Thunderbolt updates from Greg KH: "Here is the big USB and thunderbolt pull request for 5.11-rc1. Nothing major in here, just the grind of constant development to support new hardware and fix old issues: - thunderbolt updates for new USB4 hardware - cdns3 major driver updates - lots of typec updates and additions as more hardware is available - usb serial driver updates and fixes - other tiny USB driver updates All have been in linux-next with no reported issues" * tag 'usb-5.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (172 commits) usb: phy: convert comma to semicolon usb: ucsi: convert comma to semicolon usb: typec: tcpm: convert comma to semicolon usb: typec: tcpm: Update vbus_vsafe0v on init usb: typec: tcpci: Enable bleed discharge when auto discharge is enabled usb: typec: Add class for plug alt mode device USB: typec: tcpci: Add Bleed discharge to POWER_CONTROL definition USB: typec: tcpm: Add a 30ms room for tPSSourceOn in PR_SWAP USB: typec: tcpm: Fix PR_SWAP error handling USB: typec: tcpm: Hard Reset after not receiving a Request USB: gadget: f_fs: remove likely/unlikely usb: gadget: f_fs: Re-use SS descriptors for SuperSpeedPlus USB: gadget: f_midi: setup SuperSpeed Plus descriptors USB: gadget: f_acm: add support for SuperSpeed Plus USB: gadget: f_rndis: fix bitrate for SuperSpeed and above usb: typec: intel_pmc_mux: Configure cable generation value for USB4 MAINTAINERS: Add myself as a reviewer for CADENCE USB3 DRD IP DRIVER usb: chipidea: ci_hdrc_imx: Use of_device_get_match_data() usb: chipidea: usbmisc_imx: Use of_device_get_match_data() usb: cdns3: fix NULL pointer dereference on no platform data ...
2020-12-16Merge tag 'sound-5.11-rc1' of ↵Linus Torvalds378-6894/+32762
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound updates from Takashi Iwai: "Lots of changes (slightly more code increase than usual) at this time, while most of code changes are ASoC driver-specific. Here are some highlights: Core: - The new auxiliary bus implementation for Intel DSP, which will be used by other drivers as well - Lots of ASoC core cleanups and refactoring - UBSAN and KCSAN fixes in rawmidi, sequencer and a few others - Compress-offload API enhancement for the pause during draining HD- and USB-audio: - Enhancements of the USB-audio implicit feedback support, including better full-duplex operations - Continued CA0132 improvements and fixes - A few new quirk entries, HDMI audio fixes ASoC: - Support for boot time selection of Intel DSP firmware, which should help distros/users testing new stuff more easily; the kconfig was moved to boot time option, too - Some basic DPCM support in audio graph card - Removal of old pre-DT Freescale drivers - Support for Allwinner H6 I2S, Analog Devices ADAU1372, Intel Alderlake-S, GMediatek MT8192, NXP i.MX HDMI and XCVR, Realtek RT715, Qualcomm SM8250 and simple GPIO based muxes" * tag 'sound-5.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (445 commits) ALSA: pcm: oss: Fix potential out-of-bounds shift ALSA: usb-audio: Fix potential out-of-bounds shift ALSA: hda/ca0132 - Add ZxR surround DAC setup. ALSA: hda/ca0132 - Add 8051 PLL write helper functions. ALSA: hda/hdmi: packet buffer index must be set before reading value ASoC: SOF: imx: update kernel-doc description ASoC: mediatek: mt8183: delete some unreachable code ASoC: mediatek: mt8183: add PM ops to machine drivers ASoC: topology: Fix wrong size check ASoC: topology: Add missing size check ASoC: SOF: Intel: hda: fix the condition passed to sof_dev_dbg_or_err ASoC: SOF: modify the SOF_DBG flags ASoC: SOF: Intel: hda: remove duplicated status dump ASoC: rt1015p: delay 300ms after SDB pulling high for calibration ASoC: rt1015p: move SDB control from trigger to DAPM ASoC: wm_adsp: remove "ctl" from list on error in wm_adsp_create_control() ALSA: usb-audio: Fix control 'access overflow' errors from chmap ALSA: hda/hdmi: always print pin NIDs as hexadecimal ALSA: hda/realtek - Add supported for more Lenovo ALC285 Headset Button ALSA: hda/ca0132 - Remove now unnecessary DSP setup functions. ...
2020-12-16Merge tag 'net-next-5.11' of ↵Linus Torvalds1879-38756/+71917
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: "Core: - support "prefer busy polling" NAPI operation mode, where we defer softirq for some time expecting applications to periodically busy poll - AF_XDP: improve efficiency by more batching and hindering the adjacency cache prefetcher - af_packet: make packet_fanout.arr size configurable up to 64K - tcp: optimize TCP zero copy receive in presence of partial or unaligned reads making zero copy a performance win for much smaller messages - XDP: add bulk APIs for returning / freeing frames - sched: support fragmenting IP packets as they come out of conntrack - net: allow virtual netdevs to forward UDP L4 and fraglist GSO skbs BPF: - BPF switch from crude rlimit-based to memcg-based memory accounting - BPF type format information for kernel modules and related tracing enhancements - BPF implement task local storage for BPF LSM - allow the FENTRY/FEXIT/RAW_TP tracing programs to use bpf_sk_storage Protocols: - mptcp: improve multiple xmit streams support, memory accounting and many smaller improvements - TLS: support CHACHA20-POLY1305 cipher - seg6: add support for SRv6 End.DT4/DT6 behavior - sctp: Implement RFC 6951: UDP Encapsulation of SCTP - ppp_generic: add ability to bridge channels directly - bridge: Connectivity Fault Management (CFM) support as is defined in IEEE 802.1Q section 12.14. Drivers: - mlx5: make use of the new auxiliary bus to organize the driver internals - mlx5: more accurate port TX timestamping support - mlxsw: - improve the efficiency of offloaded next hop updates by using the new nexthop object API - support blackhole nexthops - support IEEE 802.1ad (Q-in-Q) bridging - rtw88: major bluetooth co-existance improvements - iwlwifi: support new 6 GHz frequency band - ath11k: Fast Initial Link Setup (FILS) - mt7915: dual band concurrent (DBDC) support - net: ipa: add basic support for IPA v4.5 Refactor: - a few pieces of in_interrupt() cleanup work from Sebastian Andrzej Siewior - phy: add support for shared interrupts; get rid of multiple driver APIs and have the drivers write a full IRQ handler, slight growth of driver code should be compensated by the simpler API which also allows shared IRQs - add common code for handling netdev per-cpu counters - move TX packet re-allocation from Ethernet switch tag drivers to a central place - improve efficiency and rename nla_strlcpy - number of W=1 warning cleanups as we now catch those in a patchwork build bot Old code removal: - wan: delete the DLCI / SDLA drivers - wimax: move to staging - wifi: remove old WDS wifi bridging support" * tag 'net-next-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1922 commits) net: hns3: fix expression that is currently always true net: fix proc_fs init handling in af_packet and tls nfc: pn533: convert comma to semicolon af_vsock: Assign the vsock transport considering the vsock address flags af_vsock: Set VMADDR_FLAG_TO_HOST flag on the receive path vsock_addr: Check for supported flag values vm_sockets: Add VMADDR_FLAG_TO_HOST vsock flag vm_sockets: Add flags field in the vsock address data structure net: Disable NETIF_F_HW_TLS_TX when HW_CSUM is disabled tcp: Add logic to check for SYN w/ data in tcp_simple_retransmit net: mscc: ocelot: install MAC addresses in .ndo_set_rx_mode from process context nfc: s3fwrn5: Release the nfc firmware net: vxget: clean up sparse warnings mlxsw: spectrum_router: Use eXtended mezzanine to offload IPv4 router mlxsw: spectrum: Set KVH XLT cache mode for Spectrum2/3 mlxsw: spectrum_router_xm: Introduce basic XM cache flushing mlxsw: reg: Add Router LPM Cache Enable Register mlxsw: reg: Add Router LPM Cache ML Delete Register mlxsw: spectrum_router_xm: Implement L-value tracking for M-index mlxsw: reg: Add XM Router M Table Register ...
2020-12-16Merge branch 'remotes/lorenzo/pci/misc'Bjorn Helgaas1-0/+2
- Add PCI endpoint subsystem references to MAINTAINERS (Gustavo Pimentel) * remotes/lorenzo/pci/misc: MAINTAINERS: Add missing documentation references to PCI Endpoint Subsystem
2020-12-16Merge branch 'remotes/lorenzo/pci/vmd'Bjorn Helgaas1-11/+26
- Offset client VMD MSI-X vectors (Jon Derrick) * remotes/lorenzo/pci/vmd: PCI: vmd: Offset Client VMD MSI-X vectors