summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2020-10-02docs: mm: numaperf.rst Add brief description for access class 1.Jonathan Cameron1-0/+8
Try to make minimal changes to the document which already describes access class 0 in a generic fashion (including IO initiatiors that are not CPUs). Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-10-02node: Add access1 class to represent CPU to memory characteristicsJonathan Cameron1-19/+69
New access1 class is nearly the same as access0, but always provides characteristics for CPUs to memory. The existing access0 class provides characteristics to nearest or direct connnect initiator which may be a Generic Initiator such as a GPU or network adapter. This new class allows thread placement on CPUs to be performed so as to give optimal access characteristics to memory, even if that memory is for example attached to a GPU or similar and only accessible to the CPU via an appropriate bus. Suggested-by: Dan Willaims <dan.j.williams@intel.com> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-10-02ACPI: HMAT: Fix handling of changes from ACPI 6.2 to ACPI 6.3Jonathan Cameron1-1/+2
In ACPI 6.3, the Memory Proximity Domain Attributes Structure changed substantially. One of those changes was that the flag for "Memory Proximity Domain field is valid" was deprecated. This was because the field "Proximity Domain for the Memory" became a required field and hence having a validity flag makes no sense. So the correct logic is to always assume the field is there. Current code assumes it never is. Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-10-02ACPI: Let ACPI know we support Generic Initiator Affinity StructuresJonathan Cameron2-0/+5
Until we tell ACPI that we support generic initiators, it will have to operate in fall back domain mode and all _PXM entries should be on existing non GI domains. This patch sets the relevant OSC bit to make that happen. Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-10-02x86: Support Generic Initiator only proximity domainsJonathan Cameron3-0/+24
In common with memoryless domains only register GI domains if the proximity node is not online. If a domain is already a memory containing domain, or a memoryless domain there is nothing to do just because it also contains a Generic Initiator. Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Acked-by: Borislav Petkov <bp@suse.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-10-02ACPI: Support Generic Initiator only domainsJonathan Cameron3-1/+72
Generic Initiators are a new ACPI concept that allows for the description of proximity domains that contain a device which performs memory access (such as a network card) but neither host CPU nor Memory. This patch has the parsing code and provides the infrastructure for an architecture to associate these new domains with their nearest memory processing node. Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-09-30ACPI / NUMA: Add stub function for pxm_to_node()Nathan Chancellor1-0/+5
After commit 01feba590cd6 ("ACPI: Do not create new NUMA domains from ACPI static tables that are not SRAT"): $ scripts/config --file arch/x86/configs/x86_64_defconfig -d NUMA -e ACPI_NFIT $ make -skj"$(nproc)" distclean defconfig drivers/acpi/nfit/ drivers/acpi/nfit/core.c: In function ‘acpi_nfit_register_region’: drivers/acpi/nfit/core.c:3010:27: error: implicit declaration of function ‘pxm_to_node’; did you mean ‘xa_to_node’? [-Werror=implicit-function-declaration] 3010 | ndr_desc->target_node = pxm_to_node(spa->proximity_domain); | ^~~~~~~~~~~ | xa_to_node cc1: some warnings being treated as errors ... Add a stub function like acpi_map_pxm_to_node() had so that the build continues to work. Fixes: 01feba590cd6 ("ACPI: Do not create new NUMA domains from ACPI static tables that are not SRAT") Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-09-24irq-chip/gic-v3-its: Fix crash if ITS is in a proximity domain without ↵Jonathan Cameron1-1/+6
processor or memory Note this crash is present before any of the patches in this series, but as explained below it is highly unlikely anyone is shipping a firmware that causes it. Tests were done using an overriden SRAT. On ARM64, the gic-v3 driver directly parses SRAT to locate GIC Interrupt Translation Service (ITS) Affinity Structures. This is done much later in the boot than the parses of SRAT which identify proximity domains. As a result, an ITS placed in a proximity domain that is not defined by another SRAT structure will result in a NUMA node that is not completely configured and a crash. ITS [mem 0x202100000-0x20211ffff] ITS@0x0000000202100000: Using ITS number 0 Unable to handle kernel paging request at virtual address 0000000000001a08 ... Call trace: __alloc_pages_nodemask+0xe8/0x338 alloc_pages_node.constprop.0+0x34/0x40 its_probe_one+0x2f8/0xb18 gic_acpi_parse_madt_its+0x108/0x150 acpi_table_parse_entries_array+0x17c/0x264 acpi_table_parse_entries+0x48/0x6c acpi_table_parse_madt+0x30/0x3c its_init+0x1c4/0x644 gic_init_bases+0x4b8/0x4ec gic_acpi_init+0x134/0x264 acpi_match_madt+0x4c/0x84 acpi_table_parse_entries_array+0x17c/0x264 acpi_table_parse_entries+0x48/0x6c acpi_table_parse_madt+0x30/0x3c __acpi_probe_device_table+0x8c/0xe8 irqchip_init+0x3c/0x48 init_IRQ+0xcc/0x100 start_kernel+0x33c/0x548 ACPI 6.3 allows any set of Affinity Structures in SRAT to define a proximity domain. However, as we do not see this crash, we can conclude that no firmware is currently placing an ITS in a node that is separate from those containing memory and / or processors. We could modify the SRAT parsing behavior to identify the existence of Proximity Domains unique to the ITS structures, and handle them as a special case of a generic initiator (once support for those merges). This patch avoids the complexity that would be needed to handle this corner case, by not allowing the ITS entry parsing code to instantiate new NUMA Nodes. If one is encountered that does not already exist, then NO_NUMA_NODE is assigned and a warning printed just as if the value had been greater than allowed NUMA Nodes. "SRAT: Invalid NUMA node -1 in ITS affinity" Whilst this does not provide the full flexibility allowed by ACPI, it does fix the problem. We can revisit a more sophisticated solution if needed by future platforms. Change is simply to replace acpi_map_pxm_to_node with pxm_to_node reflecting the fact a new mapping is not created. Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-09-24ACPI: Remove side effect of partly creating a node in acpi_get_node()Jonathan Cameron1-1/+1
acpi_get_node() calls acpi_get_pxm() to evaluate the _PXM AML method for entries found in DSDT/SSDT. ACPI 6.3 sec 6.2.14 states "_PXM evaluates to an integer that identifies a device as belonging to a Proximity Domain defined in the System Resource Affinity Table (SRAT)." Hence a _PXM method should not result in creation of a new NUMA node. Before this patch, _PXM could result in partial instantiation of NUMA node, missing elements such as zone lists. A call to devm_kzalloc(), for example, results in a NULL pointer dereference. This patch therefore replaces the acpi_map_pxm_to_node() with a call to pxm_to_node(). Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Hanjun Guo <guohanjun@huawei.com> Reviewed-by: Barry Song <song.bao.hua@hisilicon.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-09-24ACPI: Rename acpi_map_pxm_to_online_node() to pxm_to_online_node()Jonathan Cameron3-7/+6
As this function is no longer allowed to create new mappings let us rename it to reflect this. Note all nodes should already exist before any of the users of this function are called. Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-09-24ACPI: Remove side effect of partly creating a node in ↵Jonathan Cameron1-4/+3
acpi_map_pxm_to_online_node() While this function will only return an online node, it can have the side effect of partially creating a new node. The existing comments suggest this is intentional, but the usecases of this function are related to NFIT and HMAT parsing, neither of which should be able to define new nodes. One route by which the existing behaviour would cause a crash is to have a _PXM entry in ACPI DSDT attempt to place a device within this partly created proximity domain. A subsequent call to devm_kzalloc() or similar would result in an attempt to allocate memory on a node for which zone lists have not been set up and a NULL pointer dereference. Prevent such cases by switching to pxm_to_node() within acpi_map_pxm_to_online_node() which cannot cause a new node to be partly created. If one would previously have been created we now return NO_NUMA_NODE. Documentation updated to reflect this change. Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-09-24ACPI: Do not create new NUMA domains from ACPI static tables that are not SRATJonathan Cameron4-5/+4
Several ACPI static tables contain references to proximity domains. ACPI 6.3 has clarified that only entries in SRAT may define a new domain (sec 5.2.16). Those tables described in the ACPI spec have additional clarifying text. NFIT: Table 5-132, "Integer that represents the proximity domain to which the memory belongs. This number must match with corresponding entry in the SRAT table." HMAT: Table 5-145, "... This number must match with the corresponding entry in the SRAT table's processor affinity structure ... if the initiator is a processor, or the Generic Initiator Affinity Structure if the initiator is a generic initiator". IORT and DMAR are defined by external specifications. Intel Virtualization Technology for Directed I/O Rev 3.1 does not make any explicit statements, but the general SRAT statement above will still apply. https://software.intel.com/sites/default/files/managed/c5/15/vt-directed-io-spec.pdf IO Remapping Table, Platform Design Document rev D, also makes not explicit statement, but refers to ACPI SRAT table for more information and again the generic SRAT statement above applies. https://developer.arm.com/documentation/den0049/d/ In conclusion, any proximity domain specified in these tables, should be a reference to a proximity domain also found in SRAT, and they should not be able to instantiate a new domain. Hence we switch to pxm_to_node() which will only return existing nodes. Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Barry Song <song.bao.hua@hisilicon.com> Reviewed-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-09-24ACPI: Add out of bounds and numa_off protections to pxm_to_node()Jonathan Cameron1-1/+1
The function should check the validity of the pxm value before using it to index the pxm_to_node_map[] array. Whilst hardening this code may be good in general, the main intent here is to enable following patches that use this function to replace acpi_map_pxm_to_node() for non SRAT usecases which should return NO_NUMA_NODE for PXM entries not matching with those in SRAT. Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Barry Song <song.bao.hua@hisilicon.com> Reviewed-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-09-21Linux 5.9-rc6Linus Torvalds1-1/+1
2020-09-21Merge tag 'core_urgent_for_v5.9_rc6' of ↵Linus Torvalds1-2/+4
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull syscall tracing fix from Borislav Petkov: "Fix the seccomp syscall rewriting so that trace and audit see the rewritten syscall number, from Kees Cook" * tag 'core_urgent_for_v5.9_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: core/entry: Report syscall correctly for trace and audit
2020-09-21Merge tag 'objtool_urgent_for_v5.9_rc6' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull objtool fix from Borislav Petkov: "Fix noreturn detection for ignored sibling functions (Josh Poimboeuf)" * tag 'objtool_urgent_for_v5.9_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: objtool: Fix noreturn detection for ignored functions
2020-09-21Merge tag 'locking_urgent_for_v5.9_rc6' of ↵Linus Torvalds4-12/+37
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fixes from Borislav Petkov: "Two fixes from the locking/urgent pile: - Fix lockdep's detection of "USED" <- "IN-NMI" inversions (Peter Zijlstra) - Make percpu-rwsem operations on the semaphore's ->read_count IRQ-safe because it can be used in an IRQ context (Hou Tao)" * tag 'locking_urgent_for_v5.9_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking/percpu-rwsem: Use this_cpu_{inc,dec}() for read_count locking/lockdep: Fix "USED" <- "IN-NMI" inversions
2020-09-21Merge tag 'efi-urgent-for-v5.9-rc5' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull EFI fix from Borislav Petkov: "Ensure that the EFI bootloader control module only probes successfully on systems that support the EFI SetVariable runtime service" [ Tag and commit from Ard Biesheuvel, forwarded by Borislav ] * tag 'efi-urgent-for-v5.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: efi: efibc: check for efivars write capability
2020-09-21Merge tag 'x86_urgent_for_v5.9_rc6' of ↵Linus Torvalds4-1/+24
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Borislav Petkov: - A defconfig fix (Daniel Díaz) - Disable relocation relaxation for the compressed kernel when not built as -pie as in that case kernels built with clang and linked with LLD fail to boot due to the linker optimizing some instructions in non-PIE form; the gory details in the commit message (Arvind Sankar) - A fix for the "bad bp value" warning issued by the frame-pointer unwinder (Josh Poimboeuf) * tag 'x86_urgent_for_v5.9_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/unwind/fp: Fix FP unwinding in ret_from_fork x86/boot/compressed: Disable relocation relaxation x86/defconfigs: Explicitly unset CONFIG_64BIT in i386_defconfig
2020-09-21Merge tag 'libnvdimm-fixes-5.9-rc6' of ↵Linus Torvalds4-13/+40
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm fixes from Dan Williams: "A handful of fixes to address a string of mistakes in the mechanism for device-mapper to determine if its component devices are dax capable. - Fix an original bug in device-mapper table reference counting when interrogating dax capability in the component device. This bug was hidden by the following bug. - Fix device-mapper to use the proper helper (dax_supported() instead of the leaf helper generic_fsdax_supported()) to determine dax operation of a stacked block device configuration. The original implementation is only valid for one level of dax-capable block device stacking. This bug was discovered while fixing the below regression. - Fix an infinite recursion regression introduced by broken attempts to quiet the generic_fsdax_supported() path and make it bail out before logging "dax capability not found" errors" * tag 'libnvdimm-fixes-5.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: dax: Fix stack overflow when mounting fsdax pmem device dm: Call proper helper to determine dax support dm/dax: Fix table reference counts
2020-09-20Merge tag 'riscv-for-linus-5.9-rc6' of ↵Linus Torvalds8-6/+104
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V fixes from Palmer Dabbelt: - A fix for a lockdep issue to avoid an asserting triggering during early boot. There shouldn't be any incorrect behavior as the system isn't concurrent at the time. - The addition of a missing fence when installing early fixmap mappings. - A corretion to the K210 device tree's interrupt map. - A fix for M-mode timer handling on the K210. * tag 'riscv-for-linus-5.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: RISC-V: Resurrect the MMIO timer implementation for M-mode systems riscv: Fix Kendryte K210 device tree riscv: Add sfence.vma after early page table changes RISC-V: Take text_mutex in ftrace_init_nop()
2020-09-20Merge tag 'usb-5.9-rc6' of ↵Linus Torvalds9-17/+64
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb Pull USB/Thunderbolt fixes from Greg KH: "Here are some small USB and one Thunderbolt driver fixes. Nothing major at all, just some fixes for reported issues, and a quirk addition: - typec fixes - UAS disconnect fix - usblp race fix - ehci-hcd modversions build fix - ignore wakeup quirk table addition - thunderbolt DROM read fix All of these have been in linux-next with no reported issues" * tag 'usb-5.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: usblp: fix race between disconnect() and read() ehci-hcd: Move include to keep CRC stable usb: typec: intel_pmc_mux: Handle SCU IPC error conditions USB: quirks: Add USB_QUIRK_IGNORE_REMOTE_WAKEUP quirk for BYD zhaoxin notebook USB: UAS: fix disconnect by unplugging a hub usb: typec: ucsi: Prevent mode overrun usb: typec: ucsi: acpi: Increase command completion timeout value thunderbolt: Retry DROM read once if parsing fails
2020-09-20Merge tag 'tty-5.9-rc6' of ↵Linus Torvalds4-25/+33
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull tty/serial/fbcon fixes from Greg KH: "Here are some small tty/serial and one more fbcon fix. They include: - serial core locking regression fixes - new device ids for 8250_pci driver - fbcon fix for syzbot found issue All have been in linux-next with no reported issues" * tag 'tty-5.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: fbcon: Fix user font detection test at fbcon_resize(). serial: 8250_pci: Add Realtek 816a and 816b serial: core: fix console port-lock regression serial: core: fix port-lock initialisation
2020-09-20Merge tag 'edac_urgent_for_v5.9_rc6' of ↵Linus Torvalds1-0/+5
git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras Pull EDAC fixes from Borislav Petkov: "Two fixes for resulting from CONFIG_DEBUG_TEST_DRIVER_REMOVE=y experiments: - complete a previous fix to reset a local structure containing scanned system data properly so that the driver rescans, as it should, on a second load. - address a refcount underflow due to not paying attention to the driver whitelest on unregister" * tag 'edac_urgent_for_v5.9_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: EDAC/ghes: Check whether the driver is on the safe list correctly EDAC/ghes: Clear scanned data on unload
2020-09-20Merge branch 'for-linus' of ↵Linus Torvalds3-8/+28
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input Pull input fixes from Dmitry Torokhov: "Just a couple of driver quirks" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: trackpoint - add new trackpoint variant IDs Input: i8042 - add Entroware Proteus EL07R4 to nomux and reset lists
2020-09-20mm: fix wake_page_function() comment typosLinus Torvalds1-2/+2
Sedat Dilek pointed out some silly comment typo issues. Reported-by: Sedat Dilek <sedat.dilek@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-09-20Merge tag 'kbuild-fixes-v5.9-3' of ↵Linus Torvalds3-35/+39
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild fixes from Masahiro Yamada: "Fix qconf warnings and revive help message" * tag 'kbuild-fixes-v5.9-3' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: kconfig: qconf: revive help message in the info view kconfig: qconf: fix incomplete type 'struct gstr' warning kconfig: qconf: use delete[] instead of delete to free array (again)
2020-09-20dax: Fix stack overflow when mounting fsdax pmem deviceAdrian Huang1-6/+6
When mounting fsdax pmem device, commit 6180bb446ab6 ("dax: fix detection of dax support for non-persistent memory block devices") introduces the stack overflow [1][2]. Here is the call path for mounting ext4 file system: ext4_fill_super bdev_dax_supported __bdev_dax_supported dax_supported generic_fsdax_supported __generic_fsdax_supported bdev_dax_supported The call path leads to the infinite calling loop, so we cannot call bdev_dax_supported() in __generic_fsdax_supported(). The sanity checking of the variable 'dax_dev' is moved prior to the two bdev_dax_pgoff() checks [3][4]. [1] https://lore.kernel.org/linux-nvdimm/1420999447.1004543.1600055488770.JavaMail.zimbra@redhat.com/ [2] https://lore.kernel.org/linux-nvdimm/alpine.LRH.2.02.2009141131220.30651@file01.intranet.prod.int.rdu2.redhat.com/ [3] https://lore.kernel.org/linux-nvdimm/CA+RJvhxBHriCuJhm-D8NvJRe3h2MLM+ZMFgjeJjrRPerMRLvdg@mail.gmail.com/ [4] https://lore.kernel.org/linux-nvdimm/20200903160608.GU878166@iweiny-DESK2.sc.intel.com/ Fixes: 6180bb446ab6 ("dax: fix detection of dax support for non-persistent memory block devices") Reported-by: Yi Zhang <yi.zhang@redhat.com> Reported-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Adrian Huang <ahuang12@lenovo.com> Reviewed-by: Jan Kara <jack@suse.cz> Tested-by: Ritesh Harjani <riteshh@linux.ibm.com> Cc: Coly Li <colyli@suse.de> Cc: Ira Weiny <ira.weiny@intel.com> Cc: John Pittman <jpittman@redhat.com> Link: https://lore.kernel.org/r/20200917111549.6367-1-adrianhuang0701@gmail.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2020-09-20dm: Call proper helper to determine dax supportJan Kara3-5/+31
DM was calling generic_fsdax_supported() to determine whether a device referenced in the DM table supports DAX. However this is a helper for "leaf" device drivers so that they don't have to duplicate common generic checks. High level code should call dax_supported() helper which that calls into appropriate helper for the particular device. This problem manifested itself as kernel messages: dm-3: error: dax access failed (-95) when lvm2-testsuite run in cases where a DM device was stacked on top of another DM device. Fixes: 7bf7eac8d648 ("dax: Arrange for dax_supported check to span multiple devices") Cc: <stable@vger.kernel.org> Tested-by: Adrian Huang <ahuang12@lenovo.com> Signed-off-by: Jan Kara <jack@suse.cz> Acked-by: Mike Snitzer <snitzer@redhat.com> Reported-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/r/160061715195.13131.5503173247632041975.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2020-09-20dm/dax: Fix table reference countsDan Williams1-2/+3
A recent fix to the dm_dax_supported() flow uncovered a latent bug. When dm_get_live_table() fails it is still required to drop the srcu_read_lock(). Without this change the lvm2 test-suite triggers this warning: # lvm2-testsuite --only pvmove-abort-all.sh WARNING: lock held when returning to user space! 5.9.0-rc5+ #251 Tainted: G OE ------------------------------------------------ lvm/1318 is leaving the kernel with locks still held! 1 lock held by lvm/1318: #0: ffff9372abb5a340 (&md->io_barrier){....}-{0:0}, at: dm_get_live_table+0x5/0xb0 [dm_mod] ...and later on this hang signature: INFO: task lvm:1344 blocked for more than 122 seconds. Tainted: G OE 5.9.0-rc5+ #251 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:lvm state:D stack: 0 pid: 1344 ppid: 1 flags:0x00004000 Call Trace: __schedule+0x45f/0xa80 ? finish_task_switch+0x249/0x2c0 ? wait_for_completion+0x86/0x110 schedule+0x5f/0xd0 schedule_timeout+0x212/0x2a0 ? __schedule+0x467/0xa80 ? wait_for_completion+0x86/0x110 wait_for_completion+0xb0/0x110 __synchronize_srcu+0xd1/0x160 ? __bpf_trace_rcu_utilization+0x10/0x10 __dm_suspend+0x6d/0x210 [dm_mod] dm_suspend+0xf6/0x140 [dm_mod] Fixes: 7bf7eac8d648 ("dax: Arrange for dax_supported check to span multiple devices") Cc: <stable@vger.kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Alasdair Kergon <agk@redhat.com> Cc: Mike Snitzer <snitzer@redhat.com> Reported-by: Adrian Huang <ahuang12@lenovo.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Tested-by: Adrian Huang <ahuang12@lenovo.com> Link: https://lore.kernel.org/r/160045867590.25663.7548541079217827340.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2020-09-20kconfig: qconf: revive help message in the info viewMasahiro Yamada1-3/+8
Since commit 68fd110b3e7e ("kconfig: qconf: remove redundant help in the info view"), the help message is no longer displayed. I intended to drop duplicated "Symbol:", "Type:", but precious info about help and reverse dependencies was lost too. Revive it now. "defined at" is contained in menu_get_ext_help(), so I made sure to not display it twice. Fixes: 68fd110b3e7e ("kconfig: qconf: remove redundant help in the info view") Reported-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2020-09-20kconfig: qconf: fix incomplete type 'struct gstr' warningMasahiro Yamada2-31/+30
"make HOSTCXX=clang++ xconfig" reports the following: HOSTCXX scripts/kconfig/qconf.o In file included from scripts/kconfig/qconf.cc:23: In file included from scripts/kconfig/lkc.h:15: scripts/kconfig/lkc_proto.h:26:13: warning: 'get_relations_str' has C-linkage specified, but returns incomplete type 'struct gstr' which could be incompatible with C [-Wreturn-type-c-linkage] struct gstr get_relations_str(struct symbol **sym_arr, struct list_head *head); ^ Currently, get_relations_str() is declared before the struct gstr definition. Move all declarations of menu.c functions below. BTW, some are declared in lkc.h and some in lkc_proto.h, but the difference is unclear. I guess some refactoring is needed. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Acked-by: Boris Kolpackov <boris@codesynthesis.com>
2020-09-20Merge branch 'akpm' (patches from Andrew)Linus Torvalds18-50/+102
Merge fixes from Andrew Morton: "15 patches. Subsystems affected by this patch series: mailmap, mm/hotfixes, mm/thp, mm/memory-hotplug, misc, kcsan" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: kcsan: kconfig: move to menu 'Generic Kernel Debugging Instruments' fs/fs-writeback.c: adjust dirtytime_interval_handler definition to match prototype stackleak: let stack_erasing_sysctl take a kernel pointer buffer ftrace: let ftrace_enable_sysctl take a kernel pointer buffer mm/memory_hotplug: drain per-cpu pages again during memory offline selftests/vm: fix display of page size in map_hugetlb mm/thp: fix __split_huge_pmd_locked() for migration PMD kprobes: fix kill kprobe which has been marked as gone tmpfs: restore functionality of nr_inodes=0 mlock: fix unevictable_pgs event counts on THP mm: fix check_move_unevictable_pages() on THP mm: migration of hugetlbfs page skip memcg ksm: reinstate memcg charge on copied pages mailmap: add older email addresses for Kees Cook
2020-09-19Merge branch 'i2c/for-current' of ↵Linus Torvalds5-17/+29
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c fixes from Wolfram Sang: "Another bunch of fixes for I2C. Jean's i801 patch is a cleanup on top of Volker's i801 patch, but it will make dependency handling much easier if those two go together" * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: i2c: mxs: use MXS_DMA_CTRL_WAIT4END instead of DMA_CTRL_ACK i2c: mediatek: Send i2c master code at more than 1MHz i2c: mediatek: Fix generic definitions for bus frequency i2c: core: Call i2c_acpi_install_space_handler() before i2c_acpi_register_devices() i2c: i801: Simplify the suspend callback i2c: i801: Fix resume bug i2c: aspeed: Mask IRQ status to relevant bits
2020-09-19RISC-V: Resurrect the MMIO timer implementation for M-mode systemsPalmer Dabbelt4-0/+71
The K210 doesn't implement rdtime in M-mode, and since that's where Linux runs in the NOMMU systems that means we can't use rdtime. The K210 is the only system that anyone is currently running NOMMU or M-mode on, so here we're just inlining the timer read directly. This also adds the CLINT driver as an !MMU dependency, as it's currently the only timer driver availiable for these systems and without it we get a build failure for some configurations. Tested-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2020-09-19riscv: Fix Kendryte K210 device treeDamien Le Moal1-2/+4
The Kendryte K210 SoC CLINT is compatible with Sifive clint v0 (sifive,clint0). Fix the Kendryte K210 device tree clint entry to be inline with the sifive timer definition documented in Documentation/devicetree/bindings/timer/sifive,clint.yaml. The device tree clint entry is renamed similarly to u-boot device tree definition to improve compatibility with u-boot defined device tree. To ensure correct initialization, the interrup-cells attribute is added and the interrupt-extended attribute definition fixed. This fixes boot failures with Kendryte K210 SoC boards. Note that the clock referenced is kept as K210_CLK_ACLK, which does not necessarilly match the clint MTIME increment rate. This however does not seem to cause any problem for now. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2020-09-19riscv: Add sfence.vma after early page table changesGreentime Hu1-4/+3
This invalidates local TLB after modifying the page tables during early init as it's too early to handle suprious faults as we otherwise do. Fixes: f2c17aabc917 ("RISC-V: Implement compile-time fixed mappings") Reported-by: Syven Wang <syven.wang@sifive.com> Signed-off-by: Syven Wang <syven.wang@sifive.com> Signed-off-by: Greentime Hu <greentime.hu@sifive.com> Reviewed-by: Anup Patel <anup@brainfault.org> [Palmer: Cleaned up the commit text] Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2020-09-19kcsan: kconfig: move to menu 'Generic Kernel Debugging Instruments'Changbin Du1-3/+1
This moves the KCSAN kconfig items under menu 'Generic Kernel Debugging Instruments' where UBSAN resides. Signed-off-by: Changbin Du <changbin.du@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Marco Elver <elver@google.com> Link: https://lkml.kernel.org/r/20200904152224.5570-1-changbin.du@gmail.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-09-19fs/fs-writeback.c: adjust dirtytime_interval_handler definition to match ↵Tobias Klauser1-1/+1
prototype Commit 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler") changed ctl_table.proc_handler to take a kernel pointer. Adjust the definition of dirtytime_interval_handler to match its prototype in linux/writeback.h which fixes the following sparse error/warning: fs/fs-writeback.c:2189:50: warning: incorrect type in argument 3 (different address spaces) fs/fs-writeback.c:2189:50: expected void * fs/fs-writeback.c:2189:50: got void [noderef] __user *buffer fs/fs-writeback.c:2184:5: error: symbol 'dirtytime_interval_handler' redeclared with different type (incompatible argument 3 (different address spaces)): fs/fs-writeback.c:2184:5: int extern [addressable] [signed] [toplevel] dirtytime_interval_handler( ... ) fs/fs-writeback.c: note: in included file: ./include/linux/writeback.h:374:5: note: previously declared as: ./include/linux/writeback.h:374:5: int extern [addressable] [signed] [toplevel] dirtytime_interval_handler( ... ) Fixes: 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler") Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Link: https://lkml.kernel.org/r/20200907093140.13434-1-tklauser@distanz.ch Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-09-19stackleak: let stack_erasing_sysctl take a kernel pointer bufferTobias Klauser2-2/+2
Commit 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler") changed ctl_table.proc_handler to take a kernel pointer. Adjust the signature of stack_erasing_sysctl to match ctl_table.proc_handler which fixes the following sparse warning: kernel/stackleak.c:31:50: warning: incorrect type in argument 3 (different address spaces) kernel/stackleak.c:31:50: expected void * kernel/stackleak.c:31:50: got void [noderef] __user *buffer Fixes: 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler") Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Link: https://lkml.kernel.org/r/20200907093253.13656-1-tklauser@distanz.ch Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-09-19ftrace: let ftrace_enable_sysctl take a kernel pointer bufferTobias Klauser2-4/+2
Commit 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler") changed ctl_table.proc_handler to take a kernel pointer. Adjust the signature of ftrace_enable_sysctl to match ctl_table.proc_handler which fixes the following sparse warning: kernel/trace/ftrace.c:7544:43: warning: incorrect type in argument 3 (different address spaces) kernel/trace/ftrace.c:7544:43: expected void * kernel/trace/ftrace.c:7544:43: got void [noderef] __user *buffer Fixes: 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler") Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Link: https://lkml.kernel.org/r/20200907093207.13540-1-tklauser@distanz.ch Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-09-19mm/memory_hotplug: drain per-cpu pages again during memory offlinePavel Tatashin2-0/+22
There is a race during page offline that can lead to infinite loop: a page never ends up on a buddy list and __offline_pages() keeps retrying infinitely or until a termination signal is received. Thread#1 - a new process: load_elf_binary begin_new_exec exec_mmap mmput exit_mmap tlb_finish_mmu tlb_flush_mmu release_pages free_unref_page_list free_unref_page_prepare set_pcppage_migratetype(page, migratetype); // Set page->index migration type below MIGRATE_PCPTYPES Thread#2 - hot-removes memory __offline_pages start_isolate_page_range set_migratetype_isolate set_pageblock_migratetype(page, MIGRATE_ISOLATE); Set migration type to MIGRATE_ISOLATE-> set drain_all_pages(zone); // drain per-cpu page lists to buddy allocator. Thread#1 - continue free_unref_page_commit migratetype = get_pcppage_migratetype(page); // get old migration type list_add(&page->lru, &pcp->lists[migratetype]); // add new page to already drained pcp list Thread#2 Never drains pcp again, and therefore gets stuck in the loop. The fix is to try to drain per-cpu lists again after check_pages_isolated_cb() fails. Fixes: c52e75935f8d ("mm: remove extra drain pages on pcp list") Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: David Rientjes <rientjes@google.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20200903140032.380431-1-pasha.tatashin@soleen.com Link: https://lkml.kernel.org/r/20200904151448.100489-2-pasha.tatashin@soleen.com Link: http://lkml.kernel.org/r/20200904070235.GA15277@dhcp22.suse.cz Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-09-19selftests/vm: fix display of page size in map_hugetlbChristophe Leroy1-1/+1
The displayed size is in bytes while the text says it is in kB. Shift it by 10 to really display kBytes. Fixes: fa7b9a805c79 ("tools/selftest/vm: allow choosing mem size and page size in map_hugetlb") Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/e27481224564a93d14106e750de31189deaa8bc8.1598861977.git.christophe.leroy@csgroup.eu Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-09-19mm/thp: fix __split_huge_pmd_locked() for migration PMDRalph Campbell1-19/+23
A migrating transparent huge page has to already be unmapped. Otherwise, the page could be modified while it is being copied to a new page and data could be lost. The function __split_huge_pmd() checks for a PMD migration entry before calling __split_huge_pmd_locked() leading one to think that __split_huge_pmd_locked() can handle splitting a migrating PMD. However, the code always increments the page->_mapcount and adjusts the memory control group accounting assuming the page is mapped. Also, if the PMD entry is a migration PMD entry, the call to is_huge_zero_pmd(*pmd) is incorrect because it calls pmd_pfn(pmd) instead of migration_entry_to_pfn(pmd_to_swp_entry(pmd)). Fix these problems by checking for a PMD migration entry. Fixes: 84c3fc4e9c56 ("mm: thp: check pmd migration entry in common path") Signed-off-by: Ralph Campbell <rcampbell@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Yang Shi <shy828301@gmail.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Bharata B Rao <bharata@linux.ibm.com> Cc: Ben Skeggs <bskeggs@redhat.com> Cc: Shuah Khan <shuah@kernel.org> Cc: <stable@vger.kernel.org> [4.14+] Link: https://lkml.kernel.org/r/20200903183140.19055-1-rcampbell@nvidia.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-09-19kprobes: fix kill kprobe which has been marked as goneMuchun Song1-1/+8
If a kprobe is marked as gone, we should not kill it again. Otherwise, we can disarm the kprobe more than once. In that case, the statistics of kprobe_ftrace_enabled can unbalance which can lead to that kprobe do not work. Fixes: e8386a0cb22f ("kprobes: support probing module __exit function") Co-developed-by: Chengming Zhou <zhouchengming@bytedance.com> Signed-off-by: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: "Naveen N . Rao" <naveen.n.rao@linux.ibm.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: David S. Miller <davem@davemloft.net> Cc: Song Liu <songliubraving@fb.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20200822030055.32383-1-songmuchun@bytedance.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-09-19tmpfs: restore functionality of nr_inodes=0Byron Stanoszek1-4/+6
Commit e809d5f0b5c9 ("tmpfs: per-superblock i_ino support") made changes to shmem_reserve_inode() in mm/shmem.c, however the original test for (sbinfo->max_inodes) got dropped. This causes mounting tmpfs with option nr_inodes=0 to fail: # mount -ttmpfs -onr_inodes=0 none /ext0 mount: /ext0: mount(2) system call failed: Cannot allocate memory. This patch restores the nr_inodes=0 functionality. Fixes: e809d5f0b5c9 ("tmpfs: per-superblock i_ino support") Signed-off-by: Byron Stanoszek <gandalf@winds.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Hugh Dickins <hughd@google.com> Acked-by: Chris Down <chris@chrisdown.name> Link: https://lkml.kernel.org/r/20200902035715.16414-1-gandalf@winds.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-09-19mlock: fix unevictable_pgs event counts on THPHugh Dickins2-12/+18
5.8 commit 5d91f31faf8e ("mm: swap: fix vmstats for huge page") has established that vm_events should count every subpage of a THP, including unevictable_pgs_culled and unevictable_pgs_rescued; but lru_cache_add_inactive_or_unevictable() was not doing so for unevictable_pgs_mlocked, and mm/mlock.c was not doing so for unevictable_pgs mlocked, munlocked, cleared and stranded. Fix them; but THPs don't go the pagevec way in mlock.c, so no fixes needed on that path. Fixes: 5d91f31faf8e ("mm: swap: fix vmstats for huge page") Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Shakeel Butt <shakeelb@google.com> Acked-by: Yang Shi <shy828301@gmail.com> Cc: Alex Shi <alex.shi@linux.alibaba.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Qian Cai <cai@lca.pw> Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2008301408230.5954@eggly.anvils Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-09-19mm: fix check_move_unevictable_pages() on THPHugh Dickins1-2/+8
check_move_unevictable_pages() is used in making unevictable shmem pages evictable: by shmem_unlock_mapping(), drm_gem_check_release_pagevec() and i915/gem check_release_pagevec(). Those may pass down subpages of a huge page, when /sys/kernel/mm/transparent_hugepage/shmem_enabled is "force". That does not crash or warn at present, but the accounting of vmstats unevictable_pgs_scanned and unevictable_pgs_rescued is inconsistent: scanned being incremented on each subpage, rescued only on the head (since tails already appear evictable once the head has been updated). 5.8 commit 5d91f31faf8e ("mm: swap: fix vmstats for huge page") has established that vm_events in general (and unevictable_pgs_rescued in particular) should count every subpage: so follow that precedent here. Do this in such a way that if mem_cgroup_page_lruvec() is made stricter (to check page->mem_cgroup is always set), no problem: skip the tails before calling it, and add thp_nr_pages() to vmstats on the head. Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Shakeel Butt <shakeelb@google.com> Acked-by: Yang Shi <shy828301@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Qian Cai <cai@lca.pw> Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2008301405000.5954@eggly.anvils Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-09-19mm: migration of hugetlbfs page skip memcgHugh Dickins1-1/+2
hugetlbfs pages do not participate in memcg: so although they do find most of migrate_page_states() useful, it would be better if they did not call into mem_cgroup_migrate() - where Qian Cai reported that LTP's move_pages12 triggers the warning in Alex Shi's prospective commit "mm/memcg: warning on !memcg after readahead page charged". Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Shakeel Butt <shakeelb@google.com> Acked-by: Johannes Weiner <hannes@cmpxch.org> Cc: Alex Shi <alex.shi@linux.alibaba.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Qian Cai <cai@lca.pw> Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2008301359460.5954@eggly.anvils Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-09-19ksm: reinstate memcg charge on copied pagesHugh Dickins1-0/+4
Patch series "mm: fixes to past from future testing". Here's a set of independent fixes against 5.9-rc2: prompted by testing Alex Shi's "warning on !memcg" and lru_lock series, but I think fit for 5.9 - though maybe only the first for stable. This patch (of 5): In 5.8 some instances of memcg charging in do_swap_page() and unuse_pte() were removed, on the understanding that swap cache is now already charged at those points; but a case was missed, when ksm_might_need_to_copy() has decided it must allocate a substitute page: such pages were never charged. Fix it inside ksm_might_need_to_copy(). This was discovered by Alex Shi's prospective commit "mm/memcg: warning on !memcg after readahead page charged". But there is a another surprise: this also fixes some rarer uncharged PageAnon cases, when KSM is configured in, but has never been activated. ksm_might_need_to_copy()'s anon_vma->root and linear_page_index() check sometimes catches a case which would need to have been copied if KSM were turned on. Or that's my optimistic interpretation (of my own old code), but it leaves some doubt as to whether everything is working as intended there - might it hint at rare anon ptes which rmap cannot find? A question not easily answered: put in the fix for missed memcg charges. Cc; Matthew Wilcox <willy@infradead.org> Fixes: 4c6355b25e8b ("mm: memcontrol: charge swapin pages on instantiation") Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Shakeel Butt <shakeelb@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Alex Shi <alex.shi@linux.alibaba.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Qian Cai <cai@lca.pw> Cc: <stable@vger.kernel.org> [5.8] Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2008301343270.5954@eggly.anvils Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2008301358020.5954@eggly.anvils Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>