Age | Commit message (Collapse) | Author | Files | Lines |
|
Try to make minimal changes to the document which already describes
access class 0 in a generic fashion (including IO initiatiors that
are not CPUs).
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
New access1 class is nearly the same as access0, but always provides
characteristics for CPUs to memory. The existing access0 class
provides characteristics to nearest or direct connnect initiator
which may be a Generic Initiator such as a GPU or network adapter.
This new class allows thread placement on CPUs to be performed
so as to give optimal access characteristics to memory, even if that
memory is for example attached to a GPU or similar and only accessible
to the CPU via an appropriate bus.
Suggested-by: Dan Willaims <dan.j.williams@intel.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
In ACPI 6.3, the Memory Proximity Domain Attributes Structure
changed substantially. One of those changes was that the flag
for "Memory Proximity Domain field is valid" was deprecated.
This was because the field "Proximity Domain for the Memory"
became a required field and hence having a validity flag makes
no sense.
So the correct logic is to always assume the field is there.
Current code assumes it never is.
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Until we tell ACPI that we support generic initiators, it will have
to operate in fall back domain mode and all _PXM entries should
be on existing non GI domains.
This patch sets the relevant OSC bit to make that happen.
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
In common with memoryless domains only register GI domains
if the proximity node is not online. If a domain is already
a memory containing domain, or a memoryless domain there is
nothing to do just because it also contains a Generic Initiator.
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Generic Initiators are a new ACPI concept that allows for the
description of proximity domains that contain a device which
performs memory access (such as a network card) but neither
host CPU nor Memory.
This patch has the parsing code and provides the infrastructure
for an architecture to associate these new domains with their
nearest memory processing node.
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
After commit 01feba590cd6 ("ACPI: Do not create new NUMA domains from
ACPI static tables that are not SRAT"):
$ scripts/config --file arch/x86/configs/x86_64_defconfig -d NUMA -e ACPI_NFIT
$ make -skj"$(nproc)" distclean defconfig drivers/acpi/nfit/
drivers/acpi/nfit/core.c: In function ‘acpi_nfit_register_region’:
drivers/acpi/nfit/core.c:3010:27: error: implicit declaration of
function ‘pxm_to_node’; did you mean ‘xa_to_node’?
[-Werror=implicit-function-declaration]
3010 | ndr_desc->target_node = pxm_to_node(spa->proximity_domain);
| ^~~~~~~~~~~
| xa_to_node
cc1: some warnings being treated as errors
...
Add a stub function like acpi_map_pxm_to_node() had so that the build
continues to work.
Fixes: 01feba590cd6 ("ACPI: Do not create new NUMA domains from ACPI static tables that are not SRAT")
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested
Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Hanjun Guo <guohanjun@huawei.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
processor or memory
Note this crash is present before any of the patches in this series, but
as explained below it is highly unlikely anyone is shipping a firmware that
causes it. Tests were done using an overriden SRAT.
On ARM64, the gic-v3 driver directly parses SRAT to locate GIC Interrupt
Translation Service (ITS) Affinity Structures. This is done much later
in the boot than the parses of SRAT which identify proximity domains.
As a result, an ITS placed in a proximity domain that is not defined by
another SRAT structure will result in a NUMA node that is not completely
configured and a crash.
ITS [mem 0x202100000-0x20211ffff]
ITS@0x0000000202100000: Using ITS number 0
Unable to handle kernel paging request at virtual address 0000000000001a08
...
Call trace:
__alloc_pages_nodemask+0xe8/0x338
alloc_pages_node.constprop.0+0x34/0x40
its_probe_one+0x2f8/0xb18
gic_acpi_parse_madt_its+0x108/0x150
acpi_table_parse_entries_array+0x17c/0x264
acpi_table_parse_entries+0x48/0x6c
acpi_table_parse_madt+0x30/0x3c
its_init+0x1c4/0x644
gic_init_bases+0x4b8/0x4ec
gic_acpi_init+0x134/0x264
acpi_match_madt+0x4c/0x84
acpi_table_parse_entries_array+0x17c/0x264
acpi_table_parse_entries+0x48/0x6c
acpi_table_parse_madt+0x30/0x3c
__acpi_probe_device_table+0x8c/0xe8
irqchip_init+0x3c/0x48
init_IRQ+0xcc/0x100
start_kernel+0x33c/0x548
ACPI 6.3 allows any set of Affinity Structures in SRAT to define a proximity
domain. However, as we do not see this crash, we can conclude that no
firmware is currently placing an ITS in a node that is separate from
those containing memory and / or processors.
We could modify the SRAT parsing behavior to identify the existence
of Proximity Domains unique to the ITS structures, and handle them as
a special case of a generic initiator (once support for those merges).
This patch avoids the complexity that would be needed to handle this corner
case, by not allowing the ITS entry parsing code to instantiate new NUMA
Nodes. If one is encountered that does not already exist, then NO_NUMA_NODE
is assigned and a warning printed just as if the value had been greater than
allowed NUMA Nodes.
"SRAT: Invalid NUMA node -1 in ITS affinity"
Whilst this does not provide the full flexibility allowed by ACPI,
it does fix the problem. We can revisit a more sophisticated solution if
needed by future platforms.
Change is simply to replace acpi_map_pxm_to_node with pxm_to_node reflecting
the fact a new mapping is not created.
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Hanjun Guo <guohanjun@huawei.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
acpi_get_node() calls acpi_get_pxm() to evaluate the _PXM AML method
for entries found in DSDT/SSDT. ACPI 6.3 sec 6.2.14 states
"_PXM evaluates to an integer that identifies a device as belonging to
a Proximity Domain defined in the System Resource Affinity Table (SRAT)."
Hence a _PXM method should not result in creation of a new NUMA node.
Before this patch, _PXM could result in partial instantiation of
NUMA node, missing elements such as zone lists. A call to
devm_kzalloc(), for example, results in a NULL pointer dereference.
This patch therefore replaces the acpi_map_pxm_to_node() with a call
to pxm_to_node().
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Hanjun Guo <guohanjun@huawei.com>
Reviewed-by: Barry Song <song.bao.hua@hisilicon.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
As this function is no longer allowed to create new mappings
let us rename it to reflect this.
Note all nodes should already exist before any of the users
of this function are called.
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Hanjun Guo <guohanjun@huawei.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
acpi_map_pxm_to_online_node()
While this function will only return an online node, it can have the side
effect of partially creating a new node. The existing comments suggest this
is intentional, but the usecases of this function are related to NFIT and
HMAT parsing, neither of which should be able to define new nodes.
One route by which the existing behaviour would cause a crash is to have a
_PXM entry in ACPI DSDT attempt to place a device within this partly
created proximity domain. A subsequent call to devm_kzalloc() or similar
would result in an attempt to allocate memory on a node for which zone
lists have not been set up and a NULL pointer dereference.
Prevent such cases by switching to pxm_to_node() within
acpi_map_pxm_to_online_node() which cannot cause a new node to be
partly created. If one would previously have been created we now
return NO_NUMA_NODE. Documentation updated to reflect this change.
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Hanjun Guo <guohanjun@huawei.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Several ACPI static tables contain references to proximity domains.
ACPI 6.3 has clarified that only entries in SRAT may define a new
domain (sec 5.2.16).
Those tables described in the ACPI spec have additional clarifying text.
NFIT: Table 5-132,
"Integer that represents the proximity domain to which the memory
belongs. This number must match with corresponding entry in the
SRAT table."
HMAT: Table 5-145,
"... This number must match with the corresponding entry in the SRAT
table's processor affinity structure ... if the initiator is a processor,
or the Generic Initiator Affinity Structure if the initiator is a generic
initiator".
IORT and DMAR are defined by external specifications.
Intel Virtualization Technology for Directed I/O Rev 3.1 does not make any
explicit statements, but the general SRAT statement above will still apply.
https://software.intel.com/sites/default/files/managed/c5/15/vt-directed-io-spec.pdf
IO Remapping Table, Platform Design Document rev D, also makes not explicit
statement, but refers to ACPI SRAT table for more information and again the
generic SRAT statement above applies.
https://developer.arm.com/documentation/den0049/d/
In conclusion, any proximity domain specified in these tables, should be a
reference to a proximity domain also found in SRAT, and they should not be
able to instantiate a new domain. Hence we switch to pxm_to_node() which
will only return existing nodes.
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Barry Song <song.bao.hua@hisilicon.com>
Reviewed-by: Hanjun Guo <guohanjun@huawei.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
The function should check the validity of the pxm value before using
it to index the pxm_to_node_map[] array.
Whilst hardening this code may be good in general, the main intent
here is to enable following patches that use this function to replace
acpi_map_pxm_to_node() for non SRAT usecases which should return
NO_NUMA_NODE for PXM entries not matching with those in SRAT.
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Barry Song <song.bao.hua@hisilicon.com>
Reviewed-by: Hanjun Guo <guohanjun@huawei.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull syscall tracing fix from Borislav Petkov:
"Fix the seccomp syscall rewriting so that trace and audit see the
rewritten syscall number, from Kees Cook"
* tag 'core_urgent_for_v5.9_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
core/entry: Report syscall correctly for trace and audit
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull objtool fix from Borislav Petkov:
"Fix noreturn detection for ignored sibling functions (Josh Poimboeuf)"
* tag 'objtool_urgent_for_v5.9_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
objtool: Fix noreturn detection for ignored functions
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking fixes from Borislav Petkov:
"Two fixes from the locking/urgent pile:
- Fix lockdep's detection of "USED" <- "IN-NMI" inversions (Peter
Zijlstra)
- Make percpu-rwsem operations on the semaphore's ->read_count
IRQ-safe because it can be used in an IRQ context (Hou Tao)"
* tag 'locking_urgent_for_v5.9_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/percpu-rwsem: Use this_cpu_{inc,dec}() for read_count
locking/lockdep: Fix "USED" <- "IN-NMI" inversions
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull EFI fix from Borislav Petkov:
"Ensure that the EFI bootloader control module only probes successfully
on systems that support the EFI SetVariable runtime service"
[ Tag and commit from Ard Biesheuvel, forwarded by Borislav ]
* tag 'efi-urgent-for-v5.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
efi: efibc: check for efivars write capability
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:
- A defconfig fix (Daniel Díaz)
- Disable relocation relaxation for the compressed kernel when not
built as -pie as in that case kernels built with clang and linked
with LLD fail to boot due to the linker optimizing some instructions
in non-PIE form; the gory details in the commit message (Arvind
Sankar)
- A fix for the "bad bp value" warning issued by the frame-pointer
unwinder (Josh Poimboeuf)
* tag 'x86_urgent_for_v5.9_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/unwind/fp: Fix FP unwinding in ret_from_fork
x86/boot/compressed: Disable relocation relaxation
x86/defconfigs: Explicitly unset CONFIG_64BIT in i386_defconfig
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm fixes from Dan Williams:
"A handful of fixes to address a string of mistakes in the mechanism
for device-mapper to determine if its component devices are dax
capable.
- Fix an original bug in device-mapper table reference counting when
interrogating dax capability in the component device. This bug was
hidden by the following bug.
- Fix device-mapper to use the proper helper (dax_supported() instead
of the leaf helper generic_fsdax_supported()) to determine dax
operation of a stacked block device configuration. The original
implementation is only valid for one level of dax-capable block
device stacking. This bug was discovered while fixing the below
regression.
- Fix an infinite recursion regression introduced by broken attempts
to quiet the generic_fsdax_supported() path and make it bail out
before logging "dax capability not found" errors"
* tag 'libnvdimm-fixes-5.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
dax: Fix stack overflow when mounting fsdax pmem device
dm: Call proper helper to determine dax support
dm/dax: Fix table reference counts
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fixes from Palmer Dabbelt:
- A fix for a lockdep issue to avoid an asserting triggering during
early boot. There shouldn't be any incorrect behavior as the system
isn't concurrent at the time.
- The addition of a missing fence when installing early fixmap
mappings.
- A corretion to the K210 device tree's interrupt map.
- A fix for M-mode timer handling on the K210.
* tag 'riscv-for-linus-5.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
RISC-V: Resurrect the MMIO timer implementation for M-mode systems
riscv: Fix Kendryte K210 device tree
riscv: Add sfence.vma after early page table changes
RISC-V: Take text_mutex in ftrace_init_nop()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB/Thunderbolt fixes from Greg KH:
"Here are some small USB and one Thunderbolt driver fixes.
Nothing major at all, just some fixes for reported issues, and a quirk
addition:
- typec fixes
- UAS disconnect fix
- usblp race fix
- ehci-hcd modversions build fix
- ignore wakeup quirk table addition
- thunderbolt DROM read fix
All of these have been in linux-next with no reported issues"
* tag 'usb-5.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
usblp: fix race between disconnect() and read()
ehci-hcd: Move include to keep CRC stable
usb: typec: intel_pmc_mux: Handle SCU IPC error conditions
USB: quirks: Add USB_QUIRK_IGNORE_REMOTE_WAKEUP quirk for BYD zhaoxin notebook
USB: UAS: fix disconnect by unplugging a hub
usb: typec: ucsi: Prevent mode overrun
usb: typec: ucsi: acpi: Increase command completion timeout value
thunderbolt: Retry DROM read once if parsing fails
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty/serial/fbcon fixes from Greg KH:
"Here are some small tty/serial and one more fbcon fix.
They include:
- serial core locking regression fixes
- new device ids for 8250_pci driver
- fbcon fix for syzbot found issue
All have been in linux-next with no reported issues"
* tag 'tty-5.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
fbcon: Fix user font detection test at fbcon_resize().
serial: 8250_pci: Add Realtek 816a and 816b
serial: core: fix console port-lock regression
serial: core: fix port-lock initialisation
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras
Pull EDAC fixes from Borislav Petkov:
"Two fixes for resulting from CONFIG_DEBUG_TEST_DRIVER_REMOVE=y
experiments:
- complete a previous fix to reset a local structure containing
scanned system data properly so that the driver rescans, as it
should, on a second load.
- address a refcount underflow due to not paying attention to the
driver whitelest on unregister"
* tag 'edac_urgent_for_v5.9_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
EDAC/ghes: Check whether the driver is on the safe list correctly
EDAC/ghes: Clear scanned data on unload
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input fixes from Dmitry Torokhov:
"Just a couple of driver quirks"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: trackpoint - add new trackpoint variant IDs
Input: i8042 - add Entroware Proteus EL07R4 to nomux and reset lists
|
|
Sedat Dilek pointed out some silly comment typo issues.
Reported-by: Sedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild fixes from Masahiro Yamada:
"Fix qconf warnings and revive help message"
* tag 'kbuild-fixes-v5.9-3' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kconfig: qconf: revive help message in the info view
kconfig: qconf: fix incomplete type 'struct gstr' warning
kconfig: qconf: use delete[] instead of delete to free array (again)
|
|
When mounting fsdax pmem device, commit 6180bb446ab6 ("dax: fix
detection of dax support for non-persistent memory block devices")
introduces the stack overflow [1][2]. Here is the call path for
mounting ext4 file system:
ext4_fill_super
bdev_dax_supported
__bdev_dax_supported
dax_supported
generic_fsdax_supported
__generic_fsdax_supported
bdev_dax_supported
The call path leads to the infinite calling loop, so we cannot
call bdev_dax_supported() in __generic_fsdax_supported(). The sanity
checking of the variable 'dax_dev' is moved prior to the two
bdev_dax_pgoff() checks [3][4].
[1] https://lore.kernel.org/linux-nvdimm/1420999447.1004543.1600055488770.JavaMail.zimbra@redhat.com/
[2] https://lore.kernel.org/linux-nvdimm/alpine.LRH.2.02.2009141131220.30651@file01.intranet.prod.int.rdu2.redhat.com/
[3] https://lore.kernel.org/linux-nvdimm/CA+RJvhxBHriCuJhm-D8NvJRe3h2MLM+ZMFgjeJjrRPerMRLvdg@mail.gmail.com/
[4] https://lore.kernel.org/linux-nvdimm/20200903160608.GU878166@iweiny-DESK2.sc.intel.com/
Fixes: 6180bb446ab6 ("dax: fix detection of dax support for non-persistent memory block devices")
Reported-by: Yi Zhang <yi.zhang@redhat.com>
Reported-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Adrian Huang <ahuang12@lenovo.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Tested-by: Ritesh Harjani <riteshh@linux.ibm.com>
Cc: Coly Li <colyli@suse.de>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: John Pittman <jpittman@redhat.com>
Link: https://lore.kernel.org/r/20200917111549.6367-1-adrianhuang0701@gmail.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
DM was calling generic_fsdax_supported() to determine whether a device
referenced in the DM table supports DAX. However this is a helper for "leaf" device drivers so that
they don't have to duplicate common generic checks. High level code
should call dax_supported() helper which that calls into appropriate
helper for the particular device. This problem manifested itself as
kernel messages:
dm-3: error: dax access failed (-95)
when lvm2-testsuite run in cases where a DM device was stacked on top of
another DM device.
Fixes: 7bf7eac8d648 ("dax: Arrange for dax_supported check to span multiple devices")
Cc: <stable@vger.kernel.org>
Tested-by: Adrian Huang <ahuang12@lenovo.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Acked-by: Mike Snitzer <snitzer@redhat.com>
Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/r/160061715195.13131.5503173247632041975.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
A recent fix to the dm_dax_supported() flow uncovered a latent bug. When
dm_get_live_table() fails it is still required to drop the
srcu_read_lock(). Without this change the lvm2 test-suite triggers this
warning:
# lvm2-testsuite --only pvmove-abort-all.sh
WARNING: lock held when returning to user space!
5.9.0-rc5+ #251 Tainted: G OE
------------------------------------------------
lvm/1318 is leaving the kernel with locks still held!
1 lock held by lvm/1318:
#0: ffff9372abb5a340 (&md->io_barrier){....}-{0:0}, at: dm_get_live_table+0x5/0xb0 [dm_mod]
...and later on this hang signature:
INFO: task lvm:1344 blocked for more than 122 seconds.
Tainted: G OE 5.9.0-rc5+ #251
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:lvm state:D stack: 0 pid: 1344 ppid: 1 flags:0x00004000
Call Trace:
__schedule+0x45f/0xa80
? finish_task_switch+0x249/0x2c0
? wait_for_completion+0x86/0x110
schedule+0x5f/0xd0
schedule_timeout+0x212/0x2a0
? __schedule+0x467/0xa80
? wait_for_completion+0x86/0x110
wait_for_completion+0xb0/0x110
__synchronize_srcu+0xd1/0x160
? __bpf_trace_rcu_utilization+0x10/0x10
__dm_suspend+0x6d/0x210 [dm_mod]
dm_suspend+0xf6/0x140 [dm_mod]
Fixes: 7bf7eac8d648 ("dax: Arrange for dax_supported check to span multiple devices")
Cc: <stable@vger.kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Alasdair Kergon <agk@redhat.com>
Cc: Mike Snitzer <snitzer@redhat.com>
Reported-by: Adrian Huang <ahuang12@lenovo.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Tested-by: Adrian Huang <ahuang12@lenovo.com>
Link: https://lore.kernel.org/r/160045867590.25663.7548541079217827340.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
Since commit 68fd110b3e7e ("kconfig: qconf: remove redundant help in
the info view"), the help message is no longer displayed.
I intended to drop duplicated "Symbol:", "Type:", but precious info
about help and reverse dependencies was lost too.
Revive it now.
"defined at" is contained in menu_get_ext_help(), so I made sure
to not display it twice.
Fixes: 68fd110b3e7e ("kconfig: qconf: remove redundant help in the info view")
Reported-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
"make HOSTCXX=clang++ xconfig" reports the following:
HOSTCXX scripts/kconfig/qconf.o
In file included from scripts/kconfig/qconf.cc:23:
In file included from scripts/kconfig/lkc.h:15:
scripts/kconfig/lkc_proto.h:26:13: warning: 'get_relations_str' has C-linkage specified, but returns incomplete type 'struct gstr' which could be incompatible with C [-Wreturn-type-c-linkage]
struct gstr get_relations_str(struct symbol **sym_arr, struct list_head *head);
^
Currently, get_relations_str() is declared before the struct gstr
definition.
Move all declarations of menu.c functions below.
BTW, some are declared in lkc.h and some in lkc_proto.h, but the
difference is unclear. I guess some refactoring is needed.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Acked-by: Boris Kolpackov <boris@codesynthesis.com>
|
|
Merge fixes from Andrew Morton:
"15 patches.
Subsystems affected by this patch series: mailmap, mm/hotfixes,
mm/thp, mm/memory-hotplug, misc, kcsan"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
kcsan: kconfig: move to menu 'Generic Kernel Debugging Instruments'
fs/fs-writeback.c: adjust dirtytime_interval_handler definition to match prototype
stackleak: let stack_erasing_sysctl take a kernel pointer buffer
ftrace: let ftrace_enable_sysctl take a kernel pointer buffer
mm/memory_hotplug: drain per-cpu pages again during memory offline
selftests/vm: fix display of page size in map_hugetlb
mm/thp: fix __split_huge_pmd_locked() for migration PMD
kprobes: fix kill kprobe which has been marked as gone
tmpfs: restore functionality of nr_inodes=0
mlock: fix unevictable_pgs event counts on THP
mm: fix check_move_unevictable_pages() on THP
mm: migration of hugetlbfs page skip memcg
ksm: reinstate memcg charge on copied pages
mailmap: add older email addresses for Kees Cook
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"Another bunch of fixes for I2C.
Jean's i801 patch is a cleanup on top of Volker's i801 patch, but it
will make dependency handling much easier if those two go together"
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: mxs: use MXS_DMA_CTRL_WAIT4END instead of DMA_CTRL_ACK
i2c: mediatek: Send i2c master code at more than 1MHz
i2c: mediatek: Fix generic definitions for bus frequency
i2c: core: Call i2c_acpi_install_space_handler() before i2c_acpi_register_devices()
i2c: i801: Simplify the suspend callback
i2c: i801: Fix resume bug
i2c: aspeed: Mask IRQ status to relevant bits
|
|
The K210 doesn't implement rdtime in M-mode, and since that's where Linux runs
in the NOMMU systems that means we can't use rdtime. The K210 is the only
system that anyone is currently running NOMMU or M-mode on, so here we're just
inlining the timer read directly.
This also adds the CLINT driver as an !MMU dependency, as it's currently the
only timer driver availiable for these systems and without it we get a build
failure for some configurations.
Tested-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
|
|
The Kendryte K210 SoC CLINT is compatible with Sifive clint v0
(sifive,clint0). Fix the Kendryte K210 device tree clint entry to be
inline with the sifive timer definition documented in
Documentation/devicetree/bindings/timer/sifive,clint.yaml.
The device tree clint entry is renamed similarly to u-boot device tree
definition to improve compatibility with u-boot defined device tree.
To ensure correct initialization, the interrup-cells attribute is added
and the interrupt-extended attribute definition fixed.
This fixes boot failures with Kendryte K210 SoC boards.
Note that the clock referenced is kept as K210_CLK_ACLK, which does not
necessarilly match the clint MTIME increment rate. This however does not
seem to cause any problem for now.
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
|
|
This invalidates local TLB after modifying the page tables during early init as
it's too early to handle suprious faults as we otherwise do.
Fixes: f2c17aabc917 ("RISC-V: Implement compile-time fixed mappings")
Reported-by: Syven Wang <syven.wang@sifive.com>
Signed-off-by: Syven Wang <syven.wang@sifive.com>
Signed-off-by: Greentime Hu <greentime.hu@sifive.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
[Palmer: Cleaned up the commit text]
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
|
|
This moves the KCSAN kconfig items under menu 'Generic Kernel Debugging
Instruments' where UBSAN resides.
Signed-off-by: Changbin Du <changbin.du@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Marco Elver <elver@google.com>
Link: https://lkml.kernel.org/r/20200904152224.5570-1-changbin.du@gmail.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
prototype
Commit 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler")
changed ctl_table.proc_handler to take a kernel pointer. Adjust the
definition of dirtytime_interval_handler to match its prototype in
linux/writeback.h which fixes the following sparse error/warning:
fs/fs-writeback.c:2189:50: warning: incorrect type in argument 3 (different address spaces)
fs/fs-writeback.c:2189:50: expected void *
fs/fs-writeback.c:2189:50: got void [noderef] __user *buffer
fs/fs-writeback.c:2184:5: error: symbol 'dirtytime_interval_handler' redeclared with different type (incompatible argument 3 (different address spaces)):
fs/fs-writeback.c:2184:5: int extern [addressable] [signed] [toplevel] dirtytime_interval_handler( ... )
fs/fs-writeback.c: note: in included file:
./include/linux/writeback.h:374:5: note: previously declared as:
./include/linux/writeback.h:374:5: int extern [addressable] [signed] [toplevel] dirtytime_interval_handler( ... )
Fixes: 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler")
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Link: https://lkml.kernel.org/r/20200907093140.13434-1-tklauser@distanz.ch
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Commit 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler")
changed ctl_table.proc_handler to take a kernel pointer. Adjust the
signature of stack_erasing_sysctl to match ctl_table.proc_handler which
fixes the following sparse warning:
kernel/stackleak.c:31:50: warning: incorrect type in argument 3 (different address spaces)
kernel/stackleak.c:31:50: expected void *
kernel/stackleak.c:31:50: got void [noderef] __user *buffer
Fixes: 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler")
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Link: https://lkml.kernel.org/r/20200907093253.13656-1-tklauser@distanz.ch
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Commit 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler")
changed ctl_table.proc_handler to take a kernel pointer. Adjust the
signature of ftrace_enable_sysctl to match ctl_table.proc_handler which
fixes the following sparse warning:
kernel/trace/ftrace.c:7544:43: warning: incorrect type in argument 3 (different address spaces)
kernel/trace/ftrace.c:7544:43: expected void *
kernel/trace/ftrace.c:7544:43: got void [noderef] __user *buffer
Fixes: 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler")
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Link: https://lkml.kernel.org/r/20200907093207.13540-1-tklauser@distanz.ch
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
There is a race during page offline that can lead to infinite loop:
a page never ends up on a buddy list and __offline_pages() keeps
retrying infinitely or until a termination signal is received.
Thread#1 - a new process:
load_elf_binary
begin_new_exec
exec_mmap
mmput
exit_mmap
tlb_finish_mmu
tlb_flush_mmu
release_pages
free_unref_page_list
free_unref_page_prepare
set_pcppage_migratetype(page, migratetype);
// Set page->index migration type below MIGRATE_PCPTYPES
Thread#2 - hot-removes memory
__offline_pages
start_isolate_page_range
set_migratetype_isolate
set_pageblock_migratetype(page, MIGRATE_ISOLATE);
Set migration type to MIGRATE_ISOLATE-> set
drain_all_pages(zone);
// drain per-cpu page lists to buddy allocator.
Thread#1 - continue
free_unref_page_commit
migratetype = get_pcppage_migratetype(page);
// get old migration type
list_add(&page->lru, &pcp->lists[migratetype]);
// add new page to already drained pcp list
Thread#2
Never drains pcp again, and therefore gets stuck in the loop.
The fix is to try to drain per-cpu lists again after
check_pages_isolated_cb() fails.
Fixes: c52e75935f8d ("mm: remove extra drain pages on pcp list")
Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20200903140032.380431-1-pasha.tatashin@soleen.com
Link: https://lkml.kernel.org/r/20200904151448.100489-2-pasha.tatashin@soleen.com
Link: http://lkml.kernel.org/r/20200904070235.GA15277@dhcp22.suse.cz
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The displayed size is in bytes while the text says it is in kB.
Shift it by 10 to really display kBytes.
Fixes: fa7b9a805c79 ("tools/selftest/vm: allow choosing mem size and page size in map_hugetlb")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/e27481224564a93d14106e750de31189deaa8bc8.1598861977.git.christophe.leroy@csgroup.eu
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
A migrating transparent huge page has to already be unmapped. Otherwise,
the page could be modified while it is being copied to a new page and data
could be lost. The function __split_huge_pmd() checks for a PMD migration
entry before calling __split_huge_pmd_locked() leading one to think that
__split_huge_pmd_locked() can handle splitting a migrating PMD.
However, the code always increments the page->_mapcount and adjusts the
memory control group accounting assuming the page is mapped.
Also, if the PMD entry is a migration PMD entry, the call to
is_huge_zero_pmd(*pmd) is incorrect because it calls pmd_pfn(pmd) instead
of migration_entry_to_pfn(pmd_to_swp_entry(pmd)). Fix these problems by
checking for a PMD migration entry.
Fixes: 84c3fc4e9c56 ("mm: thp: check pmd migration entry in common path")
Signed-off-by: Ralph Campbell <rcampbell@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Bharata B Rao <bharata@linux.ibm.com>
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: <stable@vger.kernel.org> [4.14+]
Link: https://lkml.kernel.org/r/20200903183140.19055-1-rcampbell@nvidia.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
If a kprobe is marked as gone, we should not kill it again. Otherwise, we
can disarm the kprobe more than once. In that case, the statistics of
kprobe_ftrace_enabled can unbalance which can lead to that kprobe do not
work.
Fixes: e8386a0cb22f ("kprobes: support probing module __exit function")
Co-developed-by: Chengming Zhou <zhouchengming@bytedance.com>
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: "Naveen N . Rao" <naveen.n.rao@linux.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Song Liu <songliubraving@fb.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20200822030055.32383-1-songmuchun@bytedance.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Commit e809d5f0b5c9 ("tmpfs: per-superblock i_ino support") made changes
to shmem_reserve_inode() in mm/shmem.c, however the original test for
(sbinfo->max_inodes) got dropped. This causes mounting tmpfs with option
nr_inodes=0 to fail:
# mount -ttmpfs -onr_inodes=0 none /ext0
mount: /ext0: mount(2) system call failed: Cannot allocate memory.
This patch restores the nr_inodes=0 functionality.
Fixes: e809d5f0b5c9 ("tmpfs: per-superblock i_ino support")
Signed-off-by: Byron Stanoszek <gandalf@winds.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Hugh Dickins <hughd@google.com>
Acked-by: Chris Down <chris@chrisdown.name>
Link: https://lkml.kernel.org/r/20200902035715.16414-1-gandalf@winds.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
5.8 commit 5d91f31faf8e ("mm: swap: fix vmstats for huge page") has
established that vm_events should count every subpage of a THP, including
unevictable_pgs_culled and unevictable_pgs_rescued; but
lru_cache_add_inactive_or_unevictable() was not doing so for
unevictable_pgs_mlocked, and mm/mlock.c was not doing so for
unevictable_pgs mlocked, munlocked, cleared and stranded.
Fix them; but THPs don't go the pagevec way in mlock.c, so no fixes needed
on that path.
Fixes: 5d91f31faf8e ("mm: swap: fix vmstats for huge page")
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Yang Shi <shy828301@gmail.com>
Cc: Alex Shi <alex.shi@linux.alibaba.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Qian Cai <cai@lca.pw>
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2008301408230.5954@eggly.anvils
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
check_move_unevictable_pages() is used in making unevictable shmem pages
evictable: by shmem_unlock_mapping(), drm_gem_check_release_pagevec() and
i915/gem check_release_pagevec(). Those may pass down subpages of a huge
page, when /sys/kernel/mm/transparent_hugepage/shmem_enabled is "force".
That does not crash or warn at present, but the accounting of vmstats
unevictable_pgs_scanned and unevictable_pgs_rescued is inconsistent:
scanned being incremented on each subpage, rescued only on the head (since
tails already appear evictable once the head has been updated).
5.8 commit 5d91f31faf8e ("mm: swap: fix vmstats for huge page") has
established that vm_events in general (and unevictable_pgs_rescued in
particular) should count every subpage: so follow that precedent here.
Do this in such a way that if mem_cgroup_page_lruvec() is made stricter
(to check page->mem_cgroup is always set), no problem: skip the tails
before calling it, and add thp_nr_pages() to vmstats on the head.
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Yang Shi <shy828301@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Qian Cai <cai@lca.pw>
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2008301405000.5954@eggly.anvils
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
hugetlbfs pages do not participate in memcg: so although they do find most
of migrate_page_states() useful, it would be better if they did not call
into mem_cgroup_migrate() - where Qian Cai reported that LTP's
move_pages12 triggers the warning in Alex Shi's prospective commit
"mm/memcg: warning on !memcg after readahead page charged".
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Johannes Weiner <hannes@cmpxch.org>
Cc: Alex Shi <alex.shi@linux.alibaba.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Qian Cai <cai@lca.pw>
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2008301359460.5954@eggly.anvils
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Patch series "mm: fixes to past from future testing".
Here's a set of independent fixes against 5.9-rc2: prompted by
testing Alex Shi's "warning on !memcg" and lru_lock series, but
I think fit for 5.9 - though maybe only the first for stable.
This patch (of 5):
In 5.8 some instances of memcg charging in do_swap_page() and unuse_pte()
were removed, on the understanding that swap cache is now already charged
at those points; but a case was missed, when ksm_might_need_to_copy() has
decided it must allocate a substitute page: such pages were never charged.
Fix it inside ksm_might_need_to_copy().
This was discovered by Alex Shi's prospective commit "mm/memcg: warning on
!memcg after readahead page charged".
But there is a another surprise: this also fixes some rarer uncharged
PageAnon cases, when KSM is configured in, but has never been activated.
ksm_might_need_to_copy()'s anon_vma->root and linear_page_index() check
sometimes catches a case which would need to have been copied if KSM were
turned on. Or that's my optimistic interpretation (of my own old code),
but it leaves some doubt as to whether everything is working as intended
there - might it hint at rare anon ptes which rmap cannot find? A
question not easily answered: put in the fix for missed memcg charges.
Cc; Matthew Wilcox <willy@infradead.org>
Fixes: 4c6355b25e8b ("mm: memcontrol: charge swapin pages on instantiation")
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Alex Shi <alex.shi@linux.alibaba.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Qian Cai <cai@lca.pw>
Cc: <stable@vger.kernel.org> [5.8]
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2008301343270.5954@eggly.anvils
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2008301358020.5954@eggly.anvils
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|