Age | Commit message (Collapse) | Author | Files | Lines |
|
We need to provide all six forms of the alternative macros
(ALTERNATIVE, ALTERNATIVE_2, _ALTERNATIVE_CFG, _ALTERNATIVE_CFG_2,
__ALTERNATIVE_CFG, __ALTERNATIVE_CFG_2) for all four cases derived
from the two ifdefs (RISCV_ALTERNATIVE, __ASSEMBLY__) in order to
ensure all configs can compile. Define this missing ones and ensure
all are defined to consume all parameters passed.
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202504130710.3IKz6Ibs-lkp@intel.com/
Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Tested-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20250414120947.135173-2-ajones@ventanamicro.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
|
|
apply_r_riscv_plt32_rela() may need to emit a PLT entry for the
referenced symbol, so there must be space allocated in the PLT.
Fixes: 8fd6c5142395 ("riscv: Add remaining module relocations")
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20250409171526.862481-2-samuel.holland@sifive.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
|
|
The current code allows rel[j] to access one element past the end of the
relocation section. Simplify to num_relocations which is equivalent to
the existing size expression.
Fixes: 080c4324fa5e ("riscv: optimize ELF relocation function in riscv")
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
Reviewed-by: Maxim Kochetkov <fido_max@inbox.ru>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20250409171526.862481-1-samuel.holland@sifive.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
|
|
The /proc/iomem represents the kernel's memory map. Regions marked
with "Reserved" tells the user that the range should not be tampered
with. Kexec-tools, when using the older kexec_load syscall relies on
the "Reserved" regions to build the memory segments, that will be the
target of the new kexec'd kernel.
The RISC-V port tries to expose all reserved regions to userland, but
some regions were not properly exposed: Regions that resided in both
the "regular" and reserved memory block, e.g. the EFI Memory Map. A
missing entry could result in reserved memory being overwritten.
It turns out, that arm64, and loongarch had a similar issue a while
back:
commit d91680e687f4 ("arm64: Fix /proc/iomem for reserved but not memory regions")
commit 50d7ba36b916 ("arm64: export memblock_reserve()d regions via /proc/iomem")
Similar to the other ports, resolve the issue by splitting the regions
in an arch initcall, since we need a working allocator.
Fixes: ffe0e5261268 ("RISC-V: Improve init_resources()")
Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20250409182129.634415-1-bjorn@kernel.org
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
|
|
Ensure we only print messages about command line parameters when
the parameters are actually in use. Also complain about the use
of the vector parameter when vector support isn't available.
Fixes: aecb09e091dc ("riscv: Add parameter for skipping access speed tests")
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Closes: https://lore.kernel.org/all/CAMuHMdVEp2_ho51gkpLLJG2HimqZ1gZ0fa=JA4uNNZjFFqaPMg@mail.gmail.com/
Closes: https://lore.kernel.org/all/CAMuHMdWVMP0MYCLFq+b7H_uz-2omdFiDDUZq0t_gw0L9rrJtkQ@mail.gmail.com/
Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Tested-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20250409153650.84433-2-ajones@ventanamicro.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
|
|
Align the backspaces vertically again, after recent cleanups.
No functional changes.
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Ahmed S. Darwish <darwi@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20250414094130.6768-1-bp@kernel.org
|
|
The dpi-default-pins overwrittes the same called node, defined in
mt8186-corsola.dtsi with the wrong set of pins, so remove
it from mt8186-corsola-starmie.dtsi as the first one is correct and
sufficient.
In addition, remove dpi-func-pins node from mt8186-corsola-starmie.dtsi,
as it is not used anywhere and also defines the same set of pins as
dpi-default-pins node already present in mt8186-corsola.dtsi.
Verifeid above with Corsola/Starmie device, by connecting external screen
with usb-c -> hdmi adapter.
Signed-off-by: Łukasz Majczak <lmajczak@google.com>
Link: https://lore.kernel.org/r/20250328121300.2612942-1-lmajczak@google.com
Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
|
|
By hardware, the first and second core of the video decoder IP
need the VDEC_SOC to be powered up in order to be able to be
accessed (both internally, by firmware, and externally, by the
kernel).
Similarly, for the video encoder IP, the second core needs the
first core to be powered up in order to be accessible.
Fix that by reparenting the VDEC1/2 power domains to be children
of VDEC0 (VDEC_SOC), and the VENC1 to be a child of VENC0.
Fixes: 2b515194bf0c ("arm64: dts: mt8195: Add power domains controller")
Reviewed-by: Chen-Yu Tsai <wenst@chromium.org>
Link: https://lore.kernel.org/r/20250402090615.25871-3-angelogioacchino.delregno@collabora.com
Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
|
|
Rename pcie pinctrl definition to fix the following dtbs_check error
for mt8370-genio-510-evk and mt8390-genio-700-evk devicetree files:
```
pinctrl@10005000: 'pcie-default' does not match any of the regexes:
'-pins$', 'pinctrl-[0-9]+'
```
Signed-off-by: Louis-Alexis Eyraud <louisalexis.eyraud@collabora.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Link: https://lore.kernel.org/r/20250403-mt8390-genio-common-fix-pcie-dtbs-check-error-v1-1-70d11fc1482e@collabora.com
Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
|
|
Set the scp firmware name to the default location.
Fixes: f2b543a191b6 ("arm64: dts: mediatek: add device-tree for Genio 1200 EVK board")
Signed-off-by: Julien Massot <julien.massot@collabora.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Link: https://lore.kernel.org/r/20250404-mt8395-scp-fw-v1-2-bb8f20cd399d@collabora.com
Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
|
|
Set the scp firmware name to the default location.
Fixes: 96564b1e2ea4 ("arm64: dts: mediatek: Introduce the MT8395 Radxa NIO 12L board")
Signed-off-by: Julien Massot <julien.massot@collabora.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Link: https://lore.kernel.org/r/20250404-mt8395-scp-fw-v1-1-bb8f20cd399d@collabora.com
Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
|
|
Replace the last 2 usages of strncpy() in s390 code with strscpy().
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Add a simple sized_strscpy() implementation to allow the use of strscpy()
in the decompressor.
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Select ARCH_WANT_IRQS_OFF_ACTIVATE_MM so that activate_mm() is called with
irqs disabled. This allows to call switch_mm_irqs_off() instead of
switch_mm() and saves two local_irq_save() / local_irq_restore() pairs.
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Reduce system call overhead time (round trip time for invoking a
non-existent system call) by 25%.
With the removal of set_fs() [1] lazy control register handling was removed
in order to keep kernel entry and exit simple. However this made system
calls slower.
With the conversion to generic entry [2] and numerous follow up changes
which simplified the entry code significantly, adding support for lazy asce
handling doesn't add much complexity to the entry code anymore.
In particular this means:
- On kernel entry the primary asce is not modified and contains the user
asce
- Kernel accesses which require secondary-space mode (for example futex
operations) are surrounded by enable_sacf_uaccess() and
disable_sacf_uaccess() calls. enable_sacf_uaccess() sets the primary asce
to kernel asce so that the sacf instruction can be used to switch to
secondary-space mode. The primary asce is changed back to user asce with
disable_sacf_uaccess().
The state of the control register which contains the primary asce is
reflected with a new TIF_ASCE_PRIMARY bit. This is required on context
switch so that the correct asce is restored for the scheduled in process.
In result address spaces are now setup like this:
CPU running in | %cr1 ASCE | %cr7 ASCE | %cr13 ASCE
-----------------------------|-----------|-----------|-----------
user space | user | user | kernel
kernel (no sacf) | user | user | kernel
kernel (during sacf uaccess) | kernel | user | kernel
kernel (kvm guest execution) | guest | user | kernel
In result cr1 control register content is not changed except for:
- futex system calls
- legacy s390 PCI system calls
- the kvm specific cmpxchg_user_key() uaccess helper
This leads to faster system call execution.
[1] 87d598634521 ("s390/mm: remove set_fs / rework address space handling")
[2] 56e62a737028 ("s390: convert to generic entry")
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Based on the comments in the MT8188 IOMMU binding header, the rdma0
device specifies the wrong IOMMU device for the IOMMU port it is
tied to:
This SoC have two MM IOMMU HWs, this is the connected information:
iommu-vdo: larb0/2/5/9/10/11A/11C/13/16B/17B/19/21
iommu-vpp: larb1/3/4/6/7/11B/12/14/15/16A/17A/23/27
rdma0's endpoint is M4U_PORT_L1_DISP_RDMA0 (on larb1), which should use
iommu-vpp, but it is currently tied to iommu-vdo.
Somehow this went undetected until recently in Linux v6.15-rc1 with some
IOMMU subsystem framework changes that caused the IOMMU to no longer
work. The IOMMU would fail to probe if any devices associated with it
could not be successfully attached. Prior to these changes, only the
end device would be left without an IOMMU attached.
Fixes: 7075b21d1a8e ("arm64: dts: mediatek: mt8188: Add display nodes for vdosys0")
Signed-off-by: Chen-Yu Tsai <wenst@chromium.org>
Reviewed-by: Jason-JH Lin <jason-jh.lin@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Link: https://lore.kernel.org/r/20250408092303.3563231-1-wenst@chromium.org
Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
|
|
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/20250410071406.9669-6-wsa+renesas@sang-engineering.com
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
|
|
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/20250410071406.9669-5-wsa+renesas@sang-engineering.com
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
|
|
Add the initial device tree for the Renesas RZ/V2N EVK board, based on
the R9A09G056N48 SoC. Enable basic board functionality, including:
- Memory mapping (reserve the first 128MB for the secure area)
- Clock inputs (QEXTAL, RTXIN, AUDIO_EXTAL)
- PINCTRL configurations for peripherals
- Serial console (SCIF)
- SDHI1 with power control and UHS modes
Update the Makefile to include the new DTB.
Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/20250407191628.323613-13-prabhakar.mahadev-lad.rj@bp.renesas.com
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
|
|
Add the initial Device Tree Source Include (DTSI) file for the Renesas
RZ/V2N (R9A09G056) SoC. Include support for the following components:
- CPU (Cortex-A55 cores with operating points)
- External clocks (audio, qextal, rtxin)
- Pin controller (GPIO support)
- Clock Pulse Generator (CPG)
- System controller (SYS)
- Serial Communication Interface (SCIF)
- Secure Digital Host Interface (SDHI 0/1/2)
- Generic Interrupt Controller (GIC)
- ARMv8 timer
Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/20250407191628.323613-12-prabhakar.mahadev-lad.rj@bp.renesas.com
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
|
|
The keys are connected to the I2C GPIO extender which has the interrupt
pin not connected. So, we need to poll.
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/20250328153134.2881-12-wsa+renesas@sang-engineering.com
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
|
|
The actual sensor might differ, but all known are LM75B compatible.
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/20250328153134.2881-10-wsa+renesas@sang-engineering.com
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
|
|
Schematics mention a 24cs64 on the bus, but I definitely have only a
24c64. So, it is only mentioned as a comment.
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/20250328153134.2881-9-wsa+renesas@sang-engineering.com
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
|
|
To match the documentation and schematics, they are numbered from 1 and
not from 0.
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/20250328153134.2881-8-wsa+renesas@sang-engineering.com
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
|
|
The EB board (Expansion board) supports both RZ/N1D and RZ-N1S. Since this
configuration targets only the RZ/N1D, it is named r9a06g032-rzn1d400-eb.
It adds support for the 2 additional switch ports (port C and D) that are
available on that board.
Signed-off-by: Clément Léger <clement.leger@bootlin.com>
[Thomas: move the DTS to the Renesas directory, declare the PHY LEDs]
Signed-off-by: Thomas Bonnefille <thomas.bonnefille@bootlin.com>
Reviewed-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Tested-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Tested-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Link: https://lore.kernel.org/20250324-rzn1d400-eb-v4-1-d7ebbbad1918@bootlin.com
[wsa: Correct LAN LED nodes]
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/20250411095425.1842-2-wsa+renesas@sang-engineering.com
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
|
|
smp_text_poke_sync_each_cpu()
Missed this UML wrapper in the rename.
Fixes: 6e4955a9d73e ("x86/alternatives: Rename 'text_poke_sync()' to 'smp_text_poke_sync_each_cpu()'")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/202504141003.kc69fVoj-lkp@intel.com
|
|
Collect AMD specific platform header files in <asm/amd/*.h>.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mario Limonciello <superm1@kernel.org>
Link: https://lore.kernel.org/r/20250413084144.3746608-7-mingo@kernel.org
|
|
- There's no need for a newline after the SPDX line
- But there's a need for one before the closing header guard.
Collect AMD specific platform header files in <asm/amd/*.h>.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Carlos Bilbao <carlos.bilbao@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mario Limonciello <superm1@kernel.org>
Cc: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Link: https://lore.kernel.org/r/20250413084144.3746608-6-mingo@kernel.org
|
|
Collect AMD specific platform header files in <asm/amd/*.h>.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Carlos Bilbao <carlos.bilbao@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mario Limonciello <superm1@kernel.org>
Cc: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Link: https://lore.kernel.org/r/20250413084144.3746608-5-mingo@kernel.org
|
|
Collect AMD specific platform header files in <asm/amd/*.h>.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mario Limonciello <superm1@kernel.org>
Link: https://lore.kernel.org/r/20250413084144.3746608-4-mingo@kernel.org
|
|
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mario Limonciello <superm1@kernel.org>
Link: https://lore.kernel.org/r/20250413084144.3746608-3-mingo@kernel.org
|
|
Collect AMD specific platform header files in <asm/amd/*.h>.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mario Limonciello <superm1@kernel.org>
Link: https://lore.kernel.org/r/20250413084144.3746608-2-mingo@kernel.org
|
|
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Chang S. Bae <chang.seok.bae@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Link: https://lore.kernel.org/r/Z_ejggklB5-IWB5W@gmail.com
|
|
A few uses of 'fps' snuck in, which is rather confusing
(to me) as it suggests frames-per-second. ;-)
Rename them to the canonical 'fpstate' name.
No change in functionality.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Chang S. Bae <chang.seok.bae@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250409211127.3544993-9-mingo@kernel.org
|
|
PF_KTHREAD tasks
init_task's FPU state initialization was a bit of a hack:
__x86_init_fpu_begin = .;
. = __x86_init_fpu_begin + 128*PAGE_SIZE;
__x86_init_fpu_end = .;
But the init task isn't supposed to be using the FPU context
in any case, so remove the hack and add in some debug warnings.
As Linus noted in the discussion, the init task (and other
PF_KTHREAD tasks) *can* use the FPU via kernel_fpu_begin()/_end(),
but they don't need the context area because their FPU use is not
preemptible or reentrant, and they don't return to user-space.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Chang S. Bae <chang.seok.bae@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Uros Bizjak <ubizjak@gmail.com>
Link: https://lore.kernel.org/r/20250409211127.3544993-8-mingo@kernel.org
|
|
PF_KTHREAD|PF_USER_WORKER tasks during exit
fpu__drop() and arch_release_task_struct() calls x86_task_fpu()
unconditionally, while the FPU context area will not be present
if it's the init task, and should not be in use when it's some
other type of kthread.
Return early for PF_KTHREAD or PF_USER_WORKER tasks. The debug
warning in x86_task_fpu() will catch any kthreads attempting to
use the FPU save area.
Fixed-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250409211127.3544993-7-mingo@kernel.org
|
|
This encapsulates the fpu__drop() functionality better, and it
will also enable other changes that want to check a task for
PF_KTHREAD before calling x86_task_fpu().
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Chang S. Bae <chang.seok.bae@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250409211127.3544993-6-mingo@kernel.org
|
|
As suggested by Oleg, remove the thread::fpu pointer, as we can
calculate it via x86_task_fpu() at compile-time.
This improves code generation a bit:
kepler:~/tip> size vmlinux.before vmlinux.after
text data bss dec hex filename
26475405 10435342 1740804 38651551 24dc69f vmlinux.before
26475339 10959630 1216516 38651485 24dc65d vmlinux.after
Suggested-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Chang S. Bae <chang.seok.bae@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Uros Bizjak <ubizjak@gmail.com>
Link: https://lore.kernel.org/r/20250409211127.3544993-5-mingo@kernel.org
|
|
Turn thread.fpu into a pointer. Since most FPU code internals work by passing
around the FPU pointer already, the code generation impact is small.
This allows us to remove the old kludge of task_struct being variable size:
struct task_struct {
...
/*
* New fields for task_struct should be added above here, so that
* they are included in the randomized portion of task_struct.
*/
randomized_struct_fields_end
/* CPU-specific state of this task: */
struct thread_struct thread;
/*
* WARNING: on x86, 'thread_struct' contains a variable-sized
* structure. It *MUST* be at the end of 'task_struct'.
*
* Do not put anything below here!
*/
};
... which creates a number of problems, such as requiring thread_struct to be
the last member of the struct - not allowing it to be struct-randomized, etc.
But the primary motivation is to allow the decoupling of task_struct from
hardware details (<asm/processor.h> in particular), and to eventually allow
the per-task infrastructure:
DECLARE_PER_TASK(type, name);
...
per_task(current, name) = val;
... which requires task_struct to be a constant size struct.
The fpu_thread_struct_whitelist() quirk to hardened usercopy can be removed,
now that the FPU structure is not embedded in the task struct anymore, which
reduces text footprint a bit.
Fixed-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Chang S. Bae <chang.seok.bae@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20250409211127.3544993-4-mingo@kernel.org
|
|
This will make the removal of the task_struct::thread.fpu array
easier.
No change in functionality - code generated before and after this
commit is identical on x86-defconfig:
kepler:~/tip> diff -up vmlinux.before.asm vmlinux.after.asm
kepler:~/tip>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Chang S. Bae <chang.seok.bae@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Link: https://lore.kernel.org/r/20250409211127.3544993-3-mingo@kernel.org
|
|
The per-task FPU context/save area is allocated right
next to task_struct, currently in a variable-size
array via task_struct::thread.fpu[], but we plan to
fully hide it from the C type scope.
Introduce the x86_task_fpu() accessor that gets to the
FPU context pointer explicitly from the task pointer.
Right now this is a simple (task)->thread.fpu wrapper.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Chang S. Bae <chang.seok.bae@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250409211127.3544993-2-mingo@kernel.org
|
|
== Background ==
As feature positions in the userspace XSAVE buffer do not always align
with their feature numbers, the XSAVE format conversion needs to be
reconsidered to align with the revised xstate size calculation logic.
* For signal handling, XSAVE and XRSTOR are used directly to save and
restore extended registers.
* For ptrace, KVM, and signal returns (for 32-bit frame), the kernel
copies data between its internal buffer and the userspace XSAVE buffer.
If memcpy() were used for these cases, existing offset helpers — such
as __raw_xsave_addr() or xstate_offsets[] — would be sufficient to
handle the format conversion.
== Problem ==
When copying data from the compacted in-kernel buffer to the
non-compacted userspace buffer, the function follows the
user_regset_get2_fn() prototype. This means it utilizes struct membuf
helpers for the destination buffer. As defined in regset.h, these helpers
update the memory pointer during the copy process, enforcing sequential
writes within the loop.
Since xstate components are processed sequentially, any component whose
buffer position does not align with its feature number has an issue.
== Solution ==
Replace for_each_extended_xfeature() with the newly introduced
for_each_extended_xfeature_in_order(). This macro ensures xstate
components are handled in the correct order based on their actual
positions in the destination buffer, rather than their feature numbers.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Link: https://lore.kernel.org/r/20250320234301.8342-5-chang.seok.bae@intel.com
|
|
The current xstate size calculation assumes that the highest-numbered
xstate feature has the highest offset in the buffer, determining the size
based on the topmost bit in the feature mask. However, this assumption is
not architecturally guaranteed -- higher-numbered features may have lower
offsets.
With the introduction of the xfeature order table and its helper macro,
xstate components can now be traversed in their positional order. Update
the non-compacted format handling to iterate through the table to
determine the last-positioned feature. Then, set the offset accordingly.
Since size calculation primarily occurs during initialization or in
non-critical paths, looping to find the last feature is not expected to
have a meaningful performance impact.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Link: https://lore.kernel.org/r/20250320234301.8342-4-chang.seok.bae@intel.com
|
|
The kernel has largely assumed that higher xstate component numbers
correspond to later offsets in the buffer. However, this assumption no
longer holds for the non-compacted format, where a newer state component
may have a lower offset.
When iterating over xstate components in offset order, using the feature
number as an index may be misleading. At the same time, the CPU exposes
each component’s size and offset based on its feature number, making it a
key for state information.
To provide flexibility in handling xstate ordering, introduce a mapping
table: feature order -> feature number. The table is dynamically
populated based on the CPU-exposed features and is sorted in offset order
at boot time.
Additionally, add an accessor macro to facilitate sequential traversal of
xstate components based on their actual buffer positions, given a feature
bitmask. This accessor macro will be particularly useful for computing
custom non-compacted format sizes and iterating over xstate offsets in
non-compacted buffers.
Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Link: https://lore.kernel.org/r/20250320234301.8342-3-chang.seok.bae@intel.com
|
|
Traditionally, new xstate components have been assigned sequentially,
aligning feature numbers with their offsets in the XSAVE buffer. However,
this ordering is not architecturally mandated in the non-compacted
format, where a component's offset may not correspond to its feature
number.
The kernel caches CPUID-reported xstate component details, including size
and offset in the non-compacted format. As part of this process, a sanity
check is also conducted to ensure alignment between feature numbers and
offsets.
This check was likely intended as a general guideline rather than a
strict requirement. Upcoming changes will support out-of-order offsets.
Remove the check as becoming obsolete.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Link: https://lore.kernel.org/r/20250320234301.8342-2-chang.seok.bae@intel.com
|
|
Use asm_inline() to instruct the compiler that the size of asm()
is the minimum size of one instruction, ignoring how many instructions
the compiler thinks it is. ALTERNATIVE macro that expands to several
pseudo directives causes instruction length estimate to count
more than 20 instructions.
bloat-o-meter reports minimal code size increase
(x86_64 defconfig with CONFIG_ADDRESS_MASKING, gcc-14.2.1):
add/remove: 2/2 grow/shrink: 5/1 up/down: 2365/-1995 (370)
Function old new delta
-----------------------------------------------------
do_get_mempolicy - 1449 +1449
copy_nodes_to_user - 226 +226
__x64_sys_get_mempolicy 35 213 +178
syscall_user_dispatch_set_config 157 332 +175
__ia32_sys_get_mempolicy 31 206 +175
set_syscall_user_dispatch 29 181 +152
__do_sys_mremap 2073 2083 +10
sp_insert 133 117 -16
task_set_syscall_user_dispatch 172 - -172
kernel_get_mempolicy 1807 - -1807
Total: Before=21423151, After=21423521, chg +0.00%
The code size increase is due to the compiler inlining
more functions that inline untagged_addr(), e.g:
task_set_syscall_user_dispatch() is now fully inlined in
set_syscall_user_dispatch():
000000000010b7e0 <set_syscall_user_dispatch>:
10b7e0: f3 0f 1e fa endbr64
10b7e4: 49 89 c8 mov %rcx,%r8
10b7e7: 48 89 d1 mov %rdx,%rcx
10b7ea: 48 89 f2 mov %rsi,%rdx
10b7ed: 48 89 fe mov %rdi,%rsi
10b7f0: 65 48 8b 3d 00 00 00 mov %gs:0x0(%rip),%rdi
10b7f7: 00
10b7f8: e9 03 fe ff ff jmp 10b600 <task_set_syscall_user_dispatch>
that after inlining becomes:
000000000010b730 <set_syscall_user_dispatch>:
10b730: f3 0f 1e fa endbr64
10b734: 65 48 8b 05 00 00 00 mov %gs:0x0(%rip),%rax
10b73b: 00
10b73c: 48 85 ff test %rdi,%rdi
10b73f: 74 54 je 10b795 <set_syscall_user_dispatch+0x65>
10b741: 48 83 ff 01 cmp $0x1,%rdi
10b745: 74 06 je 10b74d <set_syscall_user_dispatch+0x1d>
10b747: b8 ea ff ff ff mov $0xffffffea,%eax
10b74c: c3 ret
10b74d: 48 85 f6 test %rsi,%rsi
10b750: 75 7b jne 10b7cd <set_syscall_user_dispatch+0x9d>
10b752: 48 85 c9 test %rcx,%rcx
10b755: 74 1a je 10b771 <set_syscall_user_dispatch+0x41>
10b757: 48 89 cf mov %rcx,%rdi
10b75a: 49 b8 ef cd ab 89 67 movabs $0x123456789abcdef,%r8
10b761: 45 23 01
10b764: 90 nop
10b765: 90 nop
10b766: 90 nop
10b767: 90 nop
10b768: 90 nop
10b769: 90 nop
10b76a: 90 nop
10b76b: 90 nop
10b76c: 49 39 f8 cmp %rdi,%r8
10b76f: 72 6e jb 10b7df <set_syscall_user_dispatch+0xaf>
10b771: 48 89 88 48 08 00 00 mov %rcx,0x848(%rax)
10b778: 48 89 b0 50 08 00 00 mov %rsi,0x850(%rax)
10b77f: 48 89 90 58 08 00 00 mov %rdx,0x858(%rax)
10b786: c6 80 60 08 00 00 00 movb $0x0,0x860(%rax)
10b78d: f0 80 48 08 20 lock orb $0x20,0x8(%rax)
10b792: 31 c0 xor %eax,%eax
10b794: c3 ret
10b795: 48 09 d1 or %rdx,%rcx
10b798: 48 09 f1 or %rsi,%rcx
10b79b: 75 aa jne 10b747 <set_syscall_user_dispatch+0x17>
10b79d: 48 c7 80 48 08 00 00 movq $0x0,0x848(%rax)
10b7a4: 00 00 00 00
10b7a8: 48 c7 80 50 08 00 00 movq $0x0,0x850(%rax)
10b7af: 00 00 00 00
10b7b3: 48 c7 80 58 08 00 00 movq $0x0,0x858(%rax)
10b7ba: 00 00 00 00
10b7be: c6 80 60 08 00 00 00 movb $0x0,0x860(%rax)
10b7c5: f0 80 60 08 df lock andb $0xdf,0x8(%rax)
10b7ca: 31 c0 xor %eax,%eax
10b7cc: c3 ret
10b7cd: 48 8d 3c 16 lea (%rsi,%rdx,1),%rdi
10b7d1: 48 39 fe cmp %rdi,%rsi
10b7d4: 0f 82 78 ff ff ff jb 10b752 <set_syscall_user_dispatch+0x22>
10b7da: e9 68 ff ff ff jmp 10b747 <set_syscall_user_dispatch+0x17>
10b7df: b8 f2 ff ff ff mov $0xfffffff2,%eax
10b7e4: c3 ret
Please note a series of NOPs that get replaced with an alternative:
11f0: 65 48 23 05 00 00 00 and %gs:0x0(%rip),%rax
11f7: 00
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250407072129.33440-1-ubizjak@gmail.com
|
|
Use struct_size() to calculate the number of bytes to allocate for a new
bts_buffer. Compared to offsetof(), struct_size() provides additional
compile-time checks (e.g., __must_be_array()).
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250413104108.49142-2-thorsten.blum@linux.dev
|
|
To reduce the impact of the API renames in -next, add compatibility
wrappers for the two most popular MSR access APIs: rdmsrl() and wrmsrl().
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Juergen Gross <jgross@suse.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Xin Li <xin@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org
|
|
Add information about CPU caches in Apple A11 SoC.
Signed-off-by: Nick Chan <towinchenmi@gmail.com>
Link: https://lore.kernel.org/r/20250220-caches-v1-9-2c7011097768@gmail.com
Signed-off-by: Sven Peter <sven@svenpeter.dev>
|
|
Add information about CPU caches in the P-cluster of Apple T2 SoC. Due to
"Apple Fusion Architecture" big.LITTLE switcher, only caches from one of
the clusters can be used at any given moment.
Signed-off-by: Nick Chan <towinchenmi@gmail.com>
Link: https://lore.kernel.org/r/20250220-caches-v1-8-2c7011097768@gmail.com
Signed-off-by: Sven Peter <sven@svenpeter.dev>
|