<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/arch/arm64/lib, branch v6.12.80</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.12.80</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.12.80'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2026-03-04T12:22:00+00:00</updated>
<entry>
<title>arm64: Fix sampling the "stable" virtual counter in preemptible section</title>
<updated>2026-03-04T12:22:00+00:00</updated>
<author>
<name>Marc Zyngier</name>
<email>maz@kernel.org</email>
</author>
<published>2026-02-26T08:22:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=9f3ffd49da12d8369efeae50234530b27cc0bb1d'/>
<id>urn:sha1:9f3ffd49da12d8369efeae50234530b27cc0bb1d</id>
<content type='text'>
[ Upstream commit e5cb94ba5f96d691d8885175d4696d6ae6bc5ec9 ]

Ben reports that when running with CONFIG_DEBUG_PREEMPT, using
__arch_counter_get_cntvct_stable() results in well deserves warnings,
as we access a per-CPU variable without preemption disabled.

Fix the issue by disabling preemption on reading the counter. We can
probably do a lot better by not disabling preemption on systems that
do not require horrible workarounds to return a valid counter value,
but this plugs the issue for the time being.

Fixes: 29cc0f3aa7c6 ("arm64: Force the use of CNTVCT_EL0 in __delay()")
Reported-by: Ben Horgan &lt;ben.horgan@arm.com&gt;
Signed-off-by: Marc Zyngier &lt;maz@kernel.org&gt;
Link: https://lore.kernel.org/r/aZw3EGs4rbQvbAzV@e134344.arm.com
Tested-by: Ben Horgan &lt;ben.horgan@arm.com&gt;
Tested-by: André Draszik &lt;andre.draszik@linaro.org&gt;
Signed-off-by: Will Deacon &lt;will@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>arm64: Force the use of CNTVCT_EL0 in __delay()</title>
<updated>2026-03-04T12:21:58+00:00</updated>
<author>
<name>Marc Zyngier</name>
<email>maz@kernel.org</email>
</author>
<published>2026-02-13T14:16:19+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=5fc0933641a8732348439b4b7de0b4f011be35bf'/>
<id>urn:sha1:5fc0933641a8732348439b4b7de0b4f011be35bf</id>
<content type='text'>
[ Upstream commit 29cc0f3aa7c64d3b3cb9d94c0a0984ba6717bf72 ]

Quentin forwards a report from Hyesoo Yu, describing an interesting
problem with the use of WFxT in __delay() when a vcpu is loaded and
that KVM is *not* in VHE mode (either nVHE or hVHE).

In this case, CNTVOFF_EL2 is set to a non-zero value to reflect the
state of the guest virtual counter. At the same time, __delay() is
using get_cycles() to read the counter value, which is indirected to
reading CNTPCT_EL0.

The core of the issue is that WFxT is using the *virtual* counter,
while the kernel is using the physical counter, and that the offset
introduces a really bad discrepancy between the two.

Fix this by forcing the use of CNTVCT_EL0, making __delay() consistent
irrespective of the value of CNTVOFF_EL2.

Reported-by: Hyesoo Yu &lt;hyesoo.yu@samsung.com&gt;
Reported-by: Quentin Perret &lt;qperret@google.com&gt;
Reviewed-by: Quentin Perret &lt;qperret@google.com&gt;
Fixes: 7d26b0516a0d ("arm64: Use WFxT for __delay() when possible")
Signed-off-by: Marc Zyngier &lt;maz@kernel.org&gt;
Link: https://lore.kernel.org/r/ktosachvft2cgqd5qkukn275ugmhy6xrhxur4zqpdxlfr3qh5h@o3zrfnsq63od
Cc: stable@vger.kernel.org
Signed-off-by: Will Deacon &lt;will@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>arm64: insn: Add support for encoding DSB</title>
<updated>2025-05-18T06:24:58+00:00</updated>
<author>
<name>James Morse</name>
<email>james.morse@arm.com</email>
</author>
<published>2021-12-09T15:12:19+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=2a3915e86187ea8a48e7f899ec3fb000f8159fb6'/>
<id>urn:sha1:2a3915e86187ea8a48e7f899ec3fb000f8159fb6</id>
<content type='text'>
commit 63de8abd97ddb9b758bd8f915ecbd18e1f1a87a0 upstream.

To generate code in the eBPF epilogue that uses the DSB instruction,
insn.c needs a heler to encode the type and domain.

Re-use the crm encoding logic from the DMB instruction.

Signed-off-by: James Morse &lt;james.morse@arm.com&gt;
Reviewed-by: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>arm64: crypto: use CC_FLAGS_FPU for NEON CFLAGS</title>
<updated>2024-05-19T21:36:18+00:00</updated>
<author>
<name>Samuel Holland</name>
<email>samuel.holland@sifive.com</email>
</author>
<published>2024-03-29T07:18:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=7177089525d97bd8d7fefd6a9406f45d818410e0'/>
<id>urn:sha1:7177089525d97bd8d7fefd6a9406f45d818410e0</id>
<content type='text'>
Now that CC_FLAGS_FPU is exported and can be used anywhere in the source
tree, use it instead of duplicating the flags here.

Link: https://lkml.kernel.org/r/20240329072441.591471-6-samuel.holland@sifive.com
Signed-off-by: Samuel Holland &lt;samuel.holland@sifive.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Acked-by: Christian König &lt;christian.koenig@amd.com&gt; 
Cc: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Cc: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
Cc: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Cc: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Cc: Huacai Chen &lt;chenhuacai@kernel.org&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Jonathan Corbet &lt;corbet@lwn.net&gt;
Cc: Masahiro Yamada &lt;masahiroy@kernel.org&gt;
Cc: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Cc: Nathan Chancellor &lt;nathan@kernel.org&gt;
Cc: Nicolas Schier &lt;nicolas@fjasle.eu&gt;
Cc: Palmer Dabbelt &lt;palmer@rivosinc.com&gt;
Cc: Russell King &lt;linux@armlinux.org.uk&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: WANG Xuerui &lt;git@xen0n.name&gt;
Cc: Will Deacon &lt;will@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>arm64, bpf: add internal-only MOV instruction to resolve per-CPU addrs</title>
<updated>2024-05-12T23:54:34+00:00</updated>
<author>
<name>Puranjay Mohan</name>
<email>puranjay12@gmail.com</email>
</author>
<published>2024-05-02T15:18:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=7a4c32222b0e14349a6311e72bf6ebd3e1d1064b'/>
<id>urn:sha1:7a4c32222b0e14349a6311e72bf6ebd3e1d1064b</id>
<content type='text'>
Support an instruction for resolving absolute addresses of per-CPU
data from their per-CPU offsets. This instruction is internal-only and
users are not allowed to use them directly. They will only be used for
internal inlining optimizations for now between BPF verifier and BPF
JITs.

Since commit 7158627686f0 ("arm64: percpu: implement optimised pcpu
access using tpidr_el1"), the per-cpu offset for the CPU is stored in
the tpidr_el1/2 register of that CPU.

To support this BPF instruction in the ARM64 JIT, the following ARM64
instructions are emitted:

mov dst, src		// Move src to dst, if src != dst
mrs tmp, tpidr_el1/2	// Move per-cpu offset of the current cpu in tmp.
add dst, dst, tmp	// Add the per cpu offset to the dst.

To measure the performance improvement provided by this change, the
benchmark in [1] was used:

Before:
glob-arr-inc   :   23.597 ± 0.012M/s
arr-inc        :   23.173 ± 0.019M/s
hash-inc       :   12.186 ± 0.028M/s

After:
glob-arr-inc   :   23.819 ± 0.034M/s
arr-inc        :   23.285 ± 0.017M/s
hash-inc       :   12.419 ± 0.011M/s

[1] https://github.com/anakryiko/linux/commit/8dec900975ef

Signed-off-by: Puranjay Mohan &lt;puranjay12@gmail.com&gt;
Acked-by: Andrii Nakryiko &lt;andrii@kernel.org&gt;
Link: https://lore.kernel.org/r/20240502151854.9810-4-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
</entry>
<entry>
<title>arm64: Get rid of ARM64_HAS_NO_HW_PREFETCH</title>
<updated>2023-12-05T12:02:52+00:00</updated>
<author>
<name>Marc Zyngier</name>
<email>maz@kernel.org</email>
</author>
<published>2023-11-22T13:37:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=103423ad7e56d6c756738823c332c414b07899e6'/>
<id>urn:sha1:103423ad7e56d6c756738823c332c414b07899e6</id>
<content type='text'>
Back in 2016, it was argued that implementations lacking a HW
prefetcher could be helped by sprinkling a number of PRFM
instructions in strategic locations.

In 2023, the one platform that presumably needed this hack is no
longer in active use (let alone maintained), and an quick
experiment shows dropping this hack only leads to a 0.4% drop
on a full kernel compilation (tested on a MT30-GS0 48 CPU system).

Given that this is pretty much in the noise department and that
it may give odd ideas to other implementers, drop the hack for
good.

Suggested-by: Will Deacon &lt;will@kernel.org&gt;
Suggested-by: Mark Rutland &lt;mark.rutland@arm.com&gt;
Signed-off-by: Marc Zyngier &lt;maz@kernel.org&gt;
Acked-by: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Link: https://lore.kernel.org/r/20231122133754.1240687-1-maz@kernel.org
Signed-off-by: Will Deacon &lt;will@kernel.org&gt;
</content>
</entry>
<entry>
<title>arm64: Avoid cpus_have_const_cap() for ARM64_HAS_WFXT</title>
<updated>2023-10-16T13:17:05+00:00</updated>
<author>
<name>Mark Rutland</name>
<email>mark.rutland@arm.com</email>
</author>
<published>2023-10-16T10:24:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=4c73056e3277f803ccdb7769e55fe7153edb07cc'/>
<id>urn:sha1:4c73056e3277f803ccdb7769e55fe7153edb07cc</id>
<content type='text'>
In __delay() we use cpus_have_const_cap() to check for ARM64_HAS_WFXT,
but this is not necessary and alternative_has_cap() would be preferable.

For historical reasons, cpus_have_const_cap() is more complicated than
it needs to be. Before cpucaps are finalized, it will perform a bitmap
test of the system_cpucaps bitmap, and once cpucaps are finalized it
will use an alternative branch. This used to be necessary to handle some
race conditions in the window between cpucap detection and the
subsequent patching of alternatives and static branches, where different
branches could be out-of-sync with one another (or w.r.t. alternative
sequences). Now that we use alternative branches instead of static
branches, these are all patched atomically w.r.t. one another, and there
are only a handful of cases that need special care in the window between
cpucap detection and alternative patching.

Due to the above, it would be nice to remove cpus_have_const_cap(), and
migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(),
or cpus_have_cap() depending on when their requirements. This will
remove redundant instructions and improve code generation, and will make
it easier to determine how each callsite will behave before, during, and
after alternative patching.

The cpus_have_const_cap() check in __delay() is an optimization to use
WFIT and WFET in preference to busy-polling the counter and/or using
regular WFE and relying upon the architected timer event stream. It is
not necessary to apply this optimization in the window between detecting
the ARM64_HAS_WFXT cpucap and patching alternatives.

This patch replaces the use of cpus_have_const_cap() with
alternative_has_cap_unlikely(), which will avoid generating code to test
the system_cpucaps bitmap and should be better for all subsequent calls
at runtime.

Signed-off-by: Mark Rutland &lt;mark.rutland@arm.com&gt;
Cc: Marc Zyngier &lt;maz@kernel.org&gt;
Cc: Suzuki K Poulose &lt;suzuki.poulose@arm.com&gt;
Cc: Will Deacon &lt;will@kernel.org&gt;
Signed-off-by: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
</content>
</entry>
<entry>
<title>Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux</title>
<updated>2023-09-08T19:48:37+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2023-09-08T19:48:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=ca9c7abf9502e108fae0e34181e01b1a20bc439f'/>
<id>urn:sha1:ca9c7abf9502e108fae0e34181e01b1a20bc439f</id>
<content type='text'>
Pull arm64 fixes from Will Deacon:
 "The main one is a fix for a broken strscpy() conversion that landed in
  the merge window and broke early parsing of the kernel command line.

   - Fix an incorrect mask in the CXL PMU driver

   - Fix a regression in early parsing of the kernel command line

   - Fix an IP checksum OoB access reported by syzbot"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: csum: Fix OoB access in IP checksum code for negative lengths
  arm64/sysreg: Fix broken strncpy() -&gt; strscpy() conversion
  perf: CXL: fix mismatched number of counters mask
</content>
</entry>
<entry>
<title>arm64: csum: Fix OoB access in IP checksum code for negative lengths</title>
<updated>2023-09-07T09:15:20+00:00</updated>
<author>
<name>Will Deacon</name>
<email>will@kernel.org</email>
</author>
<published>2023-09-07T08:54:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=8bd795fedb8450ecbef18eeadbd23ed8fc7630f5'/>
<id>urn:sha1:8bd795fedb8450ecbef18eeadbd23ed8fc7630f5</id>
<content type='text'>
Although commit c2c24edb1d9c ("arm64: csum: Fix pathological zero-length
calls") added an early return for zero-length input, syzkaller has
popped up with an example of a _negative_ length which causes an
undefined shift and an out-of-bounds read:

 | BUG: KASAN: slab-out-of-bounds in do_csum+0x44/0x254 arch/arm64/lib/csum.c:39
 | Read of size 4294966928 at addr ffff0000d7ac0170 by task syz-executor412/5975
 |
 | CPU: 0 PID: 5975 Comm: syz-executor412 Not tainted 6.4.0-rc4-syzkaller-g908f31f2a05b #0
 | Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/25/2023
 | Call trace:
 |  dump_backtrace+0x1b8/0x1e4 arch/arm64/kernel/stacktrace.c:233
 |  show_stack+0x2c/0x44 arch/arm64/kernel/stacktrace.c:240
 |  __dump_stack lib/dump_stack.c:88 [inline]
 |  dump_stack_lvl+0xd0/0x124 lib/dump_stack.c:106
 |  print_address_description mm/kasan/report.c:351 [inline]
 |  print_report+0x174/0x514 mm/kasan/report.c:462
 |  kasan_report+0xd4/0x130 mm/kasan/report.c:572
 |  kasan_check_range+0x264/0x2a4 mm/kasan/generic.c:187
 |  __kasan_check_read+0x20/0x30 mm/kasan/shadow.c:31
 |  do_csum+0x44/0x254 arch/arm64/lib/csum.c:39
 |  csum_partial+0x30/0x58 lib/checksum.c:128
 |  gso_make_checksum include/linux/skbuff.h:4928 [inline]
 |  __udp_gso_segment+0xaf4/0x1bc4 net/ipv4/udp_offload.c:332
 |  udp6_ufo_fragment+0x540/0xca0 net/ipv6/udp_offload.c:47
 |  ipv6_gso_segment+0x5cc/0x1760 net/ipv6/ip6_offload.c:119
 |  skb_mac_gso_segment+0x2b4/0x5b0 net/core/gro.c:141
 |  __skb_gso_segment+0x250/0x3d0 net/core/dev.c:3401
 |  skb_gso_segment include/linux/netdevice.h:4859 [inline]
 |  validate_xmit_skb+0x364/0xdbc net/core/dev.c:3659
 |  validate_xmit_skb_list+0x94/0x130 net/core/dev.c:3709
 |  sch_direct_xmit+0xe8/0x548 net/sched/sch_generic.c:327
 |  __dev_xmit_skb net/core/dev.c:3805 [inline]
 |  __dev_queue_xmit+0x147c/0x3318 net/core/dev.c:4210
 |  dev_queue_xmit include/linux/netdevice.h:3085 [inline]
 |  packet_xmit+0x6c/0x318 net/packet/af_packet.c:276
 |  packet_snd net/packet/af_packet.c:3081 [inline]
 |  packet_sendmsg+0x376c/0x4c98 net/packet/af_packet.c:3113
 |  sock_sendmsg_nosec net/socket.c:724 [inline]
 |  sock_sendmsg net/socket.c:747 [inline]
 |  __sys_sendto+0x3b4/0x538 net/socket.c:2144

Extend the early return to reject negative lengths as well, aligning our
implementation with the generic code in lib/checksum.c

Cc: Robin Murphy &lt;robin.murphy@arm.com&gt;
Fixes: 5777eaed566a ("arm64: Implement optimised checksum routine")
Reported-by: syzbot+4a9f9820bd8d302e22f7@syzkaller.appspotmail.com
Link: https://lore.kernel.org/r/000000000000e0e94c0603f8d213@google.com
Signed-off-by: Will Deacon &lt;will@kernel.org&gt;
</content>
</entry>
<entry>
<title>arm64: insn: Add encoders for LDRSB/LDRSH/LDRSW</title>
<updated>2023-08-18T13:45:34+00:00</updated>
<author>
<name>Xu Kuohai</name>
<email>xukuohai@huawei.com</email>
</author>
<published>2023-08-15T15:41:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=6c9f86d3632c1abcbdaabeab1ee6d6de059b4ae7'/>
<id>urn:sha1:6c9f86d3632c1abcbdaabeab1ee6d6de059b4ae7</id>
<content type='text'>
To support BPF sign-extend load instructions, add encoders for
LDRSB/LDRSH/LDRSW.

LDRSB/LDRSH/LDRSW (immediate) is encoded as follows:

     3     2 2   2   2                       1         0         0
     0     7 6   4   2                       0         5         0
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  | sz|1 1 1|0|0 1|opc|        imm12          |    Rn   |    Rt   |
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

LDRSB/LDRSH/LDRSW (register) is encoded as follows:

     3     2 2   2   2 2         1     1 1   1         0         0
     0     7 6   4   2 1         6     3 2   0         5         0
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  | sz|1 1 1|0|0 0|opc|1|    Rm   | opt |S|1 0|    Rn   |    Rt   |
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

where:

   - sz
     indicates whether 8-bit, 16-bit or 32-bit data is to be loaded

   - opc
     opc[1] (bit 23) is always 1 and opc[0] == 1 indicates regsize
     is 32-bit. Since BPF signed load instructions always exend the
     sign bit to bit 63 regardless of whether it loads an 8-bit,
     16-bit or 32-bit data. So only 64-bit register size is required.
     That is, it's sufficient to set field opc fixed to 0x2.

   - opt
     Indicates whether to sign extend the offset register Rm and the
     effective bits of Rm. We set opt to 0x7 (SXTX) since we'll use
     Rm as a sgined 64-bit value in BPF.

   - S
     Optional only when opt field is 0x3 (LSL)

In short, the above fields are encoded to the values listed below.

                   sz   opc  opt   S
LDRSB (immediate)  0x0  0x2  na    na
LDRSH (immediate)  0x1  0x2  na    na
LDRSW (immediate)  0x2  0x2  na    na
LDRSB (register)   0x0  0x2  0x7   0
LDRSH (register)   0x1  0x2  0x7   0
LDRSW (register)   0x2  0x2  0x7   0

Signed-off-by: Xu Kuohai &lt;xukuohai@huawei.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Tested-by: Florent Revest &lt;revest@chromium.org&gt;
Acked-by: Florent Revest &lt;revest@chromium.org&gt;
Link: https://lore.kernel.org/bpf/20230815154158.717901-2-xukuohai@huaweicloud.com
</content>
</entry>
</feed>
