| Age | Commit message (Collapse) | Author | Files | Lines |
|
git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Pull bpf updates from Alexei Starovoitov:
"Major changes:
- Recover from BPF arena page faults using a scratch page and add
ptep_try_set() for lockless empty-slot installs on x86 and arm64.
This allows BPF kfuncs to access arena pointers directly.
The 'arena_direct_access' stable branch was created for this work
and was pulled into sched-ext and bpf-next trees (Tejun Heo, Kumar
Kartikeya Dwivedi)
- Lift old restriction and support 6+ arguments in BPF programs and
kfuncs on x86 and arm64 (Yonghong Song, Puranjay Mohan)
Other features and fixes:
- Add 24-bit BTF vlen and reclaim unused bits in the BTF UAPI to ease
addition of new BTF kinds (Alan Maguire)
- Raise the maximum BPF call chain depth from 8 to 16 frames (Alexei
Starovoitov)
- Refactor object relationship tracking in the verifier and fix a
dynptr use-after-free bug (Amery Hung)
- Harden the signed program loader and reject exclusive maps as inner
maps (Daniel Borkmann)
- Replace the verifier min/max bounds fields with a circular number
(cnum) representation and improve 32->64 bit range refinements
(Eduard Zingerman)
- Introduce the arena library and runtime (libarena) with a buddy
allocator, rbtree and SPMC queue data structures, ASAN support and
a parallel test harness. Allow subprograms to return arena pointers
and switch to a BTF type-tag based __arena annotation (Emil
Tsalapatis)
- Cache build IDs in the sleepable stackmap path and avoid faultable
build ID reads under mm locks (Ihor Solodrai)
- Introduce the tracing_multi link to attach a single BPF program to
many kernel functions at once. Allow specifying the uprobe_multi
target via FD (Jiri Olsa)
- Extend the bpf_list family of kfuncs with bpf_list_add/del(), and
bpf_list_is_first/is_last/empty() (Kaitao Cheng)
- Extend the BPF syscall with common attributes support for
prog_load, btf_load and map_create (Leon Hwang)
- Wrap rhashtable as BPF map (Mykyta Yatsenko, Herbert Xu)
- Add sleepable support for tracepoint programs and fix deadlocks in
LRU map due to NMI reentry (Mykyta Yatsenko)
- Fix OOB access in bpf_flow_keys, fix nullness analysis of inner
arrays, enforce write checks for global subprograms (Nuoqi Gui)
- Report the maximum combined stack depth and print a breakdown of
instructions processed per subprogram (Paul Chaignon)
- Add an XDP load-balancer benchmark and arm64 JIT support for stack
arguments (Puranjay Mohan)
- Add kfuncs to traverse over wakeup_sources (Samuel Wu)
- Allow sleepable BPF programs to use LPM trie maps directly (Vlad
Poenaru)
- Many more fixes and cleanups across the verifier, BTF, sockmap,
devmap, bpffs, security hooks, s390/riscv/loongarch JITs,
rqspinlock, libbpf, bpftool, selftests"
* tag 'bpf-next-7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (336 commits)
selftests/bpf: Work around llvm stack overflow in crypto progs
selftests/bpf: add test for bpf_msg_pop_data() overflow
bpf, sockmap: fix integer overflow in bpf_msg_pop_data() bounds check
sockmap: Fix use-after-free in udp_bpf_recvmsg()
bpf, sockmap: keep sk_msg copy state in sync
bpf, sockmap: Fix wrong rsge offset in bpf_msg_push_data()
bpf, sockmap: reject overflowing copy + len in bpf_msg_push_data()
selftsets/bpf: Retry map update on helper_fill_hashmap()
selftests/bpf: Add test for sleepable lsm_cgroup rejection
selftests/bpf: Add test to verify the fix for bpf_setsockopt() helper
bpf: Fix bpf_get/setsockopt to tos for ipv4-mapped ipv6 socket
selftests/bpf: Avoid static LLVM linking for cross builds
selftests/bpf: Use common CFLAGS for urandom_read
selftests/bpf: Initialize operation name before use
tools/bpf: build: Append extra cflags
libbpf: Initialize CFLAGS before including Makefile.include
bpftool: Append extra host flags
bpftool: Avoid adding EXTRA_CFLAGS to HOST_CFLAGS
bpftool: Pass host flags to bootstrap libbpf
selftests/bpf: correct CONFIG_PPC64 macro name in comment
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
"Core & protocols:
- Work on removing rtnl_lock protection throughout the stack
continues. In this chapter:
- don't use rtnl_lock for IPv6 multicast routing configuration
- don't take rtnl_lock in ethtool for modern drivers
- prepare Qdisc dump callbacks for rtnl_lock removal
- Support dumping just ifindex + name of all interfaces, under RCU.
It's a common operation for Netlink CLI tools (when translating
names to ifindexes) and previously required full rtnl_lock.
- Support dumping qdiscs and page pools for a specific netdev. Even
tho user space wants a dump of all netdevs, most of the time, the
OOO programming model results in repeating the dump for each
netdev. Which, in absence of a cache, leads to a O(n^2) behavior.
- Flush nexthops once on multi-nexthop removal (e.g. when device goes
down), another O(n^2) -> O(n) improvement.
- Rehash locally generated traffic to a different nexthop on
retransmit timeout.
- Honor oif when choosing nexthop for locally generated IPv6 traffic.
- Convert TCP Auth Option to crypto library, and drop non-RFC algos.
- Increase subflow limits in MPTCP to 64 and endpoint limit to 256.
- Support MPTCP signaling of IPv6 address + port (ADD_ADDR). We need
to selectively skip reporting of the standard TCP Timestamp option,
because they won't fit into the header space together (12 + 30 >
40).
- Support using bridge neighbor suppression, Duplicate Address
Detection, Gratuitous ARP and unsolicited NA forwarding - in EVPN
deployments, e.g. VXLAN fabrics (IPv4 and IPv6).
- Improve link state reporting for upper netdevs (e.g. macvlan) over
tunnel devices (again, mostly for EVPN deployments).
- Support binding GENEVE tunnels to a local address.
- Speed up UDP tunnel destruction (remove one synchronize_rcu()).
- Support exponential field encoding in multicast (IGMPv3 and MLDv2).
- Support attaching PSP crypto offload to containers (veth, netkit).
- Add a new IPSec Netlink message XFRM_MSG_MIGRATE_STATE that allows
migrating individual IPsec SAs independently of their policies.
The existing XFRM_MSG_MIGRATE is tightly coupled to policy+SA
migration, lacks SPI for unique SA identification, and cannot
express reqid changes or migrate Transport mode selectors.
The new interface identifies the SA via SPI and mark, supports
reqid changes, address family changes, encap removal, and uses an
atomic create+install flow under x->lock to prevent SN/IV reuse
during AEAD SA migration.
- Implement GRO/GSO support for PPPoE.
- Convert sockopt callbacks in a number of protocols to iov_iter.
Cross-tree stuff:
- Remove support for Crypto TFM cloning (unblocked after the TCP Auth
Option rework). This feature regressed performance for all crypto
API users, since it changed crypto transformation objects into
reference-counted objects.
- Add FCrypt-PCBC implementation to rxrpc and remove it from the
global crypto API as obsolete and insecure.
Wireless:
- Major rework of station bandwidth handling, fixing issues with
lower capability than AP.
- Cleanups for EMLSR spec issues (drafts differed).
- More Neighbor Awareness Networking (Wi-Fi Aware) work (multicast,
schedule improvements, multi-station etc.)
- Some Ultra High Reliability (UHR) / IEEE 802.11bn (D1.4) work
(e.g. non-primary channel access, UHR DBE support).
- Fine Timing Measurement ranging (i.e. distance measurement) APIs.
Netfilter:
- Use per-rule hash initval in nf_conncount. This avoids unnecessary
lock contention with short keys (e.g. conntrack zones) in different
namespaces.
- Various safety improvements, both in packet parsing and object
lifetimes. Notably add refcounts to conntrack timeout policy.
Deletions:
- Remove TLS + sockmap integration. TLS wants to pin user pages to
avoid a copy, and sockmap wants to write to the input stream. More
work on this integration is clearly needed, and we can't find any
users (original author admitted that they never deployed it).
- Remove support for TLS offload with TCP Offload Engine (the far
more common opportunistic offload is retained). The locking looks
unfixable (driver sleeps under TCP spin locks) and people from the
vendor that added this are AWOL.
- Remove more ATM code, trying to leave behind only what PPPoATM
needs, AAL5 and br2684 with permanent circuits.
- Remove AppleTalk. Let it join hamradio in our out of tree protocol
graveyard, I mean, repository.
- Disable 32-bit x_tables compatibility (32bit binaries on 64bit
kernel) interface in user namespaces. To be deleted completely,
soon.
- Remove 5/10 MHz support from cfg80211/mac80211.
Drivers:
- Software:
- Support DEVMEM/DMABUF Tx over NETMEM_TX_NO_DMA devices (netkit)
- bonding: add knob to strictly follow 802.3ad for link state
- New drivers:
- Alibaba Elastic Ethernet Adaptor (cloud vNIC).
- NXP NETC switch within i.MX94.
- DPLL:
- Add operational state to pins (implement in zl3073x).
- Add generic DPLL type, for daisy-chaining DPLLs (implement in ice).
- Ethernet high-speed NICs:
- Huawei (hinic3):
- enhance tc flow offload support with queue selection,
tunnels
- nVidia/Mellanox:
- avoid over-copying payload to the skb's linear part (up to
60% win for LRO on slow CPUs like ARM64 V2)
- expose more per-queue stats over the standard API
- support additional, unprivileged PFs in the DPU
configuration
- support Socket Direct (multi-PF) with switchdev offloads
- add a pool / frag allocator for DMA mapped buffers for
control objects, save memory on systems with 64kB page size
- take advantage of the ability to dynamically change RSS
table size, even when table is configured by the user
- increase the max RSS table size for even traffic
distribution
- Ethernet NICs:
- Marvell/Aquantia:
- AQC113 PTP support
- Realtek USB (r8152):
- support 10Gbit Link Speeds and Energy-Efficient Ethernet
(EEE)
- support firmware loaded (for RTL8157/RTL8159)
- support for the RTL8159
- Intel (ixgbe):
- support Energy-Efficient Ethernet (EEE) on E610 devices
- Ethernet switches:
- Airoha:
- support multiple netdevs on a single GDM block / port
- Marvell (mv88e6xxx):
- support SERDES of mv88e6321
- Microchip (ksz8/9):
- rework the driver callbacks to remove one indirection layer
- Motorcomm (yt921x):
- support port rate policing
- support TBF qdisc offload
- support ACL/flower offload
- nVidia/Mellanox:
- expose per-PG rx_discards
- Realtek:
- rtl8365mb: bridge offloading and VLAN support
- Ethernet PHYs:
- Airoha:
- support Airoha AN8801R Gigabit PHYs.
- Micrel:
- implement 3 low-loss cable tunables
- Realtek:
- support MDI swapping for RTL8226-CG
- support MDIO for RTL931x
- Qualcomm:
- at803x: Rx and Tx clock management for IPQ5018 PHY
- Motorcomm:
- support YT8522 100M RMII PHY
- set drive strength in YT8531s RGMII
- TI:
- dp83822: add optional external PHY clock
- Bluetooth:
- hci_sync: add support for HCI_LE_Set_Host_Feature [v2]
- SMP: use AES-CMAC library API
- Intel:
- support Product level reset
- support smart trigger dump
- Mediatek:
- add event filter to filter specific event
- Realtek:
- fix RTL8761B/BU broken LE extended scan
- WiFi:
- Broadcom (b43):
- new support for a 11n device
- MediaTek (mt76):
- support mt7927
- mt792x: broken usb transport detection
- mt7921: regulatory improvements
- Qualcomm (ath9k):
- GPIO interface improvements
- Qualcomm (ath12k):
- WDS support
- replace dynamic memory allocation in WMI Rx path
- thermal throttling/cooling device support
- 6 GHz incumbent interference detection
- channel 177 in 5 GHz
- Realtek (rt89):
- RTL8922AU support
- USB 3 mode switch for performance
- better monitor radiotap support
- RTL8922DE preparations"
* tag 'net-next-7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1778 commits)
ipv4: fib_rule: Move fib4_rules_exit() to ->exit().
net: serialize netif_running() check in enqueue_to_backlog()
net: skmsg: preserve sg.copy across SG transforms
appletalk: move the protocol out of tree
appletalk: stop storing per-interface state in struct net_device
selftests/bpf: test that TLS crypto is rejected on a sockmap socket
selftests/bpf: drop the unused kTLS program from test_sockmap
selftests/bpf: remove sockmap + ktls tests
tls: remove dead sockmap (psock) handling from the SW path
tls: reject the combination of TLS and sockmap
atm: remove orphaned uAPI for deleted drivers, protocols and SVCs
atm: remove unused ATM PHY operations
atm: remove the unused pre_send and send_bh device operations
atm: remove the unused change_qos device operation
atm: remove SVC socket support and the signaling daemon interface
atm: remove the local ATM (NSAP) address registry
atm: remove dead SONET PHY ioctls
atm: remove the unused send_oam / push_oam callbacks
atm: remove AAL3/4 transport support
net: dsa: sja1105: fix lastused timestamp in flower stats
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab
Pull slab updates from Vlastimil Babka:
- Support for "allocation tokens" (currently available in Clang 22+)
for smarter partitioning of kmalloc caches based on the allocated
object type, which can be enabled instead of the "random"
per-caller-address-hash partitioning.
It should be able to deterministically separate types containing a
pointer from those that do not (Marco Elver)
- Improvements and simplification of the kmem_cache_alloc_bulk() and
mempool_alloc_bulk() API. This includes adaptation of callers
(Christoph Hellwig)
- Performance improvements and cleanups related mostly to sheaves
refill (Hao Li, Shengming Hu, Vlastimil Babka)
- Several fixups for the slabinfo tool (Xuewen Wang)
* tag 'slab-for-7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
mm/slab: do not limit zeroing to orig_size when only red zoning is enabled
mm/slub: preserve original size in _kmalloc_nolock_noprof retry path
mm: simplify the mempool_alloc_bulk API
mm/slab: improve kmem_cache_alloc_bulk
mm/slub: detach and reattach partial slabs in batch
mm/slub: introduce helpers for node partial slab state
mm/slub: use empty sheaf helpers for oversized sheaves
tools/mm/slabinfo: remove redundant slab->partial assignment
tools/mm/slabinfo: remove dead assignment in get_obj_and_str()
tools/mm/slabinfo: Fix trace disable logic inversion
MAINTAINERS: add slab-related scripts and tools to SLAB ALLOCATOR
mm/slub: fix typo in sheaves comment
mm, slab: simplify returning slab in __refill_objects_node()
mm, slab: add an optimistic __slab_try_return_freelist()
slab: fix kernel-docs for mm-api
slab: improve KMALLOC_PARTITION_RANDOM randomness
slab: support for compiler-assisted type-based slab cache partitioning
mm/slub: defer freelist construction until after bulk allocation from a new slab
|
|
This adds observability for the io_uring zcrx rx-buf-len configuration.
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Yael Chemla <ychemla@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Acked-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://patch.msgid.link/20260612211709.1456966-3-dtatulea@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/nolibc/linux-nolibc
Pull nolibc updates from Thomas Weißschuh:
- New architectures: OpenRISC and 32-bit parisc
- New library functionality: alloca(), assert(), creat() and
ftruncate()
- Automatic large file support
- Proper 64-bit system call argument passing on x32 and MIPS N32
- Cleanups of the testmatrix
- Various bugfixes and cleanups
* tag 'nolibc-20260614-for-7.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/nolibc/linux-nolibc: (37 commits)
selftests/nolibc: test against -Wwrite-strings
selftests/nolibc: use mutable buffer for execve() argv string
tools/nolibc: cast default values of program_invocation_name
tools/nolibc: add ftruncate()
tools/nolibc: add a helper to split a 64-bit argument into 32-bit halves
selftests/nolibc: enable CONFIG_TMPFS for sparc32
tools/nolibc: stackprotector: Avoid stalling program startup if crng is not init yet
tools/nolibc: getopt: Fix potential out of bounds access
selftests/nolibc: test open mode handling
tools/nolibc: always pass mode to open syscall
tools/nolibc: split open mode handling into a macro
tools/nolibc: split implicit open flags into a macro
tools/nolibc: add support for 32-bit parisc
selftests/nolibc: avoid function pointer comparisons
tools/nolibc: add support for OpenRISC / or1k
selftests/nolibc: use vmlinux for MIPS tests
selftests/nolibc: trim IMAGE mappings
selftests/nolibc: trim DEFCONFIG mappings
selftests/nolibc: trim QEMU_ARCH mappings
selftests/nolibc: use QEMU_ARCH for QEMU_ARCH_USER
...
|
|
Allow uprobe_multi link to identify the target binary by an already
opened file descriptor.
Adding new BPF_F_UPROBE_MULTI_PATH_FD flag and the path_fd field for
the attr.link_create.uprobe_multi struct.
When the flag is set, we resolve the target from path_fd, without the
flag, we keep the existing string path behavior.
I don't see a use case for supporting O_PATH file descriptors, because
we need to read the binary first to get probes offsets, so I'm using
the CLASS(fd, f), which fails for O_PATH fds.
Assisted-by: Codex:GPT-5.4
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20260611114230.950379-4-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull openat2 updates from Christian Brauner:
"Features:
- Add O_EMPTYPATH to openat(2)/openat2(2). To get an operable file
descriptor from an O_PATH file descriptor it is possible to use
openat(fd, ".", O_DIRECTORY) for directories, but other file types
require going through open("/proc/<pid>/fd/<nr>") and thus depend
on a functioning procfs.
With O_EMPTYPATH an empty path string is accepted and LOOKUP_EMPTY
is set at path resolution time, allowing to reopen the file behind
the file descriptor directly. Selftests are included.
- Add an OPENAT2_REGULAR flag for openat2(2) which refuses to open
anything but regular files with the new EFTYPE error code.
This implements the "ability to only open regular files" feature
requested by userspace via uapi-group.org and protects services
from being redirected to fifos, device nodes, and friends.
All atomic_open implementations were audited for OPENAT2_REGULAR
handling. Explicit checks were added to ceph, gfs2, nfs (v4), and
cifs/smb - these are the filesystems whose atomic_open can
encounter an existing non-regular file and would otherwise call
finish_open() on it or return a misleading error code.
The remaining implementations (9p, fuse, vboxsf, nfs v2/v3) only
call finish_open() on freshly created files and use
finish_no_open() for lookup hits, letting the VFS catch non-regular
files via the do_open() safety net.
Cleanups:
- Migrate the openat2 selftests to the kselftest harness and move
them under selftests/filesystems/. The tests were written in the
early days of selftests' TAP support and the modern kselftest
harness is much easier to follow and maintain. The contents of the
tests are unchanged and the new emptypath tests are ported on top.
- Make the LAST_XXX last-type constants private to fs/namei.c. The
only user outside of fs/namei.c was ksmbd which only needs to know
whether the last component is a regular one, so
vfs_path_parent_lookup() now performs the LAST_NORM check
internally. The ints are replaced with a dedicated enum last_type"
* tag 'vfs-7.2-rc1.openat2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
vfs: replace ints with enum last_type for LAST_XXX
vfs: make LAST_XXX private to fs/namei.c
selftests: openat2: port emptypath_test to kselftest harness
kselftest/openat2: test for OPENAT2_REGULAR flag
openat2: new OPENAT2_REGULAR flag support
openat2: introduce EFTYPE error code
selftest: add tests for O_EMPTYPATH
vfs: add O_EMPTYPATH to openat(2)/openat2(2)
selftests: openat2: migrate to kselftest harness
selftests: openat2: switch from custom ARRAY_LEN to ARRAY_SIZE
selftests: openat2: move helpers to header
selftests: move openat2 tests to selftests/filesystems/
|
|
Cross-merge networking fixes after downstream PR (net-7.1-rc8).
Conflicts:
drivers/net/ethernet/wangxun/txgbe/txgbe_aml.c
f67aead16e85 ("net: txgbe: rework service event handling")
57d39faed4c9 ("net: txgbe: improve functions of AML 40G devices")
net/rds/info.c
512db8267b73 ("rds: mark snapshot pages dirty in rds_info_getsockopt()")
6e94eeb2a2a6 ("rds: convert to getsockopt_iter")
Adjacent changes:
include/net/sock.h
1ee90b77b727 ("net: guard timestamp cmsgs to real error queue skbs")
f0de88303d5e ("net: make is_skb_wmem() available to modules")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When an 802.3ad (LACP) bonding interface has no slaves in the
collecting/distributing state, the bonding master still reports
carrier as up as long as at least 'min_links' slaves have carrier.
In this situation, only one slave is effectively used for TX/RX,
while traffic received on other slaves is dropped. Upper-layer
daemons therefore consider the interface operational, even though
traffic may be blackholed if the lack of LACP negotiation means
the partner is not ready to deal with traffic.
Introduce a configuration knob to control this behavior. It allows
the bonding master to assert carrier only when at least 'min_links'
slaves are in Collecting_Distributing state.
The default mode preserves the existing behavior. This patch only
introduces the knob; its behavior is implemented in the subsequent
commit.
Fixes: 655f8919d549 ("bonding: add min links parameter to 802.3ad")
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
Acked-by: Jay Vosburgh <jv@jvosburgh.net>
Link: https://patch.msgid.link/20260603150331.1919611-4-louis.scalbert@6wind.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add missing IFLA_BOND_BROADCAST_NEIGH in if_link uapi header.
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
Acked-by: Jay Vosburgh <jv@jvosburgh.net>
Link: https://patch.msgid.link/20260603150331.1919611-2-louis.scalbert@6wind.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Adding support to use session attachment with tracing_multi link.
Adding new BPF_TRACE_FSESSION_MULTI program attach type, that follows
the BPF_TRACE_FSESSION behaviour but on the tracing_multi link.
Such program is called on entry and exit of the attached function
and allows to pass cookie value from entry to exit execution.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20260606123955.345967-16-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Add support to specify cookies for tracing_multi link.
Cookies are provided in array where each value is paired with provided
BTF ID value with the same array index.
Such cookie can be retrieved by bpf program with bpf_get_attach_cookie
helper call.
We need to sort cookies array together with ids array in check_dup_ids,
to keep the id->cookie relation.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20260606123955.345967-15-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Adding new link to allow to attach program to multiple function
BTF IDs. The link is represented by struct bpf_tracing_multi_link.
To configure the link, new fields are added to bpf_attr::link_create
to pass array of BTF IDs;
struct {
__aligned_u64 ids;
__u32 cnt;
} tracing_multi;
Each BTF ID represents function (BTF_KIND_FUNC) that the link will
attach bpf program to.
We use previously added bpf_trampoline_multi_attach/detach functions
to attach/detach the link.
The linkinfo/fdinfo callbacks will be implemented in following changes.
Note this is supported only for archs (x86_64) with ftrace direct and
have single ops support.
CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS &&
CONFIG_HAVE_SINGLE_FTRACE_DIRECT_OPS
Note using sort_r (instead of plain sort) in check_dup_ids, because we
will use the swap callback in following changes.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20260606123955.345967-14-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Adding new program attach types multi tracing attachment:
BPF_TRACE_FENTRY_MULTI
BPF_TRACE_FEXIT_MULTI
and their base support in verifier code.
Programs with such attach type will use specific link attachment
interface coming in following changes.
This was suggested by Andrii some (long) time ago and turned out
to be easier than having special program flag for that.
Bpf programs with such types have 'bpf_multi_func' function set as
their attach_btf_id and keep module reference when it's specified
by attach_prog_fd.
They are also accepted as sleepable programs during verification,
and the real validation for specific BTF_IDs/functions will happen
during the multi link attachment in following changes.
Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20260606123955.345967-11-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Since there're 4 bytes padding at the end of struct bpf_prog_info, they
won't be checked by bpf_check_uarg_tail_zero().
pahole -C bpf_prog_info ./vmlinux
struct bpf_prog_info {
...
__u32 attach_btf_obj_id; /* 220 4 */
__u32 attach_btf_id; /* 224 4 */
/* size: 232, cachelines: 4, members: 38 */
/* sum members: 224 */
/* sum bitfield members: 1 bits, bit holes: 1, sum bit holes: 31 bits */
/* padding: 4 */
/* forced alignments: 9 */
/* last cacheline: 40 bytes */
} __attribute__((__aligned__(8)));
If a future kernel extension adds a new 4-byte field, older userspace
programs allocating this structure on the stack might inadvertently pass
uninitialized stack garbage into the new field, permanently breaking
backward compatibility. -- sashiko [1]
Fix it by changing sizeof(info) to
offsetofend(struct bpf_prog_info, attach_btf_id).
And, add "__u32 :32" to the tail of struct bpf_prog_info.
[1] https://lore.kernel.org/bpf/20260513224823.6494FC19425@smtp.kernel.org/
Fixes: aba64c7da983 ("bpf: Add verified_insns to bpf_prog_info and fdinfo")
Acked-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
Link: https://lore.kernel.org/r/20260605155249.20772-3-leon.hwang@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Since there're 4 bytes padding at the end of struct bpf_map_info, they
won't be checked by bpf_check_uarg_tail_zero().
pahole -C bpf_map_info ./vmlinux
struct bpf_map_info {
...
__u64 hash __attribute__((__aligned__(8))); /* 88 8 */
__u32 hash_size; /* 96 4 */
/* size: 104, cachelines: 2, members: 18 */
/* padding: 4 */
/* forced alignments: 1 */
/* last cacheline: 40 bytes */
} __attribute__((__aligned__(8)));
If a future kernel extension adds a new 4-byte field, older userspace
programs allocating this structure on the stack might inadvertently pass
uninitialized stack garbage into the new field, permanently breaking
backward compatibility. -- sashiko [1]
Fix it by changing sizeof(info) to
offsetofend(struct bpf_map_info, hash_size).
And, add "__u32 :32" to the tail of struct bpf_map_info.
[1] https://lore.kernel.org/bpf/20260513224823.6494FC19425@smtp.kernel.org/
Fixes: ea2e6467ac36 ("bpf: Return hashes of maps in BPF_OBJ_GET_INFO_BY_FD")
Acked-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
Link: https://lore.kernel.org/r/20260605155249.20772-2-leon.hwang@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Use rhashtable_lookup_likely() for lookups, rhashtable_remove_fast()
for deletes, and rhashtable_lookup_get_insert_fast() for inserts.
Updates modify values in place under RCU rather than allocating a
new element and swapping the pointer (as regular htab does). This
trades read consistency for performance: concurrent readers may
see partial updates. BPF_F_LOCK support and special-field
handling (timers, kptrs, etc.) follow in a later commit.
Initialize rhashtable with bpf_mem_alloc element cache. Require
BPF_F_NO_PREALLOC. Limit max_entries to 2^31. Free elements via
rhashtable_free_and_destroy().
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Link: https://lore.kernel.org/r/20260605-rhash-v7-4-5b8e05f8630d@meta.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
After commit 9b93f7e32774 ("tools/getdelays: use the static UAPI headers
from tools/include/uapi"), the Makefile was changed to use
-I../include/uapi/ instead of -I../../usr/include to ensure tools always
use the up-to-date UAPI headers.
However, only linux/taskstats.h was added to tools/include/uapi/ in commit
e5bbb35a07b3 ("tools headers UAPI: sync linux/taskstats.h"), but
linux/acct.h was missing.
This causes procacct.c to fail to compile with:
procacct.c:234:37: error: 'AGROUP' undeclared (first use in this function)
gcc -I../include/uapi/ getdelays.c -o getdelays
gcc -I../include/uapi/ procacct.c -o procacct
procacct.c: In function `print_procacct':
procacct.c:234:37: error: `AGROUP' undeclared (first use in this function)
did you mean `NOGROUP'?
234 | , t->version >= 12 ? (t->ac_flag & AGROUP ? 'P' : 'T') : '?'
| ^~~~~~
| NOGROUP
procacct.c:234:37: note: each undeclared ident
because procacct.c uses the AGROUP macro defined in linux/acct.h.
Add the missing linux/acct.h to complete the static UAPI header set.
Link: https://lore.kernel.org/20260527213558929EhiHHy9EDTMjmg3uuDOMi@zte.com.cn
Fixes: 9b93f7e32774 ("tools/getdelays: use the static UAPI headers from tools/include/uapi")
Signed-off-by: Wang Yaxin <wang.yaxin@zte.com.cn>
Reviewed-by: Thomas Weißschuh <linux@weissschuh.net>
Cc: Fan Yu <fan.yu9@zte.com.cn>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: xu xin <xu.xin16@zte.com.cn>
Cc: Yang Yang <yang.yang29@zte.com.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The kmem_cache_alloc_bulk return value is weird. It returns the number
of allocated objects, but that must always be 0 or the requested number
based on the implementations and the handling in the callers, but that
assumption is not actually documented anywhere, which confuses automated
review tools.
Fix this by returning a bool if the allocation succeeded and adding a
kerneldoc comment explaining the API.
[rob.clark@oss.qualcomm.com: fixups in
msm_iommu_pagetable_prealloc_allocate() ]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> # skbuff
Link: https://patch.msgid.link/20260528093437.2519248-2-hch@lst.de
Signed-off-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
|
|
With -Wwrite-strings the plain assignment triggers a warning as a
'const char *' is assigned to a 'char *', removing the const qualifier.
Casting the const away is fine, as there is no valid modification that
can be done to an empty string anyways.
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Acked-by: Willy Tarreau <w@1wt.eu>
Link: https://patch.msgid.link/20260525-nolibc-write-strings-v2-1-ab5cc16c7b23@weissschuh.net
|
|
On architectures with 32-bit longs, call the compat syscall
__NR_ftruncate64. As off_t is 64-bit it must be split into 2 registers.
Unlike llseek() which passes the high and low parts in explicitly named
arguments, the order here is endian independent.
Some architectures (arm, mips, ppc) require this pair of registers to
be aligned to an even register, so add custom _sys_ftruncate64()
wrappers for those.
A test case for ftruncate is added which validates negative length or
invalid fd return the appropriate error, and checks the length is
correct on success.
Co-developed-by: Jordan Richards <jordanrichards@google.com>
Signed-off-by: Jordan Richards <jordanrichards@google.com>
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Acked-by: Willy Tarreau <w@1wt.eu>
Reviewed-by: Daniel Palmer <daniel@thingy.jp>
Link: https://patch.msgid.link/20260521-nolibc-ftruncate-v1-3-5384a83b2402@weissschuh.net
|
|
On 32-bit architectures some system calls require a single 64-bit
argument to be passed as two 32-bit halves.
Add a helper to easily split such arguments. This works on little and
bit endian.
Signed-off-by: Daniel Palmer <daniel@thingy.jp>
Acked-by: Willy Tarreau <w@1wt.eu>
Link: https://patch.msgid.link/20260521-nolibc-ftruncate-v1-2-5384a83b2402@weissschuh.net
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
|
|
init yet
We are using the getrandom syscall to get a random seed for the
stack protector canary but we are calling it with no flags which means
it'll block until there is some real randomness to return.
This means that if the crng is not ready yet program startup will
block and if you are unlucky that could be for a long time and
look like the program has crashed.
Even if the call to getrandom does not yield any random data,
we will still initialize the canary.
Fixes: 7188d4637e95 ("tools/nolibc: add support for stack protector")
Signed-off-by: Daniel Palmer <daniel@thingy.jp>
Acked-by: Willy Tarreau <w@1wt.eu>
Link: https://patch.msgid.link/20260522090726.726985-1-daniel@thingy.jp
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
|
|
Introduce a new error code EFTYPE for wrong file type operations.
EFTYPE is already used in BSD systems like FreeBSD and macOS.
This will be used by the upcoming OPENAT2_REGULAR flag support to
return a specific error when a path doesn't refer to a regular file.
Signed-off-by: Dorjoy Chowdhury <dorjoychy111@gmail.com>
Link: https://patch.msgid.link/20260328172314.45807-2-dorjoychy111@gmail.com
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Aleksa Sarai <aleksa@amutable.com>
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
|
|
Add tests for the new O_EMPTYPATH flag of openat(2)/openat2(2).
Also, the current openat2 tests include a helper header file that
defines the necessary structs and constants to use openat2(2), such as
struct open_how. This may result in conflicting definitions when the
system header openat2.h is present as well.
So add openat2.h generated by 'make headers' to the uapi header
files in ./tools/include and remove the helper file definitions of
the current openat2 selftests.
Signed-off-by: Jori Koolstra <jkoolstra@xs4all.nl>
Link: https://patch.msgid.link/20260424114611.1678641-3-jkoolstra@xs4all.nl
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
|
|
Running clang-tidy on a program that uses getopt() from nolibc
this warning appears:
getopt.h:80:6: warning: Out of bound access to memory after the end of the string literal [clang-analyzer-security.ArrayBound]
80 | if (optstring[i] == ':') {
This looks like a very unlikely case that an argument
inside of argv is being changed between getopt() calls.
Adding a check for d becoming 0 in the guard after the loop
stops getopt() getting far enough to access beyond the end
of the array and seems to correct the issue.
Fixes: bae3cd708e8a ("tools/nolibc: add getopt()")
Assisted-by: Claude:claude-4.6-sonnet # reproducer
Signed-off-by: Daniel Palmer <daniel@thingy.jp>
Link: https://patch.msgid.link/20260520111931.1027758-1-daniel@thingy.jp
[Thomas: clean up commit message a bit]
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
|
|
When O_TMPFILE is set, the open mode needs to be passed to the kernel as
per the documentation. Currently this is not done.
Instead of checking for O_TMPFILE explicitly and making the conditionals
more complex, just always pass the mode to the kernel. If no value was
passed the mode will be garbage, but the kernel will ignore it anyways.
Fixes: a7604ba149e7 ("tools/nolibc/sys: make open() take a vararg on the 3rd argument")
Suggested-by: Willy Tarreau <w@1wt.eu>
Link: https://lore.kernel.org/lkml/afRfjdovT6pNtwtP@1wt.eu/
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Acked-by: Willy Tarreau <w@1wt.eu>
Link: https://patch.msgid.link/20260514-nolibc-open-tmpfile-v2-3-b4c6c5efa266@weissschuh.net
|
|
This logic is duplicated and some upcoming extensions would require even
more duplicated logic.
Move it into a macro to avoid the duplication and allow cleaner changes.
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Acked-by: Willy Tarreau <w@1wt.eu>
Link: https://patch.msgid.link/20260514-nolibc-open-tmpfile-v2-2-b4c6c5efa266@weissschuh.net
|
|
This logic is duplicated and its current form will be in the way of some
upcoming simplificiations.
Move it into a macro to avoid the duplication and enable some cleanups.
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Acked-by: Willy Tarreau <w@1wt.eu>
Link: https://patch.msgid.link/20260514-nolibc-open-tmpfile-v2-1-b4c6c5efa266@weissschuh.net
|
|
Add generic BPF syscall support for passing common attributes.
The initial set of common attributes includes:
1. 'log_buf': User-provided buffer for storing logs.
2. 'log_size': Size of the log buffer.
3. 'log_level': Log verbosity level.
4. 'log_true_size': Actual log size reported by kernel.
The common-attribute pointer and its size are passed as the 4th and 5th
syscall arguments. A new command bit, 'BPF_COMMON_ATTRS' ('1 << 16'),
indicates that common attributes are supplied.
This commit adds syscall and uapi plumbing. Command-specific handling is
added in follow-up patches.
Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
Link: https://lore.kernel.org/r/20260512153157.28382-2-leon.hwang@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Cross-merge BPF and other fixes after downstream PR.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Extend nolibc to target the 32-bit parisc architecture.
64-bit is not yet supported.
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Acked-by: Willy Tarreau <w@1wt.eu>
Link: https://patch/msgid.link/20260428-nolibc-hppa-v5-2-d843d573111a@weissschuh.net
|
|
Add support for OpenRISC / or1k to nolibc.
_start() uses the same wrapper construct as in arch-sh.h.
libgcc is necessary as OpenRISC is missing 64-bit multiplication.
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Acked-by: Stafford Horne <shorne@gmail.com>
Acked-by: Willy Tarreau <w@1wt.eu>
Link: https://patch.msgid.link/20260429-nolibc-openrisc-v2-1-8d7d7a2f3fec@weissschuh.net
|
|
With commit dacbfc167808 ("crypto: af_alg - Annotate struct af_alg_iv
with __counted_by"), two selftests, test_tag and crypto_sanity, now
indirectly rely on the __counted_by macro. On systems with commit
dacbfc167808 in the installed UAPI headers, the selftests build fails
with:
In file included from tools/testing/selftests/bpf/prog_tests/crypto_sanity.c:7:
/usr/include/linux/if_alg.h:45:22: error: expected ‘:’, ‘,’, ‘;’, ‘}’ or ‘__attribute__’ before ‘__counted_by’
45 | __u8 iv[] __counted_by(ivlen);
| ^~~~~~~~~~~~
This patch fixes it by regenerating stddef.h in tools/include using the
instructions from commit a778f5d46b62 ("tools/headers: Pull in stddef.h
to uapi to fix BPF selftests build in CI").
Fixes: dacbfc167808 ("crypto: af_alg - Annotate struct af_alg_iv with __counted_by")
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Tested-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Link: https://lore.kernel.org/r/8da8ef16055aa452d940668ed5359ce54adc6b0b.1777715500.git.paul.chaignon@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
nolibc can natively handle large files. Tell this to the kernel by
always using O_LARGEFILE when opening files. This is also how other
libcs do it.
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Acked-by: Willy Tarreau <w@1wt.eu>
Link: https://patch.msgid.link/20260418-nolibc-largefile-v1-6-b91f0775bac3@weissschuh.net
|
|
The N32 system call ABI expects 64-bit values directly in registers.
This does not work on nolibc currently, as a 'long' is only 32 bits
wide. Switch the system call wrappers to use 'long long' instead which
can handle 64-bit values on N32. As on N64 'long' and 'long long' are
the same, this does not change the behavior there.
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Acked-by: Willy Tarreau <w@1wt.eu>
Link: https://patch.msgid.link/20260418-nolibc-largefile-v1-5-b91f0775bac3@weissschuh.net
|
|
The x32 system call ABI expects 64-bit values directly in registers.
This does not work on nolibc currently, as a 'long' is only 32 bits
wide. Switch the system call wrappers to use 'long long' instead which
can handle 64-bit values on x32. As on x86_64 'long' and 'long long' are
the same, this does not change the behavior there.
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Acked-by: Willy Tarreau <w@1wt.eu>
Link: https://patch.msgid.link/20260418-nolibc-largefile-v1-4-b91f0775bac3@weissschuh.net
|
|
Currently all system call wrappers return 'long' integers which can be
directly cast to 'void *' if the returned value is actually a pointer.
An upcoming change will change the system call wrappers to sometimes
return 'long long' which can not be cast to a pointer directly.
Add explicit cast through 'long' to prepare for this.
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Acked-by: Willy Tarreau <w@1wt.eu>
Link: https://patch.msgid.link/20260418-nolibc-largefile-v1-3-b91f0775bac3@weissschuh.net
|
|
In the architecture specific system call glue, all arguments are
currently casted to 'long' to fit into registers. This works for
pointers as 'long' has the same size as pointers.
However the system call registers for X32 and MIPS N32 need to be
'long long' to work correctly for 64-bit values expected by the system
call ABI. Casting a pointer to a 'long long' will produce a compiler
warning while casting 64-bit integers to 'long' will truncate those.
Add a helper which can be used to correctly cast both pointers and
integers into 'long long' registers. Cast the pointers through
'unsigned' to avoid any sign extensions.
Both builtins have been available since at least GCC 3 and clang 3.
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Acked-by: Willy Tarreau <w@1wt.eu>
Link: https://patch.msgid.link/20260418-nolibc-largefile-v1-2-b91f0775bac3@weissschuh.net
|
|
On some architectures the llseek system call contains a leading
underscore. Treat it the same way as llseek and prefer it over the
plain lseek system call as is necessary for 64-bit offset handling.
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Acked-by: Willy Tarreau <w@1wt.eu>
Link: https://patch.msgid.link/20260418-nolibc-largefile-v1-1-b91f0775bac3@weissschuh.net
|
|
creat() is a simple wrapper around open().
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Acked-by: Willy Tarreau <w@1wt.eu>
Link: https://patch.msgid.link/20260419-nolibc-open-mode-v1-2-8dc5a960daa7@weissschuh.net
|
|
The stack protector is configured in _start_c() so we shouldn't
use it before then.
Add __nolibc_no_stack_protector to _start_c() to avoid the compiler
generating stack protector code for _start_c() and thus using it
before its configured.
Signed-off-by: Daniel Palmer <daniel@thingy.jp>
Link: https://patch.msgid.link/20260425111315.3191461-3-daniel@thingy.jp
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
|
|
To avoid polluting the namespace rename __no_stack_protector to
__nolibc_no_stack_protector so its now within the nolibc umbrella.
Suggested-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Daniel Palmer <daniel@thingy.jp>
Link: https://patch.msgid.link/20260425111315.3191461-2-daniel@thingy.jp
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
|
|
Clang may convert the loop to find _auxv into a call to wcslen() which
is missing on nolibc. -fsanitize needs to be disabled for this to
happen.
Use the same pattern as in the nolibc strlen() implementation to avoid
the function call generation.
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Acked-by: Willy Tarreau <w@1wt.eu>
Link: https://patch.msgid.link/20260418-nolibc-wcslen-v1-1-671271b8ea63@weissschuh.net
|
|
Functions which are known at compile-time to result in ENOSYS can be
surprising to the user. For example using old UAPI headers might mean
that stat() will always fail although the kernel would have the system
call available at runtime. Nowadays __nolibc_enosys() should never be
called for normal applications.
Switch the silent ENOSYS return into a compile-time error, so the user
is aware about the issue. Prefer the 'error' attribute as it provides
the best diagnostics. If the users defines NOLIBC_COMPILE_TIME_ENOSYS
the old, silent fallback is kept.
Also add a test which validates that the error can be optimized away.
Reported-by: Willy Tarreau <w@1wt.eu>
Closes: https://lore.kernel.org/lkml/acizRIq2xrFUNHNS@1wt.eu/
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Acked-by: Willy Tarreau <w@1wt.eu>
Link: https://patch.msgid.link/20260404-nolibc-enosys-v1-1-e0aba47bdee4@weissschuh.net
|
|
Add the wide-used alloca() function. As it is highly machine and
compiler dependent, just defer to the compiler builtin. This has
been available since GCC 4 and clang 3.
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Acked-by: Willy Tarreau <w@1wt.eu>
Link: https://patch.msgid.link/20260409-nolibc-alloca-v1-1-ed02f68dfaf9@weissschuh.net
|
|
Add the standard assert() macro from the assert.h header.
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Acked-by: Willy Tarreau <w@1wt.eu>
Link: https://patch.msgid.link/20260409-nolibc-assert-v1-1-42da8b367e23@weissschuh.net
|
|
BTF maximum vlen is encoded using 16 bits with a maximum vlen
of 65535. This has sufficed for structs, function parameters
and enumerated type values. However, with upcoming BTF location
information - in particular information about inline sites -
this limit is surpassed. Use bits 16-23 - currently unused in
BTF info - to extend to 24 bits, giving a max vlen of (2^24 - 1),
or 16 million.
Also extend BTF kind encoding from 5 to 7 bits, giving a maximum
available number of kinds of 128. Since with the BTF location work
we use another 3 kinds, we are fast approaching the current limit
of 32.
Convert BTF_MAX_* values to enums to allow them to be encoded in
kernel BTF; this will allow us to detect if the running kernel
supports a 24-bit vlen or not. Add one for max _possible_
(not used) kind.
Fix up a few places in the kernel where a 16-bit vlen is assumed;
remove BTF_INFO_MASK as now all bits are used.
The vlen expansion was suggested by Andrii in [1]; the kind expansion
is tackled here too as it may be needed also to support new kinds
in BTF.
[1] https://lore.kernel.org/bpf/CAEf4BzZx=X6vGqcA8SPU6D+v6k+TR=ZewebXMuXtpmML058piw@mail.gmail.com/
Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Acked-by: Mykyta Yatsenko <yatsenko@meta.com>
Link: https://lore.kernel.org/r/20260417143023.1551481-2-alan.maguire@oracle.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock
Pull memblock updates from Mike Rapoport:
- improve debuggability of reserve_mem kernel parameter handling with
print outs in case of a failure and debugfs info showing what was
actually reserved
- Make memblock_free_late() and free_reserved_area() use the same core
logic for freeing the memory to buddy and ensure it takes care of
updating memblock arrays when ARCH_KEEP_MEMBLOCK is enabled.
* tag 'memblock-v7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock:
x86/alternative: delay freeing of smp_locks section
memblock: warn when freeing reserved memory before memory map is initialized
memblock, treewide: make memblock_free() handle late freeing
memblock: make free_reserved_area() update memblock if ARCH_KEEP_MEMBLOCK=y
memblock: extract page freeing from free_reserved_area() into a helper
memblock: make free_reserved_area() more robust
mm: move free_reserved_area() to mm/memblock.c
powerpc: opal-core: pair alloc_pages_exact() with free_pages_exact()
powerpc: fadump: pair alloc_pages_exact() with free_pages_exact()
memblock: reserve_mem: fix end caclulation in reserve_mem_release_by_name()
memblock: move reserve_bootmem_range() to memblock.c and make it static
memblock: Add reserve_mem debugfs info
memblock: Print out errors on reserve_mem parser
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools
Pull perf tools updates from Namhyung Kim:
"perf report:
- Add 'comm_nodigit' sort key to combine similar threads that only
have different numbers in the comm. In the following example, the
'comm_nodigit' will have samples from all threads starting with
"bpfrb/" into an entry "bpfrb/<N>".
$ perf report -s comm_nodigit,comm -H
...
#
# Overhead CommandNoDigit / Command
# ........... ........................
#
20.30% swapper
20.30% swapper
13.37% chrome
13.37% chrome
10.07% bpfrb/<N>
7.47% bpfrb/0
0.70% bpfrb/1
0.47% bpfrb/3
0.46% bpfrb/2
0.25% bpfrb/4
0.23% bpfrb/5
0.20% bpfrb/6
0.14% bpfrb/10
0.07% bpfrb/7
- Support flat layout for symfs. The --symfs option is to specify the
location of debugging symbol files. The default 'hierarchy' layout
would search the symbol file using the same path of the original
file under the symfs root. The new 'flat' layout would search only
in the root directory.
- Update 'simd' sort key for ARM SIMD flags to cover ASE/SME and more
predicate flags.
perf stat:
- Add --pmu-filter option to select specific PMUs. This would be
useful when you measure metrics from multiple instance of uncore
PMUs with similar names.
# perf stat -M cpa_p0_avg_bw
Performance counter stats for 'system wide':
19,417,779,115 hisi_sicl0_cpa0/cpa_cycles/ # 0.00 cpa_p0_avg_bw
0 hisi_sicl0_cpa0/cpa_p0_wr_dat/
0 hisi_sicl0_cpa0/cpa_p0_rd_dat_64b/
0 hisi_sicl0_cpa0/cpa_p0_rd_dat_32b/
19,417,751,103 hisi_sicl10_cpa0/cpa_cycles/ # 0.00 cpa_p0_avg_bw
0 hisi_sicl10_cpa0/cpa_p0_wr_dat/
0 hisi_sicl10_cpa0/cpa_p0_rd_dat_64b/
0 hisi_sicl10_cpa0/cpa_p0_rd_dat_32b/
19,417,730,679 hisi_sicl2_cpa0/cpa_cycles/ # 0.31 cpa_p0_avg_bw
75,635,749 hisi_sicl2_cpa0/cpa_p0_wr_dat/
18,520,640 hisi_sicl2_cpa0/cpa_p0_rd_dat_64b/
0 hisi_sicl2_cpa0/cpa_p0_rd_dat_32b/
19,417,674,227 hisi_sicl8_cpa0/cpa_cycles/ # 0.00 cpa_p0_avg_bw
0 hisi_sicl8_cpa0/cpa_p0_wr_dat/
0 hisi_sicl8_cpa0/cpa_p0_rd_dat_64b/
0 hisi_sicl8_cpa0/cpa_p0_rd_dat_32b/
19.417734480 seconds time elapsed
With --pmu-filter, users can select only hisi_sicl2_cpa0 PMU.
# perf stat --pmu-filter hisi_sicl2_cpa0 -M cpa_p0_avg_bw
Performance counter stats for 'system wide':
6,234,093,559 cpa_cycles # 0.60 cpa_p0_avg_bw
50,548,465 cpa_p0_wr_dat
7,552,182 cpa_p0_rd_dat_64b
0 cpa_p0_rd_dat_32b
6.234139320 seconds time elapsed
Data type profiling:
- Quality improvements by tracking register state more precisely
- Ensure array members to get the type
- Handle more cases for global variables
Vendor event/metric updates:
- Update various Intel events and metrics
- Add NVIDIA Tegra 410 Olympus events
Internal changes:
- Verify perf.data header for maliciously crafted files
- Update perf test to cover more usages and make them robust
- Move a couple of copied kernel headers not to annoy objtool build
- Fix a bug in map sorting in name order
- Remove some unused codes
Misc:
- Fix module symbol resolution with non-zero text address
- Add -t/--threads option to `perf bench mem mmap`
- Track duration of exit*() syscall by `perf trace -s`
- Add core.addr2line-timeout and core.addr2line-disable-warn config
items"
* tag 'perf-tools-for-v7.1-2026-04-17' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: (131 commits)
perf loongarch: Fix build failure with CONFIG_LIBDW_DWARF_UNWIND
perf annotate: Use jump__delete when freeing LoongArch jumps
perf test: Fixes for check branch stack sampling
perf test: Fix inet_pton probe failure and unroll call graph
perf build: fix "argument list too long" in second location
perf header: Add sanity checks to HEADER_BPF_BTF processing
perf header: Sanity check HEADER_BPF_PROG_INFO
perf header: Sanity check HEADER_PMU_CAPS
perf header: Sanity check HEADER_HYBRID_TOPOLOGY
perf header: Sanity check HEADER_CACHE
perf header: Sanity check HEADER_GROUP_DESC
perf header: Sanity check HEADER_PMU_MAPPINGS
perf header: Sanity check HEADER_MEM_TOPOLOGY
perf header: Sanity check HEADER_NUMA_TOPOLOGY
perf header: Sanity check HEADER_CPU_TOPOLOGY
perf header: Sanity check HEADER_NRCPUS and HEADER_CPU_DOMAIN_INFO
perf header: Bump up the max number of command line args allowed
perf header: Validate nr_domains when reading HEADER_CPU_DOMAIN_INFO
perf sample: Fix documentation typo
perf arm_spe: Improve SIMD flags setting
...
|