Age | Commit message (Collapse) | Author | Files | Lines |
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from bpf and wireless.
Nothing scary here. Feels like the first wave of regressions from v6.5
is addressed - one outstanding fix still to come in TLS for the
sendpage rework.
Current release - regressions:
- udp: fix __ip_append_data()'s handling of MSG_SPLICE_PAGES
- dsa: fix older DSA drivers using phylink
Previous releases - regressions:
- gro: fix misuse of CB in udp socket lookup
- mlx5: unregister devlink params in case interface is down
- Revert "wifi: ath11k: Enable threaded NAPI"
Previous releases - always broken:
- sched: cls_u32: fix match key mis-addressing
- sched: bind logic fixes for cls_fw, cls_u32 and cls_route
- add bound checks to a number of places which hand-parse netlink
- bpf: disable preemption in perf_event_output helpers code
- qed: fix scheduling in a tasklet while getting stats
- avoid using APIs which are not hardirq-safe in couple of drivers,
when we may be in a hard IRQ (netconsole)
- wifi: cfg80211: fix return value in scan logic, avoid page
allocator warning
- wifi: mt76: mt7615: do not advertise 5 GHz on first PHY of MT7615D
(DBDC)
Misc:
- drop handful of inactive maintainers, put some new in place"
* tag 'net-6.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (98 commits)
MAINTAINERS: update TUN/TAP maintainers
test/vsock: remove vsock_perf executable on `make clean`
tcp_metrics: fix data-race in tcpm_suck_dst() vs fastopen
tcp_metrics: annotate data-races around tm->tcpm_net
tcp_metrics: annotate data-races around tm->tcpm_vals[]
tcp_metrics: annotate data-races around tm->tcpm_lock
tcp_metrics: annotate data-races around tm->tcpm_stamp
tcp_metrics: fix addr_same() helper
prestera: fix fallback to previous version on same major version
udp: Fix __ip_append_data()'s handling of MSG_SPLICE_PAGES
net/mlx5e: Set proper IPsec source port in L4 selector
net/mlx5: fs_core: Skip the FTs in the same FS_TYPE_PRIO_CHAINS fs_prio
net/mlx5: fs_core: Make find_closest_ft more generic
wifi: brcmfmac: Fix field-spanning write in brcmf_scan_params_v2_to_v1()
vxlan: Fix nexthop hash size
ip6mr: Fix skb_under_panic in ip6mr_cache_report()
s390/qeth: Don't call dev_close/dev_open (DOWN/UP)
net: tap_open(): set sk_uid from current_fsuid()
net: tun_chr_open(): set sk_uid from current_fsuid()
net: dcb: choose correct policy to parse DCB_ATTR_BCN
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen fixes from Juergen Gross:
- A fix for a performance problem in QubesOS, adding a way to drain the
queue of grants experiencing delayed unmaps faster
- A patch enabling the use of static event channels from user mode,
which was omitted when introducing supporting static event channels
- A fix for a problem where Xen related code didn't check properly for
running in a Xen environment, resulting in a WARN splat
* tag 'for-linus-6.5a-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen: speed up grant-table reclaim
xen/evtchn: Introduce new IOCTL to bind static evtchn
xenbus: check xen_domain in xenbus_probe_initcall
|
|
Pull block fixes from Jens Axboe:
"A few fixes that should go into the current kernel release, mainly:
- Set of fixes for dasd (Stefan)
- Handle interruptible waits returning because of a signal for ublk
(Ming)"
* tag 'block-6.5-2023-07-28' of git://git.kernel.dk/linux:
ublk: return -EINTR if breaking from waiting for existed users in DEL_DEV
ublk: fail to recover device if queue setup is interrupted
ublk: fail to start device if queue setup is interrupted
block: Fix a source code comment in include/uapi/linux/blkzoned.h
s390/dasd: print copy pair message only for the correct error
s390/dasd: fix hanging device after request requeue
s390/dasd: use correct number of retries for ERP requests
s390/dasd: fix hanging device after quiesce/resume
|
|
Typical misuse of
nla_parse_nested(array, XXX_MAX, ...);
array must be declared as
struct nlattr *array[XXX_MAX + 1];
v2: Based on feedbacks from Ido Schimmel and Zahari Doychev,
I also changed TCA_FLOWER_KEY_CFM_OPT_MAX and cfm_opt_policy
definitions.
syzbot reported:
BUG: KASAN: stack-out-of-bounds in __nla_validate_parse+0x136/0x2bd0 lib/nlattr.c:588
Write of size 32 at addr ffffc90003a0ee20 by task syz-executor296/5014
CPU: 0 PID: 5014 Comm: syz-executor296 Not tainted 6.5.0-rc2-syzkaller-00307-gd192f5382581 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/12/2023
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:364 [inline]
print_report+0x163/0x540 mm/kasan/report.c:475
kasan_report+0x175/0x1b0 mm/kasan/report.c:588
kasan_check_range+0x27e/0x290 mm/kasan/generic.c:187
__asan_memset+0x23/0x40 mm/kasan/shadow.c:84
__nla_validate_parse+0x136/0x2bd0 lib/nlattr.c:588
__nla_parse+0x40/0x50 lib/nlattr.c:700
nla_parse_nested include/net/netlink.h:1262 [inline]
fl_set_key_cfm+0x1e3/0x440 net/sched/cls_flower.c:1718
fl_set_key+0x2168/0x6620 net/sched/cls_flower.c:1884
fl_tmplt_create+0x1fe/0x510 net/sched/cls_flower.c:2666
tc_chain_tmplt_add net/sched/cls_api.c:2959 [inline]
tc_ctl_chain+0x131d/0x1ac0 net/sched/cls_api.c:3068
rtnetlink_rcv_msg+0x82b/0xf50 net/core/rtnetlink.c:6424
netlink_rcv_skb+0x1df/0x430 net/netlink/af_netlink.c:2549
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x7c3/0x990 net/netlink/af_netlink.c:1365
netlink_sendmsg+0xa2a/0xd60 net/netlink/af_netlink.c:1914
sock_sendmsg_nosec net/socket.c:725 [inline]
sock_sendmsg net/socket.c:748 [inline]
____sys_sendmsg+0x592/0x890 net/socket.c:2494
___sys_sendmsg net/socket.c:2548 [inline]
__sys_sendmsg+0x2b0/0x3a0 net/socket.c:2577
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f54c6150759
Code: 48 83 c4 28 c3 e8 d7 19 00 00 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffe06c30578 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f54c619902d RCX: 00007f54c6150759
RDX: 0000000000000000 RSI: 0000000020000280 RDI: 0000000000000003
RBP: 00007ffe06c30590 R08: 0000000000000000 R09: 00007ffe06c305f0
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f54c61c35f0
R13: 00007ffe06c30778 R14: 0000000000000001 R15: 0000000000000001
</TASK>
The buggy address belongs to stack of task syz-executor296/5014
and is located at offset 32 in frame:
fl_set_key_cfm+0x0/0x440 net/sched/cls_flower.c:374
This frame has 1 object:
[32, 56) 'nla_cfm_opt'
The buggy address belongs to the virtual mapping at
[ffffc90003a08000, ffffc90003a11000) created by:
copy_process+0x5c8/0x4290 kernel/fork.c:2330
Fixes: 7cfffd5fed3e ("net: flower: add support for matching cfm fields")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Zahari Doychev <zdoychev@maxlinear.com>
Link: https://lore.kernel.org/r/20230726145815.943910-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Xen 4.17 supports the creation of static evtchns. To allow user space
application to bind static evtchns introduce new ioctl
"IOCTL_EVTCHN_BIND_STATIC". Existing IOCTL doing more than binding
that’s why we need to introduce the new IOCTL to only bind the static
event channels.
Static evtchns to be available for use during the lifetime of the
guest. When the application exits, __unbind_from_irq() ends up being
called from release() file operations because of that static evtchns
are getting closed. To avoid closing the static event channel, add the
new bool variable "is_static" in "struct irq_info" to mark the event
channel static when creating the event channel to avoid closing the
static evtchn.
Also, take this opportunity to remove the open-coded version of the
evtchn close in drivers/xen/evtchn.c file and use xen_evtchn_close().
Signed-off-by: Rahul Singh <rahul.singh@arm.com>
Reviewed-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Link: https://lore.kernel.org/r/ae7329bf1713f83e4aad4f3fa0f316258c40a3e9.1689677042.git.rahul.singh@arm.com
Signed-off-by: Juergen Gross <jgross@suse.com>
|
|
syzkaller found a warning in packet_getname() [0], where we try to
copy 16 bytes to sockaddr_ll.sll_addr[8].
Some devices (ip6gre, vti6, ip6tnl) have 16 bytes address expressed
by struct in6_addr. Also, Infiniband has 32 bytes as MAX_ADDR_LEN.
The write seems to overflow, but actually not since we use struct
sockaddr_storage defined in __sys_getsockname() and its size is 128
(_K_SS_MAXSIZE) bytes. Thus, we have sufficient room after sll_addr[]
as __data[].
To avoid the warning, let's add a flex array member union-ed with
sll_addr.
Another option would be to use strncpy() and limit the copied length
to sizeof(sll_addr), but it will return the partial address and break
an application that passes sockaddr_storage to getsockname().
[0]:
memcpy: detected field-spanning write (size 16) of single field "sll->sll_addr" at net/packet/af_packet.c:3604 (size 8)
WARNING: CPU: 0 PID: 255 at net/packet/af_packet.c:3604 packet_getname+0x25c/0x3a0 net/packet/af_packet.c:3604
Modules linked in:
CPU: 0 PID: 255 Comm: syz-executor750 Not tainted 6.5.0-rc1-00330-g60cc1f7d0605 #4
Hardware name: linux,dummy-virt (DT)
pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : packet_getname+0x25c/0x3a0 net/packet/af_packet.c:3604
lr : packet_getname+0x25c/0x3a0 net/packet/af_packet.c:3604
sp : ffff800089887bc0
x29: ffff800089887bc0 x28: ffff000010f80f80 x27: 0000000000000003
x26: dfff800000000000 x25: ffff700011310f80 x24: ffff800087d55000
x23: dfff800000000000 x22: ffff800089887c2c x21: 0000000000000010
x20: ffff00000de08310 x19: ffff800089887c20 x18: ffff800086ab1630
x17: 20646c6569662065 x16: 6c676e697320666f x15: 0000000000000001
x14: 1fffe0000d56d7ca x13: 0000000000000000 x12: 0000000000000000
x11: 0000000000000000 x10: 0000000000000000 x9 : 3e60944c3da92b00
x8 : 3e60944c3da92b00 x7 : 0000000000000001 x6 : 0000000000000001
x5 : ffff8000898874f8 x4 : ffff800086ac99e0 x3 : ffff8000803f8808
x2 : 0000000000000001 x1 : 0000000100000000 x0 : 0000000000000000
Call trace:
packet_getname+0x25c/0x3a0 net/packet/af_packet.c:3604
__sys_getsockname+0x168/0x24c net/socket.c:2042
__do_sys_getsockname net/socket.c:2057 [inline]
__se_sys_getsockname net/socket.c:2054 [inline]
__arm64_sys_getsockname+0x7c/0x94 net/socket.c:2054
__invoke_syscall arch/arm64/kernel/syscall.c:38 [inline]
invoke_syscall+0x98/0x2c0 arch/arm64/kernel/syscall.c:52
el0_svc_common+0x134/0x240 arch/arm64/kernel/syscall.c:139
do_el0_svc+0x64/0x198 arch/arm64/kernel/syscall.c:188
el0_svc+0x2c/0x7c arch/arm64/kernel/entry-common.c:647
el0t_64_sync_handler+0x84/0xfc arch/arm64/kernel/entry-common.c:665
el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:591
Fixes: df8fc4e934c1 ("kbuild: Enable -fstrict-flex-arrays=3")
Reported-by: syzkaller <syzkaller@googlegroups.com>
Suggested-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20230724213425.22920-3-kuniyu@amazon.com
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Fix the symbolic names for zone conditions in the blkzoned.h header
file.
Cc: Hannes Reinecke <hare@suse.de>
Cc: Damien Le Moal <dlemoal@kernel.org>
Fixes: 6a0cb1bc106f ("block: Implement support for zoned block devices")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Link: https://lore.kernel.org/r/20230706201422.3987341-1-bvanassche@acm.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
Pull fuse fixes from Miklos Szeredi:
"Small but important fixes and a trivial cleanup"
* tag 'fuse-update-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
fuse: ioctl: translate ENOSYS in outarg
fuse: revalidate: don't invalidate if interrupted
fuse: Apply flags2 only when userspace set the FUSE_INIT_EXT
fuse: remove duplicate check for nodeid
fuse: add feature flag for expire-only
|
|
Pull in the currently staged SCSI fixes for 6.5.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull hardening fixes from Kees Cook:
- Check for NULL bdev in LoadPin (Matthias Kaehlcke)
- Revert unwanted KUnit FORTIFY build default
- Fix 1-element array causing boot warnings with xhci-hub
* tag 'hardening-v6.5-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
usb: ch9: Replace bmSublinkSpeedAttr 1-element array with flexible array
Revert "fortify: Allow KUnit test to build without FORTIFY"
dm: verity-loadpin: Add NULL pointer check for 'bdev' parameter
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic
Pull asm-generic updates from Arnd Bergmann:
"These are cleanups for architecture specific header files:
- the comments in include/linux/syscalls.h have gone out of sync and
are really pointless, so these get removed
- The asm/bitsperlong.h header no longer needs to be architecture
specific on modern compilers, so use a generic version for newer
architectures that use new enough userspace compilers
- A cleanup for virt_to_pfn/virt_to_bus to have proper type checking,
forcing the use of pointers"
* tag 'asm-generic-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
syscalls: Remove file path comments from headers
tools arch: Remove uapi bitsperlong.h of hexagon and microblaze
asm-generic: Unify uapi bitsperlong.h for arm64, riscv and loongarch
m68k/mm: Make pfn accessors static inlines
arm64: memory: Make virt_to_pfn() a static inline
ARM: mm: Make virt_to_pfn() a static inline
asm-generic/page.h: Make pfn accessors static inlines
xen/netback: Pass (void *) to virt_to_page()
netfs: Pass a pointer to virt_to_page()
cifs: Pass a pointer to virt_to_page() in cifsglob
cifs: Pass a pointer to virt_to_page()
riscv: mm: init: Pass a pointer to virt_to_page()
ARC: init: Pass a pointer to virt_to_pfn() in init
m68k: Pass a pointer to virt_to_pfn() virt_to_page()
fs/proc/kcore.c: Pass a pointer to virt_addr_valid()
|
|
The new qTimestamp attribute was added to UFS 4.0 spec, in order to
synchronize timestamp between device logs and the host. The spec recommends
to send this attribute upon device power-on Reset/HW reset or when
switching to Active state (using SSU command). Due to this attribute, the
attribute's max value was extended to 8 bytes. As a result, the new
definition of struct utp_upiu_query_v4_0 was added.
Signed-off-by: Arthur Simchaev <Arthur.Simchaev@wdc.com>
-----------------
Changes to v2:
- Adressed Bart's comments
- Add missed response variable to ufshcd_set_timestamp_attr
Link: https://lore.kernel.org/r/20230626103320.8737-1-arthur.simchaev@wdc.com
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Since commit 2d47c6956ab3 ("ubsan: Tighten UBSAN_BOUNDS on GCC"),
UBSAN_BOUNDS no longer pretends 1-element arrays are unbounded. Walking
bmSublinkSpeedAttr will trigger a warning, so make it a proper flexible
array. Add a union to keep the struct size identical for userspace in
case anything was depending on the old size.
False positive warning was:
UBSAN: array-index-out-of-bounds in drivers/usb/host/xhci-hub.c:231:31 index 1 is out of range for type '__le32 [1]'
for this line of code:
ssp_cap->bmSublinkSpeedAttr[offset++] = cpu_to_le32(attr);
Reported-by: Borislav Petkov <bp@alien8.de>
Closes: https://lore.kernel.org/lkml/2023062945-fencing-pebble-0411@gregkh/
Reported-by: Mirsad Todorovac <mirsad.todorovac@alu.unizg.hr>
Closes: https://lore.kernel.org/lkml/9a8e34ad-8a8b-3830-4878-3c2c82e69dd9@alu.unizg.hr/
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Gustavo A. R. Silva" <gustavoars@kernel.org>
Tested-by: "Borislav Petkov (AMD)" <bp@alien8.de>
Tested-by: Mirsad Todorovac <mirsad.todorovac@alu.unizg.hr>
Reviewed-by: "Gustavo A. R. Silva" <gustavoars@kernel.org>
Link: https://lore.kernel.org/r/20230629190900.never.787-kees@kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media updates from Mauro Carvalho Chehab:
- Lots of improvement at atomisp driver, which is starting to look in
good shape
- Mediatek vcodec driver has gained support for av1 and hevc stateless
codecs
- New sensor driver: ov01a10
- verisilicon driver has gained AV1 entropy helpers
- tegra-video has gained support for Tegra20 parallel input
- dvb core has gained an extra property to better support DVB-S2X
- as usual, lots of cleanups, fixes and improvements on media drivers
* tag 'media/v6.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (253 commits)
media: wl128x: fix a clang warning
media: dvb: mb86a20s: get rid of a clang-15 warning
media: cec: i2c: ch7322: also select REGMAP
media: add HAS_IOPORT dependencies
media: tc358746: select CONFIG_GENERIC_PHY
media: mediatek: vcodec: Add dbgfs help function
media: mediatek: vcodec: Add encode to support dbgfs
media: mediatek: vcodec: Change dbgfs interface to support encode
media: mediatek: vcodec: Get each instance format type
media: mediatek: vcodec: Get each context resolution information
media: mediatek: vcodec: Add a debugfs file to get different useful information
media: mediatek: vcodec: Add debug params to control different log level
media: mediatek: vcodec: Add debugfs interface to get debug information
media: mediatek: vcodec: support stateless AV1 decoder
media: verisilicon: Conditionally ignore native formats
media: verisilicon: Enable AV1 decoder on rk3588
media: verisilicon: Add film grain feature to AV1 driver
media: verisilicon: Add Rockchip AV1 decoder
media: verisilicon: Add AV1 entropy helpers
media: verisilicon: Compute motion vectors size for AV1 frames
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394
Pull firewire updates from Takashi Sakamoto:
"This consist of three parts; UAPI update, OHCI driver update, and
several bug fixes.
Firstly, the 1394 OHCI specification defines method to retrieve
hardware time stamps for asynchronous communication, which was
previously unavailable in user space. This adds new events to the
UAPI, allowing applications to retrieve the time when asynchronous
packet are received and sent. The new events are tested in the
bleeding edge of libhinawa and look to work well. The new version of
libhinawa will be released after current merge window is closed:
https://git.kernel.org/pub/scm/libs/ieee1394/libhinawa.git/
Secondly, the FireWire stack includes a PCM device driver for 1394
OHCI hardware, This change modernizes the driver by managed resource
(devres) framework.
Lastly, bug fixes for firewire-net and firewire-core"
* tag 'firewire-6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394: (25 commits)
firewire: net: fix use after free in fwnet_finish_incoming_packet()
firewire: core: obsolete usage of GFP_ATOMIC at building node tree
firewire: ohci: release buffer for AR req/resp contexts when managed resource is released
firewire: ohci: use devres for content of configuration ROM
firewire: ohci: use devres for IT, IR, AT/receive, and AT/request contexts
firewire: ohci: use devres for list of isochronous contexts
firewire: ohci: use devres for requested IRQ
firewire: ohci: use devres for misc DMA buffer
firewire: ohci: use devres for MMIO region mapping
firewire: ohci: use devres for PCI-related resources
firewire: ohci: use devres for memory object of ohci structure
firewire: fix warnings to generate UAPI documentation
firewire: fix build failure due to missing module license
firewire: cdev: implement new event relevant to phy packet with time stamp
firewire: cdev: add new event to notify phy packet with time stamp
firewire: cdev: code refactoring to dispatch event for phy packet
firewire: cdev: implement new event to notify response subaction with time stamp
firewire: cdev: add new event to notify response subaction with time stamp
firewire: cdev: code refactoring to operate event of response
firewire: core: implement variations to send request and wait for response with time stamp
...
|
|
Pull more block updates from Jens Axboe:
"Mostly items that came in a bit late for the initial pull request,
wanted to make sure they had the appropriate amount of linux-next soak
before going upstream.
Outside of stragglers, just generic fixes for either merge window
items, or longer standing bugs"
* tag 'block-6.5-2023-07-03' of git://git.kernel.dk/linux: (25 commits)
md/raid0: add discard support for the 'original' layout
nvme: disable controller on reset state failure
nvme: sync timeout work on failed reset
nvme: ensure unquiesce on teardown
cdrom/gdrom: Fix build error
nvme: improved uring polling
block: add request polling helper
nvme-mpath: fix I/O failure with EAGAIN when failing over I/O
nvme: host: fix command name spelling
blk-sysfs: add a new attr_group for blk_mq
blk-iocost: move wbt_enable/disable_default() out of spinlock
blk-wbt: cleanup rwb_enabled() and wbt_disabled()
blk-wbt: remove dead code to handle wbt enable/disable with io inflight
blk-wbt: don't create wbt sysfs entry if CONFIG_BLK_WBT is disabled
blk-mq: fix two misuses on RQF_USE_SCHED
blk-throttle: Fix io statistics for cgroup v1
bcache: Fix bcache device claiming
bcache: Alloc holder object before async registration
raid10: avoid spin_lock from fastpath from raid10_unplug()
md: fix 'delete_mutex' deadlock
...
|
|
Pull virtio updates from Michael Tsirkin:
- resume support in vdpa/solidrun
- structure size optimizations in virtio_pci
- new pds_vdpa driver
- immediate initialization mechanism for vdpa/ifcvf
- interrupt bypass for vdpa/mlx5
- multiple worker support for vhost
- viirtio net in Intel F2000X-PL support for vdpa/ifcvf
- fixes, cleanups all over the place
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (48 commits)
vhost: Make parameter name match of vhost_get_vq_desc()
vduse: fix NULL pointer dereference
vhost: Allow worker switching while work is queueing
vhost_scsi: add support for worker ioctls
vhost: allow userspace to create workers
vhost: replace single worker pointer with xarray
vhost: add helper to parse userspace vring state/file
vhost: remove vhost_work_queue
vhost_scsi: flush IO vqs then send TMF rsp
vhost_scsi: convert to vhost_vq_work_queue
vhost_scsi: make SCSI cmd completion per vq
vhost_sock: convert to vhost_vq_work_queue
vhost: convert poll work to be vq based
vhost: take worker or vq for flushing
vhost: take worker or vq instead of dev for queueing
vhost, vhost_net: add helper to check if vq has work
vhost: add vhost_worker pointer to vhost_virtqueue
vhost: dynamically allocate vhost_worker
vhost: create worker at end of vhost_dev_set_owner
virtio_bt: call scheduler when we free unused buffs
...
|
|
Pull kvm updates from Paolo Bonzini:
"ARM64:
- Eager page splitting optimization for dirty logging, optionally
allowing for a VM to avoid the cost of hugepage splitting in the
stage-2 fault path.
- Arm FF-A proxy for pKVM, allowing a pKVM host to safely interact
with services that live in the Secure world. pKVM intervenes on
FF-A calls to guarantee the host doesn't misuse memory donated to
the hyp or a pKVM guest.
- Support for running the split hypervisor with VHE enabled, known as
'hVHE' mode. This is extremely useful for testing the split
hypervisor on VHE-only systems, and paves the way for new use cases
that depend on having two TTBRs available at EL2.
- Generalized framework for configurable ID registers from userspace.
KVM/arm64 currently prevents arbitrary CPU feature set
configuration from userspace, but the intent is to relax this
limitation and allow userspace to select a feature set consistent
with the CPU.
- Enable the use of Branch Target Identification (FEAT_BTI) in the
hypervisor.
- Use a separate set of pointer authentication keys for the
hypervisor when running in protected mode, as the host is untrusted
at runtime.
- Ensure timer IRQs are consistently released in the init failure
paths.
- Avoid trapping CTR_EL0 on systems with Enhanced Virtualization
Traps (FEAT_EVT), as it is a register commonly read from userspace.
- Erratum workaround for the upcoming AmpereOne part, which has
broken hardware A/D state management.
RISC-V:
- Redirect AMO load/store misaligned traps to KVM guest
- Trap-n-emulate AIA in-kernel irqchip for KVM guest
- Svnapot support for KVM Guest
s390:
- New uvdevice secret API
- CMM selftest and fixes
- fix racy access to target CPU for diag 9c
x86:
- Fix missing/incorrect #GP checks on ENCLS
- Use standard mmu_notifier hooks for handling APIC access page
- Drop now unnecessary TR/TSS load after VM-Exit on AMD
- Print more descriptive information about the status of SEV and
SEV-ES during module load
- Add a test for splitting and reconstituting hugepages during and
after dirty logging
- Add support for CPU pinning in demand paging test
- Add support for AMD PerfMonV2, with a variety of cleanups and minor
fixes included along the way
- Add a "nx_huge_pages=never" option to effectively avoid creating NX
hugepage recovery threads (because nx_huge_pages=off can be toggled
at runtime)
- Move handling of PAT out of MTRR code and dedup SVM+VMX code
- Fix output of PIC poll command emulation when there's an interrupt
- Add a maintainer's handbook to document KVM x86 processes,
preferred coding style, testing expectations, etc.
- Misc cleanups, fixes and comments
Generic:
- Miscellaneous bugfixes and cleanups
Selftests:
- Generate dependency files so that partial rebuilds work as
expected"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (153 commits)
Documentation/process: Add a maintainer handbook for KVM x86
Documentation/process: Add a label for the tip tree handbook's coding style
KVM: arm64: Fix misuse of KVM_ARM_VCPU_POWER_OFF bit index
RISC-V: KVM: Remove unneeded semicolon
RISC-V: KVM: Allow Svnapot extension for Guest/VM
riscv: kvm: define vcpu_sbi_ext_pmu in header
RISC-V: KVM: Expose IMSIC registers as attributes of AIA irqchip
RISC-V: KVM: Add in-kernel virtualization of AIA IMSIC
RISC-V: KVM: Expose APLIC registers as attributes of AIA irqchip
RISC-V: KVM: Add in-kernel emulation of AIA APLIC
RISC-V: KVM: Implement device interface for AIA irqchip
RISC-V: KVM: Skeletal in-kernel AIA irqchip support
RISC-V: KVM: Set kvm_riscv_aia_nr_hgei to zero
RISC-V: KVM: Add APLIC related defines
RISC-V: KVM: Add IMSIC related defines
RISC-V: KVM: Implement guest external interrupt line management
KVM: x86: Remove PRIx* definitions as they are solely for user space
s390/uv: Update query for secret-UVCs
s390/uv: replace scnprintf with sysfs_emit
s390/uvdevice: Add 'Lock Secret Store' UVC
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB / Thunderbolt driver updates from Greg KH:
"Here is the big set of USB and Thunderbolt driver updates for 6.5-rc1.
Included in here are:
- Lots of USB4/Thunderbolt additions and updates for new hardware
types and fixes as people are starting to get access to the
hardware in the wild
- new gadget controller driver, cdns2, added
- new typec drivers added
- xhci driver updates
- typec driver updates
- usbip driver fixes
- usb-serial driver updates and fixes
- lots of smaller USB driver updates
All of these have been in linux-next for a while with no reported
problems"
* tag 'usb-6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (265 commits)
usb: host: xhci-plat: Set XHCI_STATE_REMOVING before resuming XHCI HC
usb: host: xhci: Do not re-initialize the XHCI HC if being removed
usb: typec: nb7vpq904m: fix CONFIG_DRM dependency
usbip: usbip_host: Replace strlcpy with strscpy
usb: dwc3: gadget: Propagate core init errors to UDC during pullup
USB: serial: option: add LARA-R6 01B PIDs
usb: ulpi: Make container_of() no-op in to_ulpi_dev()
usb: gadget: legacy: fix error return code in gfs_bind
usb: typec: fsa4480: add support for Audio Accessory Mode
usb: typec: fsa4480: rework mux & switch setup to handle more states
usb: typec: ucsi: call typec_set_mode on non-altmode partner change
USB: gadget: f_hid: make hidg_class a static const structure
USB: gadget: f_printer: make usb_gadget_class a static const structure
USB: mon: make mon_bin_class a static const structure
USB: gadget: udc: core: make udc_class a static const structure
USB: roles: make role_class a static const structure
dt-bindings: usb: dwc3: Add interrupt-names property support for wakeup interrupt
dt-bindings: usb: Add StarFive JH7110 USB controller
dt-bindings: usb: dwc3: Add IPQ9574 compatible
usb: cdns2: Fix spelling mistake in a trace message "Wakupe" -> "Wakeup"
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull Char/Misc updates from Greg KH:
"Here is the big set of char/misc and other driver subsystem updates
for 6.5-rc1.
Lots of different, tiny, stuff in here, from a range of smaller driver
subsystems, including pulls from some substems directly:
- IIO driver updates and additions
- W1 driver updates and fixes (and a new maintainer!)
- FPGA driver updates and fixes
- Counter driver updates
- Extcon driver updates
- Interconnect driver updates
- Coresight driver updates
- mfd tree tag merge needed for other updates on top of that, lots of
small driver updates as patches, including:
- static const updates for class structures
- nvmem driver updates
- pcmcia driver fix
- lots of other small driver updates and fixes
All of these have been in linux-next for a while with no reported
problems"
* tag 'char-misc-6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (243 commits)
bsr: fix build problem with bsr_class static cleanup
comedi: make all 'class' structures const
char: xillybus: make xillybus_class a static const structure
xilinx_hwicap: make icap_class a static const structure
virtio_console: make port class a static const structure
ppdev: make ppdev_class a static const structure
char: misc: make misc_class a static const structure
/dev/mem: make mem_class a static const structure
char: lp: make lp_class a static const structure
dsp56k: make dsp56k_class a static const structure
bsr: make bsr_class a static const structure
oradax: make 'cl' a static const structure
hwtracing: hisi_ptt: Fix potential sleep in atomic context
hwtracing: hisi_ptt: Advertise PERF_PMU_CAP_NO_EXCLUDE for PTT PMU
hwtracing: hisi_ptt: Export available filters through sysfs
hwtracing: hisi_ptt: Add support for dynamically updating the filter list
hwtracing: hisi_ptt: Factor out filter allocation and release operation
samples: pfsm: add CC_CAN_LINK dependency
misc: fastrpc: check return value of devm_kasprintf()
coresight: dummy: Update type of mode parameter in dummy_{sink,source}_enable()
...
|
|
This patch drops the requirement that we can only switch workers if work
has not been queued by using RCU for the vq based queueing paths and a
mutex for the device wide flush.
We can also use this to support SIGKILL properly in the future where we
should exit almost immediately after getting that signal. With this
patch, when get_signal returns true, we can set the vq->worker to NULL
and do a synchronize_rcu to prevent new work from being queued to the
vhost_task that has been killed.
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Message-Id: <20230626232307.97930-18-michael.christie@oracle.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
For vhost-scsi with 3 vqs or more and a workload that tries to use
them in parallel like:
fio --filename=/dev/sdb --direct=1 --rw=randrw --bs=4k \
--ioengine=libaio --iodepth=128 --numjobs=3
the single vhost worker thread will become a bottlneck and we are stuck
at around 500K IOPs no matter how many jobs, virtqueues, and CPUs are
used.
To better utilize virtqueues and available CPUs, this patch allows
userspace to create workers and bind them to vqs. You can have N workers
per dev and also share N workers with M vqs on that dev.
This patch adds the interface related code and the next patch will hook
vhost-scsi into it. The patches do not try to hook net and vsock into
the interface because:
1. multiple workers don't seem to help vsock. The problem is that with
only 2 virtqueues we never fully use the existing worker when doing
bidirectional tests. This seems to match vhost-scsi where we don't see
the worker as a bottleneck until 3 virtqueues are used.
2. net already has a way to use multiple workers.
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Message-Id: <20230626232307.97930-16-michael.christie@oracle.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux
Pull flexible-array update from Gustavo Silva:
"Transform a zero-length array into a C99 flexible-array member.
This addresses a build failure with Clang by fixing multiple
'-Warray-bounds' warnings in drivers/staging/ks7010/ks_wlan_net.c"
* tag 'flex-array-transformations-6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux:
uapi: wireless: Replace zero-length array with flexible-array member
|
|
Common KVM changes for 6.5:
- Fix unprotected vcpu->pid dereference via debugfs
- Fix KVM_BUG() and KVM_BUG_ON() macros with 64-bit conditionals
- Refactor failure path in kvm_io_bus_unregister_dev() to simplify the code
- Misc cleanups
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 updates for 6.5
- Eager page splitting optimization for dirty logging, optionally
allowing for a VM to avoid the cost of block splitting in the stage-2
fault path.
- Arm FF-A proxy for pKVM, allowing a pKVM host to safely interact with
services that live in the Secure world. pKVM intervenes on FF-A calls
to guarantee the host doesn't misuse memory donated to the hyp or a
pKVM guest.
- Support for running the split hypervisor with VHE enabled, known as
'hVHE' mode. This is extremely useful for testing the split
hypervisor on VHE-only systems, and paves the way for new use cases
that depend on having two TTBRs available at EL2.
- Generalized framework for configurable ID registers from userspace.
KVM/arm64 currently prevents arbitrary CPU feature set configuration
from userspace, but the intent is to relax this limitation and allow
userspace to select a feature set consistent with the CPU.
- Enable the use of Branch Target Identification (FEAT_BTI) in the
hypervisor.
- Use a separate set of pointer authentication keys for the hypervisor
when running in protected mode, as the host is untrusted at runtime.
- Ensure timer IRQs are consistently released in the init failure
paths.
- Avoid trapping CTR_EL0 on systems with Enhanced Virtualization Traps
(FEAT_EVT), as it is a register commonly read from userspace.
- Erratum workaround for the upcoming AmpereOne part, which has broken
hardware A/D state management.
As a consequence of the hVHE series reworking the arm64 software
features framework, the for-next/module-alloc branch from the arm64 tree
comes along for the ride.
|
|
KVM/riscv changes for 6.5
- Redirect AMO load/store misaligned traps to KVM guest
- Trap-n-emulate AIA in-kernel irqchip for KVM guest
- Svnapot support for KVM Guest
|
|
Pull VFIO updates from Alex Williamson:
- Adjust log levels for common messages (Oleksandr Natalenko, Alex
Williamson)
- Support for dynamic MSI-X allocation (Reinette Chatre)
- Enable and report PCIe AtomicOp Completer capabilities (Alex
Williamson)
- Cleanup Kconfigs for vfio bus drivers (Alex Williamson)
- Add support for CDX bus based devices (Nipun Gupta)
- Fix race with concurrent mdev initialization (Eric Farman)
* tag 'vfio-v6.5-rc1' of https://github.com/awilliam/linux-vfio:
vfio/mdev: Move the compat_class initialization to module init
vfio/cdx: add support for CDX bus
vfio/fsl: Create Kconfig sub-menu
vfio/platform: Cleanup Kconfig
vfio/pci: Cleanup Kconfig
vfio/pci-core: Add capability for AtomicOp completer support
vfio/pci: Also demote hiding standard cap messages
vfio/pci: Clear VFIO_IRQ_INFO_NORESIZE for MSI-X
vfio/pci: Support dynamic MSI-X
vfio/pci: Probe and store ability to support dynamic MSI-X
vfio/pci: Use bitfield for struct vfio_pci_core_device flags
vfio/pci: Update stale comment
vfio/pci: Remove interrupt context counter
vfio/pci: Use xarray for interrupt context storage
vfio/pci: Move to single error path
vfio/pci: Prepare for dynamic interrupt context storage
vfio/pci: Remove negative check on unsigned vector
vfio/pci: Consolidate irq cleanup on MSI/MSI-X disable
vfio/pci: demote hiding ecap messages to debug level
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Pull pci updates from Bjorn Helgaas:
"Enumeration:
- Export pcie_retrain_link() for use outside ASPM
- Add Data Link Layer Link Active Reporting as another way for
pcie_retrain_link() to determine the link is up
- Work around link training failures (especially on the ASMedia
ASM2824 switch) by training first at 2.5GT/s and then attempting
higher rates
Resource management:
- When we coalesce host bridge windows, remove invalidated resources
from the resource tree so future allocations work correctly
Hotplug:
- Cancel bringup sequence if card is not present, to keep from
blinking Power Indicator indefinitely
- Reassign bridge resources if necessary for ACPI hotplug
Driver binding:
- Convert platform_device .remove() callbacks to return void instead
of a mostly useless int
Power management:
- Reduce wait time for secondary bus to be ready to speed up resume
- Avoid putting EloPOS E2/S2/H2 (as well as Elo i2) PCIe Ports in
D3cold
- Call _REG when transitioning D-states so AML that uses the PCI
config space OpRegion works, which fixes some ASMedia GPIO
controllers after resume
Virtualization:
- Delay extra 250ms after FLR of Solidigm P44 Pro NVMe to avoid KVM
hang when guest is rebooted
- Add function 1 DMA alias quirk for Marvell 88SE9235
Error handling:
- Unexport pci_save_aer_state() since it's only used in drivers/pci/
- Drop recommendation for drivers to configure AER Capability, since
the PCI core does this for all devices
ASPM:
- Disable ASPM on MFD function removal to avoid use-after-free
- Tighten up pci_enable_link_state() and pci_disable_link_state()
interfaces so they don't enable/disable states the driver didn't
specify
- Avoid link retraining race that can happen if ASPM sets link
control parameters while the link is in the midst of training for
some other reason
Endpoint framework:
- Change "PCI Endpoint Virtual NTB driver" Kconfig prompt to be
different from "PCI Endpoint NTB driver"
- Automatically create a function specific attributes group for
endpoint drivers to avoid reference counting issues
- Fix many EPC test issues
- Return pci_epf_type_add_cfs() error if EPF has no driver
- Add kernel-doc for pci_epc_raise_irq() and pci_epc_map_msi_irq()
MSI vector parameters
- Pass EPF device ID to driver probe functions
- Return -EALREADY if EPC has already been started/stopped
- Add linkdown notifier support and use it in qcom-ep
- Add Bus Master Enable event support and use it in qcom-ep
- Add Qualcomm Modem Host Interface (MHI) endpoint driver
- Add Layerscape PME interrupt handling to manage link-up
notification
Cadence PCIe controller driver:
- Wait for link retrain to complete when working around the J721E
i2085 erratum with Gen2 mode
Faraday FTPC100 PCI controller driver:
- Release clock resources on error paths
Freescale i.MX6 PCIe controller driver:
- Save and restore Root Port MSI control to work around hardware defect
Intel VMD host bridge driver:
- Reset VMD config register between soft reboots
- Capture pci_reset_bus() return value instead of printing junk when
it fails
Qualcomm PCIe controller driver:
- Add SDX65 endpoint compatible string to DT binding
- Disable register write access after init for IP v2.3.3, v2.9.0
- Use DWC helpers for enabling/disabling writes to DBI registers
- Hide slot hotplug capability for IP v1.0.0, v1.9.0, v2.1.0, v2.3.2,
v2.3.3, v2.7.0, v2.9.0
- Reuse v2.3.2 post-init sequence for v2.4.0
Renesas R-Car PCIe controller driver:
- Remove unused static pcie_base and pcie_dev
Rockchip PCIe controller driver:
- Remove writes to unused registers
- Write endpoint Device ID using correct register
- Assert PCI Configuration Enable bit after probe so endpoint
responds instead of generating Request Retry Status messages
- Poll waiting for PHY PLLs to lock
- Update RK3399 example DT binding to be valid
- Use RK3399 PCIE_CLIENT_LEGACY_INT_CTRL to generate INTx instead of
manually generating PCIe message
- Use multiple windows to avoid address translation conflicts
- Use u32 (not u16) when accessing 32-bit registers
- Hide MSI-X Capability, since RK3399 can't generate MSI-X
- Set endpoint controller required alignment to 256
Synopsys DesignWare PCIe controller driver:
- Wait for link to come up only if we've initiated link training
Miscellaneous:
- Add pci_clear_master() stub for non-CONFIG_PCI"
* tag 'pci-v6.5-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (116 commits)
Documentation: PCI: correct spelling
PCI: vmd: Fix uninitialized variable usage in vmd_enable_domain()
PCI: xgene-msi: Convert to platform remove callback returning void
PCI: tegra: Convert to platform remove callback returning void
PCI: rockchip-host: Convert to platform remove callback returning void
PCI: mvebu: Convert to platform remove callback returning void
PCI: mt7621: Convert to platform remove callback returning void
PCI: mediatek-gen3: Convert to platform remove callback returning void
PCI: mediatek: Convert to platform remove callback returning void
PCI: iproc: Convert to platform remove callback returning void
PCI: hisi-error: Convert to platform remove callback returning void
PCI: dwc: Convert to platform remove callback returning void
PCI: j721e: Convert to platform remove callback returning void
PCI: brcmstb: Convert to platform remove callback returning void
PCI: altera-msi: Convert to platform remove callback returning void
PCI: altera: Convert to platform remove callback returning void
PCI: aardvark: Convert to platform remove callback returning void
PCI: rcar: Use correct product family name for Renesas R-Car
PCI: layerscape: Add the endpoint linkup notifier support
PCI: endpoint: pci-epf-vntb: Fix typo in comments
...
|
|
Pull SCSI updates from James Bottomley:
"Updates to the usual drivers (ufs, pm80xx, libata-scsi, smartpqi,
lpfc, qla2xxx).
We have a couple of major core changes impacting other systems:
- Command Duration Limits, which spills into block and ATA
- block level Persistent Reservation Operations, which touches block,
nvme, target and dm
Both of these are added with merge commits containing a cover letter
explaining what's going on"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (187 commits)
scsi: core: Improve warning message in scsi_device_block()
scsi: core: Replace scsi_target_block() with scsi_block_targets()
scsi: core: Don't wait for quiesce in scsi_device_block()
scsi: core: Don't wait for quiesce in scsi_stop_queue()
scsi: core: Merge scsi_internal_device_block() and device_block()
scsi: sg: Increase number of devices
scsi: bsg: Increase number of devices
scsi: qla2xxx: Remove unused nvme_ls_waitq wait queue
scsi: ufs: ufs-pci: Add support for Intel Arrow Lake
scsi: sd: sd_zbc: Use PAGE_SECTORS_SHIFT
scsi: ufs: wb: Add explicit flush_threshold sysfs attribute
scsi: ufs: ufs-qcom: Switch to the new ICE API
scsi: ufs: dt-bindings: qcom: Add ICE phandle
scsi: ufs: ufs-mediatek: Set UFSHCD_QUIRK_MCQ_BROKEN_RTC quirk
scsi: ufs: ufs-mediatek: Set UFSHCD_QUIRK_MCQ_BROKEN_INTR quirk
scsi: ufs: core: Add host quirk UFSHCD_QUIRK_MCQ_BROKEN_RTC
scsi: ufs: core: Add host quirk UFSHCD_QUIRK_MCQ_BROKEN_INTR
scsi: ufs: core: Remove dedicated hwq for dev command
scsi: ufs: core: mcq: Fix the incorrect OCS value for the device command
scsi: ufs: dt-bindings: samsung,exynos: Drop unneeded quotes
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V updates from Palmer Dabbelt:
- Support for ACPI
- Various cleanups to the ISA string parsing, including making them
case-insensitive
- Support for the vector extension
- Support for independent irq/softirq stacks
- Our CPU DT binding now has "unevaluatedProperties: false"
* tag 'riscv-for-linus-6.5-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (78 commits)
riscv: hibernate: remove WARN_ON in save_processor_state
dt-bindings: riscv: cpus: switch to unevaluatedProperties: false
dt-bindings: riscv: cpus: add a ref the common cpu schema
riscv: stack: Add config of thread stack size
riscv: stack: Support HAVE_SOFTIRQ_ON_OWN_STACK
riscv: stack: Support HAVE_IRQ_EXIT_ON_IRQ_STACK
RISC-V: always report presence of extensions formerly part of the base ISA
dt-bindings: riscv: explicitly mention assumption of Zicntr & Zihpm support
RISC-V: remove decrement/increment dance in ISA string parser
RISC-V: rework comments in ISA string parser
RISC-V: validate riscv,isa at boot, not during ISA string parsing
RISC-V: split early & late of_node to hartid mapping
RISC-V: simplify register width check in ISA string parsing
perf: RISC-V: Limit the number of counters returned from SBI
riscv: replace deprecated scall with ecall
riscv: uprobes: Restore thread.bad_cause
riscv: mm: try VMA lock-based page fault handling first
riscv: mm: Pre-allocate PGD entries for vmalloc/modules area
RISC-V: hwprobe: Expose Zba, Zbb, and Zbs
RISC-V: Track ISA extensions per hart
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
- Extend KCSAN support to 32-bit and BookE. Add some KCSAN annotations
- Make ELFv2 ABI the default for 64-bit big-endian kernel builds, and
use the -mprofile-kernel option (kernel specific ftrace ABI) for big
endian ELFv2 kernels
- Add initial Dynamic Execution Control Register (DEXCR) support, and
allow the ROP protection instructions to be used on Power 10
- Various other small features and fixes
Thanks to Aditya Gupta, Aneesh Kumar K.V, Benjamin Gray, Brian King,
Christophe Leroy, Colin Ian King, Dmitry Torokhov, Gaurav Batra, Jean
Delvare, Joel Stanley, Marco Elver, Masahiro Yamada, Nageswara R Sastry,
Nathan Chancellor, Naveen N Rao, Nayna Jain, Nicholas Piggin, Paul
Gortmaker, Randy Dunlap, Rob Herring, Rohan McLure, Russell Currey,
Sachin Sant, Timothy Pearson, Tom Rix, and Uwe Kleine-König.
* tag 'powerpc-6.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (76 commits)
powerpc: remove checks for binutils older than 2.25
powerpc: Fail build if using recordmcount with binutils v2.37
powerpc/iommu: TCEs are incorrectly manipulated with DLPAR add/remove of memory
powerpc/iommu: Only build sPAPR access functions on pSeries
powerpc: powernv: Annotate data races in opal events
powerpc: Mark writes registering ipi to host cpu through kvm and polling
powerpc: Annotate accesses to ipi message flags
powerpc: powernv: Fix KCSAN datarace warnings on idle_state contention
powerpc: Mark [h]ssr_valid accesses in check_return_regs_valid
powerpc: qspinlock: Enforce qnode writes prior to publishing to queue
powerpc: qspinlock: Mark accesses to qnode lock checks
powerpc/powernv/pci: Remove last IODA1 defines
powerpc/powernv/pci: Remove MVE code
powerpc/powernv/pci: Remove ioda1 support
powerpc: 52xx: Make immr_id DT match tables static
powerpc: mpc512x: Remove open coded "ranges" parsing
powerpc: fsl_soc: Use of_range_to_resource() for "ranges" parsing
powerpc: fsl: Use of_property_read_reg() to parse "reg"
powerpc: fsl_rio: Use of_range_to_resource() for "ranges" parsing
macintosh: Use of_property_read_reg() to parse "reg"
...
|
|
Pull rdma updates from Jason Gunthorpe:
"This cycle saw a focus on rxe and bnxt_re drivers:
- Code cleanups for irdma, rxe, rtrs, hns, vmw_pvrdma
- rxe uses workqueues instead of tasklets
- rxe has better compliance around access checks for MRs and rereg_mr
- mana supportst he 'v2' FW interface for RX coalescing
- hfi1 bug fix for stale cache entries in its MR cache
- mlx5 buf fix to handle FW failures when destroying QPs
- erdma HW has a new doorbell allocation mechanism for uverbs that is
secure
- Lots of small cleanups and rework in bnxt_re:
- Use the common mmap functions
- Support disassociation
- Improve FW command flow
- support for 'low latency push'"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (71 commits)
RDMA/bnxt_re: Fix an IS_ERR() vs NULL check
RDMA/bnxt_re: Fix spelling mistake "priviledged" -> "privileged"
RDMA/bnxt_re: Remove duplicated include in bnxt_re/main.c
RDMA/bnxt_re: Refactor code around bnxt_qplib_map_rc()
RDMA/bnxt_re: Remove incorrect return check from slow path
RDMA/bnxt_re: Enable low latency push
RDMA/bnxt_re: Reorg the bar mapping
RDMA/bnxt_re: Move the interface version to chip context structure
RDMA/bnxt_re: Query function capabilities from firmware
RDMA/bnxt_re: Optimize the bnxt_re_init_hwrm_hdr usage
RDMA/bnxt_re: Add disassociate ucontext support
RDMA/bnxt_re: Use the common mmap helper functions
RDMA/bnxt_re: Initialize opcode while sending message
RDMA/cma: Remove NULL check before dev_{put, hold}
RDMA/rxe: Simplify cq->notify code
RDMA/rxe: Fixes mr access supported list
RDMA/bnxt_re: optimize the parameters passed to helper functions
RDMA/bnxt_re: remove redundant cmdq_bitmap
RDMA/bnxt_re: use firmware provided max request timeout
RDMA/bnxt_re: cancel all control path command waiters upon error
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull fsnotify updates from Jan Kara:
- Support for fanotify events returning file handles for filesystems
not exportable via NFS
- Improved error handling exportfs functions
- Add missing FS_OPEN events when unusual open helpers are used
* tag 'fsnotify_for_v6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
fsnotify: move fsnotify_open() hook into do_dentry_open()
exportfs: check for error return value from exportfs_encode_*()
fanotify: support reporting non-decodeable file handles
exportfs: allow exporting non-decodeable file handles to userspace
exportfs: add explicit flag to request non-decodeable file handles
exportfs: change connectable argument to bit flags
|
|
Pull drm updates from Dave Airlie:
"There is one set of patches to misc for a i915 gsc/mei proxy driver.
Otherwise it's mostly amdgpu/i915/msm, lots of hw enablement and lots
of refactoring.
core:
- replace strlcpy with strscpy
- EDID changes to support further conversion to struct drm_edid
- Move i915 DSC parameter code to common DRM helpers
- Add Colorspace functionality
aperture:
- ignore framebuffers with non-primary devices
fbdev:
- use fbdev i/o helpers
- add Kconfig options for fb_ops helpers
- use new fb io helpers directly in drivers
sysfs:
- export DRM connector ID
scheduler:
- Avoid an infinite loop
ttm:
- store function table in .rodata
- Add query for TTM mem limit
- Add NUMA awareness to pools
- Export ttm_pool_fini()
bridge:
- fsl-ldb: support i.MX6SX
- lt9211, lt9611: remove blanking packets
- tc358768: implement input bus formats, devm cleanups
- ti-snd65dsi86: implement wait_hpd_asserted
- analogix: fix endless probe loop
- samsung-dsim: support swapped clock, fix enabling, support var
clock
- display-connector: Add support for external power supply
- imx: Fix module linking
- tc358762: Support reset GPIO
panel:
- nt36523: Support Lenovo J606F
- st7703: Support Anbernic RG353V-V2
- InnoLux G070ACE-L01 support
- boe-tv101wum-nl6: Improve initialization
- sharp-ls043t1le001: Mode fixes
- simple: BOE EV121WXM-N10-1850, S6D7AA0
- Ampire AM-800480L1TMQW-T00H
- Rocktech RK043FN48H
- Starry himax83102-j02
- Starry ili9882t
amdgpu:
- add new ctx query flag to handle reset better
- add new query/set shadow buffer for rdna3
- DCN 3.2/3.1.x/3.0.x updates
- Enable DC_FP on loongarch
- PCIe fix for RDNA2
- improve DC FAMS/SubVP support for better power management
- partition support for lots of engines
- Take NUMA into account when allocating memory
- Add new DRM_AMDGPU_WERROR config parameter to help with CI
- Initial SMU13 overdrive support
- Add support for new colorspace KMS API
- W=1 fixes
amdkfd:
- Query TTM mem limit rather than hardcoding it
- GC 9.4.3 partition support
- Handle NUMA for partitions
- Add debugger interface for enabling gdb
- Add KFD event age tracking
radeon:
- Fix possible UAF
i915:
- new getparam for PXP support
- GSC/MEI proxy driver
- Meteorlake display enablement
- avoid clearing preallocated framebuffers with TTM
- implement framebuffer mmap support
- Disable sampler indirect state in bindless heap
- Enable fdinfo for GuC backends
- GuC loading and firmware table handling fixes
- Various refactors for multi-tile enablement
- Define MOCS and PAT tables for MTL
- GSC/MEI support for Meteorlake
- PMU multi-tile support
- Large driver kernel doc cleanup
- Allow VRR toggling and arbitrary refresh rates
- Support async flips on linear buffers on display ver 12+
- Expose CRTC CTM property on ILK/SNB/VLV
- New debugfs for display clock frequencies
- Hotplug refactoring
- Display refactoring
- I915_GEM_CREATE_EXT_SET_PAT for Mesa on Meteorlake
- Use large rings for compute contexts
- HuC loading for MTL
- Allow user to set cache at BO creation
- MTL powermanagement enhancements
- Switch to dedicated workqueues to stop using flush_scheduled_work()
- Move display runtime init under display/
- Remove 10bit gamma on desktop gen3 parts, they don't support it
habanalabs:
- uapi: return 0 for user queries if there was a h/w or f/w error
- Add pci health check when we lose connection with the firmware.
This can be used to distinguish between pci link down and firmware
getting stuck.
- Add more info to the error print when TPC interrupt occur.
- Firmware fixes
msm:
- Adreno A660 bindings
- SM8350 MDSS bindings fix
- Added support for DPU on sm6350 and sm6375 platforms
- Implemented tearcheck support to support vsync on SM150 and newer
platforms
- Enabled missing features (DSPP, DSC, split display) on sc8180x,
sc8280xp, sm8450
- Added support for DSI and 28nm DSI PHY on MSM8226 platform
- Added support for DSI on sm6350 and sm6375 platforms
- Added support for display controller on MSM8226 platform
- A690 GPU support
- Move cmdstream dumping out of fence signaling path
- a610 support
- Support for a6xx devices without GMU
nouveau:
- NULL ptr before deref fixes
armada:
- implement fbdev emulation as client
sun4i:
- fix mipi-dsi dotclock
- release clocks
vc4:
- rgb range toggle property
- BT601 / BT2020 HDMI support
vkms:
- convert to drmm helpers
- add reflection and rotation support
- fix rgb565 conversion
gma500:
- fix iomem access
shmobile:
- support renesas soc platform
- enable fbdev
mxsfb:
- Add support for i.MX93 LCDIF
stm:
- dsi: Use devm_ helper
- ltdc: Fix potential invalid pointer deref
renesas:
- Group drivers in renesas subdirectory to prepare for new platform
- Drop deprecated R-Car H3 ES1.x support
meson:
- Add support for MIPI DSI displays
virtio:
- add sync object support
mediatek:
- Add display binding document for MT6795"
* tag 'drm-next-2023-06-29' of git://anongit.freedesktop.org/drm/drm: (1791 commits)
drm/i915: Fix a NULL vs IS_ERR() bug
drm/i915: make i915_drm_client_fdinfo() reference conditional again
drm/i915/huc: Fix missing error code in intel_huc_init()
drm/i915/gsc: take a wakeref for the proxy-init-completion check
drm/msm/a6xx: Add A610 speedbin support
drm/msm/a6xx: Add A619_holi speedbin support
drm/msm/a6xx: Use adreno_is_aXYZ macros in speedbin matching
drm/msm/a6xx: Use "else if" in GPU speedbin rev matching
drm/msm/a6xx: Fix some A619 tunables
drm/msm/a6xx: Add A610 support
drm/msm/a6xx: Add support for A619_holi
drm/msm/adreno: Disable has_cached_coherent in GMU wrapper configurations
drm/msm/a6xx: Introduce GMU wrapper support
drm/msm/a6xx: Move CX GMU power counter enablement to hw_init
drm/msm/a6xx: Extend and explain UBWC config
drm/msm/a6xx: Remove both GBIF and RBBM GBIF halt on hw init
drm/msm/a6xx: Add a helper for software-resetting the GPU
drm/msm/a6xx: Improve a6xx_bus_clear_pending_transactions()
drm/msm/a6xx: Move a6xx_bus_clear_pending_transactions to a6xx_gpu
drm/msm/a6xx: Move force keepalive vote removal to a6xx_gmu_force_off()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound updates from Takashi Iwai:
"Lots of changes as usual, but the only significant stuff in ALSA core
part is the MIDI 2.0 support, while ASoC core kept receiving the code
refactoring. The majority of changes are seen rather in device
drivers, and quite a few new drivers can be found there.
Here we go, some highlights:
ALSA and ASoC Core:
- Support of MIDI 2.0 devices: rawmidi and sequencer API have been
extended for the support of the new UMP (Universal MIDI Packet)
protocol, USB audio driver got the USB MIDI 2.0 interface support
- Continued refactoring around ASoC DAI links and the ordering of
trigger callbacks
- PCM ABI extension for better drain support
ASoC Drivers:
- Conversions of many drivers to use maple tree based caches
- Everlasting improvement works on ASoC Intel drivers
- Compressed audio support for Qualcomm
- Support for AMD SoundWire, Analog Devices SSM3515, Google
Chameleon, Ingenic X1000, Intel systems with various CODECs,
Loongson platforms, Maxim MAX98388, Mediatek MT8188, Nuvoton
NAU8825C, NXP platforms with NAU8822, Qualcomm WSA884x, StarFive
JH7110, Texas Instruments TAS2781
HD-audio:
- Quirks for HP and ASUS machines
- CS35L41 HD-audio codec fixes
- Loongson HD-audio support
Misc:
- A new virtual PCM test driver for kselftests
- Continued refactoring and improvements on the legacy emu10k1
driver"
* tag 'sound-6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (556 commits)
ALSA: hda/realtek: Enable mute/micmute LEDs and limit mic boost on EliteBook
ASoC: hdmi-codec: fix channel info for compressed formats
ALSA: pcm: fix ELD constraints for (E)AC3, DTS(-HD) and MLP formats
ASoC: core: Always store of_node when getting DAI link component
ASoC: tas2781: Fix error code in tas2781_load_calibration()
ASoC: amd: update pm_runtime enable sequence
ALSA: ump: Export MIDI1 / UMP conversion helpers
ASoC: tas2781: fix Kconfig dependencies
ASoC: amd: acp: remove acp poweroff function
ASoC: amd: acp: clear pdm dma interrupt mask
ASoC: codecs: max98090: Allow dsp_a mode
ASoC: qcom: common: add default jack dapm pins
ASoC: loongson: fix address space confusion
ASoC: dt-bindings: microchip,sama7g5-pdmc: Simplify "microchip,mic-pos" constraints
ASoC: tegra: Remove stale comments in AHUB
ASoC: tegra: Use normal system sleep for ASRC
ALSA: hda/realtek: Add quirks for ROG ALLY CS35l41 audio
ASoC: fsl-asoc-card: Allow passing the number of slots in use
ASoC: codecs: wsa884x: Add WSA884x family of speakers
ASoC: dt-bindings: qcom,wsa8840: Add WSA884x family of speakers
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking changes from Jakub Kicinski:
"WiFi 7 and sendpage changes are the biggest pieces of work for this
release. The latter will definitely require fixes but I think that we
got it to a reasonable point.
Core:
- Rework the sendpage & splice implementations
Instead of feeding data into sockets page by page extend sendmsg
handlers to support taking a reference on the data, controlled by a
new flag called MSG_SPLICE_PAGES
Rework the handling of unexpected-end-of-file to invoke an
additional callback instead of trying to predict what the right
combination of MORE/NOTLAST flags is
Remove the MSG_SENDPAGE_NOTLAST flag completely
- Implement SCM_PIDFD, a new type of CMSG type analogous to
SCM_CREDENTIALS, but it contains pidfd instead of plain pid
- Enable socket busy polling with CONFIG_RT
- Improve reliability and efficiency of reporting for ref_tracker
- Auto-generate a user space C library for various Netlink families
Protocols:
- Allow TCP to shrink the advertised window when necessary, prevent
sk_rcvbuf auto-tuning from growing the window all the way up to
tcp_rmem[2]
- Use per-VMA locking for "page-flipping" TCP receive zerocopy
- Prepare TCP for device-to-device data transfers, by making sure
that payloads are always attached to skbs as page frags
- Make the backoff time for the first N TCP SYN retransmissions
linear. Exponential backoff is unnecessarily conservative
- Create a new MPTCP getsockopt to retrieve all info
(MPTCP_FULL_INFO)
- Avoid waking up applications using TLS sockets until we have a full
record
- Allow using kernel memory for protocol ioctl callbacks, paving the
way to issuing ioctls over io_uring
- Add nolocalbypass option to VxLAN, forcing packets to be fully
encapsulated even if they are destined for a local IP address
- Make TCPv4 use consistent hash in TIME_WAIT and SYN_RECV. Ensure
in-kernel ECMP implementation (e.g. Open vSwitch) select the same
link for all packets. Support L4 symmetric hashing in Open vSwitch
- PPPoE: make number of hash bits configurable
- Allow DNS to be overwritten by DHCPACK in the in-kernel DHCP client
(ipconfig)
- Add layer 2 miss indication and filtering, allowing higher layers
(e.g. ACL filters) to make forwarding decisions based on whether
packet matched forwarding state in lower devices (bridge)
- Support matching on Connectivity Fault Management (CFM) packets
- Hide the "link becomes ready" IPv6 messages by demoting their
printk level to debug
- HSR: don't enable promiscuous mode if device offloads the proto
- Support active scanning in IEEE 802.15.4
- Continue work on Multi-Link Operation for WiFi 7
BPF:
- Add precision propagation for subprogs and callbacks. This allows
maintaining verification efficiency when subprograms are used, or
in fact passing the verifier at all for complex programs,
especially those using open-coded iterators
- Improve BPF's {g,s}setsockopt() length handling. Previously BPF
assumed the length is always equal to the amount of written data.
But some protos allow passing a NULL buffer to discover what the
output buffer *should* be, without writing anything
- Accept dynptr memory as memory arguments passed to helpers
- Add routing table ID to bpf_fib_lookup BPF helper
- Support O_PATH FDs in BPF_OBJ_PIN and BPF_OBJ_GET commands
- Drop bpf_capable() check in BPF_MAP_FREEZE command (used to mark
maps as read-only)
- Show target_{obj,btf}_id in tracing link fdinfo
- Addition of several new kfuncs (most of the names are
self-explanatory):
- Add a set of new dynptr kfuncs: bpf_dynptr_adjust(),
bpf_dynptr_is_null(), bpf_dynptr_is_rdonly(), bpf_dynptr_size()
and bpf_dynptr_clone().
- bpf_task_under_cgroup()
- bpf_sock_destroy() - force closing sockets
- bpf_cpumask_first_and(), rework bpf_cpumask_any*() kfuncs
Netfilter:
- Relax set/map validation checks in nf_tables. Allow checking
presence of an entry in a map without using the value
- Increase ip_vs_conn_tab_bits range for 64BIT builds
- Allow updating size of a set
- Improve NAT tuple selection when connection is closing
Driver API:
- Integrate netdev with LED subsystem, to allow configuring HW
"offloaded" blinking of LEDs based on link state and activity
(i.e. packets coming in and out)
- Support configuring rate selection pins of SFP modules
- Factor Clause 73 auto-negotiation code out of the drivers, provide
common helper routines
- Add more fool-proof helpers for managing lifetime of MDIO devices
associated with the PCS layer
- Allow drivers to report advanced statistics related to Time Aware
scheduler offload (taprio)
- Allow opting out of VF statistics in link dump, to allow more VFs
to fit into the message
- Split devlink instance and devlink port operations
New hardware / drivers:
- Ethernet:
- Synopsys EMAC4 IP support (stmmac)
- Marvell 88E6361 8 port (5x1GE + 3x2.5GE) switches
- Marvell 88E6250 7 port switches
- Microchip LAN8650/1 Rev.B0 PHYs
- MediaTek MT7981/MT7988 built-in 1GE PHY driver
- WiFi:
- Realtek RTL8192FU, 2.4 GHz, b/g/n mode, 2T2R, 300 Mbps
- Realtek RTL8723DS (SDIO variant)
- Realtek RTL8851BE
- CAN:
- Fintek F81604
Drivers:
- Ethernet NICs:
- Intel (100G, ice):
- support dynamic interrupt allocation
- use meta data match instead of VF MAC addr on slow-path
- nVidia/Mellanox:
- extend link aggregation to handle 4, rather than just 2 ports
- spawn sub-functions without any features by default
- OcteonTX2:
- support HTB (Tx scheduling/QoS) offload
- make RSS hash generation configurable
- support selecting Rx queue using TC filters
- Wangxun (ngbe/txgbe):
- add basic Tx/Rx packet offloads
- add phylink support (SFP/PCS control)
- Freescale/NXP (enetc):
- report TAPRIO packet statistics
- Solarflare/AMD:
- support matching on IP ToS and UDP source port of outer
header
- VxLAN and GENEVE tunnel encapsulation over IPv4 or IPv6
- add devlink dev info support for EF10
- Virtual NICs:
- Microsoft vNIC:
- size the Rx indirection table based on requested
configuration
- support VLAN tagging
- Amazon vNIC:
- try to reuse Rx buffers if not fully consumed, useful for ARM
servers running with 16kB pages
- Google vNIC:
- support TCP segmentation of >64kB frames
- Ethernet embedded switches:
- Marvell (mv88e6xxx):
- enable USXGMII (88E6191X)
- Microchip:
- lan966x: add support for Egress Stage 0 ACL engine
- lan966x: support mapping packet priority to internal switch
priority (based on PCP or DSCP)
- Ethernet PHYs:
- Broadcom PHYs:
- support for Wake-on-LAN for BCM54210E/B50212E
- report LPI counter
- Microsemi PHYs: support RGMII delay configuration (VSC85xx)
- Micrel PHYs: receive timestamp in the frame (LAN8841)
- Realtek PHYs: support optional external PHY clock
- Altera TSE PCS: merge the driver into Lynx PCS which it is a
variant of
- CAN: Kvaser PCIEcan:
- support packet timestamping
- WiFi:
- Intel (iwlwifi):
- major update for new firmware and Multi-Link Operation (MLO)
- configuration rework to drop test devices and split the
different families
- support for segmented PNVM images and power tables
- new vendor entries for PPAG (platform antenna gain) feature
- Qualcomm 802.11ax (ath11k):
- Multiple Basic Service Set Identifier (MBSSID) and Enhanced
MBSSID Advertisement (EMA) support in AP mode
- support factory test mode
- RealTek (rtw89):
- add RSSI based antenna diversity
- support U-NII-4 channels on 5 GHz band
- RealTek (rtl8xxxu):
- AP mode support for 8188f
- support USB RX aggregation for the newer chips"
* tag 'net-next-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1602 commits)
net: scm: introduce and use scm_recv_unix helper
af_unix: Skip SCM_PIDFD if scm->pid is NULL.
net: lan743x: Simplify comparison
netlink: Add __sock_i_ino() for __netlink_diag_dump().
net: dsa: avoid suspicious RCU usage for synced VLAN-aware MAC addresses
Revert "af_unix: Call scm_recv() only after scm_set_cred()."
phylink: ReST-ify the phylink_pcs_neg_mode() kdoc
libceph: Partially revert changes to support MSG_SPLICE_PAGES
net: phy: mscc: fix packet loss due to RGMII delays
net: mana: use vmalloc_array and vcalloc
net: enetc: use vmalloc_array and vcalloc
ionic: use vmalloc_array and vcalloc
pds_core: use vmalloc_array and vcalloc
gve: use vmalloc_array and vcalloc
octeon_ep: use vmalloc_array and vcalloc
net: usb: qmi_wwan: add u-blox 0x1312 composition
perf trace: fix MSG_SPLICE_PAGES build error
ipvlan: Fix return value of ipvlan_queue_xmit()
netfilter: nf_tables: fix underflow in chain reference counter
netfilter: nf_tables: unbind non-anonymous set if rule construction fails
...
|
|
Drivers can poll requests directly, so use that. We just need to ensure
the driver's request was allocated from a polled hctx, so a special
driver flag is added to struct io_uring_cmd.
The allows unshared and multipath namespaces to use the same polling
callback, and multipath is guaranteed to get the same queue as the
command was submitted on. Previously multipath polling might check a
different path and poll the wrong info.
The other bonus is we don't need a bio payload in order to poll,
allowing commands like 'flush' and 'write zeroes' to be submitted on the
same high priority queue as read and write commands.
Finally, using the request based polling skips the unnecessary bio
overhead.
Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20230612190343.2087040-3-kbusch@meta.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi updates from Mark Brown:
"One small core feature this time around but mostly driver improvements
and additions for SPI:
- Add support for controlling the idle state of MOSI, some systems
can support this and depending on the system integration may need
it to avoid glitching in some situations
- Support for polling mode in the S3C64xx driver and DMA on the
Qualcomm QSPI driver
- Support for several Allwinner SoCs, AMD Pensando Elba, Intel Mount
Evans, Renesas RZ/V2M, and ST STM32H7"
* tag 'spi-v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: (66 commits)
spi: dt-bindings: atmel,at91rm9200-spi: fix broken sam9x7 compatible
spi: dt-bindings: atmel,at91rm9200-spi: add sam9x7 compatible
spi: Add support for Renesas CSI
spi: dt-bindings: Add bindings for RZ/V2M CSI
spi: sun6i: Use the new helper to derive the xfer timeout value
spi: atmel: Prevent false timeouts on long transfers
spi: dt-bindings: stm32: do not disable spi-slave property for stm32f4-f7
spi: Create a helper to derive adaptive timeouts
spi: spi-geni-qcom: correctly handle -EPROBE_DEFER from dma_request_chan()
spi: stm32: disable spi-slave property for stm32f4-f7
spi: stm32: introduction of stm32h7 SPI device mode support
spi: stm32: use dmaengine_terminate_{a}sync instead of _all
spi: stm32: renaming of spi_master into spi_controller
spi: dw: Remove misleading comment for Mount Evans SoC
spi: dt-bindings: snps,dw-apb-ssi: Add compatible for Intel Mount Evans SoC
spi: dw: Add compatible for Intel Mount Evans SoC
spi: s3c64xx: Use dev_err_probe()
spi: s3c64xx: Use the managed spi master allocation function
spi: spl022: Probe defer is no error
spi: spi-imx: fix mixing of native and gpio chipselects for imx51/imx53/imx6 variants
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull mm updates from Andrew Morton:
- Yosry Ahmed brought back some cgroup v1 stats in OOM logs
- Yosry has also eliminated cgroup's atomic rstat flushing
- Nhat Pham adds the new cachestat() syscall. It provides userspace
with the ability to query pagecache status - a similar concept to
mincore() but more powerful and with improved usability
- Mel Gorman provides more optimizations for compaction, reducing the
prevalence of page rescanning
- Lorenzo Stoakes has done some maintanance work on the
get_user_pages() interface
- Liam Howlett continues with cleanups and maintenance work to the
maple tree code. Peng Zhang also does some work on maple tree
- Johannes Weiner has done some cleanup work on the compaction code
- David Hildenbrand has contributed additional selftests for
get_user_pages()
- Thomas Gleixner has contributed some maintenance and optimization
work for the vmalloc code
- Baolin Wang has provided some compaction cleanups,
- SeongJae Park continues maintenance work on the DAMON code
- Huang Ying has done some maintenance on the swap code's usage of
device refcounting
- Christoph Hellwig has some cleanups for the filemap/directio code
- Ryan Roberts provides two patch series which yield some
rationalization of the kernel's access to pte entries - use the
provided APIs rather than open-coding accesses
- Lorenzo Stoakes has some fixes to the interaction between pagecache
and directio access to file mappings
- John Hubbard has a series of fixes to the MM selftesting code
- ZhangPeng continues the folio conversion campaign
- Hugh Dickins has been working on the pagetable handling code, mainly
with a view to reducing the load on the mmap_lock
- Catalin Marinas has reduced the arm64 kmalloc() minimum alignment
from 128 to 8
- Domenico Cerasuolo has improved the zswap reclaim mechanism by
reorganizing the LRU management
- Matthew Wilcox provides some fixups to make gfs2 work better with the
buffer_head code
- Vishal Moola also has done some folio conversion work
- Matthew Wilcox has removed the remnants of the pagevec code - their
functionality is migrated over to struct folio_batch
* tag 'mm-stable-2023-06-24-19-15' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (380 commits)
mm/hugetlb: remove hugetlb_set_page_subpool()
mm: nommu: correct the range of mmap_sem_read_lock in task_mem()
hugetlb: revert use of page_cache_next_miss()
Revert "page cache: fix page_cache_next/prev_miss off by one"
mm/vmscan: fix root proactive reclaim unthrottling unbalanced node
mm: memcg: rename and document global_reclaim()
mm: kill [add|del]_page_to_lru_list()
mm: compaction: convert to use a folio in isolate_migratepages_block()
mm: zswap: fix double invalidate with exclusive loads
mm: remove unnecessary pagevec includes
mm: remove references to pagevec
mm: rename invalidate_mapping_pagevec to mapping_try_invalidate
mm: remove struct pagevec
net: convert sunrpc from pagevec to folio_batch
i915: convert i915_gpu_error to use a folio_batch
pagevec: rename fbatch_count()
mm: remove check_move_unevictable_pages()
drm: convert drm_gem_put_pages() to use a folio_batch
i915: convert shmem_sg_free_table() to use a folio_batch
scatterlist: add sg_set_folio()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull hardening updates from Kees Cook:
"There are three areas of note:
A bunch of strlcpy()->strscpy() conversions ended up living in my tree
since they were either Acked by maintainers for me to carry, or got
ignored for multiple weeks (and were trivial changes).
The compiler option '-fstrict-flex-arrays=3' has been enabled
globally, and has been in -next for the entire devel cycle. This
changes compiler diagnostics (though mainly just -Warray-bounds which
is disabled) and potential UBSAN_BOUNDS and FORTIFY _warning_
coverage. In other words, there are no new restrictions, just
potentially new warnings. Any new FORTIFY warnings we've seen have
been fixed (usually in their respective subsystem trees). For more
details, see commit df8fc4e934c12b.
The under-development compiler attribute __counted_by has been added
so that we can start annotating flexible array members with their
associated structure member that tracks the count of flexible array
elements at run-time. It is possible (likely?) that the exact syntax
of the attribute will change before it is finalized, but GCC and Clang
are working together to sort it out. Any changes can be made to the
macro while we continue to add annotations.
As an example of that last case, I have a treewide commit waiting with
such annotations found via Coccinelle:
https://git.kernel.org/linus/adc5b3cb48a049563dc673f348eab7b6beba8a9b
Also see commit dd06e72e68bcb4 for more details.
Summary:
- Fix KMSAN vs FORTIFY in strlcpy/strlcat (Alexander Potapenko)
- Convert strreplace() to return string start (Andy Shevchenko)
- Flexible array conversions (Arnd Bergmann, Wyes Karny, Kees Cook)
- Add missing function prototypes seen with W=1 (Arnd Bergmann)
- Fix strscpy() kerndoc typo (Arne Welzel)
- Replace strlcpy() with strscpy() across many subsystems which were
either Acked by respective maintainers or were trivial changes that
went ignored for multiple weeks (Azeem Shaikh)
- Remove unneeded cc-option test for UBSAN_TRAP (Nick Desaulniers)
- Add KUnit tests for strcat()-family
- Enable KUnit tests of FORTIFY wrappers under UML
- Add more complete FORTIFY protections for strlcat()
- Add missed disabling of FORTIFY for all arch purgatories.
- Enable -fstrict-flex-arrays=3 globally
- Tightening UBSAN_BOUNDS when using GCC
- Improve checkpatch to check for strcpy, strncpy, and fake flex
arrays
- Improve use of const variables in FORTIFY
- Add requested struct_size_t() helper for types not pointers
- Add __counted_by macro for annotating flexible array size members"
* tag 'hardening-v6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (54 commits)
netfilter: ipset: Replace strlcpy with strscpy
uml: Replace strlcpy with strscpy
um: Use HOST_DIR for mrproper
kallsyms: Replace all non-returning strlcpy with strscpy
sh: Replace all non-returning strlcpy with strscpy
of/flattree: Replace all non-returning strlcpy with strscpy
sparc64: Replace all non-returning strlcpy with strscpy
Hexagon: Replace all non-returning strlcpy with strscpy
kobject: Use return value of strreplace()
lib/string_helpers: Change returned value of the strreplace()
jbd2: Avoid printing outside the boundary of the buffer
checkpatch: Check for 0-length and 1-element arrays
riscv/purgatory: Do not use fortified string functions
s390/purgatory: Do not use fortified string functions
x86/purgatory: Do not use fortified string functions
acpi: Replace struct acpi_table_slit 1-element array with flex-array
clocksource: Replace all non-returning strlcpy with strscpy
string: use __builtin_memcpy() in strlcpy/strlcat
staging: most: Replace all non-returning strlcpy with strscpy
drm/i2c: tda998x: Replace all non-returning strlcpy with strscpy
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull execve updates from Kees Cook:
- Fix a few comments for correctness and typos (Baruch Siach)
- Small simplifications for binfmt (Christophe JAILLET)
- Set p_align to 4 for PT_NOTE in core dump (Fangrui Song)
* tag 'execve-v6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
binfmt_elf: fix comment typo s/reset/regset/
elf: correct note name comment
binfmt: Slightly simplify elf_fdpic_map_file()
binfmt: Use struct_size()
coredump, vmcore: Set p_align to 4 for PT_NOTE
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm
Pull lsm updates from Paul Moore:
- A SafeSetID patch to correct what appears to be a cut-n-paste typo in
the code causing a UID to be printed where a GID was desired.
This is coming via the LSM tree because we haven't been able to get a
response from the SafeSetID maintainer (Micah Morton) in several
months. Hopefully we are able to get in touch with Micah, but until
we do I'm going to pick them up in the LSM tree.
- A small fix to the reiserfs LSM xattr code.
We're continuing to work through some issues with the reiserfs code
as we try to fixup the LSM xattr handling, but in the process we're
uncovering some ugly problems in reiserfs and we may just end up
removing the LSM xattr support in reiserfs prior to reiserfs'
removal.
For better or worse, this shouldn't impact any of the reiserfs users,
as we discovered that LSM xattrs on reiserfs were completely broken,
meaning no one is currently using the combo of reiserfs and a file
labeling LSM.
- A tweak to how the cap_user_data_t struct/typedef is declared in the
header file to appease the Sparse gods.
- In the process of trying to sort out the SafeSetID lost-maintainer
problem I realized that I needed to update the labeled networking
entry to "Supported".
- Minor comment/documentation and spelling fixes.
* tag 'lsm-pr-20230626' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm:
device_cgroup: Fix kernel-doc warnings in device_cgroup
SafeSetID: fix UID printed instead of GID
MAINTAINERS: move labeled networking to "supported"
capability: erase checker warnings about struct __user_cap_data_struct
lsm: fix a number of misspellings
reiserfs: Initialize sec->length in reiserfs_security_init().
capability: fix kernel-doc warnings in capability.c
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Alexander Gordeev:
- Fix the style of protected key API driver source: use x-mas tree for
all local variable declarations
- Rework protected key API driver to not use the struct pkey_protkey
and pkey_clrkey anymore. Both structures have a fixed size buffer,
but with the support of ECC protected key these buffers are not big
enough. Use dynamic buffers internally and transparently for
userspace
- Add support for a new 'non CCA clear key token' with ECC clear keys
supported: ECC P256, ECC P384, ECC P521, ECC ED25519 and ECC ED448.
This makes it possible to derive a protected key from the ECC clear
key input via PKEY_KBLOB2PROTK3 ioctl, while currently the only way
to derive is via PCKMO instruction
- The s390 PMU of PAI crypto and extension 1 NNPA counters use atomic_t
for reference counting. Replace this with the proper data type
refcount_t
- Select ARCH_SUPPORTS_INT128, but limit this to clang for now, since
gcc generates inefficient code, which may lead to stack overflows
- Replace one-element array with flexible-array member in struct
vfio_ccw_parent and refactor the rest of the code accordingly. Also,
prefer struct_size() over sizeof() open- coded versions
- Introduce OS_INFO_FLAGS_ENTRY pointing to a flags field and
OS_INFO_FLAG_REIPL_CLEAR flag that informs a dumper whether the
system memory should be cleared or not once dumped
- Fix a hang when a user attempts to remove a VFIO-AP mediated device
attached to a guest: add VFIO_DEVICE_GET_IRQ_INFO and
VFIO_DEVICE_SET_IRQS IOCTLs and wire up the VFIO bus driver callback
to request a release of the device
- Fix calculation for R_390_GOTENT relocations for modules
- Allow any user space process with CAP_PERFMON capability read and
display the CPU Measurement facility counter sets
- Rework large statically-defined per-CPU cpu_cf_events data structure
and replace it with dynamically allocated structures created when a
perf_event_open() system call is invoked or /dev/hwctr device is
accessed
* tag 's390-6.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/cpum_cf: rework PER_CPU_DEFINE of struct cpu_cf_events
s390/cpum_cf: open access to hwctr device for CAP_PERFMON privileged process
s390/module: fix rela calculation for R_390_GOTENT
s390/vfio-ap: wire in the vfio_device_ops request callback
s390/vfio-ap: realize the VFIO_DEVICE_SET_IRQS ioctl
s390/vfio-ap: realize the VFIO_DEVICE_GET_IRQ_INFO ioctl
s390/pkey: add support for ecc clear key
s390/pkey: do not use struct pkey_protkey
s390/pkey: introduce reverse x-mas trees
s390/zcore: conditionally clear memory on reipl
s390/ipl: add REIPL_CLEAR flag to os_info
vfio/ccw: use struct_size() helper
vfio/ccw: replace one-element array with flexible-array member
s390: select ARCH_SUPPORTS_INT128
s390/pai_ext: replace atomic_t with refcount_t
s390/pai_crypto: replace atomic_t with refcount_t
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:
- Introduce cmpxchg128() -- aka. the demise of cmpxchg_double()
The cmpxchg128() family of functions is basically & functionally the
same as cmpxchg_double(), but with a saner interface.
Instead of a 6-parameter horror that forced u128 - u64/u64-halves
layout details on the interface and exposed users to complexity,
fragility & bugs, use a natural 3-parameter interface with u128
types.
- Restructure the generated atomic headers, and add kerneldoc comments
for all of the generic atomic{,64,_long}_t operations.
The generated definitions are much cleaner now, and come with
documentation.
- Implement lock_set_cmp_fn() on lockdep, for defining an ordering when
taking multiple locks of the same type.
This gets rid of one use of lockdep_set_novalidate_class() in the
bcache code.
- Fix raw_cpu_generic_try_cmpxchg() bug due to an unintended variable
shadowing generating garbage code on Clang on certain ARM builds.
* tag 'locking-core-2023-06-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (43 commits)
locking/atomic: scripts: fix ${atomic}_dec_if_positive() kerneldoc
percpu: Fix self-assignment of __old in raw_cpu_generic_try_cmpxchg()
locking/atomic: treewide: delete arch_atomic_*() kerneldoc
locking/atomic: docs: Add atomic operations to the driver basic API documentation
locking/atomic: scripts: generate kerneldoc comments
docs: scripts: kernel-doc: accept bitwise negation like ~@var
locking/atomic: scripts: simplify raw_atomic*() definitions
locking/atomic: scripts: simplify raw_atomic_long*() definitions
locking/atomic: scripts: split pfx/name/sfx/order
locking/atomic: scripts: restructure fallback ifdeffery
locking/atomic: scripts: build raw_atomic_long*() directly
locking/atomic: treewide: use raw_atomic*_<op>()
locking/atomic: scripts: add trivial raw_atomic*_<op>()
locking/atomic: scripts: factor out order template generation
locking/atomic: scripts: remove leftover "${mult}"
locking/atomic: scripts: remove bogus order parameter
locking/atomic: xtensa: add preprocessor symbols
locking/atomic: x86: add preprocessor symbols
locking/atomic: sparc: add preprocessor symbols
locking/atomic: sh: add preprocessor symbols
...
|
|
Linux 6.4
Resolve conflicts between rdma rc and next in rxe_cq matching linux-next:
drivers/infiniband/sw/rxe/rxe_cq.c:
https://lore.kernel.org/r/20230622115246.365d30ad@canb.auug.org.au
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next
Pablo Neira Ayuso says:
====================
Netfilter/IPVS updates for net-next
1) Allow slightly larger IPVS connection table size from Kconfig for
64-bit arch, from Abhijeet Rastogi.
2) Since IPVS connection table might be larger than 2^20 after previous
patch, allow to limit it depending on the available memory.
Moreover, use kvmalloc. From Julian Anastasov.
3) Do not rebuild VLAN header in nft_payload when matching source and
destination MAC address.
4) Remove nested rcu read lock side in ip_set_test(), from Florian Westphal.
5) Allow to update set size, also from Florian.
6) Improve NAT tuple selection when connection is closing,
from Florian Westphal.
7) Support for resetting set element stateful expression, from Phil Sutter.
8) Use NLA_POLICY_MAX to narrow down maximum attribute value in nf_tables,
from Florian Westphal.
* tag 'nf-next-23-06-26' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
netfilter: nf_tables: limit allowed range via nla_policy
netfilter: nf_tables: Introduce NFT_MSG_GETSETELEM_RESET
netfilter: snat: evict closing tcp entries on reply tuple collision
netfilter: nf_tables: permit update of set size
netfilter: ipset: remove rcu_read_lock_bh pair from ip_set_test
netfilter: nft_payload: rebuild vlan header when needed
ipvs: dynamically limit the connection hash table
ipvs: increase ip_vs_conn_tab_bits range for 64BIT
====================
Link: https://lore.kernel.org/r/20230626064749.75525-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Pull block updates from Jens Axboe:
- NVMe pull request via Keith:
- Various cleanups all around (Irvin, Chaitanya, Christophe)
- Better struct packing (Christophe JAILLET)
- Reduce controller error logs for optional commands (Keith)
- Support for >=64KiB block sizes (Daniel Gomez)
- Fabrics fixes and code organization (Max, Chaitanya, Daniel
Wagner)
- bcache updates via Coly:
- Fix a race at init time (Mingzhe Zou)
- Misc fixes and cleanups (Andrea, Thomas, Zheng, Ye)
- use page pinning in the block layer for dio (David)
- convert old block dio code to page pinning (David, Christoph)
- cleanups for pktcdvd (Andy)
- cleanups for rnbd (Guoqing)
- use the unchecked __bio_add_page() for the initial single page
additions (Johannes)
- fix overflows in the Amiga partition handling code (Michael)
- improve mq-deadline zoned device support (Bart)
- keep passthrough requests out of the IO schedulers (Christoph, Ming)
- improve support for flush requests, making them less special to deal
with (Christoph)
- add bdev holder ops and shutdown methods (Christoph)
- fix the name_to_dev_t() situation and use cases (Christoph)
- decouple the block open flags from fmode_t (Christoph)
- ublk updates and cleanups, including adding user copy support (Ming)
- BFQ sanity checking (Bart)
- convert brd from radix to xarray (Pankaj)
- constify various structures (Thomas, Ivan)
- more fine grained persistent reservation ioctl capability checks
(Jingbo)
- misc fixes and cleanups (Arnd, Azeem, Demi, Ed, Hengqi, Hou, Jan,
Jordy, Li, Min, Yu, Zhong, Waiman)
* tag 'for-6.5/block-2023-06-23' of git://git.kernel.dk/linux: (266 commits)
scsi/sg: don't grab scsi host module reference
ext4: Fix warning in blkdev_put()
block: don't return -EINVAL for not found names in devt_from_devname
cdrom: Fix spectre-v1 gadget
block: Improve kernel-doc headers
blk-mq: don't insert passthrough request into sw queue
bsg: make bsg_class a static const structure
ublk: make ublk_chr_class a static const structure
aoe: make aoe_class a static const structure
block/rnbd: make all 'class' structures const
block: fix the exclusive open mask in disk_scan_partitions
block: add overflow checks for Amiga partition support
block: change all __u32 annotations to __be32 in affs_hardblocks.h
block: fix signed int overflow in Amiga partition support
block: add capacity validation in bdev_add_partition()
block: fine-granular CAP_SYS_ADMIN for Persistent Reservation
block: disallow Persistent Reservation on partitions
reiserfs: fix blkdev_put() warning from release_journal_dev()
block: fix wrong mode for blkdev_get_by_dev() from disk_scan_partitions()
block: document the holder argument to blkdev_get_by_path
...
|
|
Pull io_uring updates from Jens Axboe:
"Nothing major in this release, just a bunch of cleanups and some
optimizations around networking mostly.
- clean up file request flags handling (Christoph)
- clean up request freeing and CQ locking (Pavel)
- support for using pre-registering the io_uring fd at setup time
(Josh)
- Add support for user allocated ring memory, rather than having the
kernel allocate it. Mostly for packing rings into a huge page (me)
- avoid an unnecessary double retry on receive (me)
- maintain ordering for task_work, which also improves performance
(me)
- misc cleanups/fixes (Pavel, me)"
* tag 'for-6.5/io_uring-2023-06-23' of git://git.kernel.dk/linux: (39 commits)
io_uring: merge conditional unlock flush helpers
io_uring: make io_cq_unlock_post static
io_uring: inline __io_cq_unlock
io_uring: fix acquire/release annotations
io_uring: kill io_cq_unlock()
io_uring: remove IOU_F_TWQ_FORCE_NORMAL
io_uring: don't batch task put on reqs free
io_uring: move io_clean_op()
io_uring: inline io_dismantle_req()
io_uring: remove io_free_req_tw
io_uring: open code io_put_req_find_next
io_uring: add helpers to decode the fixed file file_ptr
io_uring: use io_file_from_index in io_msg_grab_file
io_uring: use io_file_from_index in __io_sync_cancel
io_uring: return REQ_F_ flags from io_file_get_flags
io_uring: remove io_req_ffs_set
io_uring: remove a confusing comment above io_file_get_flags
io_uring: remove the mode variable in io_file_get_flags
io_uring: remove __io_file_supports_nowait
io_uring: wait interruptibly for request completions on exit
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs mount updates from Christian Brauner:
"This contains the work to extend move_mount() to allow adding a mount
beneath the topmost mount of a mount stack.
There are two LWN articles about this. One covers the original patch
series in [1]. The other in [2] summarizes the session and roughly the
discussion between Al and me at LSFMM. The second article also goes
into some good questions from attendees.
Since all details are found in the relevant commit with a technical
dive into semantics and locking at the end I'm only adding the
motivation and core functionality for this from commit message and
leave out the invasive details. The code is also heavily commented and
annotated as well which was explicitly requested.
TL;DR:
> mount -t ext4 /dev/sda /mnt
|
└─/mnt /dev/sda ext4
> mount --beneath -t xfs /dev/sdb /mnt
|
└─/mnt /dev/sdb xfs
└─/mnt /dev/sda ext4
> umount /mnt
|
└─/mnt /dev/sdb xfs
The longer motivation is that various distributions are adding or are
in the process of adding support for system extensions and in the
future configuration extensions through various tools. A more detailed
explanation on system and configuration extensions can be found on the
manpage which is listed below at [3].
System extension images may – dynamically at runtime — extend the
/usr/ and /opt/ directory hierarchies with additional files. This is
particularly useful on immutable system images where a /usr/ and/or
/opt/ hierarchy residing on a read-only file system shall be extended
temporarily at runtime without making any persistent modifications.
When one or more system extension images are activated, their /usr/
and /opt/ hierarchies are combined via overlayfs with the same
hierarchies of the host OS, and the host /usr/ and /opt/ overmounted
with it ("merging"). When they are deactivated, the mount point is
disassembled — again revealing the unmodified original host version of
the hierarchy ("unmerging"). Merging thus makes the extension's
resources suddenly appear below the /usr/ and /opt/ hierarchies as if
they were included in the base OS image itself. Unmerging makes them
disappear again, leaving in place only the files that were shipped
with the base OS image itself.
System configuration images are similar but operate on directories
containing system or service configuration.
On nearly all modern distributions mount propagation plays a crucial
role and the rootfs of the OS is a shared mount in a peer group
(usually with peer group id 1):
TARGET SOURCE FSTYPE PROPAGATION MNT_ID PARENT_ID
/ / ext4 shared:1 29 1
On such systems all services and containers run in a separate mount
namespace and are pivot_root()ed into their rootfs. A separate mount
namespace is almost always used as it is the minimal isolation
mechanism services have. But usually they are even much more isolated
up to the point where they almost become indistinguishable from
containers.
Mount propagation again plays a crucial role here. The rootfs of all
these services is a slave mount to the peer group of the host rootfs.
This is done so the service will receive mount propagation events from
the host when certain files or directories are updated.
In addition, the rootfs of each service, container, and sandbox is
also a shared mount in its separate peer group:
TARGET SOURCE FSTYPE PROPAGATION MNT_ID PARENT_ID
/ / ext4 shared:24 master:1 71 47
For people not too familiar with mount propagation, the master:1 means
that this is a slave mount to peer group 1. Which as one can see is
the host rootfs as indicated by shared:1 above. The shared:24
indicates that the service rootfs is a shared mount in a separate peer
group with peer group id 24.
A service may run other services. Such nested services will also have
a rootfs mount that is a slave to the peer group of the outer service
rootfs mount.
For containers things are just slighly different. A container's rootfs
isn't a slave to the service's or host rootfs' peer group. The rootfs
mount of a container is simply a shared mount in its own peer group:
TARGET SOURCE FSTYPE PROPAGATION MNT_ID PARENT_ID
/home/ubuntu/debian-tree / ext4 shared:99 61 60
So whereas services are isolated OS components a container is treated
like a separate world and mount propagation into it is restricted to a
single well known mount that is a slave to the peer group of the
shared mount /run on the host:
TARGET SOURCE FSTYPE PROPAGATION MNT_ID PARENT_ID
/propagate/debian-tree /run/host/incoming tmpfs master:5 71 68
Here, the master:5 indicates that this mount is a slave to the peer
group with peer group id 5. This allows to propagate mounts into the
container and served as a workaround for not being able to insert
mounts into mount namespaces directly. But the new mount api does
support inserting mounts directly. For the interested reader the
blogpost in [4] might be worth reading where I explain the old and the
new approach to inserting mounts into mount namespaces.
Containers of course, can themselves be run as services. They often
run full systems themselves which means they again run services and
containers with the exact same propagation settings explained above.
The whole system is designed so that it can be easily updated,
including all services in various fine-grained ways without having to
enter every single service's mount namespace which would be
prohibitively expensive. The mount propagation layout has been
carefully chosen so it is possible to propagate updates for system
extensions and configurations from the host into all services.
The simplest model to update the whole system is to mount on top of
/usr, /opt, or /etc on the host. The new mount on /usr, /opt, or /etc
will then propagate into every service. This works cleanly the first
time. However, when the system is updated multiple times it becomes
necessary to unmount the first update on /opt, /usr, /etc and then
propagate the new update. But this means, there's an interval where
the old base system is accessible. This has to be avoided to protect
against downgrade attacks.
The vfs already exposes a mechanism to userspace whereby mounts can be
mounted beneath an existing mount. Such mounts are internally referred
to as "tucked". The patch series exposes the ability to mount beneath
a top mount through the new MOVE_MOUNT_BENEATH flag for the
move_mount() system call. This allows userspace to seamlessly upgrade
mounts. After this series the only thing that will have changed is
that mounting beneath an existing mount can be done explicitly instead
of just implicitly.
The crux is that the proposed mechanism already exists and that it is
so powerful as to cover cases where mounts are supposed to be updated
with new versions. Crucially, it offers an important flexibility.
Namely that updates to a system may either be forced or can be delayed
and the umount of the top mount be left to a service if it is a
cooperative one"
Link: https://lwn.net/Articles/927491 [1]
Link: https://lwn.net/Articles/934094 [2]
Link: https://man7.org/linux/man-pages/man8/systemd-sysext.8.html [3]
Link: https://brauner.io/2023/02/28/mounting-into-mount-namespaces.html [4]
Link: https://github.com/flatcar/sysext-bakery
Link: https://fedoraproject.org/wiki/Changes/Unified_Kernel_Support_Phase_1
Link: https://fedoraproject.org/wiki/Changes/Unified_Kernel_Support_Phase_2
Link: https://github.com/systemd/systemd/pull/26013
* tag 'v6.5/vfs.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
fs: allow to mount beneath top mount
fs: use a for loop when locking a mount
fs: properly document __lookup_mnt()
fs: add path_mounted()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull misc vfs updates from Christian Brauner:
"Miscellaneous features, cleanups, and fixes for vfs and individual fs
Features:
- Use mode 0600 for file created by cachefilesd so it can be run by
unprivileged users. This aligns them with directories which are
already created with mode 0700 by cachefilesd
- Reorder a few members in struct file to prevent some false sharing
scenarios
- Indicate that an eventfd is used a semaphore in the eventfd's
fdinfo procfs file
- Add a missing uapi header for eventfd exposing relevant uapi
defines
- Let the VFS protect transitions of a superblock from read-only to
read-write in addition to the protection it already provides for
transitions from read-write to read-only. Protecting read-only to
read-write transitions allows filesystems such as ext4 to perform
internal writes, keeping writers away until the transition is
completed
Cleanups:
- Arnd removed the architecture specific arch_report_meminfo()
prototypes and added a generic one into procfs.h. Note, we got a
report about a warning in amdpgpu codepaths that suggested this was
bisectable to this change but we concluded it was a false positive
- Remove unused parameters from split_fs_names()
- Rename put_and_unmap_page() to unmap_and_put_page() to let the name
reflect the order of the cleanup operation that has to unmap before
the actual put
- Unexport buffer_check_dirty_writeback() as it is not used outside
of block device aops
- Stop allocating aio rings from highmem
- Protecting read-{only,write} transitions in the VFS used open-coded
barriers in various places. Replace them with proper little helpers
and document both the helpers and all barrier interactions involved
when transitioning between read-{only,write} states
- Use flexible array members in old readdir codepaths
Fixes:
- Use the correct type __poll_t for epoll and eventfd
- Replace all deprecated strlcpy() invocations, whose return value
isn't checked with an equivalent strscpy() call
- Fix some kernel-doc warnings in fs/open.c
- Reduce the stack usage in jffs2's xattr codepaths finally getting
rid of this: fs/jffs2/xattr.c:887:1: error: the frame size of 1088
bytes is larger than 1024 bytes [-Werror=frame-larger-than=]
royally annoying compilation warning
- Use __FMODE_NONOTIFY instead of FMODE_NONOTIFY where an int and not
fmode_t is required to avoid fmode_t to integer degradation
warnings
- Create coredumps with O_WRONLY instead of O_RDWR. There's a long
explanation in that commit how O_RDWR is actually a bug which we
found out with the help of Linus and git archeology
- Fix "no previous prototype" warnings in the pipe codepaths
- Add overflow calculations for remap_verify_area() as a signed
addition overflow could be triggered in xfstests
- Fix a null pointer dereference in sysv
- Use an unsigned variable for length calculations in jfs avoiding
compilation warnings with gcc 13
- Fix a dangling pipe pointer in the watch queue codepath
- The legacy mount option parser provided as a fallback by the VFS
for filesystems not yet converted to the new mount api did prefix
the generated mount option string with a leading ',' causing issues
for some filesystems
- Fix a repeated word in a comment in fs.h
- autofs: Update the ctime when mtime is updated as mandated by
POSIX"
* tag 'v6.5/vfs.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (27 commits)
readdir: Replace one-element arrays with flexible-array members
fs: Provide helpers for manipulating sb->s_readonly_remount
fs: Protect reconfiguration of sb read-write from racing writes
eventfd: add a uapi header for eventfd userspace APIs
autofs: set ctime as well when mtime changes on a dir
eventfd: show the EFD_SEMAPHORE flag in fdinfo
fs/aio: Stop allocating aio rings from HIGHMEM
fs: Fix comment typo
fs: unexport buffer_check_dirty_writeback
fs: avoid empty option when generating legacy mount string
watch_queue: prevent dangling pipe pointer
fs.h: Optimize file struct to prevent false sharing
highmem: Rename put_and_unmap_page() to unmap_and_put_page()
cachefiles: Allow the cache to be non-root
init: remove unused names parameter in split_fs_names()
jfs: Use unsigned variable for length calculations
fs/sysv: Null check to prevent null-ptr-deref bug
fs: use UB-safe check for signed addition overflow in remap_verify_area
procfs: consolidate arch_report_meminfo declaration
fs: pipe: reveal missing function protoypes
...
|