summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)AuthorFilesLines
2024-02-21acpi/ghes: Remove CXL CPER notificationsDan Williams1-18/+0
Initial tests with the CXL CPER implementation identified that error reports were being duplicated in the log and the trace event [1]. Then it was discovered that the notification handler took sleeping locks while the GHES event handling runs in spin_lock_irqsave() context [2] While the duplicate reporting was fixed in v6.8-rc4, the fix for the sleeping-lock-vs-atomic collision would enjoy more time to settle and gain some test cycles. Given how late it is in the development cycle, remove the CXL hookup for now and try again during the next merge window. Note that end result is that v6.8 does not emit CXL CPER payloads to the kernel log, but this is in line with the CXL trend to move error reporting to trace events instead of the kernel log. Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Rafael J. Wysocki <rafael@kernel.org> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Link: http://lore.kernel.org/r/20240108165855.00002f5a@Huawei.com [1] Closes: http://lore.kernel.org/r/b963c490-2c13-4b79-bbe7-34c6568423c7@moroto.mountain [2] Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-02-21workqueue: Clean up enum work_bits and related constantsTejun Heo1-26/+32
The bits of work->data are used for a few different purposes. How the bits are used is determined by enum work_bits. The planned disable/enable support will add another use, so let's clean it up a bit in preparation. - Let WORK_STRUCT_*_BIT's values be determined by enum definition order. - Deliminate different bit sections the same way using SHIFT and BITS values. - Rename __WORK_OFFQ_CANCELING to WORK_OFFQ_CANCELING_BIT for consistency. - Introduce WORK_STRUCT_PWQ_SHIFT and replace WORK_STRUCT_FLAG_MASK and WORK_STRUCT_WQ_DATA_MASK with WQ_STRUCT_PWQ_MASK for clarity. - Improve documentation. No functional changes. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <jiangshanlai@gmail.com>
2024-02-21Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds2-2/+5
Pull rdma fixes from Jason Gunthorpe: "Mostly irdma and bnxt_re fixes: - Missing error unwind in hf1 - For bnxt - fix fenching behavior to work on new chips, fail unsupported SRQ resize back to userspace, propogate SRQ FW failure back to userspace. - Correctly fail unsupported SRQ resize back to userspace in bnxt - Adjust a memcpy in mlx5 to not overflow a struct field. - Prevent userspace from triggering mlx5 fw syndrome logging from sysfs - Use the correct access mode for MLX5_IB_METHOD_DEVX_OBJ_MODIFY to avoid a userspace failure on modify - For irdma - Don't UAF a concurrent tasklet during destroy, prevent userspace from issuing invalid QP attrs, fix a possible CQ overflow, capture a missing HW async error event - sendmsg() triggerable memory access crash in hfi1 - Fix the srpt_service_guid parameter to not crash due to missing function pointer - Don't leak objects in error unwind in qedr - Don't weirdly cast function pointers in srpt" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: RDMA/srpt: fix function pointer cast warnings RDMA/qedr: Fix qedr_create_user_qp error flow RDMA/srpt: Support specifying the srpt_service_guid parameter IB/hfi1: Fix sdma.h tx->num_descs off-by-one error RDMA/irdma: Add AE for too many RNRS RDMA/irdma: Set the CQ read threshold for GEN 1 RDMA/irdma: Validate max_send_wr and max_recv_wr RDMA/irdma: Fix KASAN issue with tasklet RDMA/mlx5: Relax DEVX access upon modify commands IB/mlx5: Don't expose debugfs entries for RRoCE general parameters if not supported RDMA/mlx5: Fix fortify source warning while accessing Eth segment RDMA/bnxt_re: Add a missing check in bnxt_qplib_query_srq RDMA/bnxt_re: Return error for SRQ resize RDMA/bnxt_re: Fix unconditional fence for newer adapters RDMA/bnxt_re: Remove a redundant check inside bnxt_re_vf_res_config RDMA/bnxt_re: Avoid creating fence MR for newer adapters IB/hfi1: Fix a memleak in init_credit_return
2024-02-21mm/swap: fix race when skipping swapcacheKairui Song1-0/+5
When skipping swapcache for SWP_SYNCHRONOUS_IO, if two or more threads swapin the same entry at the same time, they get different pages (A, B). Before one thread (T0) finishes the swapin and installs page (A) to the PTE, another thread (T1) could finish swapin of page (B), swap_free the entry, then swap out the possibly modified page reusing the same entry. It breaks the pte_same check in (T0) because PTE value is unchanged, causing ABA problem. Thread (T0) will install a stalled page (A) into the PTE and cause data corruption. One possible callstack is like this: CPU0 CPU1 ---- ---- do_swap_page() do_swap_page() with same entry <direct swapin path> <direct swapin path> <alloc page A> <alloc page B> swap_read_folio() <- read to page A swap_read_folio() <- read to page B <slow on later locks or interrupt> <finished swapin first> ... set_pte_at() swap_free() <- entry is free <write to page B, now page A stalled> <swap out page B to same swap entry> pte_same() <- Check pass, PTE seems unchanged, but page A is stalled! swap_free() <- page B content lost! set_pte_at() <- staled page A installed! And besides, for ZRAM, swap_free() allows the swap device to discard the entry content, so even if page (B) is not modified, if swap_read_folio() on CPU0 happens later than swap_free() on CPU1, it may also cause data loss. To fix this, reuse swapcache_prepare which will pin the swap entry using the cache flag, and allow only one thread to swap it in, also prevent any parallel code from putting the entry in the cache. Release the pin after PT unlocked. Racers just loop and wait since it's a rare and very short event. A schedule_timeout_uninterruptible(1) call is added to avoid repeated page faults wasting too much CPU, causing livelock or adding too much noise to perf statistics. A similar livelock issue was described in commit 029c4628b2eb ("mm: swap: get rid of livelock in swapin readahead") Reproducer: This race issue can be triggered easily using a well constructed reproducer and patched brd (with a delay in read path) [1]: With latest 6.8 mainline, race caused data loss can be observed easily: $ gcc -g -lpthread test-thread-swap-race.c && ./a.out Polulating 32MB of memory region... Keep swapping out... Starting round 0... Spawning 65536 workers... 32746 workers spawned, wait for done... Round 0: Error on 0x5aa00, expected 32746, got 32743, 3 data loss! Round 0: Error on 0x395200, expected 32746, got 32743, 3 data loss! Round 0: Error on 0x3fd000, expected 32746, got 32737, 9 data loss! Round 0 Failed, 15 data loss! This reproducer spawns multiple threads sharing the same memory region using a small swap device. Every two threads updates mapped pages one by one in opposite direction trying to create a race, with one dedicated thread keep swapping out the data out using madvise. The reproducer created a reproduce rate of about once every 5 minutes, so the race should be totally possible in production. After this patch, I ran the reproducer for over a few hundred rounds and no data loss observed. Performance overhead is minimal, microbenchmark swapin 10G from 32G zram: Before: 10934698 us After: 11157121 us Cached: 13155355 us (Dropping SWP_SYNCHRONOUS_IO flag) [kasong@tencent.com: v4] Link: https://lkml.kernel.org/r/20240219082040.7495-1-ryncsn@gmail.com Link: https://lkml.kernel.org/r/20240206182559.32264-1-ryncsn@gmail.com Fixes: 0bcac06f27d7 ("mm, swap: skip swapcache for swapin of synchronous device") Reported-by: "Huang, Ying" <ying.huang@intel.com> Closes: https://lore.kernel.org/lkml/87bk92gqpx.fsf_-_@yhuang6-desk2.ccr.corp.intel.com/ Link: https://github.com/ryncsn/emm-test-project/tree/master/swap-stress-race [1] Signed-off-by: Kairui Song <kasong@tencent.com> Reviewed-by: "Huang, Ying" <ying.huang@intel.com> Acked-by: Yu Zhao <yuzhao@google.com> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Chris Li <chrisl@kernel.org> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Yosry Ahmed <yosryahmed@google.com> Cc: Yu Zhao <yuzhao@google.com> Cc: Barry Song <21cnbao@gmail.com> Cc: SeongJae Park <sj@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-20fs/select: rework stack allocation hack for clangArnd Bergmann1-4/+0
A while ago, we changed the way that select() and poll() preallocate a temporary buffer just under the size of the static warning limit of 1024 bytes, as clang was frequently going slightly above that limit. The warnings have recently returned and I took another look. As it turns out, clang is not actually inherently worse at reserving stack space, it just happens to inline do_select() into core_sys_select(), while gcc never inlines it. Annotate do_select() to never be inlined and in turn remove the special case for the allocation size. This should give the same behavior for both clang and gcc all the time and once more avoids those warnings. Fixes: ad312f95d41c ("fs/select: avoid clang stack usage warning") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20240216202352.2492798-1-arnd@kernel.org Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Andi Kleen <ak@linux.intel.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-02-20block: pass a queue_limits argument to blk_alloc_diskChristoph Hellwig1-3/+7
Pass a queue_limits to blk_alloc_disk and apply it if non-NULL. This will allow allocating queues with valid queue limits instead of setting the values one at a time later. Also change blk_alloc_disk to return an ERR_PTR instead of just NULL which can't distinguish errors. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Link: https://lore.kernel.org/r/20240215071055.2201424-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-20Merge tag 'v6.8-rc5' into timers/core, to resolve conflictIngo Molnar19-35/+138
There's a conflict between this recent upstream fix: dad6a09f3148 ("hrtimer: Report offline hrtimer enqueue") and a pending commit in the timers tree: 1a4729ecafc2 ("hrtimers: Move hrtimer base related definitions into hrtimer_defs.h") Resolve it by applying the upstream fix to the new <linux/hrtimer_defs.h> header. Conflict: include/linux/hrtimer.h Semantic conflict: include/linux/hrtimer_defs.h Signed-off-by: Ingo Molnar <mingo@kernel.org>
2024-02-19smp: Make __smp_processor_id() 0-argument macroAlexey Dobriyan1-1/+1
smp_processor_id family of macros never accepted any arguments. #define __smp_processor_id(x) works by accident (see C99 6.10.3 §4). __smp_processor_id() gets 1 (empty) argument and passes it down to raw_smp_processor_id() which doesn't accept arguments. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/0037d1f2-8153-4b33-b43e-f4b6ecd710ac@p183
2024-02-19jiffies: Transform comment about time_* functions into DOC blockAnna-Maria Behnsen1-6/+9
This general note about time_* functions is also useful to be available in kernel documentation. Therefore transform it into a kernel-doc DOC block with proper formatting. Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240123164702.55612-6-anna-maria@linutronix.de
2024-02-19hrtimers: Update formatting of documentationAnna-Maria Behnsen1-11/+3
Documentation of functions lacks the annotations which are used by kernel-doc and *.rst to make appearance in rendered documents more user-friendly. Use those annotations to improve user-friendliness. While at it prevent duplication of comments and use a reference instead. Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240123164702.55612-3-anna-maria@linutronix.de
2024-02-19hrtimers: Move hrtimer base related definitions into hrtimer_defs.hAnna-Maria Behnsen2-103/+102
hrtimer base related struct definitions are part of hrtimers.h as it is required there. With this, also the struct documentation which is for core code internal use, is exposed into the general api. To prevent this, move all core internal definitions and the related includes into hrtimer_defs.h. Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240123164702.55612-2-anna-maria@linutronix.de
2024-02-17Merge tag 'char-misc-6.8-rc5' of ↵Linus Torvalds3-4/+7
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char / miscdriver fixes from Greg KH: "Here is a small set of char/misc and IIO driver fixes for 6.8-rc5. Included in here are: - lots of iio driver fixes for reported issues - nvmem device naming fixup for reported problem - interconnect driver fixes for reported issues All of these have been in linux-next for a while with no reported the issues (the nvmem patch was included in a different branch in linux-next before sent to me for inclusion here)" * tag 'char-misc-6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (21 commits) nvmem: include bit index in cell sysfs file name iio: adc: ad4130: only set GPIO_CTRL if pin is unused iio: adc: ad4130: zero-initialize clock init data interconnect: qcom: x1e80100: Add missing ACV enable_mask interconnect: qcom: sm8650: Use correct ACV enable_mask iio: accel: bma400: Fix a compilation problem iio: commom: st_sensors: ensure proper DMA alignment iio: hid-sensor-als: Return 0 for HID_USAGE_SENSOR_TIME_TIMESTAMP iio: move LIGHT_UVA and LIGHT_UVB to the end of iio_modifier staging: iio: ad5933: fix type mismatch regression iio: humidity: hdc3020: fix temperature offset iio: adc: ad7091r8: Fix error code in ad7091r8_gpio_setup() iio: adc: ad_sigma_delta: ensure proper DMA alignment iio: imu: adis: ensure proper DMA alignment iio: humidity: hdc3020: Add Makefile, Kconfig and MAINTAINERS entry iio: imu: bno055: serdev requires REGMAP iio: magnetometer: rm3100: add boundary check for the value read from RM3100_REG_TMRC iio: pressure: bmp280: Add missing bmp085 to SPI id table iio: core: fix memleak in iio_device_register_sysfs interconnect: qcom: sm8550: Enable sync_state ...
2024-02-17Merge tag 'tty-6.8-rc5' of ↵Linus Torvalds1-5/+27
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull tty / serial fixes from Greg KH: "Here are three small tty and serial driver fixes for 6.8-rc5: - revert a 8250_pci1xxxx off-by-one change that was incorrect - two changes to fix the transmit path of the mxs-auart driver, fixing a regression in the 6.2 release All of these have been in linux-next for over a week with no reported issues" * tag 'tty-6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: serial: mxs-auart: fix tx serial: core: introduce uart_port_tx_flags() serial: 8250_pci1xxxx: partially revert off by one patch
2024-02-17Merge tag 'usb-6.8-rc5' of ↵Linus Torvalds1-1/+0
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb Pull USB / Thunderbolt fixes from Greg KH: "Here are two small fixes for 6.8-rc5: - thunderbolt to fix a reported issue on many platforms - dwc3 driver revert of a commit that caused problems in -rc1 Both of these changes have been in linux-next for over a week with no reported issues" * tag 'usb-6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: Revert "usb: dwc3: Support EBC feature of DWC_usb31" thunderbolt: Fix setting the CNS bit in ROUTER_CS_5
2024-02-17x86/numa: Fix the address overlap check in numa_fill_memblks()Alison Schofield1-0/+2
numa_fill_memblks() fills in the gaps in numa_meminfo memblks over a physical address range. To do so, it first creates a list of existing memblks that overlap that address range. The issue is that it is off by one when comparing to the end of the address range, so memblks that do not overlap are selected. The impact of selecting a memblk that does not actually overlap is that an existing memblk may be filled when the expected action is to do nothing and return NUMA_NO_MEMBLK to the caller. The caller can then add a new NUMA node and memblk. Replace the broken open-coded search for address overlap with the memblock helper memblock_addrs_overlap(). Update the kernel doc and in code comments. Suggested by: "Huang, Ying" <ying.huang@intel.com> Fixes: 8f012db27c95 ("x86/numa: Introduce numa_fill_memblks()") Signed-off-by: Alison Schofield <alison.schofield@intel.com> Acked-by: Mike Rapoport (IBM) <rppt@kernel.org> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Link: https://lore.kernel.org/r/10a3e6109c34c21a8dd4c513cf63df63481a2b07.1705085543.git.alison.schofield@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-02-17Merge tag 'block-6.8-2024-02-16' of git://git.kernel.dk/linuxLinus Torvalds1-0/+1
Pull block fixes from Jens Axboe: "Just an nvme pull request via Keith: - Fabrics connection error handling (Chaitanya) - Use relaxed effects to reduce unnecessary queue freezes (Keith)" * tag 'block-6.8-2024-02-16' of git://git.kernel.dk/linux: nvmet: remove superfluous initialization nvme: implement support for relaxed effects nvme-fabrics: fix I/O connect error handling
2024-02-16Merge tag 'trace-v6.8-rc4' of ↵Linus Torvalds1-7/+10
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing fixes from Steven Rostedt: - Fix the #ifndef that didn't have the 'CONFIG_' prefix on HAVE_DYNAMIC_FTRACE_WITH_REGS The fix to have dynamic trampolines work with x86 broke arm64 as the config used in the #ifdef was HAVE_DYNAMIC_FTRACE_WITH_REGS and not CONFIG_HAVE_DYNAMIC_FTRACE_WITH_REGS which removed the fix that the previous fix was to fix. - Fix tracing_on state The code to test if "tracing_on" is set incorrectly used ring_buffer_record_is_on() which returns false if the ring buffer isn't able to be written to. But the ring buffer disable has several bits that disable it. One is internal disabling which is used for resizing and other modifications of the ring buffer. But the "tracing_on" user space visible flag should only report if tracing is actually on and not internally disabled, as this can cause confusion as writing "1" when it is disabled will not enable it. Instead use ring_buffer_record_is_set_on() which shows the user space visible settings. - Fix a false positive kmemleak on saved cmdlines Now that the saved_cmdlines structure is allocated via alloc_page() and not via kmalloc() it has become invisible to kmemleak. The allocation done to one of its pointers was flagged as a dangling allocation leak. Make kmemleak aware of this allocation and free. - Fix synthetic event dynamic strings An update that cleaned up the synthetic event code removed the return value of trace_string(), and had it return zero instead of the length, causing dynamic strings in the synthetic event to always have zero size. - Clean up documentation and header files for seq_buf * tag 'trace-v6.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: seq_buf: Fix kernel documentation seq_buf: Don't use "proxy" headers tracing/synthetic: Fix trace_string() return value tracing: Inform kmemleak of saved_cmdlines allocation tracing: Use ring_buffer_record_is_set_on() in tracer_tracing_is_on() tracing: Fix HAVE_DYNAMIC_FTRACE_WITH_REGS ifdef
2024-02-16Merge tag 'gpio-fixes-for-v6.8-rc5' of ↵Linus Torvalds1-0/+18
git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux Pull gpio fixes from Bartosz Golaszewski: - add missing stubs for functions that are not built with GPIOLIB disabled * tag 'gpio-fixes-for-v6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux: gpiolib: add gpio_device_get_label() stub for !GPIOLIB gpiolib: add gpio_device_get_base() stub for !GPIOLIB gpiolib: add gpiod_to_gpio_device() stub for !GPIOLIB
2024-02-16io_uring/napi: enable even with a timeout of 0Jens Axboe1-0/+1
1 usec is not as short as it used to be, and it makes sense to allow 0 for a busy poll timeout - this means just do one loop to check if we have anything available. Add a separate ->napi_enabled to check if napi has been enabled or not. While at it, move the writing of the ctx napi values after we've copied the old values back to userspace. This ensures that if the call fails, we'll be in the same state as we were before, rather than some indeterminate state. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-15Merge tag 'net-6.8-rc5' of ↵Linus Torvalds2-8/+8
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from can, wireless and netfilter. Current release - regressions: - af_unix: fix task hung while purging oob_skb in GC - pds_core: do not try to run health-thread in VF path Current release - new code bugs: - sched: act_mirred: don't zero blockid when net device is being deleted Previous releases - regressions: - netfilter: - nat: restore default DNAT behavior - nf_tables: fix bidirectional offload, broken when unidirectional offload support was added - openvswitch: limit the number of recursions from action sets - eth: i40e: do not allow untrusted VF to remove administratively set MAC address Previous releases - always broken: - tls: fix races and bugs in use of async crypto - mptcp: prevent data races on some of the main socket fields, fix races in fastopen handling - dpll: fix possible deadlock during netlink dump operation - dsa: lan966x: fix crash when adding interface under a lag when some of the ports are disabled - can: j1939: prevent deadlock by changing j1939_socks_lock to rwlock Misc: - a handful of fixes and reliability improvements for selftests - fix sysfs documentation missing net/ in paths - finish the work of squashing the missing MODULE_DESCRIPTION() warnings in networking" * tag 'net-6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (92 commits) net: fill in MODULE_DESCRIPTION()s for missing arcnet net: fill in MODULE_DESCRIPTION()s for mdio_devres net: fill in MODULE_DESCRIPTION()s for ppp net: fill in MODULE_DESCRIPTION()s for fddik/skfp net: fill in MODULE_DESCRIPTION()s for plip net: fill in MODULE_DESCRIPTION()s for ieee802154/fakelb net: fill in MODULE_DESCRIPTION()s for xen-netback net: ravb: Count packets instead of descriptors in GbEth RX path pppoe: Fix memory leak in pppoe_sendmsg() net: sctp: fix skb leak in sctp_inq_free() net: bcmasp: Handle RX buffer allocation failure net-timestamp: make sk_tskey more predictable in error path selftests: tls: increase the wait in poll_partial_rec_async ice: Add check for lport extraction to LAG init netfilter: nf_tables: fix bidirectional offload regression netfilter: nat: restore default DNAT behavior netfilter: nft_set_pipapo: fix missing : in kdoc igc: Remove temporary workaround igb: Fix string truncation warnings in igb_set_fw_version can: netlink: Fix TDCO calculation using the old data bittiming ...
2024-02-15update workarounds for gcc "asm goto" issueLinus Torvalds2-4/+12
In commit 4356e9f841f7 ("work around gcc bugs with 'asm goto' with outputs") I did the gcc workaround unconditionally, because the cause of the bad code generation wasn't entirely clear. In the meantime, Jakub Jelinek debugged the issue, and has come up with a fix in gcc [2], which also got backported to the still maintained branches of gcc-11, gcc-12 and gcc-13. Note that while the fix technically wasn't in the original gcc-14 branch, Jakub says: "while it is true that no GCC 14 snapshots until today (or whenever the fix will be committed) have the fix, for GCC trunk it is up to the distros to use the latest snapshot if they use it at all and would allow better testing of the kernel code without the workaround, so that if there are other issues they won't be discovered years later. Most userland code doesn't actually use asm goto with outputs..." so we will consider gcc-14 to be fixed - if somebody is using gcc snapshots of the gcc-14 before the fix, they should upgrade. Note that while the bug goes back to gcc-11, in practice other gcc changes seem to have effectively hidden it since gcc-12.1 as per a bisect by Jakub. So even a gcc-14 snapshot without the fix likely doesn't show actual problems. Also, make the default 'asm_goto_output()' macro mark the asm as volatile by hand, because of an unrelated gcc issue [1] where it doesn't match the documented behavior ("asm goto is always volatile"). Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103979 [1] Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113921 [2] Link: https://lore.kernel.org/all/20240208220604.140859-1-seanjc@google.com/ Requested-by: Jakub Jelinek <jakub@redhat.com> Cc: Uros Bizjak <ubizjak@gmail.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Andrew Pinski <quic_apinski@quicinc.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-02-15seq_buf: Fix kernel documentationAndy Shevchenko1-6/+6
There are plenty of issues with the kernel documentation here: - misspelled word "sequence" - different style of returned value descriptions - missed Return sections - unaligned style of ASCII / NUL-terminated / etc - wrong function references Fix all these. Link: https://lkml.kernel.org/r/20240215152506.598340-1-andriy.shevchenko@linux.intel.com Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-02-15seq_buf: Don't use "proxy" headersAndy Shevchenko1-1/+4
Update header inclusions to follow IWYU (Include What You Use) principle. Link: https://lkml.kernel.org/r/20240215142255.400264-1-andriy.shevchenko@linux.intel.com Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-02-15genirq/msi: Provide MSI_FLAG_PARENT_PM_DEVThomas Gleixner1-0/+2
Some platform-MSI implementations require that power management is redirected to the underlying interrupt chip device. To make this work with per device MSI domains provide a new feature flag and let the core code handle the setup of dev->pm_dev when set during device MSI domain creation. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Anup Patel <apatel@ventanamicro.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240127161753.114685-14-apatel@ventanamicro.com
2024-02-15genirq/msi: Provide allocation/free functions for "wired" MSI interruptsThomas Gleixner1-0/+17
To support wire to MSI bridges proper in the MSI core infrastructure it is required to have separate allocation/free interfaces which can be invoked from the regular irqdomain allocaton/free functions. The mechanism for allocation is: - Allocate the next free MSI descriptor index in the domain - Store the hardware interrupt number and the trigger type which was extracted by the irqdomain core from the firmware spec in the MSI descriptor device cookie so it can be retrieved by the underlying interrupt domain and interrupt chip - Use the regular MSI allocation mechanism for the newly allocated index which returns a fully initialized Linux interrupt on succes This works because: - the domains have a fixed size - each hardware interrupt is only allocated once - the underlying domain does not care about the MSI index it only cares about the hardware interrupt number and the trigger type The free function looks up the MSI index in the MSI descriptor of the provided Linux interrupt number and uses the regular index based free functions of the MSI core. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Anup Patel <apatel@ventanamicro.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240127161753.114685-12-apatel@ventanamicro.com
2024-02-15genirq/msi: Optionally use dev->fwnode for device domainThomas Gleixner1-0/+2
To support wire to MSI domains via the MSI infrastructure it is required to use the firmware node of the device which implements this for creating the MSI domain. Otherwise the existing firmware match mechanisms to find the correct irqdomain for a wired interrupt which is connected to a wire to MSI bridge would fail. This cannot be used for the general case because not all devices provide firmware nodes and all regular per device MSI domains are directly associated to the device and have not be searched for. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Anup Patel <apatel@ventanamicro.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240127161753.114685-11-apatel@ventanamicro.com
2024-02-15genirq/msi: Provide DOMAIN_BUS_WIRED_TO_MSIThomas Gleixner1-0/+1
Provide a domain bus token for the upcoming support for wire to MSI device domains so the domain can be distinguished from regular device MSI domains. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Anup Patel <apatel@ventanamicro.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240127161753.114685-10-apatel@ventanamicro.com
2024-02-15genirq/msi: Provide optional translation opThomas Gleixner1-0/+5
irq_create_fwspec_mapping() requires translation of the firmware spec to a hardware interrupt number and the trigger type information. Wired interrupts which are connected to a wire to MSI bridge, like MBIGEN are allocated that way. So far MBIGEN provides a regular irqdomain which then hooks backwards into the MSI infrastructure. That's an unholy mess and will be replaced with per device MSI domains which are regular MSI domains. Interrupts on MSI domains are not supported by irq_create_fwspec_mapping(), but for making the wire to MSI bridges sane it makes sense to provide a special allocation/free interface in the MSI infrastructure. That avoids the backdoors into the core MSI allocation code and just shares all the regular MSI infrastructure. Provide an optional translation callback in msi_domain_ops which can be utilized by these wire to MSI bridges. No other MSI domain should provide a translation callback. The default translation callback of the MSI irqdomains will warn when it is invoked on a non-prepared MSI domain. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Anup Patel <apatel@ventanamicro.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240127161753.114685-8-apatel@ventanamicro.com
2024-02-15platform-msi: Remove unused interfacesThomas Gleixner1-3/+0
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2024-02-15platform-msi: Prepare for real per device domainsThomas Gleixner1-0/+4
Provide functions to create and remove per device MSI domains which replace the platform-MSI domains. The new model is that each of the devices which utilize platform-MSI gets now its private MSI domain which is "customized" in size and with a device specific function to write the MSI message into the device. This is the same functionality as platform-MSI but it avoids all the down sides of platform MSI, i.e. the extra ID book keeping, the special data structure in the msi descriptor. Further the domains are only created when the devices are really in use, so the burden is on the usage and not on the infrastructure. Fill in the domain template and provide two functions to init/allocate and remove a per device MSI domain. Until all users and parent domain providers are converted, the init/alloc function invokes the original platform-MSI code when the irqdomain which is associated to the device does not provide MSI parent functionality yet. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Anup Patel <apatel@ventanamicro.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240127161753.114685-6-apatel@ventanamicro.com
2024-02-15genirq/irqdomain: Add DOMAIN_BUS_DEVICE_MSIThomas Gleixner1-0/+1
Add a new domain bus token to prepare for device MSI which aims to replace the existing platform MSI maze. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Anup Patel <apatel@ventanamicro.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240127161753.114685-5-apatel@ventanamicro.com
2024-02-15genirq/msi: Extend msi_parent_opsThomas Gleixner1-0/+8
Supporting per device MSI domains on ARM64, RISC-V and the zoo of interrupt mechanisms needs a bit more information than what the initial x86 implementation provides. Add the following fields: - required_flags: The flags which a parent domain requires to be set - bus_select_token: The bus token of the parent domain for select() - bus_select_mask: A bitmask of supported child domain bus types This allows to provide library functions which can be shared between various interrupt chip implementations and avoids replicating mostly similar code all over the place. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Anup Patel <apatel@ventanamicro.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240127161753.114685-4-apatel@ventanamicro.com
2024-02-15Merge tag 'mips-fixes_6.8_2' of ↵Linus Torvalds1-0/+4
git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux Pull MIPS fixes from Thomas Bogendoerfer: - Fix for broken ipv6 checksums - Fix handling of exceptions in delay slots * tag 'mips-fixes_6.8_2' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: mm/memory: Use exception ip to search exception tables MIPS: Clear Cause.BD in instruction_pointer_set ptrace: Introduce exception_ip arch hook MIPS: Add 'memory' clobber to csum_ipv6_magic() inline assembler
2024-02-14rcu-tasks: Repair RCU Tasks Trace quiescence checkPaul E. McKenney1-2/+2
The context-switch-time check for RCU Tasks Trace quiescence expects current->trc_reader_special.b.need_qs to be zero, and if so, updates it to TRC_NEED_QS_CHECKED. This is backwards, because if this value is zero, there is no RCU Tasks Trace grace period in flight, an thus no need for a quiescent state. Instead, when a grace period starts, this field is set to TRC_NEED_QS. This commit therefore changes the check from zero to TRC_NEED_QS. Reported-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Tested-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
2024-02-14rcu/sync: remove un-used rcu_sync_enter_start functionOnkarnath1-1/+0
With commit '6a010a49b63a ("cgroup: Make !percpu threadgroup_rwsem operations optional")' usage of rcu_sync_enter_start is removed. So this function can also be removed. In the words of Oleg Nesterov: __rcu_sync_enter(wait => false) is a better alternative if someone needs rcu_sync_enter_start() again. Link: https://lore.kernel.org/all/20220725121208.GB28662@redhat.com/ Signed-off-by: Onkarnath <onkarnath.1@samsung.com> Signed-off-by: Maninder Singh <maninder1.s@samsung.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Tejun Heo <tj@kernel.org> Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
2024-02-14Merge tag 'iio-fixes-for-6.8a' of ↵Greg Kroah-Hartman3-4/+7
http://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio into char-misc-linus Jonathan writes: IIO: 1st set of fixes for the 6.8 cycle Usual mixed bag of issues introduced this cycle and fixes for long term issues that have been identified recently + one case where I messed up a merge resolution and dropped the build file changes. Most important is the userspace ABI fix for the iio_modifier enum where we accidentally added new entries in the middle rather than at the end. IIO Core - Close a memory leak in an error path. - Move LIGHT_UVA and LIGHT_UVB definitions to end of the iio_modifier enum to avoid breaking older userspace. (not yet in a released kernel thankfully). adi,adis - Fix a DMA buffer alignment issue that was missing in series that fixed these across IIO. adi,ad-sigma-delta - Fix a DMA buffer alignment issue that was missing in series that fixed these across IIO. adi,ad4130 - Zero init remaining fields of clock init data. - Only set GPIO control bits on pins that aren't in use for anything else. adi,ad5933 - Fix an old bug due to type mismatch. This is a rare device so good to get some new test coverage. adi,ad7091r - Use right variable for an error return code. bosch,bma400 - Add missing CONFIG_REGMAP_I2C dependency. bosch,bmp280: - Add missing bmp085 ID to the SPI table to avoid mismatch with the of_device_id table. hid-sensors: - Avoid returning an error for timestamp read back that succeeds. pni,rm3100 - Check value read from RM31000_REG_TMRC register is valid before using it. Hardening to avoid a real world issue seen on some faulty hardware. st,st-sensors - Fix a DMA buffer alignment issue that was missing in series that fixed these across IIO. ti,hdc3020 - Add missing Kconfig and Makefile entrees accidentally dropped when patches were applied. - Fix wrong temperature offset (negated) * tag 'iio-fixes-for-6.8a' of http://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio: iio: adc: ad4130: only set GPIO_CTRL if pin is unused iio: adc: ad4130: zero-initialize clock init data iio: accel: bma400: Fix a compilation problem iio: commom: st_sensors: ensure proper DMA alignment iio: hid-sensor-als: Return 0 for HID_USAGE_SENSOR_TIME_TIMESTAMP iio: move LIGHT_UVA and LIGHT_UVB to the end of iio_modifier staging: iio: ad5933: fix type mismatch regression iio: humidity: hdc3020: fix temperature offset iio: adc: ad7091r8: Fix error code in ad7091r8_gpio_setup() iio: adc: ad_sigma_delta: ensure proper DMA alignment iio: imu: adis: ensure proper DMA alignment iio: humidity: hdc3020: Add Makefile, Kconfig and MAINTAINERS entry iio: imu: bno055: serdev requires REGMAP iio: magnetometer: rm3100: add boundary check for the value read from RM3100_REG_TMRC iio: pressure: bmp280: Add missing bmp085 to SPI id table iio: core: fix memleak in iio_device_register_sysfs
2024-02-13nvme: implement support for relaxed effectsKeith Busch1-0/+1
NVM Express TP4167 provides a way for controllers to report a relaxed execution constraint. Specifically, it notifies of exclusivity for IO vs. admin commands instead of grouping these together. If set, then we don't need to freeze IO in order to execute that admin command. The freezing distrupts IO processes, so it's nice to avoid that if the controller tells us it's not necessary. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-02-13block: pass a queue_limits argument to blk_mq_alloc_diskChristoph Hellwig1-3/+4
Pass a queue_limits to blk_mq_alloc_disk and apply it if non-NULL. This will allow allocating queues with valid queue limits instead of setting the values one at a time later. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20240213073425.1621680-11-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-13block: pass a queue_limits argument to blk_mq_init_queueChristoph Hellwig1-1/+2
Pass a queue_limits to blk_mq_init_queue and apply it if non-NULL. This will allow allocating queues with valid queue limits instead of setting the values one at a time later. Also rename the function to blk_mq_alloc_queue as that is a much better name for a function that allocates a queue and always pass the queuedata argument instead of having a separate version for the extra argument. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20240213073425.1621680-10-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-13block: add a max_user_discard_sectors queue limitChristoph Hellwig1-0/+1
Add a new max_user_discard_sectors limit that mirrors max_user_sectors and stores the value that the user manually set. This now allows updates of the max_hw_discard_sectors to not worry about the user limit. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20240213073425.1621680-7-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-13block: add an API to atomically update queue limitsChristoph Hellwig1-0/+23
Add a new queue_limits_{start,commit}_update pair of functions that allows taking an atomic snapshot of queue limits, update it, and commit it if it passes validity checking. Also use the low-level validation helper to implement blk_set_default_limits instead of duplicating the initialization. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20240213073425.1621680-5-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-13block: move max_{open,active}_zones to struct queue_limitsChristoph Hellwig1-6/+6
The maximum number of open and active zones is a limit on the queue and should be places there so that we can including it in the upcoming queue limits batch update API. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20240213073425.1621680-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-13gpiolib: add gpio_device_get_label() stub for !GPIOLIBKrzysztof Kozlowski1-0/+6
Add empty stub of gpio_device_get_label() when GPIOLIB is not enabled. Cc: <stable@vger.kernel.org> Fixes: d1f7728259ef ("gpiolib: provide gpio_device_get_label()") Suggested-by: kernel test robot <lkp@intel.com> Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2024-02-13gpiolib: add gpio_device_get_base() stub for !GPIOLIBKrzysztof Kozlowski1-0/+6
Add empty stub of gpio_device_get_base() when GPIOLIB is not enabled. Cc: <stable@vger.kernel.org> Fixes: 8c85a102fc4e ("gpiolib: provide gpio_device_get_base()") Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2024-02-13gpiolib: add gpiod_to_gpio_device() stub for !GPIOLIBKrzysztof Kozlowski1-0/+6
Add empty stub of gpiod_to_gpio_device() when GPIOLIB is not enabled. Cc: <stable@vger.kernel.org> Fixes: 370232d096e3 ("gpiolib: provide gpiod_to_gpio_device()") Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2024-02-13ptrace: Introduce exception_ip arch hookJiaxun Yang1-0/+4
On architectures with delay slot, architecture level instruction pointer (or program counter) in pt_regs may differ from where exception was triggered. Introduce exception_ip hook to invoke architecture code and determine actual instruction pointer to the exception. Link: https://lore.kernel.org/lkml/00d1b813-c55f-4365-8d81-d70258e10b16@app.fastmail.com/ Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2024-02-12block: support PI at non-zero offset within metadataKanchan Joshi2-0/+2
Block layer integrity processing assumes that protection information (PI) is placed in the first bytes of each metadata block. Remove this limitation and include the metadata before the PI in the calculation of the guard tag. Signed-off-by: Kanchan Joshi <joshi.k@samsung.com> Signed-off-by: Chinmay Gameti <c.gameti@samsung.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20240201130126.211402-3-joshi.k@samsung.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-12block: remove gfp_flags from blkdev_zone_mgmtJohannes Thumshirn1-1/+1
Now that all callers pass in GFP_KERNEL to blkdev_zone_mgmt() and use memalloc_no{io,fs}_{save,restore}() to define the allocation scope, we can drop the gfp_mask parameter from blkdev_zone_mgmt() as well as blkdev_zone_reset_all() and blkdev_zone_reset_all_emulated(). Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Mike Snitzer <snitzer@kernel.org> Link: https://lore.kernel.org/r/20240128-zonefs_nofs-v3-5-ae3b7c8def61@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-12Merge tag 'vfs-6.8-rc5.fixes' of ↵Linus Torvalds1-3/+0
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: - Fix performance regression introduced by moving the security permission hook out of do_clone_file_range() and into its caller vfs_clone_file_range(). This causes the security hook to be called in situation were it wasn't called before as the fast permission checks were left in do_clone_file_range(). Fix this by merging the two implementations back together and restoring the old ordering: fast permission checks first, expensive ones later. - Tweak mount_setattr() permission checking so that mount properties on the real rootfs can be changed. When we added mount_setattr() we added additional checks compared to legacy mount(2). If the mount had a parent then verify that the caller and the mount namespace the mount is attached to match and if not make sure that it's an anonymous mount. But the real rootfs falls into neither category. It is neither an anoymous mount because it is obviously attached to the initial mount namespace but it also obviously doesn't have a parent mount. So that means legacy mount(2) allows changing mount properties on the real rootfs but mount_setattr(2) blocks this. This causes regressions (See the commit for details). Fix this by relaxing the check. If the mount has a parent or if it isn't a detached mount, verify that the mount namespaces of the caller and the mount are the same. Technically, we could probably write this even simpler and check that the mount namespaces match if it isn't a detached mount. But the slightly longer check makes it clearer what conditions one needs to think about. * tag 'vfs-6.8-rc5.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: fs: relax mount_setattr() permission checks remap_range: merge do_clone_file_range() into vfs_clone_file_range()
2024-02-12filelock: always define for_each_file_lock()Jeff Layton1-5/+3
...and eliminate the stub version when CONFIG_FILE_LOCKING is disabled. This silences the following warning that crept in recently: fs/ceph/locks.c: In function 'ceph_count_locks': fs/ceph/locks.c:380:27: error: unused variable 'lock' [-Werror=unused-variable] 380 | struct file_lock *lock; Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202402062210.3YyBVGF1-lkp@intel.com/ Fixes: 75cabec0111b ("filelock: add some new helper functions") Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20240212-flsplit3-v1-1-019f0ad6bf69@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>