| Age | Commit message (Collapse) | Author | Files | Lines |
|
Patch series "initial work on making VMA flags a bitmap", v3.
We are in the rather silly situation that we are running out of VMA flags
as they are currently limited to a system word in size.
This leads to absurd situations where we limit features to 64-bit
architectures only because we simply do not have the ability to add a flag
for 32-bit ones.
This is very constraining and leads to hacks or, in the worst case, simply
an inability to implement features we want for entirely arbitrary reasons.
This also of course gives us something of a Y2K type situation in mm where
we might eventually exhaust all of the VMA flags even on 64-bit systems.
This series lays the groundwork for getting away from this limitation by
establishing VMA flags as a bitmap whose size we can increase in future
beyond 64 bits if required.
This is necessarily a highly iterative process given the extensive use of
VMA flags throughout the kernel, so we start by performing basic steps.
Firstly, we declare VMA flags by bit number rather than by value,
retaining the VM_xxx fields but in terms of these newly introduced
VMA_xxx_BIT fields.
While we are here, we use sparse annotations to ensure that, when dealing
with VMA bit number parameters, we cannot be passed values which are not
declared as such - providing some useful type safety.
We then introduce an opaque VMA flag type, much like the opaque mm_struct
flag type introduced in commit bb6525f2f8c4 ("mm: add bitmap mm->flags
field"), which we establish in union with vma->vm_flags (but still set at
system word size meaning there is no functional or data type size change).
We update the vm_flags_xxx() helpers to use this new bitmap, introducing
sensible helpers to do so.
This series lays the foundation for further work to expand the use of
bitmap VMA flags and eventually eliminate these arbitrary restrictions.
This patch (of 4):
In order to lay the groundwork for VMA flags being a bitmap rather than a
system word in size, we need to be able to consistently refer to VMA flags
by bit number rather than value.
Take this opportunity to do so in an enum which we which is additionally
useful for tooling to extract metadata from.
This additionally makes it very clear which bits are being used for what
at a glance.
We use the VMA_ prefix for the bit values as it is logical to do so since
these reference VMAs. We consistently suffix with _BIT to make it clear
what the values refer to.
We declare bit values even when the flags that use them would not be
enabled by config options as this is simply clearer and clearly defines
what bit numbers are used for what, at no additional cost.
We declare a sparse-bitwise type vma_flag_t which ensures that users can't
pass around invalid VMA flags by accident and prepares for future work
towards VMA flags being a bitmap where we want to ensure bit values are
type safe.
To make life easier, we declare some macro helpers - DECLARE_VMA_BIT()
allows us to avoid duplication in the enum bit number declarations (and
maintaining the sparse __bitwise attribute), and INIT_VM_FLAG() is used to
assist with declaration of flags.
Unfortunately we can't declare both in the enum, as we run into issue with
logic in the kernel requiring that flags are preprocessor definitions, and
additionally we cannot have a macro which declares another macro so we
must define each flag macro directly.
Additionally, update the VMA userland testing vma_internal.h header to
include these changes.
We also have to fix the parameters to the vma_flag_*_atomic() functions
since VMA_MAYBE_GUARD_BIT is now of type vma_flag_t and sparse will
complain otherwise.
We have to update some rather silly if-deffery found in mm/task_mmu.c
which would otherwise break.
Finally, we update the rust binding helper as now it cannot auto-detect
the flags at all.
Link: https://lkml.kernel.org/r/cover.1764064556.git.lorenzo.stoakes@oracle.com
Link: https://lkml.kernel.org/r/3a35e5a0bcfa00e84af24cbafc0653e74deda64a.1764064556.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Pedro Falcato <pfalcato@suse.de>
Acked-by: Alice Ryhl <aliceryhl@google.com> [rust]
Cc: Alex Gaynor <alex.gaynor@gmail.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Andreas Hindborg <a.hindborg@kernel.org>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Björn Roy Baron <bjorn3_gh@protonmail.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Byungchul Park <byungchul@sk.com>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Chris Li <chrisl@kernel.org>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Gary Guo <gary@garyguo.net>
Cc: Gregory Price <gourry@gourry.net>
Cc: "Huang, Ying" <ying.huang@linux.alibaba.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Joshua Hahn <joshua.hahnjy@gmail.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Kairui Song <kasong@tencent.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Kemeng Shi <shikemeng@huaweicloud.com>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Mathew Brost <matthew.brost@intel.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Mel Gorman <mgorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Nico Pache <npache@redhat.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Peter Xu <peterx@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Rakie Kim <rakie.kim@sk.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Trevor Gross <tmgross@umich.edu>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Wei Xu <weixugc@google.com>
Cc: xu xin <xu.xin16@zte.com.cn>
Cc: Yuanchu Xie <yuanchu@google.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Ritesh reported that timeouts occurred frequently for rqspinlock despite
reentrancy on the same lock on the same CPU in [0]. This patch closes
one of the races leading to this behavior, and reduces the frequency of
timeouts.
We currently have a tiny window between the fast-path cmpxchg and the
grabbing of the lock entry where an NMI could land, attempt the same
lock that was just acquired, and end up timing out. This is not ideal.
Instead, move the lock entry acquisition from the fast path to before
the cmpxchg, and remove the grabbing of the lock entry in the slow path,
assuming it was already taken by the fast path. The TAS fallback is
invoked directly without being preceded by the typical fast path,
therefore we must continue to grab the deadlock detection entry in that
case.
Case on lock leading to missed AA:
cmpxchg lock A
<NMI>
... rqspinlock acquisition of A
... timeout
</NMI>
grab_held_lock_entry(A)
There is a similar case when unlocking the lock. If the NMI lands
between the WRITE_ONCE and smp_store_release, it is possible that we end
up in a situation where the NMI fails to diagnose the AA condition,
leading to a timeout.
Case on unlock leading to missed AA:
WRITE_ONCE(rqh->locks[rqh->cnt - 1], NULL)
<NMI>
... rqspinlock acquisition of A
... timeout
</NMI>
smp_store_release(A->locked, 0)
The patch changes the order on unlock to smp_store_release() succeeded
by WRITE_ONCE() of NULL. This avoids the missed AA detection described
above, but may lead to a false positive if the NMI lands between these
two statements, which is acceptable (and preferred over a timeout).
The original intention of the reverse order on unlock was to prevent the
following possible misdiagnosis of an ABBA scenario:
grab entry A
lock A
grab entry B
lock B
unlock B
smp_store_release(B->locked, 0)
grab entry B
lock B
grab entry A
lock A
! <detect ABBA>
WRITE_ONCE(rqh->locks[rqh->cnt - 1], NULL)
If the store release were is after the WRITE_ONCE, the other CPU would
not observe B in the table of the CPU unlocking the lock B. However,
since the threads are obviously participating in an ABBA deadlock, it
is no longer appealing to use the order above since it may lead to a
250 ms timeout due to missed AA detection.
[0]: https://lore.kernel.org/bpf/CAH6OuBTjG+N=+GGwcpOUbeDN563oz4iVcU3rbse68egp9wj9_A@mail.gmail.com
Fixes: 0d80e7f951be ("rqspinlock: Choose trylock fallback for NMI waiters")
Reported-by: Ritesh Oedayrajsingh Varma <ritesh@superluminal.eu>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20251128232802.1031906-2-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Raw NAND changes:
* The major change in this MR will be the support for the Allwinner H616
NAND controller, which lead to numerous changes and cleanups in the
driver.
* Another notable change on this driver is the use of
field_get()/field_prep(), but since the global support for this
helpers is going to be merged in the same release as we start using
these helpers, it implies undefining them in the first place to avoid
warnings. Depending on the merging order (Yuri's bitmap branch or
mtd/next), a temporary warning may arise.
* Marvell drivers layout handling changes have also landed, they fix
previous definitions and abuses that have been made previously, which
implied to relax the ECC parameters validation in the core a bit.
* The Cadence NAND controller driver gets NV-DDR interface support.
SPI NAND changes:
* Support for FudanMicro FM25S01BI3 and ESMT F50L1G41LC is added.
Aside from these main changes, there is the usual load of fixes and API
updates.
|
|
Change an empty line into a blank kernel-doc line to prevent
a kernel-doc warning:
Warning: ../include/uapi/linux/i2c.h:38 bad line:
Fixes: bfb3939c51d5 ("i2c: refactor documentation of struct i2c_msg")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
|
|
The CAN bus support enabled with CONFIG_CAN provides a socket-based
access to CAN interfaces. With the introduction of the latest CAN protocol
CAN XL additional configuration status information needs to be exposed to
the network layer than formerly provided by standard Linux network drivers.
This requires the CAN driver infrastructure to be selected by default.
As the CAN network layer can only operate on CAN interfaces anyway all
distributions and common default configs enable at least one CAN driver.
So selecting CONFIG_CAN_DEV when CONFIG_CAN is selected by the user has
no effect on established configurations but solves potential build issues
when CONFIG_CAN[_XXX]=y is set together with CANFIG_CAN_DEV=m
Fixes: 1a620a723853 ("can: raw: instantly reject unsupported CAN frames")
Reported-by: Vincent Mailhol <mailhol@kernel.org>
Closes: https://lore.kernel.org/all/CAMZ6RqL_nGszwoLPXn1Li8op-ox4k3Hs6p=Hw6+w0W=DTtobPw@mail.gmail.com/
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202511280531.YnWW2Rxc-lkp@intel.com/
Closes: https://lore.kernel.org/oe-kbuild-all/202511280842.djCQ0N0O-lkp@intel.com/
Closes: https://lore.kernel.org/oe-kbuild-all/202511282325.uVQFRTkA-lkp@intel.com/
Closes: https://lore.kernel.org/oe-kbuild-all/202511291520.guIE1QHj-lkp@intel.com/
Suggested-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Link: https://patch.msgid.link/20251129090500.17484-1-socketcan@hartkopp.net
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next
Pablo Neira Ayuso says:
====================
Netfilter updates for net-next
The following batch contains Netfilter updates for net-next:
0) Add sanity check for maximum encapsulations in bridge vlan,
reported by the new AI robot.
1) Move the flowtable path discovery code to its own file, the
nft_flow_offload.c mixes the nf_tables evaluation with the path
discovery logic, just split this in two for clarity.
2) Consolidate flowtable xmit path by using dev_queue_xmit() and the
real device behind the layer 2 vlan/pppoe device. This allows to
inline encapsulation. After this update, hw_ifidx can be removed
since both ifidx and hw_ifidx now point to the same device.
3) Support for IPIP encapsulation in the flowtable, extend selftest
to cover for this new layer 3 offload, from Lorenzo Bianconi.
4) Push down the skb into the conncount API to fix duplicates in the
conncount list for packets with non-confirmed conntrack entries,
this is due to an optimization introduced in d265929930e2
("netfilter: nf_conncount: reduce unnecessary GC").
From Fernando Fernandez Mancera.
5) In conncount, disable BH when performing garbage collection
to consolidate existing behaviour in the conncount API, also
from Fernando.
6) A matching packet with a confirmed conntrack invokes GC if
conncount reaches the limit in an attempt to release slots.
This allows the existing extensions to be used for real conntrack
counting, not just limiting new connections, from Fernando.
7) Support for updating ct count objects in nf_tables, from Fernando.
8) Extend nft_flowtables.sh selftest to send IPv6 TCP traffic,
from Lorenzo Bianconi.
9) Fixes for UAPI kernel-doc documentation, from Randy Dunlap.
* tag 'nf-next-25-11-28' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
netfilter: nf_tables: improve UAPI kernel-doc comments
netfilter: ip6t_srh: fix UAPI kernel-doc comments format
selftests: netfilter: nft_flowtable.sh: Add the capability to send IPv6 TCP traffic
netfilter: nft_connlimit: add support to object update operation
netfilter: nft_connlimit: update the count if add was skipped
netfilter: nf_conncount: make nf_conncount_gc_list() to disable BH
netfilter: nf_conncount: rework API to use sk_buff directly
selftests: netfilter: nft_flowtable.sh: Add IPIP flowtable selftest
netfilter: flowtable: Add IPIP tx sw acceleration
netfilter: flowtable: Add IPIP rx sw acceleration
netfilter: flowtable: use tuple address to calculate next hop
netfilter: flowtable: remove hw_ifidx
netfilter: flowtable: inline pppoe encapsulation in xmit path
netfilter: flowtable: inline vlan encapsulation in xmit path
netfilter: flowtable: consolidate xmit path
netfilter: flowtable: move path discovery infrastructure to its own file
netfilter: flowtable: check for maximum number of encapsulations in bridge vlan
====================
Link: https://patch.msgid.link/20251128002345.29378-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next
Johannes Berg says:
====================
Apart from the usual small things just driver updates:
- mt76:
- WED support for >32-bit DMA
- airoha NPU support
- regdomain improvements
- continued WiFi7/MLO work
- rtw89
- support USB devices RTL8852AU and RTL8852CU
- initial work for RTL8922DE
- improved injection support
- rtl8xxxu: 40 MHz connection fixes/support
- brcmfmac: Acer A1 840 tablet quirk
* tag 'wireless-next-2025-11-27' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (152 commits)
wifi: mac80211: allow sharing identical chanctx for S1G interfaces
wifi: nl80211: vendor-cmd: intel: fix a blank kernel-doc line warning
wifi: cfg80211: include s1g_primary_2mhz when comparing chandefs
wifi: cfg80211: include s1g_primary_2mhz when sending chandef
wifi: ieee80211: correct FILS status codes
mt76: mt7615: Fix memory leak in mt7615_mcu_wtbl_sta_add()
wifi: mt76: mt792x: fix wifi init fail by setting MCU_RUNNING after CLC load
wifi: mt76: Strip whitespace from build ddate
wifi: mt76: mt7996: Add missing locking in mt7996_mac_sta_rc_work()
wifi: mt76: mt7996: skip ieee80211_iter_keys() on scanning link remove
wifi: mt76: mt7996: skip deflink accounting for offchannel links
wifi: mt76: Move mt76_abort_scan out of mt76_reset_device()
wifi: mt76: mt7996: move mt7996_update_beacons under mt76 mutex
wifi: mt76: mt7996: grab mt76 mutex in mt7996_mac_sta_event()
wifi: mt76: mt7925: ensure the 6GHz A-MPDU density cap from the hardware.
wifi: mt76: mt7996: fix EMI rings for RRO
wifi: mt76: mt7996: fix using wrong phy to start in mt7996_mac_restart()
wifi: mt76: mt7996: fix MLO set key and group key issues
wifi: mt76: mt7996: fix MLD group index assignment
wifi: mt76: mt7996: use correct link_id when filling TXD and TXP
...
====================
Link: https://patch.msgid.link/20251127103806.17776-3-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Switch to use i3c_xfer instead of i3c_priv_xfer because framework update to
support HDR mode. i3c_priv_xfer is now an alias of i3c_xfer.
Signed-off-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20251106-i3c_ddr-v11-2-33a6a66ed095@nxp.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
|
|
Rename struct i3c_priv_xfer to struct i3c_xfer, since private xfer in the
I3C spec refers only to SDR transfers. Ref: i3c spec ver1.2, section 3,
Technical Overview.
i3c_xfer will be used for both SDR and HDR.
Rename enum i3c_hdr_mode to i3c_xfer_mode. Previous definition need match
CCC GET_CAP1 bit position. Use 31 as SDR transfer mode.
Add i3c_device_do_xfers() with an xfer mode argument, while keeping
i3c_device_do_priv_xfers() as a wrapper that calls i3c_device_do_xfers()
with I3C_SDR for backward compatibility.
Introduce a 'cmd' field in struct i3c_xfer as an anonymous union with
'rnw', since HDR mode uses read/write commands instead of the SDR address
bit.
Add .i3c_xfers() callback for master controllers. If not implemented, fall
back to SDR with .priv_xfers(). The .priv_xfers() API can be removed once
all controllers switch to .i3c_xfers().
Add 'mode_mask' bitmask to advertise controller capability.
Signed-off-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20251106-i3c_ddr-v11-1-33a6a66ed095@nxp.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
|
|
Device specific VFIO driver variant for Xe will implement VF migration.
Export everything that's needed for migration ops.
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251127093934.1462188-4-michal.winiarski@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char / misc / IIO fixes from Greg KH:
"Here are some much-delayed char/misc/iio driver fixes for 6.18-rc8.
Fixes in here include:
- lots of iio driver bugfixes for reported issues.
- counter driver bugfix
- slimbus driver bugfix
- mei tiny bugfix
- nvmem layout uevent bugfix
All of these have been in linux-next for a while, but due to travel on
my side, I haven't had a chance to get them to you"
* tag 'char-misc-6.18-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (23 commits)
nvmem: layouts: fix nvmem_layout_bus_uevent
iio: accel: bmc150: Fix irq assumption regression
most: usb: fix double free on late probe failure
slimbus: ngd: Fix reference count leak in qcom_slim_ngd_notify_slaves
firmware: stratix10-svc: fix bug in saving controller data
mei: fix error flow in probe
iio: st_lsm6dsx: Fixed calibrated timestamp calculation
iio: humditiy: hdc3020: fix units for thresholds and hysteresis
iio: humditiy: hdc3020: fix units for temperature and humidity measurement
iio: imu: st_lsm6dsx: fix array size for st_lsm6dsx_settings fields
iio: accel: fix ADXL355 startup race condition
iio: adc: ad7124: fix temperature channel
iio:common:ssp_sensors: Fix an error handling path ssp_probe()
iio: adc: ad7280a: fix ad7280_store_balance_timer()
iio: buffer-dmaengine: enable .get_dma_dev()
iio: buffer-dma: support getting the DMA channel
iio: buffer: support getting dma channel from the buffer
iio: pressure: bmp280: correct meas_time_us calculation
iio: adc: stm32-dfsdm: fix st,adc-alt-channel property handling
iio: adc: ad7380: fix SPI offload trigger rate
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB/Thunderbolt fixes from Greg KH:
"Here are some last-minutes USB and Thunderbolt driver fixes and new
device ids for 6.18-rc8. Included in here are:
- usb storage quirk fixup
- xhci driver fixes for reported issues
- usb gadget driver fixes
- dwc3 driver fixes
- UAS driver fixup
- thunderbolt new device ids
- usb-serial driver new ids
All of these have been in linux-next with no reported issues, many for
many weeks"
* tag 'usb-6.18-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (21 commits)
usb: gadget: renesas_usbf: Handle devm_pm_runtime_enable() errors
USB: storage: Remove subclass and protocol overrides from Novatek quirk
usb: uas: fix urb unmapping issue when the uas device is remove during ongoing data transfer
usb: dwc3: Fix race condition between concurrent dwc3_remove_requests() call paths
xhci: dbgtty: fix device unregister
usb: storage: sddr55: Reject out-of-bound new_pba
USB: serial: option: add support for Rolling RW101R-GL
usb: typec: ucsi: psy: Set max current to zero when disconnected
usb: gadget: f_eem: Fix memory leak in eem_unwrap
usb: dwc3: pci: Sort out the Intel device IDs
usb: dwc3: pci: add support for the Intel Nova Lake -S
drivers/usb/dwc3: fix PCI parent check
usb: storage: Fix memory leak in USB bulk transport
xhci: sideband: Fix race condition in sideband unregister
xhci: dbgtty: Fix data corruption when transmitting data form DbC to host
xhci: fix stale flag preventig URBs after link state error is cleared
USB: serial: ftdi_sio: add support for u-blox EVK-M101
usb: cdns3: Fix double resource release in cdns3_pci_probe
usb: gadget: udc: fix use-after-free in usb_gadget_state_work
usb: renesas_usbhs: Fix synchronous external abort on unbind
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jassibrar/mailbox
Pull mailbox fixes from Jassi Brar:
- omap: check for pending msgs only when mbox is exclusive
- mailbox-test: debugfs_create_dir error checking
- mtk:
- cmdq: fix DMA address handling
- gpueb: Add missing 'static' to mailbox ops struct
- pcc: don't zero error register
- th1520: fix clock imbalance on probe failure
* tag 'mailbox-fixes-v6.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/jassibrar/mailbox:
mailbox: th1520: fix clock imbalance on probe failure
mailbox: pcc: don't zero error register
mailbox: mtk-gpueb: Add missing 'static' to mailbox ops struct
mailbox: mtk-cmdq: Refine DMA address handling for the command buffer
mailbox: mailbox-test: Fix debugfs_create_dir error checking
mailbox: omap-mailbox: Check for pending msgs only when mbox is exclusive
|
|
Speculative prefetches from CPU to GPU memory until the GPU is
ready after reset can cause harmless corrected RAS events to
be logged on Grace systems. It is thus preferred that the
mapping not be re-established until the GPU is ready post reset.
The GPU readiness can be checked through BAR0 registers similar
to the checking at the time of device probe.
It can take several seconds for the GPU to be ready. So it is
desirable that the time overlaps as much of the VM startup as
possible to reduce impact on the VM bootup time. The GPU
readiness state is thus checked on the first fault/huge_fault
request or read/write access which amortizes the GPU readiness
time.
The first fault and read/write checks the GPU state when the
reset_done flag - which denotes whether the GPU has just been
reset. The memory_lock is taken across map/access to avoid
races with GPU reset.
Also check if the memory is enabled, before waiting for GPU
to be ready. Otherwise the readiness check would block for 30s.
Lastly added PM handling wrapping on read/write access.
Cc: Shameer Kolothum <skolothumtho@nvidia.com>
Cc: Alex Williamson <alex@shazbot.org>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Vikram Sethi <vsethi@nvidia.com>
Reviewed-by: Shameer Kolothum <skolothumtho@nvidia.com>
Suggested-by: Alex Williamson <alex@shazbot.org>
Signed-off-by: Ankit Agrawal <ankita@nvidia.com>
Link: https://lore.kernel.org/r/20251127170632.3477-7-ankita@nvidia.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
|
|
Refactor vfio_pci_mmap_huge_fault to take out the implementation
to map the VMA to the PTE/PMD/PUD as a separate function.
Export the new function to be used by nvgrace-gpu module.
Move the alignment check code to verify that pfn and VMA VA is
aligned to the page order to the header file and make it inline.
No functional change is intended.
Cc: Shameer Kolothum <skolothumtho@nvidia.com>
Cc: Alex Williamson <alex@shazbot.org>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Reviewed-by: Shameer Kolothum <skolothumtho@nvidia.com>
Signed-off-by: Ankit Agrawal <ankita@nvidia.com>
Link: https://lore.kernel.org/r/20251127170632.3477-2-ankita@nvidia.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
|
|
Thanks to a device generating an ACS violation during bus reset,
lockdep reported the following circular locking issue:
CPU0: SET_IRQS (MSI/X): holds igate, acquires memory_lock
CPU1: HOT_RESET: holds memory_lock, acquires pci_bus_sem
CPU2: AER: holds pci_bus_sem, acquires igate
This results in a potential 3-way deadlock.
Remove the pci_bus_sem->igate leg of the triangle by using RCU
to peek at the eventfd rather than locking it with igate.
Fixes: 3be3a074cf5b ("vfio-pci: Don't use device_lock around AER interrupt setup")
Signed-off-by: Alex Williamson <alex.williamson@nvidia.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20251124223623.2770706-1-alex@shazbot.org
Signed-off-by: Alex Williamson <alex@shazbot.org>
|
|
Modify kernel-doc comments in sbitmap.h to prevent warnings:
Warning: include/linux/sbitmap.h:84 struct member 'alloc_hint' not
described in 'sbitmap'
Warning: include/linux/sbitmap.h:151 struct member 'ws_active' not
described in 'sbitmap_queue'
Warning: include/linux/sbitmap.h:552 No description found for
return value of 'sbq_wait_ptr'
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Add __kfifo_alloc_node() by refactoring and reusing __kfifo_alloc(),
and define kfifo_alloc_node() macro to support NUMA-aware memory
allocation.
The new __kfifo_alloc_node() function accepts a NUMA node parameter
and uses kmalloc_array_node() instead of kmalloc_array() for
node-specific allocation. The existing __kfifo_alloc() now calls
__kfifo_alloc_node() with NUMA_NO_NODE to maintain backward
compatibility.
This enables users to allocate kfifo buffers on specific NUMA nodes,
which is important for performance in NUMA systems where the kfifo
will be primarily accessed by threads running on specific nodes.
Cc: Stefani Seibold <stefani@seibold.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
This is just apply Kuai's patch in [1] with mirror changes.
blk_mq_realloc_hw_ctxs() will free the 'queue_hw_ctx'(e.g. undate
submit_queues through configfs for null_blk), while it might still be
used from other context(e.g. switch elevator to none):
t1 t2
elevator_switch
blk_mq_unquiesce_queue
blk_mq_run_hw_queues
queue_for_each_hw_ctx
// assembly code for hctx = (q)->queue_hw_ctx[i]
mov 0x48(%rbp),%rdx -> read old queue_hw_ctx
__blk_mq_update_nr_hw_queues
blk_mq_realloc_hw_ctxs
hctxs = q->queue_hw_ctx
q->queue_hw_ctx = new_hctxs
kfree(hctxs)
movslq %ebx,%rax
mov (%rdx,%rax,8),%rdi ->uaf
This problem was found by code review, and I comfirmed that the concurrent
scenario do exist(specifically 'q->queue_hw_ctx' can be changed during
blk_mq_run_hw_queues()), however, the uaf problem hasn't been repoduced yet
without hacking the kernel.
Sicne the queue is freezed in __blk_mq_update_nr_hw_queues(), fix the
problem by protecting 'queue_hw_ctx' through rcu where it can be accessed
without grabbing 'q_usage_counter'.
[1] https://lore.kernel.org/all/20220225072053.2472431-1-yukuai3@huawei.com/
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Fengnan Chang <changfengnan@bytedance.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
After commit 4e5cc99e1e48 ("blk-mq: manage hctx map via xarray"), we use
an xarray instead of array to store hctx, but in poll mode, each time
in blk_mq_poll, we need use xa_load to find corresponding hctx, this
introduce some costs. In my test, xa_load may cost 3.8% cpu.
This patch revert previous change, eliminates the overhead of xa_load
and can result in a 3% performance improvement.
Signed-off-by: Fengnan Chang <changfengnan@bytedance.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Merge PM QoS updates and a cpupower utility update for 6.19-rc1:
- Introduce and document a QoS limit on CPU exit latency during wakeup
from suspend-to-idle (Ulf Hansson)
- Add support for building libcpupower statically (Zuo An)
* pm-qos:
Documentation: power/cpuidle: Document the CPU system wakeup latency QoS
cpuidle: Respect the CPU system wakeup QoS limit for cpuidle
sched: idle: Respect the CPU system wakeup QoS limit for s2idle
pmdomain: Respect the CPU system wakeup QoS limit for cpuidle
pmdomain: Respect the CPU system wakeup QoS limit for s2idle
PM: QoS: Introduce a CPU system wakeup QoS limit
* pm-tools:
tools/power/cpupower: Support building libcpupower statically
|
|
'for-next/efi-preempt', 'for-next/assembler-macro', 'for-next/typos', 'for-next/sme-ptrace-disable', 'for-next/local-tlbi-page-reused', 'for-next/mpam', 'for-next/acpi' and 'for-next/documentation', remote-tracking branch 'arm64/for-next/perf' into for-next/core
* arm64/for-next/perf:
perf: arm_spe: Add support for filtering on data source
perf: Add perf_event_attr::config4
perf/imx_ddr: Add support for PMU in DB (system interconnects)
perf/imx_ddr: Get and enable optional clks
perf/imx_ddr: Move ida_alloc() from ddr_perf_init() to ddr_perf_probe()
dt-bindings: perf: fsl-imx-ddr: Add compatible string for i.MX8QM, i.MX8QXP and i.MX8DXL
arch_topology: Provide a stub topology_core_has_smt() for !CONFIG_GENERIC_ARCH_TOPOLOGY
perf/arm-ni: Fix and optimise register offset calculation
perf: arm_pmuv3: Add new Cortex and C1 CPU PMUs
perf: arm_cspmu: fix error handling in arm_cspmu_impl_unregister()
perf/arm-ni: Add NoC S3 support
perf/arm_cspmu: nvidia: Add pmevfiltr2 support
perf/arm_cspmu: nvidia: Add revision id matching
perf/arm_cspmu: Add pmpidr support
perf/arm_cspmu: Add callback to reset filter config
perf: arm_pmuv3: Don't use PMCCNTR_EL0 on SMT cores
* for-next/misc:
: Miscellaneous patches
arm64: atomics: lse: Remove unused parameters from ATOMIC_FETCH_OP_AND macros
arm64: remove duplicate ARCH_HAS_MEM_ENCRYPT
arm64: mm: use untagged address to calculate page index
arm64: mm: make linear mapping permission update more robust for patial range
arm64/mm: Elide TLB flush in certain pte protection transitions
arm64/mm: Rename try_pgd_pgtable_alloc_init_mm
arm64/mm: Allow __create_pgd_mapping() to propagate pgtable_alloc() errors
arm64: add unlikely hint to MTE async fault check in el0_svc_common
arm64: acpi: add newline to deferred APEI warning
arm64: entry: Clean out some indirection
arm64/mm: Ensure PGD_SIZE is aligned to 64 bytes when PA_BITS = 52
arm64/mm: Drop cpu_set_[default|idmap]_tcr_t0sz()
arm64: remove unused ARCH_PFN_OFFSET
arm64: use SOFTIRQ_ON_OWN_STACK for enabling softirq stack
arm64: Remove assertion on CONFIG_VMAP_STACK
* for-next/kselftest:
: arm64 kselftest patches
kselftest/arm64: Align zt-test register dumps
* for-next/efi-preempt:
: arm64: Make EFI calls preemptible
arm64/efi: Call EFI runtime services without disabling preemption
arm64/efi: Move uaccess en/disable out of efi_set_pgd()
arm64/efi: Drop efi_rt_lock spinlock from EFI arch wrapper
arm64/fpsimd: Permit kernel mode NEON with IRQs off
arm64/fpsimd: Don't warn when EFI execution context is preemptible
efi/runtime-wrappers: Keep track of the efi_runtime_lock owner
efi: Add missing static initializer for efi_mm::cpus_allowed_lock
* for-next/assembler-macro:
: arm64: Replace __ASSEMBLY__ with __ASSEMBLER__ in headers
arm64: Replace __ASSEMBLY__ with __ASSEMBLER__ in non-uapi headers
arm64: Replace __ASSEMBLY__ with __ASSEMBLER__ in uapi headers
* for-next/typos:
: Random typo/spelling fixes
arm64: Fix double word in comments
arm64: Fix typos and spelling errors in comments
* for-next/sme-ptrace-disable:
: Support disabling streaming mode via ptrace on SME only systems
kselftest/arm64: Cover disabling streaming mode without SVE in fp-ptrace
kselftst/arm64: Test NT_ARM_SVE FPSIMD format writes on non-SVE systems
arm64/sme: Support disabling streaming mode via ptrace on SME only systems
* for-next/local-tlbi-page-reused:
: arm64, mm: avoid TLBI broadcast if page reused in write fault
arm64, tlbflush: don't TLBI broadcast if page reused in write fault
mm: add spurious fault fixing support for huge pmd
* for-next/mpam: (34 commits)
: Basic Arm MPAM driver (more to follow)
MAINTAINERS: new entry for MPAM Driver
arm_mpam: Add kunit tests for props_mismatch()
arm_mpam: Add kunit test for bitmap reset
arm_mpam: Add helper to reset saved mbwu state
arm_mpam: Use long MBWU counters if supported
arm_mpam: Probe for long/lwd mbwu counters
arm_mpam: Consider overflow in bandwidth counter state
arm_mpam: Track bandwidth counter state for power management
arm_mpam: Add mpam_msmon_read() to read monitor value
arm_mpam: Add helpers to allocate monitors
arm_mpam: Probe and reset the rest of the features
arm_mpam: Allow configuration to be applied and restored during cpu online
arm_mpam: Use a static key to indicate when mpam is enabled
arm_mpam: Register and enable IRQs
arm_mpam: Extend reset logic to allow devices to be reset any time
arm_mpam: Add a helper to touch an MSC from any CPU
arm_mpam: Reset MSC controls from cpuhp callbacks
arm_mpam: Merge supported features during mpam_enable() into mpam_class
arm_mpam: Probe the hardware features resctrl supports
arm_mpam: Add helpers for managing the locking around the mon_sel registers
...
* for-next/acpi:
: arm64 acpi updates
ACPI: GTDT: Get rid of acpi_arch_timer_mem_init()
* for-next/documentation:
: arm64 Documentation updates
Documentation/arm64: Fix the typo of register names
|
|
Merge energy model management updates and operating performance points
(OPP) library changes for 6.19-rc1:
- Add support for sending netlink notifications to user space on energy
model updates (Changwoo Mini, Peng Fan)
- Minor improvements to the Rust OPP interface (Tamir Duberstein)
- Fixes to scope-based pointers in the OPP library (Viresh Kumar)
* pm-em:
PM: EM: Add to em_pd_list only when no failure
PM: EM: Notify an event when the performance domain changes
PM: EM: Implement em_notify_pd_created/updated()
PM: EM: Implement em_notify_pd_deleted()
PM: EM: Implement em_nl_get_pd_table_doit()
PM: EM: Implement em_nl_get_pds_doit()
PM: EM: Add an iterator and accessor for the performance domain
PM: EM: Add a skeleton code for netlink notification
PM: EM: Add em.yaml and autogen files
PM: EM: Expose the ID of a performance domain via debugfs
PM: EM: Assign a unique ID when creating a performance domain
* pm-opp:
rust: opp: simplify callers of `to_c_str_array`
OPP: Initialize scope-based pointers inline
rust: opp: fix broken rustdoc link
|
|
GCE can only fetch the command buffer address from a 32-bit register.
Some SoCs support a 35-bit command buffer address for GCE, which
requires a right shift of 3 bits before setting the address into
the 32-bit register. A comment has been added to the header of
cmdq_get_shift_pa() to explain this requirement.
To prevent the GCE command buffer address from being DMA mapped beyond
its supported bit range, the DMA bit mask for the device is set during
initialization.
Additionally, to ensure the correct shift is applied when setting or
reading the register that stores the GCE command buffer address,
new APIs, cmdq_convert_gce_addr() and cmdq_revert_gce_addr(), have
been introduced for consistent operations on this register.
The variable type for the command buffer address has been standardized
to dma_addr_t to prevent handling issues caused by type mismatches.
Fixes: 0858fde496f8 ("mailbox: cmdq: variablize address shift in platform")
Signed-off-by: Jason-JH Lin <jason-jh.lin@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: Jassi Brar <jassisinghbrar@gmail.com>
|
|
Merge cpuidle and power capping updates for 6.19-rc1:
- Use residency threshold in polling state override decisions in the
menu cpuidle governor (Aboorva Devarajan)
- Add sanity check for exit latency and target residency in the cpufreq
core (Rafael Wysocki)
- Use this_cpu_ptr() where possible in the teo governor (Christian
Loehle)
- Rework the handling of tick wakeups in the teo cpuidle governor to
increase the likelihood of stopping the scheduler tick in the cases
when tick wakeups can be counted as non-timer ones (Rafael Wysocki)
- Fix a reverse condition in the teo cpuidle governor and drop a
misguided target residency check from it (Rafael Wysocki)
- Clean up muliple minor defects in the teo cpuidle governor (Rafael
Wysocki)
- Update header inclusion to make it follow the Include What You Use
principle (Andy Shevchenko)
- Enable MSR-based RAPL PMU support in the intel_rapl power capping
driver and arrange for using it on the Panther Lake and Wildcat Lake
processors (Kuppuswamy Sathyanarayanan)
- Add support for Nova Lake and Wildcat Lake processors to the
intel_rapl power capping driver (Kaushlendra Kumar, Srinivas
Pandruvada)
* pm-cpuidle:
cpuidle: Warn instead of bailing out if target residency check fails
cpuidle: Update header inclusion
cpuidle: governors: teo: Add missing space to the description
cpuidle: governors: teo: Simplify intercepts-based state lookup
cpuidle: governors: teo: Fix tick_intercepts handling in teo_update()
cpuidle: governors: teo: Rework the handling of tick wakeups
cpuidle: governors: teo: Decay metrics below DECAY_SHIFT threshold
cpuidle: governors: teo: Use s64 consistently in teo_update()
cpuidle: governors: teo: Drop redundant function parameter
cpuidle: governors: teo: Drop misguided target residency check
cpuidle: teo: Use this_cpu_ptr() where possible
cpuidle: Add sanity check for exit latency and target residency
cpuidle: menu: Use residency threshold in polling state override decisions
* pm-powercap:
powercap: intel_rapl: Enable MSR-based RAPL PMU support
powercap: intel_rapl: Prepare read_raw() interface for atomic-context callers
powercap: intel_rapl: Add support for Nova Lake processors
powercap: intel_rapl: Add support for Wildcat Lake platform
|
|
Merge updates related to system suspend and hibernation for 6.19-rc1:
- Replace snprintf() with scnprintf() in show_trace_dev_match()
(Kaushlendra Kumar)
- Fix memory allocation error handling in pm_vt_switch_required()
(Malaya Kumar Rout)
- Introduce CALL_PM_OP() macro and use it to simplify code in
generic PM operations (Kaushlendra Kumar)
- Add module param to backtrace all CPUs in the device power management
watchdog (Sergey Senozhatsky)
- Rework message printing in swsusp_save() (Rafael Wysocki)
- Make it possible to change the number of hibernation compression
threads (Xueqin Luo)
- Clarify that only cgroup1 freezer uses PM freezer (Tejun Heo)
- Add document on debugging shutdown hangs to PM documentation and
correct a mistaken configuration option in it (Mario Limonciello)
- Shut down wakeup source timer before removing the wakeup source from
the list (Kaushlendra Kumar, Rafael Wysocki)
- Introduce new PMSG_POWEROFF event for system shutdown handling with
the help of PM device callbacks (Mario Limonciello)
- Make pm_test delay interruptible by wakeup events (Riwen Lu)
- Clean up kernel-doc comment style usage in the core hibernation
code and remove unuseful comments from it (Sunday Adelodun, Rafael
Wysocki)
- Add support for handling wakeup events and aborting the suspend
process while it is syncing file systems (Samuel Wu, Rafael Wysocki)
* pm-sleep: (21 commits)
PM: hibernate: Extra cleanup of comments in swap handling code
PM: sleep: Call pm_sleep_fs_sync() instead of ksys_sync_helper()
PM: sleep: Add support for wakeup during filesystem sync
PM: hibernate: Clean up kernel-doc comment style usage
PM: suspend: Make pm_test delay interruptible by wakeup events
usb: sl811-hcd: Add PM_EVENT_POWEROFF into suspend callbacks
scsi: Add PM_EVENT_POWEROFF into suspend callbacks
PM: Introduce new PMSG_POWEROFF event
PM: wakeup: Update after recent wakeup source removal ordering change
PM: wakeup: Delete timer before removing wakeup source from list
Documentation: power: Correct a mistaken configuration option
Documentation: power: Add document on debugging shutdown hangs
freezer: Clarify that only cgroup1 freezer uses PM freezer
PM: hibernate: add sysfs interface for hibernate_compression_threads
PM: hibernate: make compression threads configurable
PM: hibernate: dynamically allocate crc->unc_len/unc for configurable threads
PM: hibernate: Rework message printing in swsusp_save()
PM: dpm_watchdog: add module param to backtrace all CPUs
PM: sleep: Introduce CALL_PM_OP() macro to simplify code
PM: console: Fix memory allocation error handling in pm_vt_switch_required()
...
|
|
Merge a core power management update and runtime PM framework updates
for 6.19-rc1:
- Add WQ_UNBOUND to pm_wq workqueue (Marco Crivellari)
- Add runtime PM wrapper macros for ACQUIRE()/ACQUIRE_ERR() and use
them in the PCI core and the ACPI TAD driver (Rafael Wysocki)
- Improve runtime PM in the ACPI TAD driver (Rafael Wysocki)
- Update pm_runtime_allow/forbid() documentation (Rafael Wysocki)
- Fix typos in runtime.c comments (Malaya Kumar Rout)
* pm-core:
PM: WQ_UNBOUND added to pm_wq workqueue
* pm-runtime:
PCI/sysfs: Use PM_RUNTIME_ACQUIRE()/PM_RUNTIME_ACQUIRE_ERR()
ACPI: TAD: Use PM_RUNTIME_ACQUIRE()/PM_RUNTIME_ACQUIRE_ERR()
PM: runtime: Wrapper macros for ACQUIRE()/ACQUIRE_ERR()
PM: runtime: fix typos in runtime.c comments
ACPI: TAD: Improve runtime PM using guard macros
ACPI: TAD: Rearrange runtime PM operations in acpi_tad_remove()
PM: runtime: docs: Update pm_runtime_allow/forbid() documentation
|
|
Merge an ACPICA change, device ACPI properties handling update, ACPI
power management updates, and an ACPI battery driver update for
6.19-rc1:
- Avoid walking the ACPI namespace in the AML interpreter if the
starting node cannot be determined (Cryolitia PukNgae)
- Use min() instead of min_t() in the ACPI device properties handling
code to avoid discarding significant bits (David Laight)
- Fix potential fwnode refcount leak in acpi_fwnode_graph_parse_endpoint()
that may prevent the parent fwnode from being released (Haotian Zhang)
- Rework acpi_graph_get_next_endpoint() to use ACPI functions only, remove
unnecessary contitionals from it to make it easier to follow, and make
acpi_get_next_subnode() static (Sakari Ailus)
- Drop unused function acpi_get_lps0_constraint(), make some Low-Power
S0 callback functions for suspend-to-idle static, and rearrange the
code retrieving Low-Power S0 constraits so it only runs when the
constraits are actually used (Rafael Wysocki)
- Drop redundant locking from the ACPI battery driver (Rafael Wysocki)
* acpica:
ACPICA: Avoid walking the Namespace if start_node is NULL
* acpi-property:
ACPI: property: use min() instead of min_t()
ACPI: property: Fix fwnode refcount leak in acpi_fwnode_graph_parse_endpoint()
ACPI: property: Rework acpi_graph_get_next_endpoint()
ACPI: property: Use ACPI functions in acpi_graph_get_next_endpoint() only
ACPI: property: Make acpi_get_next_subnode() static
* acpi-pm:
ACPI: PM: s2idle: Only retrieve constraints when needed
ACPI: PM: s2idle: Staticise LPS0 callback functions
ACPI: PM: s2idle: Drop acpi_get_lps0_constraint()
* acpi-battery:
ACPI: battery: Drop redundant locking
|
|
Add a ':' to the end of struct member names to prevent kernel-doc
warnings:
Warning: include/linux/gpio/regmap.h:108 struct member 'regmap_irq_line'
not described in 'gpio_regmap_config'
Warning: include/linux/gpio/regmap.h:108 struct member 'regmap_irq_flags'
not described in 'gpio_regmap_config'
Fixes: 553b75d4bfe9 ("gpio: regmap: Allow to allocate regmap-irq device")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Michael Walle <mwalle@kernel.org>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20251128062739.845403-1-rdunlap@infradead.org
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
|
|
If the amp is still reporting FIRMWARE_MISSING after cs35l56_patch()
has completed it is helpful to log a warning.
After a complete firmware download the FIRMWARE_MISSING flag will be
clear. If this isn't the case, the driver should log a message to
report this.
The amp can produce basic audio output without firmware, as a fallback,
so this wasn't originally logged as a warning condition because the amp
is still in an operational state - just not with full functionality.
However, it was not at all obvious to an end user that anything is
unusual.
Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Link: https://patch.msgid.link/20251128112520.40067-1-rf@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
I've been playing with this to allow for moderately flexible usage of
the get_unused_fd_flags() + create file + fd_install() pattern that's
used quite extensively.
How callers allocate files is really heterogenous so it's not really
convenient to fold them into a single class. It's possibe to split them
into subclasses like for anon inodes. I think that's not necessarily
nice as well.
My take is to add two primites:
(1) FD_ADD() the simple cases a file is installed:
fd = FD_ADD(O_CLOEXEC, open_file(some, args)));
if (fd >= 0)
kvm_get_kvm(vcpu->kvm);
return fd;
(2) FD_PREPARE() that captures all the cases where access to fd or file
or additional work before publishing the fd is needed:
FD_PREPARE(fdf, open_flag, file_open_handle(&path, open_flag));
if (fdf.err)
return fdf.err;
if (copy_to_user(/* something something */))
return -EFAULT;
return fd_publish(fdf);
I've converted all of the easy cases over to it and it gets rid of an
aweful lot of convoluted cleanup logic.
It's centered around struct fd_prepare. FD_PREPARE() encapsulates all of
allocation and cleanup logic and must be followed by a call to
fd_publish() which associates the fd with the file and installs it into
the callers fdtable. If fd_publish() isn't called both are deallocated.
It mandates a specific order namely that first we allocate the fd and
then instantiate the file. But that shouldn't be a problem nearly
everyone I've converted uses this exact pattern anyway.
There's a bunch of additional cases where it would be easy to convert
them to this pattern. For example, the whole sync file stuff in dma
currently retains the containing structure of the file instead of the
file itself even though it's only used to allocate files. Changing that
would make it fall into the FD_PREPARE() pattern easily. I've not done
that work yet.
There's room for extending this in a way that wed'd have subclasses for
some particularly often use patterns but as I said I'm not even sure
that's worth it.
Link: https://patch.msgid.link/20251123-work-fd-prepare-v4-1-b6efa1706cfd@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Some devices, namely Lenovo Legion devices, have an "extreme" mode where
power draw is at the maximum limit of the cooling hardware. Add a new
"max-power" platform profile to properly reflect this operating mode.
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Acked-by: Rafael J. Wysocki (Intel) <rafael@kernel.org>
Signed-off-by: Derek J. Clark <derekjohn.clark@gmail.com>
Reviewed-by: Armin Wolf <W_Armin@gmx.de>
Reviewed-by: Mark Pearson <mpearson-lenovo@squebb.ca>
Link: https://patch.msgid.link/20251127151605.1018026-2-derekjohn.clark@gmail.com
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
|
|
The definition of struct delegation uses stdint.h integer types. Add the
necessary headers to ensure that always works.
Fixes: 1602bad16d7d ("vfs: expose delegation support to userland")
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
The HDCP GSC implementation is different for both i915 and xe. Add it to
the display parent interface, and call the hooks via the parent
interface.
Reviewed-by: Suraj Kandpal <suraj.kandpal@intel.com>
Link: https://patch.msgid.link/e397073e91f8aa7518754b3b79f65c1936be91ad.1764090990.git.jani.nikula@intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
'nvidia/tegra', 'intel/vt-d', 'amd/amd-vi' and 'core' into next
|
|
If the IOVA is limited to less than 48 the page table will be constructed
with a 3 level configuration which is unsupported by hardware.
Like the second stage the caller needs to pass in both the top_level an
the vasz to specify a table that has more levels than required to hold the
IOVA range.
Fixes: 6cbc09b7719e ("iommu/vt-d: Restore previous domain::aperture_end calculation")
Reported-by: Calvin Owens <calvin@wbinvd.org>
Closes: https://lore.kernel.org/r/8f257d2651eb8a4358fcbd47b0145002e5f1d638.1764237717.git.calvin@wbinvd.org
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Tested-by: Calvin Owens <calvin@wbinvd.org>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
|
|
VT-d second stage HW specifies both the maximum IOVA and the supported
table walk starting points. Weirdly there is HW that only supports a 4
level walk but has a maximum IOVA that only needs 3.
The current code miscalculates this and creates a wrongly sized page table
which ultimately fails the compatibility check for number of levels.
This is fixed by allowing the page table to be created with both a vasz
and top_level input. The vasz will set the aperture for the domain while
the top_level will set the page table geometry.
Add top_level to vtdss and correct the logic in VT-d to generate the right
top_level and vasz from mgaw and sagaw.
Fixes: d373449d8e97 ("iommu/vt-d: Use the generic iommu page table")
Reported-by: Calvin Owens <calvin@wbinvd.org>
Closes: https://lore.kernel.org/r/8f257d2651eb8a4358fcbd47b0145002e5f1d638.1764237717.git.calvin@wbinvd.org
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Tested-by: Calvin Owens <calvin@wbinvd.org>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
|
|
This 0x88C3 is registered to Infineon Technologies Corporate Research ST
and are used by MaxLinear.
Infineon made a spin off called Lantiq.
Lantiq was acquired by Intel
MaxLinear acquired Intels Connected Home division.
The product FAQ from MaxLinear describes it's history from the F24S.
The driver for the gsw1xx is based on Lantiq showing it's similarities.
Ref https://standards-oui.ieee.org/ethertype/eth.txt
Signed-off-by: Peter Enderborg <Peter.Enderborg@axis.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Convert all the legacy code directly accessing the pp fields in net_iov
to access them through @desc in net_iov.
Signed-off-by: Byungchul Park <byungchul@sk.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The trace_marker_raw file in tracefs takes a buffer from user space that
contains an id as well as a raw data string which is usually a binary
structure. The structure used has the following:
struct raw_data_entry {
struct trace_entry ent;
unsigned int id;
char buf[];
};
Since the passed in "cnt" variable is both the size of buf as well as the
size of id, the code to allocate the location on the ring buffer had:
size = struct_size(entry, buf, cnt - sizeof(entry->id));
Which is quite ugly and hard to understand. Instead, add a helper macro
called struct_offset() which then changes the above to a simple and easy
to understand:
size = struct_offset(entry, id) + cnt;
This will likely come in handy for other use cases too.
Link: https://lore.kernel.org/all/CAHk-=whYZVoEdfO1PmtbirPdBMTV9Nxt9f09CK0k6S+HJD3Zmg@mail.gmail.com/
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: "Gustavo A. R. Silva" <gustavoars@kernel.org>
Link: https://patch.msgid.link/20251126145249.05b1770a@gandalf.local.home
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: Kees Cook <kees@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
In include/uapi/linux/netfilter/nf_tables.h,
correct the kernel-doc comments for mistyped enum names and enum values to
avoid these kernel-doc warnings and improve the documentation:
nf_tables.h:896: warning: Enum value 'NFT_EXTHDR_OP_TCPOPT' not described
in enum 'nft_exthdr_op'
nf_tables.h:896: warning: Excess enum value 'NFT_EXTHDR_OP_TCP' description
in 'nft_exthdr_op'
nf_tables.h:1210: warning: expecting prototype for enum
nft_flow_attributes. Prototype was for enum nft_offload_attributes instead
nf_tables.h:1428: warning: expecting prototype for enum nft_reject_code.
Prototype was for enum nft_reject_inet_code instead
(add beginning '@' to each enum value description:)
nf_tables.h:1493: warning: Enum value 'NFTA_TPROXY_FAMILY' not described
in enum 'nft_tproxy_attributes'
nf_tables.h:1493: warning: Enum value 'NFTA_TPROXY_REG_ADDR' not described
in enum 'nft_tproxy_attributes'
nf_tables.h:1493: warning: Enum value 'NFTA_TPROXY_REG_PORT' not described
in enum 'nft_tproxy_attributes'
nf_tables.h:1796: warning: expecting prototype for enum
nft_device_attributes. Prototype was for enum
nft_devices_attributes instead
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Fix the kernel-doc format for struct members to be "@member" instead of
"@ member" to avoid kernel-doc warnings.
Warning: ip6t_srh.h:60 struct member 'next_hdr' not described in 'ip6t_srh'
Warning: ip6t_srh.h:60 struct member 'hdr_len' not described in 'ip6t_srh'
Warning: ip6t_srh.h:60 struct member 'segs_left' not described
in 'ip6t_srh'
Warning: ip6t_srh.h:60 struct member 'last_entry' not described
in 'ip6t_srh'
Warning: ip6t_srh.h:60 struct member 'tag' not described in 'ip6t_srh'
Warning: ip6t_srh.h:60 struct member 'mt_flags' not described in 'ip6t_srh'
Warning: ip6t_srh.h:60 struct member 'mt_invflags' not described
in 'ip6t_srh'
Warning: ip6t_srh.h:93 struct member 'next_hdr' not described
in 'ip6t_srh1'
Warning: ip6t_srh.h:93 struct member 'hdr_len' not described in 'ip6t_srh1'
Warning: ip6t_srh.h:93 struct member 'segs_left' not described
in 'ip6t_srh1'
Warning: ip6t_srh.h:93 struct member 'last_entry' not described
in 'ip6t_srh1'
Warning: ip6t_srh.h:93 struct member 'tag' not described in 'ip6t_srh1'
Warning: ip6t_srh.h:93 struct member 'psid_addr' not described
in 'ip6t_srh1'
Warning: ip6t_srh.h:93 struct member 'nsid_addr' not described
in 'ip6t_srh1'
Warning: ip6t_srh.h:93 struct member 'lsid_addr' not described
in 'ip6t_srh1'
Warning: ip6t_srh.h:93 struct member 'psid_msk' not described
in 'ip6t_srh1'
Warning: ip6t_srh.h:93 struct member 'nsid_msk' not described
in 'ip6t_srh1'
Warning: ip6t_srh.h:93 struct member 'lsid_msk' not described
in 'ip6t_srh1'
Warning: ip6t_srh.h:93 struct member 'mt_flags' not described
in 'ip6t_srh1'
Warning: ip6t_srh.h:93 struct member 'mt_invflags' not described
in 'ip6t_srh1'
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
When using nf_conncount infrastructure for non-confirmed connections a
duplicated track is possible due to an optimization introduced since
commit d265929930e2 ("netfilter: nf_conncount: reduce unnecessary GC").
In order to fix this introduce a new conncount API that receives
directly an sk_buff struct. It fetches the tuple and zone and the
corresponding ct from it. It comes with both existing conncount variants
nf_conncount_count_skb() and nf_conncount_add_skb(). In addition remove
the old API and adjust all the users to use the new one.
This way, for each sk_buff struct it is possible to check if there is a
ct present and already confirmed. If so, skip the add operation.
Fixes: d265929930e2 ("netfilter: nf_conncount: reduce unnecessary GC")
Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Introduce sw acceleration for rx path of IPIP tunnels relying on the
netfilter flowtable infrastructure. Subsequent patches will add sw
acceleration for IPIP tunnels tx path.
This series introduces basic infrastructure to accelerate other tunnel
types (e.g. IP6IP6).
IPIP rx sw acceleration can be tested running the following scenario where
the traffic is forwarded between two NICs (eth0 and eth1) and an IPIP
tunnel is used to access a remote site (using eth1 as the underlay device):
ETH0 -- TUN0 <==> ETH1 -- [IP network] -- TUN1 (192.168.100.2)
$ip addr show
6: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 00:00:22:33:11:55 brd ff:ff:ff:ff:ff:ff
inet 192.168.0.2/24 scope global eth0
valid_lft forever preferred_lft forever
7: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 00:11:22:33:11:55 brd ff:ff:ff:ff:ff:ff
inet 192.168.1.1/24 scope global eth1
valid_lft forever preferred_lft forever
8: tun0@NONE: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1480 qdisc noqueue state UNKNOWN group default qlen 1000
link/ipip 192.168.1.1 peer 192.168.1.2
inet 192.168.100.1/24 scope global tun0
valid_lft forever preferred_lft forever
$ip route show
default via 192.168.100.2 dev tun0
192.168.0.0/24 dev eth0 proto kernel scope link src 192.168.0.2
192.168.1.0/24 dev eth1 proto kernel scope link src 192.168.1.1
192.168.100.0/24 dev tun0 proto kernel scope link src 192.168.100.1
$nft list ruleset
table inet filter {
flowtable ft {
hook ingress priority filter
devices = { eth0, eth1 }
}
chain forward {
type filter hook forward priority filter; policy accept;
meta l4proto { tcp, udp } flow add @ft
}
}
Reproducing the scenario described above using veths I got the following
results:
- TCP stream received from the IPIP tunnel:
- net-next: (baseline) ~ 71Gbps
- net-next + IPIP flowtbale support: ~101Gbps
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
hw_ifidx was originally introduced to store the real netdevice as a
requirement for the hardware offload support in:
73f97025a972 ("netfilter: nft_flow_offload: use direct xmit if hardware offload is enabled")
Since ("netfilter: flowtable: consolidate xmit path"), ifidx and
hw_ifidx points to the real device in the xmit path, remove it.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Use dev_queue_xmit() for the XMIT_NEIGH case. Store the interface index
of the real device behind the vlan/pppoe device, this introduces an
extra lookup for the real device in the xmit path because rt->dst.dev
provides the vlan/pppoe device.
XMIT_NEIGH now looks more similar to XMIT_DIRECT but the check for stale
dst and the neighbour lookup still remain in place which is convenient
to deal with network topology changes.
Note that nft_flow_route() needs to relax the check for _XMIT_NEIGH so
the existing basic xfrm offload (which only works in one direction) does
not break.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
This file contains the path discovery that is run from the forward chain
for the packet offloading the flow into the flowtable. This consists
of a series of calls to dev_fill_forward_path() for each device stack.
More topologies may be supported in the future, so move this code to its
own file to separate it from the nftables flow_offload expression.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Introduce a generic infrastructure for tracking recoverable hardware
errors (HW errors that are visible to the OS but does not cause a panic)
and record them for vmcore consumption. This aids post-mortem crash
analysis tools by preserving a count and timestamp for the last occurrence
of such errors. On the other side, correctable errors, which the OS
typically remains unaware of because the underlying hardware handles them
transparently, are less relevant for crash dump and therefore are NOT
tracked in this infrastructure.
Add centralized logging for sources of recoverable hardware errors based
on the subsystem it has been notified.
hwerror_data is write-only at kernel runtime, and it is meant to be read
from vmcore using tools like crash/drgn. For example, this is how it
looks like when opening the crashdump from drgn.
>>> prog['hwerror_data']
(struct hwerror_info[1]){
{
.count = (int)844,
.timestamp = (time64_t)1752852018,
},
...
This helps fleet operators quickly triage whether a crash may be
influenced by hardware recoverable errors (which executes a uncommon code
path in the kernel), especially when recoverable errors occurred shortly
before a panic, such as the bug fixed by commit ee62ce7a1d90 ("page_pool:
Track DMA-mapped pages and unmap them when destroying the pool")
This is not intended to replace full hardware diagnostics but provides a
fast way to correlate hardware events with kernel panics quickly.
Rare machine check exceptions—like those indicated by mce_flags.p5 or
mce_flags.winchip—are not accounted for in this method, as they fall
outside the intended usage scope for this feature's user base.
[leitao@debian.org: add hw-recoverable-errors to toctree]
Link: https://lkml.kernel.org/r/20251127-vmcoreinfo_fix-v1-1-26f5b1c43da9@debian.org
Link: https://lkml.kernel.org/r/20251010-vmcore_hw_error-v5-1-636ede3efe44@debian.org
Signed-off-by: Breno Leitao <leitao@debian.org>
Suggested-by: Tony Luck <tony.luck@intel.com>
Suggested-by: Shuai Xue <xueshuai@linux.alibaba.com>
Reviewed-by: Shuai Xue <xueshuai@linux.alibaba.com>
Reviewed-by: Hanjun Guo <guohanjun@huawei.com> [APEI]
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Bob Moore <robert.moore@intel.com>
Cc: Borislav Betkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Morse <james.morse@arm.com>
Cc: Konrad Rzessutek Wilk <konrad.wilk@oracle.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Mahesh Salgaonkar <mahesh@linux.ibm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: "Oliver O'Halloran" <oohall@gmail.com>
Cc: Omar Sandoval <osandov@osandov.com>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The ability to preserve a memfd allows userspace to use KHO and LUO to
transfer its memory contents to the next kernel. This is useful in many
ways. For one, it can be used with IOMMUFD as the backing store for IOMMU
page tables. Preserving IOMMUFD is essential for performing a hypervisor
live update with passthrough devices. memfd support provides the first
building block for making that possible.
For another, applications with a large amount of memory that takes time to
reconstruct, reboots to consume kernel upgrades can be very expensive.
memfd with LUO gives those applications reboot-persistent memory that they
can use to quickly save and reconstruct that state.
While memfd is backed by either hugetlbfs or shmem, currently only support
on shmem is added. To be more precise, support for anonymous shmem files
is added.
The handover to the next kernel is not transparent. All the properties of
the file are not preserved; only its memory contents, position, and size.
The recreated file gets the UID and GID of the task doing the restore, and
the task's cgroup gets charged with the memory.
Once preserved, the file cannot grow or shrink, and all its pages are
pinned to avoid migrations and swapping. The file can still be read from
or written to.
Use vmalloc to get the buffer to hold the folios, and preserve it using
kho_preserve_vmalloc(). This doesn't have the size limit.
Link: https://lkml.kernel.org/r/20251125165850.3389713-15-pasha.tatashin@soleen.com
Signed-off-by: Pratyush Yadav <ptyadav@amazon.de>
Co-developed-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Tested-by: David Matlack <dmatlack@google.com>
Cc: Aleksander Lobakin <aleksander.lobakin@intel.com>
Cc: Alexander Graf <graf@amazon.com>
Cc: Alice Ryhl <aliceryhl@google.com>
Cc: Andriy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: anish kumar <yesanishhere@gmail.com>
Cc: Anna Schumaker <anna.schumaker@oracle.com>
Cc: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Borislav Betkov <bp@alien8.de>
Cc: Chanwoo Choi <cw00.choi@samsung.com>
Cc: Chen Ridong <chenridong@huawei.com>
Cc: Chris Li <chrisl@kernel.org>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Daniel Wagner <wagi@kernel.org>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Jeffery <djeffery@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Guixin Liu <kanie@linux.alibaba.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Joanthan Cameron <Jonathan.Cameron@huawei.com>
Cc: Joel Granados <joel.granados@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Lennart Poettering <lennart@poettering.net>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Leon Romanovsky <leonro@nvidia.com>
Cc: Lukas Wunner <lukas@wunner.de>
Cc: Marc Rutland <mark.rutland@arm.com>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Matthew Maurer <mmaurer@google.com>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Myugnjoo Ham <myungjoo.ham@samsung.com>
Cc: Parav Pandit <parav@nvidia.com>
Cc: Pratyush Yadav <pratyush@kernel.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Saeed Mahameed <saeedm@nvidia.com>
Cc: Samiullah Khawaja <skhawaja@google.com>
Cc: Song Liu <song@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Stuart Hayes <stuart.w.hayes@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: Thomas Weißschuh <linux@weissschuh.net>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: William Tu <witu@nvidia.com>
Cc: Yoann Congal <yoann.congal@smile.fr>
Cc: Zhu Yanjun <yanjun.zhu@linux.dev>
Cc: Zijun Hu <quic_zijuhu@quicinc.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Currently file handlers only get the serialized_data field to store their
state. This field has a pointer to the serialized state of the file, and
it becomes a part of LUO file's serialized state.
File handlers can also need some runtime state to track information that
shouldn't make it in the serialized data.
One such example is a vmalloc pointer. While kho_preserve_vmalloc()
preserves the memory backing a vmalloc allocation, it does not store the
original vmap pointer, since that has no use being passed to the next
kernel. The pointer is needed to free the memory in case the file is
unpreserved.
Provide a private field in struct luo_file and pass it to all the
callbacks. The field's can be set by preserve, and must be freed by
unpreserve.
Link: https://lkml.kernel.org/r/20251125165850.3389713-14-pasha.tatashin@soleen.com
Signed-off-by: Pratyush Yadav <ptyadav@amazon.de>
Co-developed-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Tested-by: David Matlack <dmatlack@google.com>
Cc: Aleksander Lobakin <aleksander.lobakin@intel.com>
Cc: Alexander Graf <graf@amazon.com>
Cc: Alice Ryhl <aliceryhl@google.com>
Cc: Andriy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: anish kumar <yesanishhere@gmail.com>
Cc: Anna Schumaker <anna.schumaker@oracle.com>
Cc: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Borislav Betkov <bp@alien8.de>
Cc: Chanwoo Choi <cw00.choi@samsung.com>
Cc: Chen Ridong <chenridong@huawei.com>
Cc: Chris Li <chrisl@kernel.org>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Daniel Wagner <wagi@kernel.org>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Jeffery <djeffery@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Guixin Liu <kanie@linux.alibaba.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Joanthan Cameron <Jonathan.Cameron@huawei.com>
Cc: Joel Granados <joel.granados@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Lennart Poettering <lennart@poettering.net>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Leon Romanovsky <leonro@nvidia.com>
Cc: Lukas Wunner <lukas@wunner.de>
Cc: Marc Rutland <mark.rutland@arm.com>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Matthew Maurer <mmaurer@google.com>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Myugnjoo Ham <myungjoo.ham@samsung.com>
Cc: Parav Pandit <parav@nvidia.com>
Cc: Pratyush Yadav <pratyush@kernel.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Saeed Mahameed <saeedm@nvidia.com>
Cc: Samiullah Khawaja <skhawaja@google.com>
Cc: Song Liu <song@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Stuart Hayes <stuart.w.hayes@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: Thomas Weißschuh <linux@weissschuh.net>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: William Tu <witu@nvidia.com>
Cc: Yoann Congal <yoann.congal@smile.fr>
Cc: Zhu Yanjun <yanjun.zhu@linux.dev>
Cc: Zijun Hu <quic_zijuhu@quicinc.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|