summaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)AuthorFilesLines
2025-02-05x86/smp: Eliminate mwait_play_dead_cpuid_hint()Patryk Wlazlyn1-47/+7
Currently, mwait_play_dead_cpuid_hint() looks up the MWAIT hint of the deepest idle state by inspecting CPUID leaf 0x5 with the assumption that, if the number of sub-states for a given major C-state is nonzero, those sub-states are always represented by consecutive numbers starting from 0. This assumption is not based on the documented platform behavior and in fact it is not met on recent Intel platforms. For example, Intel's Sierra Forest report two C-states with two substates each in cpuid leaf 0x5: Name* target cstate target subcstate (mwait hint) =========================================================== C1 0x00 0x00 C1E 0x00 0x01 -- 0x10 ---- C6S 0x20 0x22 C6P 0x20 0x23 -- 0x30 ---- /* No more (sub)states all the way down to the end. */ =========================================================== * Names of the cstates are not included in the CPUID leaf 0x5, they are taken from the product specific documentation. Notice that hints 0x20 and 0x21 are not defined for C-state 0x20 (C6), so the existing MWAIT hint lookup in mwait_play_dead_cpuid_hint() based on the CPUID leaf 0x5 contents does not work in this case. Instead of using MWAIT hint lookup that is not guaranteed to work, make native_play_dead() rely on the idle driver for the given platform to put CPUs going offline into appropriate idle state and, if that fails, fall back to hlt_play_dead(). Accordingly, drop mwait_play_dead_cpuid_hint() altogether and make native_play_dead() call cpuidle_play_dead() instead of it unconditionally with the assumption that it will not return if it is successful. Still, in case cpuidle_play_dead() fails, call hlt_play_dead() at the end. Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://lore.kernel.org/all/20250205155211.329780-5-artem.bityutskiy%40linux.intel.com
2025-02-05ACPI/processor_idle: Add FFH state handlingPatryk Wlazlyn1-0/+10
Recent Intel platforms will depend on the idle driver to pass the correct hint for playing dead via mwait_play_dead_with_hint(). Expand the existing enter_dead interface with handling for FFH states and pass the MWAIT hint to the mwait_play_dead code. Suggested-by: Gautham R. Shenoy <gautham.shenoy@amd.com> Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://lore.kernel.org/all/20250205155211.329780-3-artem.bityutskiy%40linux.intel.com
2025-02-05x86/smp: Allow calling mwait_play_dead with an arbitrary hintPatryk Wlazlyn2-41/+50
Introduce a helper function to allow offlined CPUs to enter idle states with a specific MWAIT hint. The new helper will be used in subsequent patches by the acpi_idle and intel_idle drivers. No functional change intended. Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://lore.kernel.org/all/20250205155211.329780-2-artem.bityutskiy%40linux.intel.com
2025-02-05x86/xen: remove unneeded dummy push from xen_hypercall_hvm()Juergen Gross1-6/+0
Stack alignment of the kernel in 64-bit mode is 8, not 16, so the dummy push in xen_hypercall_hvm() for aligning the stack to 16 bytes can be removed. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2025-02-05x86/xen: add FRAME_END to xen_hypercall_hvm()Juergen Gross1-0/+1
xen_hypercall_hvm() is missing a FRAME_END at the end, add it. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202502030848.HTNTTuo9-lkp@intel.com/ Fixes: b4845bb63838 ("x86/xen: add central hypercall functions") Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2025-02-05x86/xen: fix xen_hypercall_hvm() to not clobber %rbxJuergen Gross1-2/+2
xen_hypercall_hvm(), which is used when running as a Xen PVH guest at most only once during early boot, is clobbering %rbx. Depending on whether the caller relies on %rbx to be preserved across the call or not, this clobbering might result in an early crash of the system. This can be avoided by using an already saved register instead of %rbx. Fixes: b4845bb63838 ("x86/xen: add central hypercall functions") Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2025-02-05perf/x86/intel: Support PEBS counters snapshottingKan Liang6-25/+284
The counters snapshotting is a new adaptive PEBS extension, which can capture programmable counters, fixed-function counters, and performance metrics in a PEBS record. The feature is available in the PEBS format V6. The target counters can be configured in the new fields of MSR_PEBS_CFG. Then the PEBS HW will generate the bit mask of counters (Counters Group Header) followed by the content of all the requested counters into a PEBS record. The current Linux perf sample read feature can read all events in the group when any event in the group is overflowed. But the rdpmc in the NMI/overflow handler has a small gap from overflow. Also, there is some overhead for each rdpmc read. The counters snapshotting feature can be used as an accurate and low-overhead replacement. Extend intel_update_topdown_event() to accept the value from PEBS records. Add a new PEBS_CNTR flag to indicate a sample read group that utilizes the counters snapshotting feature. When the group is scheduled, the PEBS configure can be updated accordingly. To prevent the case that a PEBS record value might be in the past relative to what is already in the event, perf always stops the PMU and drains the PEBS buffer before updating the corresponding event->count. Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20250121152303.3128733-4-kan.liang@linux.intel.com
2025-02-05perf/x86/intel: Avoid disable PMU if !cpuc->enabled in sample readKan Liang3-29/+25
The WARN_ON(this_cpu_read(cpu_hw_events.enabled)) in the intel_pmu_save_and_restart_reload() is triggered, when sampling read topdown events. In a NMI handler, the cpu_hw_events.enabled is set and used to indicate the status of core PMU. The generic pmu->pmu_disable_count, updated in the perf_pmu_disable/enable pair, is not touched. However, the perf_pmu_disable/enable pair is invoked when sampling read in a NMI handler. The cpuc->enabled is mistakenly set by the perf_pmu_enable(). Avoid disabling PMU if the core PMU is already disabled. Merge the logic together. Fixes: 7b2c05a15d29 ("perf/x86/intel: Generic support for hardware TopDown metrics") Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20250121152303.3128733-2-kan.liang@linux.intel.com
2025-02-05perf/x86/intel: Apply static call for drain_pebsPeter Zijlstra (Intel)3-2/+3
The x86_pmu_drain_pebs static call was introduced in commit 7c9903c9bf71 ("x86/perf, static_call: Optimize x86_pmu methods"), but it's not really used to replace the old method. Apply the static call for drain_pebs. Fixes: 7c9903c9bf71 ("x86/perf, static_call: Optimize x86_pmu methods") Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20250121152303.3128733-1-kan.liang@linux.intel.com
2025-02-04x86/cpu: Fix #define name for Intel CPU model 0x5ATony Luck4-4/+4
This CPU was mistakenly given the name INTEL_ATOM_AIRMONT_MID. But it uses a Silvermont core, not Airmont. Change #define name to INTEL_ATOM_SILVERMONT_MID2 Reported-by: Christian Ludloff <ludloff@gmail.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/all/20241007165701.19693-1-tony.luck%40intel.com
2025-02-04KVM: x86/mmu: Ensure NX huge page recovery thread is alive before wakingSean Christopherson1-7/+26
When waking a VM's NX huge page recovery thread, ensure the thread is actually alive before trying to wake it. Now that the thread is spawned on-demand during KVM_RUN, a VM without a recovery thread is reachable via the related module params. BUG: kernel NULL pointer dereference, address: 0000000000000040 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 RIP: 0010:vhost_task_wake+0x5/0x10 Call Trace: <TASK> set_nx_huge_pages+0xcc/0x1e0 [kvm] param_attr_store+0x8a/0xd0 module_attr_store+0x1a/0x30 kernfs_fop_write_iter+0x12f/0x1e0 vfs_write+0x233/0x3e0 ksys_write+0x60/0xd0 do_syscall_64+0x5b/0x160 entry_SYSCALL_64_after_hwframe+0x4b/0x53 RIP: 0033:0x7f3b52710104 </TASK> Modules linked in: kvm_intel kvm CR2: 0000000000000040 Fixes: 931656b9e2ff ("kvm: defer huge page recovery vhost task to later") Cc: stable@vger.kernel.org Cc: Keith Busch <kbusch@kernel.org> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-ID: <20250124234623.3609069-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-02-04KVM: remove kvm_arch_post_init_vmPaolo Bonzini1-6/+1
The only statement in a kvm_arch_post_init_vm implementation can be moved into the x86 kvm_arch_init_vm. Do so and remove all traces from architecture-independent code. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-02-04kvm: x86: SRSO_USER_KERNEL_NO is not synthesizedPaolo Bonzini1-1/+1
SYNTHESIZED_F() generally is used together with setup_force_cpu_cap(), i.e. when it makes sense to present the feature even if cpuid does not have it *and* the VM is not able to see the difference. For example, it can be used when mitigations on the host automatically protect the guest as well. The "SYNTHESIZED_F(SRSO_USER_KERNEL_NO)" line came in as a conflict resolution between the CPUID overhaul from the KVM tree and support for the feature in the x86 tree. Using it right now does not hurt, or make a difference for that matter, because there is no setup_force_cpu_cap(X86_FEATURE_SRSO_USER_KERNEL_NO). However, it is a little less future proof in case such a setup_force_cpu_cap() appears later, for a case where the kernel somehow is not vulnerable but the guest would have to apply the mitigation. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-02-03perf/amd/ibs: Ceil sample_period to min_periodRavi Bangoria1-0/+7
The sample_period needs to be recalibrated after every sample to match the desired sampling freq for a 'freq mode event'. Since the next sample_period is calculated by generic kernel, PMU specific constraints are not (explicitly) reckoned. The sample_period value is programmed in a MaxCnt field of IBS PMUs, and the MaxCnt field has following constraints: 1) MaxCnt must be multiple of 0x10. Kernel keeps track of residual / over-counted period into period_left, which should take care of this constraint by programming MaxCnt with (sample_period & ~0xF) and adding remaining period into the next sample. 2) MaxCnt must be >= 0x10 for IBS Fetch PMU and >= 0x90 for IBS Op PMU. Currently, IBS PMU driver allows sample_period below min_period, which is an undefined HW behavior. Reset sample_period to min_period whenever it's less than that. Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20250115054438.1021-9-ravi.bangoria@amd.com
2025-02-03perf/amd/ibs: Add ->check_period() callbackRavi Bangoria1-0/+24
IBS Fetch and IBS Op PMUs have constraints on sample period. The sample period is verified at the time of opening an event but not at the ioctl() interface. Hence, a user can open an event with valid period but change it later with ioctl(). Add a ->check_period() callback to verify the period provided at ioctl() is also valid. Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/r/20250115054438.1021-8-ravi.bangoria@amd.com
2025-02-03perf/amd/ibs: Add PMU specific minimum periodRavi Bangoria1-8/+16
0x10 is the minimum sample period for IBS Fetch and 0x90 for IBS Op. Current IBS PMU driver uses 0x10 for both the PMUs, which is incorrect. Fix it by adding PMU specific minimum period values in struct perf_ibs. Also, bail out opening a 'sample period mode' event if the user requested sample period is less than PMU supported minimum value. For a 'freq mode' event, start calibrating sample period from PMU specific minimum period. Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/r/20250115054438.1021-7-ravi.bangoria@amd.com
2025-02-03perf/amd/ibs: Don't allow freq mode event creation through ->config interfaceRavi Bangoria1-0/+3
Most perf_event_attr->config bits directly maps to IBS_{FETCH|OP}_CTL MSR. Since the sample period is programmed in these control registers, IBS PMU driver allows opening an IBS event by setting sample period value directly in perf_event_attr->config instead of using explicit perf_event_attr->sample_period interface. However, this logic is not applicable for freq mode events since the semantics of control register fields are applicable only to fixed sample period whereas the freq mode event adjusts sample period after each and every sample. Currently, IBS driver (unintentionally) allows creating freq mode event via ->config interface, which is semantically wrong as well as detrimental because it can be misused to bypass perf_event_max_sample_rate checks. Don't allow freq mode event creation through perf_event_attr->config interface. Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/r/20250115054438.1021-6-ravi.bangoria@amd.com
2025-02-03perf/amd/ibs: Fix perf_ibs_op.cnt_mask for CurCntRavi Bangoria2-1/+3
IBS Op uses two counters: MaxCnt and CurCnt. MaxCnt is programmed with the desired sample period. IBS hw generates sample when CurCnt reaches to MaxCnt. The size of these counter used to be 20 bits but later they were extended to 27 bits. The 7 bit extension is indicated by CPUID Fn8000_001B_EAX[6 / OpCntExt]. perf_ibs->cnt_mask variable contains bit masks for MaxCnt and CurCnt. But IBS driver does not set upper 7 bits of CurCnt in cnt_mask even when OpCntExt CPUID bit is set. Fix this. IBS driver uses cnt_mask[CurCnt] bits only while disabling an event. Fortunately, CurCnt bits are not read from MSR while re-enabling the event, instead MaxCnt is programmed with desired period and CurCnt is set to 0. Hence, we did not see any issues so far. Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/r/20250115054438.1021-5-ravi.bangoria@amd.com
2025-02-03perf/amd/ibs: Fix ->config to sample period calculation for OP PMURavi Bangoria1-4/+13
Instead of using standard perf_event_attr->freq=0 and ->sample_period fields, IBS event in 'sample period mode' can also be opened by setting period value directly in perf_event_attr->config in a MaxCnt bit-field format. IBS OP MaxCnt bits are defined as: (high bits) IbsOpCtl[26:20] = IbsOpMaxCnt[26:20] (low bits) IbsOpCtl[15:0] = IbsOpMaxCnt[19:4] Perf event sample period can be derived from MaxCnt bits as: sample_period = (high bits) | ((low_bits) << 4); However, current code just masks MaxCnt bits and shifts all of them, including high bits, which is incorrect. Fix it. Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/r/20250115054438.1021-4-ravi.bangoria@amd.com
2025-02-03perf/amd/ibs: Remove pointless sample period checkRavi Bangoria1-7/+2
Valid perf event sample period value for IBS PMUs (Fetch and Op both) is limited to multiple of 0x10. perf_ibs_init() has this check: if (!event->attr.sample_freq && hwc->sample_period & 0x0f) return -EINVAL; But it's broken since hwc->sample_period will always be 0 when event->attr.sample_freq is 0 (irrespective of event->attr.freq value.) One option to fix this is to change the condition: - if (!event->attr.sample_freq && hwc->sample_period & 0x0f) + if (!event->attr.freq && hwc->sample_period & 0x0f) However, that will break all userspace tools which have been using IBS event with sample_period not multiple of 0x10. Another option is to remove the condition altogether and mask lower nibble _silently_, same as what current code is inadvertently doing. I'm preferring this approach as it keeps the existing behavior. Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/r/20250115054438.1021-3-ravi.bangoria@amd.com
2025-02-03perf/amd/ibs: Remove IBS_{FETCH|OP}_CONFIG_MASK macrosRavi Bangoria1-5/+2
Definition of these macros are very simple and they are used at only one place. Get rid of unnecessary redirection. Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/r/20250115054438.1021-2-ravi.bangoria@amd.com
2025-02-03x86/mm: Fix flush_tlb_range() when used for zapping normal PMDsJann Horn1-1/+1
On the following path, flush_tlb_range() can be used for zapping normal PMD entries (PMD entries that point to page tables) together with the PTE entries in the pointed-to page table: collapse_pte_mapped_thp pmdp_collapse_flush flush_tlb_range The arm64 version of flush_tlb_range() has a comment describing that it can be used for page table removal, and does not use any last-level invalidation optimizations. Fix the X86 version by making it behave the same way. Currently, X86 only uses this information for the following two purposes, which I think means the issue doesn't have much impact: - In native_flush_tlb_multi() for checking if lazy TLB CPUs need to be IPI'd to avoid issues with speculative page table walks. - In Hyper-V TLB paravirtualization, again for lazy TLB stuff. The patch "x86/mm: only invalidate final translations with INVLPGB" which is currently under review (see <https://lore.kernel.org/all/20241230175550.4046587-13-riel@surriel.com/>) would probably be making the impact of this a lot worse. Fixes: 016c4d92cd16 ("x86/mm/tlb: Add freed_tables argument to flush_tlb_mm_range") Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20250103-x86-collapse-flush-fix-v1-1-3c521856cfa6@google.com
2025-02-03x86: re-enable EXECMEM_ROX supportMike Rapoport (Microsoft)1-0/+1
after rework of execmem ROX caches Signed-off-by: "Mike Rapoport (Microsoft)" <rppt@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20250126074733.1384926-10-rppt@kernel.org
2025-02-03Revert "x86/module: prepare module loading for ROX allocations of text"Mike Rapoport (Microsoft)5-161/+112
The module code does not create a writable copy of the executable memory anymore so there is no need to handle it in module relocation and alternatives patching. This reverts commit 9bfc4824fd4836c16bb44f922bfaffba5da3e4f3. Signed-off-by: "Mike Rapoport (Microsoft)" <rppt@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20250126074733.1384926-8-rppt@kernel.org
2025-02-03x86/mm/pat: restore large ROX pages after fragmentationKirill A. Shutemov2-4/+215
Change of attributes of the pages may lead to fragmentation of direct mapping over time and performance degradation when these pages contain executable code. With current code it's one way road: kernel tries to avoid splitting large pages, but it doesn't restore them back even if page attributes got compatible again. Any change to the mapping may potentially allow to restore large page. Add a hook to cpa_flush() path that will check if the pages in the range that were just touched can be mapped at PMD level. If the collapse at the PMD level succeeded, also attempt to collapse PUD level. The collapse logic runs only when a set_memory_ method explicitly sets CPA_COLLAPSE flag, for now this is only enabled in set_memory_rox(). CPUs don't like[1] to have to have TLB entries of different size for the same memory, but looks like it's okay as long as these entries have matching attributes[2]. Therefore it's critical to flush TLB before any following changes to the mapping. Note that we already allow for multiple TLB entries of different sizes for the same memory now in split_large_page() path. It's not a new situation. set_memory_4k() provides a way to use 4k pages on purpose. Kernel must not remap such pages as large. Re-use one of software PTE bits to indicate such pages. [1] See Erratum 383 of AMD Family 10h Processors [2] https://lore.kernel.org/linux-mm/1da1b025-cabc-6f04-bde5-e50830d1ecf0@amd.com/ [rppt@kernel.org: * s/restore/collapse/ * update formatting per peterz * use 'struct ptdesc' instead of 'struct page' for list of page tables to be freed * try to collapse PMD first and if it succeeds move on to PUD as peterz suggested * flush TLB twice: for changes done in the original CPA call and after collapsing of large pages * update commit message ] Signed-off-by: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Co-developed-by: "Mike Rapoport (Microsoft)" <rppt@kernel.org> Signed-off-by: "Mike Rapoport (Microsoft)" <rppt@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20250126074733.1384926-4-rppt@kernel.org
2025-02-03x86/mm/pat: drop duplicate variable in cpa_flush()Mike Rapoport (Microsoft)1-2/+1
There is a 'struct cpa_data *data' parameter in cpa_flush() that is assigned to a local 'struct cpa_data *cpa' variable. Rename the parameter from 'data' to 'cpa' and drop declaration of the local 'cpa' variable. Signed-off-by: "Mike Rapoport (Microsoft)" <rppt@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20250126074733.1384926-3-rppt@kernel.org
2025-02-03x86/mm/pat: cpa-test: fix length for CPA_ARRAY testMike Rapoport (Microsoft)1-1/+1
The CPA_ARRAY test always uses len[1] as numpages argument to change_page_attr_set() although the addresses array is different each iteration of the test loop. Replace len[1] with len[i] to have numpages matching the addresses array. Fixes: ecc729f1f471 ("x86/mm/cpa: Add ARRAY and PAGES_ARRAY selftests") Signed-off-by: "Mike Rapoport (Microsoft)" <rppt@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20250126074733.1384926-2-rppt@kernel.org
2025-01-31Merge tag 'kbuild-v6.14' of ↵Linus Torvalds1-5/+1
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild updates from Masahiro Yamada: - Support multiple hook locations for maint scripts of Debian package - Remove 'cpio' from the build tool requirement - Introduce gendwarfksyms tool, which computes CRCs for export symbols based on the DWARF information - Support CONFIG_MODVERSIONS for Rust - Resolve all conflicts in the genksyms parser - Fix several syntax errors in genksyms * tag 'kbuild-v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (64 commits) kbuild: fix Clang LTO with CONFIG_OBJTOOL=n kbuild: Strip runtime const RELA sections correctly kconfig: fix memory leak in sym_warn_unmet_dep() kconfig: fix file name in warnings when loading KCONFIG_DEFCONFIG_LIST genksyms: fix syntax error for attribute before init-declarator genksyms: fix syntax error for builtin (u)int*x*_t types genksyms: fix syntax error for attribute after 'union' genksyms: fix syntax error for attribute after 'struct' genksyms: fix syntax error for attribute after abstact_declarator genksyms: fix syntax error for attribute before nested_declarator genksyms: fix syntax error for attribute before abstract_declarator genksyms: decouple ATTRIBUTE_PHRASE from type-qualifier genksyms: record attributes consistently for init-declarator genksyms: restrict direct-declarator to take one parameter-type-list genksyms: restrict direct-abstract-declarator to take one parameter-type-list genksyms: remove Makefile hack genksyms: fix last 3 shift/reduce conflicts genksyms: fix 6 shift/reduce conflicts and 5 reduce/reduce conflicts genksyms: reduce type_qualifier directly to decl_specifier genksyms: rename cvar_qualifier to type_qualifier ...
2025-01-31kbuild: Strip runtime const RELA sections correctlyArd Biesheuvel1-5/+1
Due to the fact that runtime const ELF sections are named without a leading period or double underscore, the RSTRIP logic that removes the static RELA sections from vmlinux fails to identify them. This results in a situation like below, where some sections that were supposed to get removed are left behind. [Nr] Name Type Address Off Size ES Flg Lk Inf Al [58] runtime_shift_d_hash_shift PROGBITS ffffffff83500f50 2900f50 000014 00 A 0 0 1 [59] .relaruntime_shift_d_hash_shift RELA 0000000000000000 55b6f00 000078 18 I 70 58 8 [60] runtime_ptr_dentry_hashtable PROGBITS ffffffff83500f68 2900f68 000014 00 A 0 0 1 [61] .relaruntime_ptr_dentry_hashtable RELA 0000000000000000 55b6f78 000078 18 I 70 60 8 [62] runtime_ptr_USER_PTR_MAX PROGBITS ffffffff83500f80 2900f80 000238 00 A 0 0 1 [63] .relaruntime_ptr_USER_PTR_MAX RELA 0000000000000000 55b6ff0 000d50 18 I 70 62 8 So tweak the match expression to strip all sections starting with .rel. While at it, consolidate the logic used by RISC-V, s390 and x86 into a single shared Makefile library command. Link: https://lore.kernel.org/all/CAHk-=wjk3ynjomNvFN8jf9A1k=qSc=JFF591W00uXj-qqNUxPQ@mail.gmail.com/ Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Charlie Jenkins <charlie@rivosinc.com> Tested-by: Charlie Jenkins <charlie@rivosinc.com> Tested-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2025-01-31Merge tag 'x86-mm-2025-01-31' of ↵Linus Torvalds6-19/+55
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 mm updates from Ingo Molnar: - The biggest changes are the TLB flushing scalability optimizations, to update the mm_cpumask lazily and related changes. This feature has both a track record and a continued risk of performance regressions, so it was already delayed by a cycle - but it's all 100% perfect now™ (Rik van Riel) - Also miscellaneous fixes and cleanups. (Gautam Somani, Kirill Shutemov, Sebastian Andrzej Siewior) * tag 'x86-mm-2025-01-31' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mm: Remove unnecessary include of <linux/extable.h> x86/mtrr: Rename mtrr_overwrite_state() to guest_force_mtrr_state() x86/mm/selftests: Fix typo in lam.c x86/mm/tlb: Only trim the mm_cpumask once a second x86/mm/tlb: Also remove local CPU from mm_cpumask if stale x86/mm/tlb: Add tracepoint for TLB flush IPI to stale CPU x86/mm/tlb: Update mm_cpumask lazily
2025-01-31Merge tag 'uml-for-linus-6.14-rc1' of ↵Linus Torvalds2-22/+0
git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux Pull UML updates from Richard Weinberger: - hostfs: Convert to writepages - many cleanups: removal of dead macros, missing __init * tag 'uml-for-linus-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux: um: Remove unused asm/archparam.h header um: Include missing headers in asm/pgtable.h hostfs: Convert to writepages um: rtc: use RTC time when calculating the alarm um: Remove unused user_context function um: Remove unused THREAD_NAME_LEN macro um: Remove unused PGD_BOUND macro um: Mark setup_env_path as __init um: Mark install_fatal_handler as __init um: Mark set_stklim as __init um: Mark get_top_address as __init um: Mark parse_cache_line as __init um: Mark parse_host_cpu_flags as __init um: Count iomem_size only once in physmem calculation um: Remove obsolete fixmap support um: Remove unused MODULES_LEN macro
2025-01-31Merge tag 'rtc-6.14' of ↵Linus Torvalds2-7/+0
git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux Pull RTC updates from Alexandre Belloni: "Not much this cycle, there are multiple small fixes. Core: - use boolean values with device_init_wakeup() Drivers: - pcf2127: add BSM support - pcf85063: fix possible out of bounds write" * tag 'rtc-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux: rtc: pcf2127: add BSM support rtc: Remove hpet_rtc_dropped_irq() dt-bindings: rtc: mxc: Document fsl,imx31-rtc rtc: stm32: Use syscon_regmap_lookup_by_phandle_args rtc: zynqmp: Fix optional clock name property rtc: loongson: clear TOY_MATCH0_REG in loongson_rtc_isr() rtc: pcf85063: fix potential OOB write in PCF85063 NVMEM read rtc: tps6594: Fix integer overflow on 32bit systems rtc: use boolean values with device_init_wakeup() rtc: RTC_DRV_SPEAR should not default to y when compile-testing
2025-01-31Merge tag 'acpi-6.14-rc1-2' of ↵Linus Torvalds1-5/+45
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI fixes from Rafael Wysocki: "Add a new ACPI-related quirk for Vexia EDU ATLA 10 tablet 5V (Hans de Goede) and fix the MADT parsing code so that CPUs with different entry types (LAPIC and x2APIC) are initialized in the order in which they appear in the MADT as required by the ACPI specification (Zhang Rui)" * tag 'acpi-6.14-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: x86/acpi: Fix LAPIC/x2APIC parsing order ACPI: x86: Add skip i2c clients quirk for Vexia EDU ATLA 10 tablet 5V
2025-01-30x86/boot: Use '-std=gnu11' to fix build with GCC 15Nathan Chancellor1-0/+1
GCC 15 changed the default C standard version to C23, which should not have impacted the kernel because it requests the gnu11 standard via '-std=' in the main Makefile. However, the x86 compressed boot Makefile uses its own set of KBUILD_CFLAGS without a '-std=' value (i.e., using the default), resulting in errors from the kernel's definitions of bool, true, and false in stddef.h, which are reserved keywords under C23. ./include/linux/stddef.h:11:9: error: expected identifier before ‘false’ 11 | false = 0, ./include/linux/types.h:35:33: error: two or more data types in declaration specifiers 35 | typedef _Bool bool; Set '-std=gnu11' in the x86 compressed boot Makefile to resolve the error and consistently use the same C standard version for the entire kernel. Closes: https://lore.kernel.org/4OAhbllK7x4QJGpZjkYjtBYNLd_2whHx9oFiuZcGwtVR4hIzvduultkgfAIRZI3vQpZylu7Gl929HaYFRGeMEalWCpeMzCIIhLxxRhq4U-Y=@protonmail.com/ Closes: https://lore.kernel.org/Z4467umXR2PZ0M1H@tucnak/ Reported-by: Kostadin Shishmanov <kostadinshishmanov@protonmail.com> Reported-by: Jakub Jelinek <jakub@redhat.com> Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Cc:stable@vger.kernel.org Link: https://lore.kernel.org/all/20250121-x86-use-std-consistently-gcc-15-v1-1-8ab0acf645cb%40kernel.org
2025-01-29Merge tag 'for-linus-6.14-rc1-tag' of ↵Linus Torvalds1-0/+4
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen updates from Juergen Gross: "Three minor fixes" * tag 'for-linus-6.14-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen: update pvcalls_front_accept prototype Grab mm lock before grabbing pt lock xen: pcpu: remove unnecessary __ref annotation
2025-01-29Merge tag 'constfy-sysctl-6.14-rc1' of ↵Linus Torvalds2-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl Pull sysctl table constification from Joel Granados: "All ctl_table declared outside of functions and that remain unmodified after initialization are const qualified. This prevents unintended modifications to proc_handler function pointers by placing them in the .rodata section. This is a continuation of the tree-wide effort started a few releases ago with the constification of the ctl_table struct arguments in the sysctl API done in 78eb4ea25cd5 ("sysctl: treewide: constify the ctl_table argument of proc_handlers")" * tag 'constfy-sysctl-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl: treewide: const qualify ctl_tables where applicable
2025-01-29x86/sev: Disable jump tables in SEV startup codeArd Biesheuvel1-0/+4
When retpolines and IBT are both disabled, the compiler is free to use jump tables to optimize switch instructions. However, these are emitted by Clang as absolute references into .rodata: jmp *-0x7dfffe90(,%r9,8) R_X86_64_32S .rodata+0x170 Given that this code will execute before that address in .rodata has even been mapped, it is guaranteed to crash a SEV-SNP guest in a way that is difficult to diagnose. So disable jump tables when building this code. It would be better if we could attach this annotation to the __head macro but this appears to be impossible. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Tested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20250127114334.1045857-6-ardb+git@google.com
2025-01-28treewide: const qualify ctl_tables where applicableJoel Granados2-2/+2
Add the const qualifier to all the ctl_tables in the tree except for watchdog_hardlockup_sysctl, memory_allocation_profiling_sysctls, loadpin_sysctl_table and the ones calling register_net_sysctl (./net, drivers/inifiniband dirs). These are special cases as they use a registration function with a non-const qualified ctl_table argument or modify the arrays before passing them on to the registration function. Constifying ctl_table structs will prevent the modification of proc_handler function pointers as the arrays would reside in .rodata. This is made possible after commit 78eb4ea25cd5 ("sysctl: treewide: constify the ctl_table argument of proc_handlers") constified all the proc_handlers. Created this by running an spatch followed by a sed command: Spatch: virtual patch @ depends on !(file in "net") disable optional_qualifier @ identifier table_name != { watchdog_hardlockup_sysctl, iwcm_ctl_table, ucma_ctl_table, memory_allocation_profiling_sysctls, loadpin_sysctl_table }; @@ + const struct ctl_table table_name [] = { ... }; sed: sed --in-place \ -e "s/struct ctl_table .table = &uts_kern/const struct ctl_table *table = \&uts_kern/" \ kernel/utsname_sysctl.c Reviewed-by: Song Liu <song@kernel.org> Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org> # for kernel/trace/ Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> # SCSI Reviewed-by: Darrick J. Wong <djwong@kernel.org> # xfs Acked-by: Jani Nikula <jani.nikula@intel.com> Acked-by: Corey Minyard <cminyard@mvista.com> Acked-by: Wei Liu <wei.liu@kernel.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Bill O'Donnell <bodonnel@redhat.com> Acked-by: Baoquan He <bhe@redhat.com> Acked-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Acked-by: Anna Schumaker <anna.schumaker@oracle.com> Signed-off-by: Joel Granados <joel.granados@kernel.org>
2025-01-27Merge tag 'mm-stable-2025-01-26-14-59' of ↵Linus Torvalds12-91/+54
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: "The various patchsets are summarized below. Plus of course many indivudual patches which are described in their changelogs. - "Allocate and free frozen pages" from Matthew Wilcox reorganizes the page allocator so we end up with the ability to allocate and free zero-refcount pages. So that callers (ie, slab) can avoid a refcount inc & dec - "Support large folios for tmpfs" from Baolin Wang teaches tmpfs to use large folios other than PMD-sized ones - "Fix mm/rodata_test" from Petr Tesarik performs some maintenance and fixes for this small built-in kernel selftest - "mas_anode_descend() related cleanup" from Wei Yang tidies up part of the mapletree code - "mm: fix format issues and param types" from Keren Sun implements a few minor code cleanups - "simplify split calculation" from Wei Yang provides a few fixes and a test for the mapletree code - "mm/vma: make more mmap logic userland testable" from Lorenzo Stoakes continues the work of moving vma-related code into the (relatively) new mm/vma.c - "mm/page_alloc: gfp flags cleanups for alloc_contig_*()" from David Hildenbrand cleans up and rationalizes handling of gfp flags in the page allocator - "readahead: Reintroduce fix for improper RA window sizing" from Jan Kara is a second attempt at fixing a readahead window sizing issue. It should reduce the amount of unnecessary reading - "synchronously scan and reclaim empty user PTE pages" from Qi Zheng addresses an issue where "huge" amounts of pte pagetables are accumulated: https://lore.kernel.org/lkml/cover.1718267194.git.zhengqi.arch@bytedance.com/ Qi's series addresses this windup by synchronously freeing PTE memory within the context of madvise(MADV_DONTNEED) - "selftest/mm: Remove warnings found by adding compiler flags" from Muhammad Usama Anjum fixes some build warnings in the selftests code when optional compiler warnings are enabled - "mm: don't use __GFP_HARDWALL when migrating remote pages" from David Hildenbrand tightens the allocator's observance of __GFP_HARDWALL - "pkeys kselftests improvements" from Kevin Brodsky implements various fixes and cleanups in the MM selftests code, mainly pertaining to the pkeys tests - "mm/damon: add sample modules" from SeongJae Park enhances DAMON to estimate application working set size - "memcg/hugetlb: Rework memcg hugetlb charging" from Joshua Hahn provides some cleanups to memcg's hugetlb charging logic - "mm/swap_cgroup: remove global swap cgroup lock" from Kairui Song removes the global swap cgroup lock. A speedup of 10% for a tmpfs-based kernel build was demonstrated - "zram: split page type read/write handling" from Sergey Senozhatsky has several fixes and cleaups for zram in the area of zram_write_page(). A watchdog softlockup warning was eliminated - "move pagetable_*_dtor() to __tlb_remove_table()" from Kevin Brodsky cleans up the pagetable destructor implementations. A rare use-after-free race is fixed - "mm/debug: introduce and use VM_WARN_ON_VMG()" from Lorenzo Stoakes simplifies and cleans up the debugging code in the VMA merging logic - "Account page tables at all levels" from Kevin Brodsky cleans up and regularizes the pagetable ctor/dtor handling. This results in improvements in accounting accuracy - "mm/damon: replace most damon_callback usages in sysfs with new core functions" from SeongJae Park cleans up and generalizes DAMON's sysfs file interface logic - "mm/damon: enable page level properties based monitoring" from SeongJae Park increases the amount of information which is presented in response to DAMOS actions - "mm/damon: remove DAMON debugfs interface" from SeongJae Park removes DAMON's long-deprecated debugfs interfaces. Thus the migration to sysfs is completed - "mm/hugetlb: Refactor hugetlb allocation resv accounting" from Peter Xu cleans up and generalizes the hugetlb reservation accounting - "mm: alloc_pages_bulk: small API refactor" from Luiz Capitulino removes a never-used feature of the alloc_pages_bulk() interface - "mm/damon: extend DAMOS filters for inclusion" from SeongJae Park extends DAMOS filters to support not only exclusion (rejecting), but also inclusion (allowing) behavior - "Add zpdesc memory descriptor for zswap.zpool" from Alex Shi introduces a new memory descriptor for zswap.zpool that currently overlaps with struct page for now. This is part of the effort to reduce the size of struct page and to enable dynamic allocation of memory descriptors - "mm, swap: rework of swap allocator locks" from Kairui Song redoes and simplifies the swap allocator locking. A speedup of 400% was demonstrated for one workload. As was a 35% reduction for kernel build time with swap-on-zram - "mm: update mips to use do_mmap(), make mmap_region() internal" from Lorenzo Stoakes reworks MIPS's use of mmap_region() so that mmap_region() can be made MM-internal - "mm/mglru: performance optimizations" from Yu Zhao fixes a few MGLRU regressions and otherwise improves MGLRU performance - "Docs/mm/damon: add tuning guide and misc updates" from SeongJae Park updates DAMON documentation - "Cleanup for memfd_create()" from Isaac Manjarres does that thing - "mm: hugetlb+THP folio and migration cleanups" from David Hildenbrand provides various cleanups in the areas of hugetlb folios, THP folios and migration - "Uncached buffered IO" from Jens Axboe implements the new RWF_DONTCACHE flag which provides synchronous dropbehind for pagecache reading and writing. To permite userspace to address issues with massive buildup of useless pagecache when reading/writing fast devices - "selftests/mm: virtual_address_range: Reduce memory" from Thomas Weißschuh fixes and optimizes some of the MM selftests" * tag 'mm-stable-2025-01-26-14-59' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (321 commits) mm/compaction: fix UBSAN shift-out-of-bounds warning s390/mm: add missing ctor/dtor on page table upgrade kasan: sw_tags: use str_on_off() helper in kasan_init_sw_tags() tools: add VM_WARN_ON_VMG definition mm/damon/core: use str_high_low() helper in damos_wmark_wait_us() seqlock: add missing parameter documentation for raw_seqcount_try_begin() mm/page-writeback: consolidate wb_thresh bumping logic into __wb_calc_thresh mm/page_alloc: remove the incorrect and misleading comment zram: remove zcomp_stream_put() from write_incompressible_page() mm: separate move/undo parts from migrate_pages_batch() mm/kfence: use str_write_read() helper in get_access_type() selftests/mm/mkdirty: fix memory leak in test_uffdio_copy() kasan: hw_tags: Use str_on_off() helper in kasan_init_hw_tags() selftests/mm: virtual_address_range: avoid reading from VM_IO mappings selftests/mm: vm_util: split up /proc/self/smaps parsing selftests/mm: virtual_address_range: unmap chunks after validation selftests/mm: virtual_address_range: mmap() without PROT_WRITE selftests/memfd/memfd_test: fix possible NULL pointer dereference mm: add FGP_DONTCACHE folio creation flag mm: call filemap_fdatawrite_range_kick() after IOCB_DONTCACHE issue ...
2025-01-27Merge tag 'mm-nonmm-stable-2025-01-24-23-16' of ↵Linus Torvalds1-3/+2
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: "Mainly individually changelogged singleton patches. The patch series in this pull are: - "lib min_heap: Improve min_heap safety, testing, and documentation" from Kuan-Wei Chiu provides various tightenings to the min_heap library code - "xarray: extract __xa_cmpxchg_raw" from Tamir Duberstein preforms some cleanup and Rust preparation in the xarray library code - "Update reference to include/asm-<arch>" from Geert Uytterhoeven fixes pathnames in some code comments - "Converge on using secs_to_jiffies()" from Easwar Hariharan uses the new secs_to_jiffies() in various places where that is appropriate - "ocfs2, dlmfs: convert to the new mount API" from Eric Sandeen switches two filesystems to the new mount API - "Convert ocfs2 to use folios" from Matthew Wilcox does that - "Remove get_task_comm() and print task comm directly" from Yafang Shao removes now-unneeded calls to get_task_comm() in various places - "squashfs: reduce memory usage and update docs" from Phillip Lougher implements some memory savings in squashfs and performs some maintainability work - "lib: clarify comparison function requirements" from Kuan-Wei Chiu tightens the sort code's behaviour and adds some maintenance work - "nilfs2: protect busy buffer heads from being force-cleared" from Ryusuke Konishi fixes an issues in nlifs when the fs is presented with a corrupted image - "nilfs2: fix kernel-doc comments for function return values" from Ryusuke Konishi fixes some nilfs kerneldoc - "nilfs2: fix issues with rename operations" from Ryusuke Konishi addresses some nilfs BUG_ONs which syzbot was able to trigger - "minmax.h: Cleanups and minor optimisations" from David Laight does some maintenance work on the min/max library code - "Fixes and cleanups to xarray" from Kemeng Shi does maintenance work on the xarray library code" * tag 'mm-nonmm-stable-2025-01-24-23-16' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (131 commits) ocfs2: use str_yes_no() and str_no_yes() helper functions include/linux/lz4.h: add some missing macros Xarray: use xa_mark_t in xas_squash_marks() to keep code consistent Xarray: remove repeat check in xas_squash_marks() Xarray: distinguish large entries correctly in xas_split_alloc() Xarray: move forward index correctly in xas_pause() Xarray: do not return sibling entries from xas_find_marked() ipc/util.c: complete the kernel-doc function descriptions gcov: clang: use correct function param names latencytop: use correct kernel-doc format for func params minmax.h: remove some #defines that are only expanded once minmax.h: simplify the variants of clamp() minmax.h: move all the clamp() definitions after the min/max() ones minmax.h: use BUILD_BUG_ON_MSG() for the lo < hi test in clamp() minmax.h: reduce the #define expansion of min(), max() and clamp() minmax.h: update some comments minmax.h: add whitespace around operators and after commas nilfs2: do not update mtime of renamed directory that is not moved nilfs2: handle errors that nilfs_prepare_chunk() may return CREDITS: fix spelling mistake ...
2025-01-26mm/memblock: add memblock_alloc_or_panic interfaceGuo Weikang6-30/+7
Before SLUB initialization, various subsystems used memblock_alloc to allocate memory. In most cases, when memory allocation fails, an immediate panic is required. To simplify this behavior and reduce repetitive checks, introduce `memblock_alloc_or_panic`. This function ensures that memory allocation failures result in a panic automatically, improving code readability and consistency across subsystems that require this behavior. [guoweikang.kernel@gmail.com: arch/s390: save_area_alloc default failure behavior changed to panic] Link: https://lkml.kernel.org/r/20250109033136.2845676-1-guoweikang.kernel@gmail.com Link: https://lore.kernel.org/lkml/Z2fknmnNtiZbCc7x@kernel.org/ Link: https://lkml.kernel.org/r/20250102072528.650926-1-guoweikang.kernel@gmail.com Signed-off-by: Guo Weikang <guoweikang.kernel@gmail.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k] Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> [s390] Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-26asm-generic: pgalloc: provide generic __pgd_{alloc,free}Kevin Brodsky1-13/+11
We already have a generic implementation of alloc/free up to P4D level, as well as pgd_free(). Let's finish the work and add a generic PGD-level alloc helper as well. Unlike at lower levels, almost all architectures need some specific magic at PGD level (typically initialising PGD entries), so introducing a generic pgd_alloc() isn't worth it. Instead we introduce two new helpers, __pgd_alloc() and __pgd_free(), and make use of them in the arch-specific pgd_alloc() and pgd_free() wherever possible. To accommodate as many arch as possible, __pgd_alloc() takes a page allocation order. Because pagetable_alloc() allocates zeroed pages, explicit zeroing in pgd_alloc() becomes redundant and we can get rid of it. Some trivial implementations of pgd_free() also become unnecessary once __pgd_alloc() is used; remove them. Another small improvement is consistent accounting of PGD pages by using GFP_PGTABLE_{USER,KERNEL} as appropriate. Not all PGD allocations can be handled by the generic helpers. In particular, multiple architectures allocate PGDs from a kmem_cache, and those PGDs may not be page-sized. Link: https://lkml.kernel.org/r/20250103184415.2744423-6-kevin.brodsky@arm.com Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Linus Walleij <linus.walleij@linaro.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-26mm: pgtable: move __tlb_remove_table_one() in x86 to generic fileQi Zheng1-19/+0
The __tlb_remove_table_one() in x86 does not contain architecture-specific content, so move it to the generic file. Link: https://lkml.kernel.org/r/aab8a449bc67167943fd2cb5aab0a3a23b7b1cd7.1736317725.git.zhengqi.arch@bytedance.com Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Alexandre Ghiti <alex@ghiti.fr> Cc: Alexandre Ghiti <alexghiti@rivosinc.com> Cc: Andreas Larsson <andreas@gaisler.com> Cc: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jann Horn <jannh@google.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com> Cc: Will Deacon <will@kernel.org> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-26mm: pgtable: introduce generic __tlb_remove_table()Qi Zheng1-17/+0
Several architectures (arm, arm64, riscv and x86) define exactly the same __tlb_remove_table(), just introduce generic __tlb_remove_table() to eliminate these duplications. The s390 __tlb_remove_table() is nearly the same, so also make s390 __tlb_remove_table() version generic. Link: https://lkml.kernel.org/r/ea372633d94f4d3f9f56a7ec5994bf050bf77e39.1736317725.git.zhengqi.arch@bytedance.com Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com> Acked-by: Andreas Larsson <andreas@gaisler.com> [sparc] Acked-by: Alexander Gordeev <agordeev@linux.ibm.com> [s390] Acked-by: Arnd Bergmann <arnd@arndb.de> [asm-generic] Cc: Alexandre Ghiti <alex@ghiti.fr> Cc: Alexandre Ghiti <alexghiti@rivosinc.com> Cc: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jann Horn <jannh@google.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com> Cc: Will Deacon <will@kernel.org> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-26x86: pgtable: move pagetable_dtor() to __tlb_remove_table()Qi Zheng3-12/+6
Move pagetable_dtor() to __tlb_remove_table(), so that ptlock and page table pages can be freed together (regardless of whether RCU is used). This prevents the use-after-free problem where the ptlock is freed immediately but the page table pages is freed later via RCU. Link: https://lkml.kernel.org/r/27b3cdc8786bebd4f748380bf82f796482718504.1736317725.git.zhengqi.arch@bytedance.com Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Alexandre Ghiti <alex@ghiti.fr> Cc: Alexandre Ghiti <alexghiti@rivosinc.com> Cc: Andreas Larsson <andreas@gaisler.com> Cc: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jann Horn <jannh@google.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com> Cc: Will Deacon <will@kernel.org> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-26x86: pgtable: convert __tlb_remove_table() to use struct ptdescQi Zheng3-13/+19
Convert __tlb_remove_table() to use struct ptdesc, which will help to move pagetable_dtor() to __tlb_remove_table(). And page tables shouldn't have swap cache, so use pagetable_free() instead of free_page_and_swap_cache() to free page table pages. Link: https://lkml.kernel.org/r/39f60f93143ff77cf5d6b3c3e75af0ffc1480adb.1736317725.git.zhengqi.arch@bytedance.com Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Alexandre Ghiti <alex@ghiti.fr> Cc: Alexandre Ghiti <alexghiti@rivosinc.com> Cc: Andreas Larsson <andreas@gaisler.com> Cc: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jann Horn <jannh@google.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com> Cc: Will Deacon <will@kernel.org> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-26mm: pgtable: introduce pagetable_dtor()Qi Zheng1-6/+6
The pagetable_p*_dtor() are exactly the same except for the handling of ptlock. If we make ptlock_free() handle the case where ptdesc->ptl is NULL and remove VM_BUG_ON_PAGE() from pmd_ptlock_free(), we can unify pagetable_p*_dtor() into one function. Let's introduce pagetable_dtor() to do this. Later, pagetable_dtor() will be moved to tlb_remove_ptdesc(), so that ptlock and page table pages can be freed together (regardless of whether RCU is used). This prevents the use-after-free problem where the ptlock is freed immediately but the page table pages is freed later via RCU. Link: https://lkml.kernel.org/r/47f44fff9dc68d9d9e9a0d6c036df275f820598a.1736317725.git.zhengqi.arch@bytedance.com Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Originally-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com> Acked-by: Alexander Gordeev <agordeev@linux.ibm.com> [s390] Cc: Alexandre Ghiti <alex@ghiti.fr> Cc: Alexandre Ghiti <alexghiti@rivosinc.com> Cc: Andreas Larsson <andreas@gaisler.com> Cc: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jann Horn <jannh@google.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com> Cc: Will Deacon <will@kernel.org> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-26mm: pgtable: add statistics for P4D level page tableQi Zheng1-0/+3
Like other levels of page tables, add statistics for P4D level page table. Link: https://lkml.kernel.org/r/d55fe3c286305aae84457da9e1066df99b3de125.1736317725.git.zhengqi.arch@bytedance.com Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Originally-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Alexandre Ghiti <alex@ghiti.fr> Cc: Alexandre Ghiti <alexghiti@rivosinc.com> Cc: Andreas Larsson <andreas@gaisler.com> Cc: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jann Horn <jannh@google.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com> Cc: Will Deacon <will@kernel.org> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-26asm-generic: pgalloc: provide generic p4d_{alloc_one,free}Kevin Brodsky1-18/+0
Four architectures currently implement 5-level pgtables: arm64, riscv, x86 and s390. The first three have essentially the same implementation for p4d_alloc_one() and p4d_free(), so we've got an opportunity to reduce duplication like at the lower levels. Provide a generic version of p4d_alloc_one() and p4d_free(), and make use of it on those architectures. Their implementation is the same as at PUD level, except that p4d_free() performs a runtime check by calling mm_p4d_folded(). 5-level pgtables depend on a runtime-detected hardware feature on all supported architectures, so we might as well include this check in the generic implementation. No runtime check is required in p4d_alloc_one() as the top-level p4d_alloc() already does the required check. Link: https://lkml.kernel.org/r/26d69c74a29183ecc335b9b407040d8e4cd70c6a.1736317725.git.zhengqi.arch@bytedance.com Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Arnd Bergmann <arnd@arndb.de> [asm-generic] Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Alexandre Ghiti <alex@ghiti.fr> Cc: Alexandre Ghiti <alexghiti@rivosinc.com> Cc: Andreas Larsson <andreas@gaisler.com> Cc: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jann Horn <jannh@google.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com> Cc: Will Deacon <will@kernel.org> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-26Merge tag 'pci-v6.14-changes' of ↵Linus Torvalds1-0/+30
git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci Pull pci updates from Bjorn Helgaas: "Enumeration: - Batch sizing of multiple BARs while memory decoding is disabled instead of disabling/enabling decoding for each BAR individually; this optimizes virtualized environments where toggling decoding enable is expensive (Alex Williamson) - Add host bridge .enable_device() and .disable_device() hooks for bridges that need to configure things like Requester ID to StreamID mapping when enabling devices (Frank Li) - Extend struct pci_ecam_ops with .enable_device() and .disable_device() hooks so drivers that use pci_host_common_probe() instead of their own .probe() have a way to set the .enable_device() callbacks (Marc Zyngier) - Drop 'No bus range found' message so we don't complain when DTs don't specify the default 'bus-range = <0x00 0xff>' (Bjorn Helgaas) - Rename the drivers/pci/of_property.c struct of_pci_range to of_pci_range_entry to avoid confusion with the global of_pci_range in include/linux/of_address.h (Bjorn Helgaas) Driver binding: - Update resource request API documentation to encourage callers to supply a driver name when requesting resources (Philipp Stanner) - Export pci_intx_unmanaged() and pcim_intx() (always managed) so callers of pci_intx() (which is sometimes managed) can explicitly choose the one they need (Philipp Stanner) - Convert drivers from pci_intx() to always-managed pcim_intx() or never-managed pci_intx_unmanaged(): amd_sfh, ata (ahci, ata_piix, pata_rdc, sata_sil24, sata_sis, sata_uli, sata_vsc), bnx2x, bna, ntb, qtnfmac, rtsx, tifm_7xx1, vfio, xen-pciback (Philipp Stanner) - Remove pci_intx_unmanaged() since pci_intx() is now always unmanaged and pcim_intx() is always managed (Philipp Stanner) Error handling: - Unexport pcie_read_tlp_log() to encourage drivers to use PCI core logging rather than building their own (Ilpo Järvinen) - Move TLP Log handling to its own file (Ilpo Järvinen) - Store number of supported End-End TLP Prefixes always so we can read the correct number of DWORDs from the TLP Prefix Log (Ilpo Järvinen) - Read TLP Prefixes in addition to the Header Log in pcie_read_tlp_log() (Ilpo Järvinen) - Add pcie_print_tlp_log() to consolidate printing of TLP Header and Prefix Log (Ilpo Järvinen) - Quirk the Intel Raptor Lake-P PIO log size to accommodate vendor BIOSes that don't configure it correctly (Takashi Iwai) ASPM: - Save parent L1 PM Substates config so when we restore it along with an endpoint's config, the parent info isn't junk (Jian-Hong Pan) Power management: - Avoid D3 for Root Ports on TUXEDO Sirius Gen1 with old BIOS because the system can't wake up from suspend (Werner Sembach) Endpoint framework: - Destroy the EPC device in devm_pci_epc_destroy(), which previously didn't call devres_release() (Zijun Hu) - Finish virtual EP removal in pci_epf_remove_vepf(), which previously caused a subsequent pci_epf_add_vepf() to fail with -EBUSY (Zijun Hu) - Write BAR_MASK before iATU registers in pci_epc_set_bar() so we don't depend on the BAR_MASK reset value being larger than the requested BAR size (Niklas Cassel) - Prevent changing BAR size/flags in pci_epc_set_bar() to prevent reads from bypassing the iATU if we reduced the BAR size (Niklas Cassel) - Verify address alignment when programming iATU so we don't attempt to write bits that are read-only because of the BAR size, which could lead to directing accesses to the wrong address (Niklas Cassel) - Implement artpec6 pci_epc_features so we can rely on all drivers supporting it so we can use it in EPC core code (Niklas Cassel) - Check for BARs of fixed size to prevent endpoint drivers from trying to change their size (Niklas Cassel) - Verify that requested BAR size is a power of two when endpoint driver sets the BAR (Niklas Cassel) Endpoint framework tests: - Clear pci-epf-test dma_chan_rx, not dma_chan_tx, after freeing dma_chan_rx (Mohamed Khalfella) - Correct the DMA MEMCPY test so it doesn't fail if the Endpoint supports both DMA_PRIVATE and DMA_MEMCPY (Manivannan Sadhasivam) - Add pci-epf-test and pci_endpoint_test support for capabilities (Niklas Cassel) - Add Endpoint test for consecutive BARs (Niklas Cassel) - Remove redundant comparison from Endpoint BAR test because a > 1MB BAR can always be exactly covered by iterating with a 1MB buffer (Hans Zhang) - Move and convert PCI Endpoint tests from tools/pci to Kselftests (Manivannan Sadhasivam) Apple PCIe controller driver: - Convert StreamID mapping configuration from a bus notifier to the .enable_device() and .disable_device() callbacks (Marc Zyngier) Freescale i.MX6 PCIe controller driver: - Add Requester ID to StreamID mapping configuration when enabling devices (Frank Li) - Use DWC core suspend/resume functions for imx6 (Frank Li) - Add suspend/resume support for i.MX8MQ, i.MX8Q, and i.MX95 (Richard Zhu) - Add DT compatible string 'fsl,imx8q-pcie-ep' and driver support for i.MX8Q series (i.MX8QM, i.MX8QXP, and i.MX8DXL) Endpoints (Frank Li) - Add DT binding for optional i.MX95 Refclk and driver support to enable it if the platform hasn't enabled it (Richard Zhu) - Configure PHY based on controller being in Root Complex or Endpoint mode (Frank Li) - Rely on dbi2 and iATU base addresses from DT via dw_pcie_get_resources() instead of hardcoding them (Richard Zhu) - Deassert apps_reset in imx_pcie_deassert_core_reset() since it is asserted in imx_pcie_assert_core_reset() (Richard Zhu) - Add missing reference clock enable or disable logic for IMX6SX, IMX7D, IMX8MM (Richard Zhu) - Remove redundant imx7d_pcie_init_phy() since imx7d_pcie_enable_ref_clk() does the same thing (Richard Zhu) Freescale Layerscape PCIe controller driver: - Simplify by using syscon_regmap_lookup_by_phandle_args() instead of syscon_regmap_lookup_by_phandle() followed by of_property_read_u32_array() (Krzysztof Kozlowski) Marvell MVEBU PCIe controller driver: - Add MODULE_DEVICE_TABLE() to enable module autoloading (Liao Chen) MediaTek PCIe Gen3 controller driver: - Use clk_bulk_prepare_enable() instead of separate clk_bulk_prepare() and clk_bulk_enable() (Lorenzo Bianconi) - Rearrange reset assert/deassert so they're both done in the *_power_up() callbacks (Lorenzo Bianconi) - Document that Airoha EN7581 requires PHY init and power-on before PHY reset deassert, unlike other MediaTek Gen3 controllers (Lorenzo Bianconi) - Move Airoha EN7581 post-reset delay from the en7581 clock .enable() method to mtk_pcie_en7581_power_up() (Lorenzo Bianconi) - Sleep instead of delay during Airoha EN7581 power-up, since this is a non-atomic context (Lorenzo Bianconi) - Skip PERST# assertion on Airoha EN7581 during probe and suspend/resume to avoid a hardware defect (Lorenzo Bianconi) - Enable async probe to reduce system startup time (Douglas Anderson) Microchip PolarFlare PCIe controller driver: - Set up the inbound address translation based on whether the platform allows coherent or non-coherent DMA (Daire McNamara) - Update DT binding such that platforms are DMA-coherent by default and must specify 'dma-noncoherent' if needed (Conor Dooley) Mobiveil PCIe controller driver: - Convert mobiveil-pcie.txt to YAML and update 'interrupt-names' and 'reg-names' (Frank Li) Qualcomm PCIe controller driver: - Add DT SM8550 and SM8650 optional 'global' interrupt for link events (Neil Armstrong) - Add DT 'compatible' strings for IPQ5424 PCIe controller (Manikanta Mylavarapu) - If 'global' IRQ is supported for detection of Link Up events, tell DWC core not to wait for link up (Krishna chaitanya chundru) Renesas R-Car PCIe controller driver: - Avoid passing stack buffer as resource name (King Dix) Rockchip PCIe controller driver: - Simplify clock and reset handling by using bulk interfaces (Anand Moon) - Pass typed rockchip_pcie (not void) pointer to rockchip_pcie_disable_clocks() (Anand Moon) - Return -ENOMEM, not success, when pci_epc_mem_alloc_addr() fails (Dan Carpenter) Rockchip DesignWare PCIe controller driver: - Use dll_link_up IRQ to detect Link Up and enumerate devices so users don't have to manually rescan (Niklas Cassel) - Tell DWC core not to wait for link up since the 'sys' interrupt is required and detects Link Up events (Niklas Cassel) Synopsys DesignWare PCIe controller driver: - Don't wait for link up in DWC core if driver can detect Link Up event (Krishna chaitanya chundru) - Update ICC and OPP votes after Link Up events (Krishna chaitanya chundru) - Always stop link in dw_pcie_suspend_noirq(), which is required at least for i.MX8QM to re-establish link on resume (Richard Zhu) - Drop racy and unnecessary LTSSM state check before sending PME_TURN_OFF message in dw_pcie_suspend_noirq() (Richard Zhu) - Add struct of_pci_range.parent_bus_addr for devices that need their immediate parent bus address, not the CPU address, e.g., to program an internal Address Translation Unit (iATU) (Frank Li) TI DRA7xx PCIe controller driver: - Simplify by using syscon_regmap_lookup_by_phandle_args() instead of syscon_regmap_lookup_by_phandle() followed by of_parse_phandle_with_fixed_args() or of_property_read_u32_index() (Krzysztof Kozlowski) Xilinx Versal CPM PCIe controller driver: - Add DT binding and driver support for Xilinx Versal CPM5 (Thippeswamy Havalige) MicroSemi Switchtec management driver: - Add Microchip PCI100X device IDs (Rakesh Babu Saladi) Miscellaneous: - Move reset related sysfs code from pci.c to pci-sysfs.c where other similar code lives (Ilpo Järvinen) - Simplify reset_method_store() memory management by using __free() instead of explicit kfree() cleanup (Ilpo Järvinen) - Constify struct bin_attribute for sysfs, VPD, P2PDMA, and the IBM ACPI hotplug driver (Thomas Weißschuh) - Remove redundant PCI_VSEC_HDR and PCI_VSEC_HDR_LEN_SHIFT (Dongdong Zhang) - Correct documentation of the 'config_acs=' kernel parameter (Akihiko Odaki)" * tag 'pci-v6.14-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (111 commits) PCI: Batch BAR sizing operations dt-bindings: PCI: microchip,pcie-host: Allow dma-noncoherent PCI: microchip: Set inbound address translation for coherent or non-coherent mode Documentation: Fix pci=config_acs= example PCI: Remove redundant PCI_VSEC_HDR and PCI_VSEC_HDR_LEN_SHIFT PCI: Don't include 'pm_wakeup.h' directly selftests: pci_endpoint: Migrate to Kselftest framework selftests: Move PCI Endpoint tests from tools/pci to Kselftests misc: pci_endpoint_test: Fix IOCTL return value dt-bindings: PCI: qcom: Document the IPQ5424 PCIe controller dt-bindings: PCI: qcom,pcie-sm8550: Document 'global' interrupt dt-bindings: PCI: mobiveil: Convert mobiveil-pcie.txt to YAML PCI: switchtec: Add Microchip PCI100X device IDs misc: pci_endpoint_test: Remove redundant 'remainder' test misc: pci_endpoint_test: Add consecutive BAR test misc: pci_endpoint_test: Add support for capabilities PCI: endpoint: pci-epf-test: Add support for capabilities PCI: endpoint: pci-epf-test: Fix check for DMA MEMCPY test PCI: endpoint: pci-epf-test: Set dma_chan_rx pointer to NULL on error PCI: dwc: Simplify config resource lookup ...