Age | Commit message (Collapse) | Author | Files | Lines |
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
"Many singleton patches against the MM code. The patch series which are
included in this merge do the following:
- Peng Zhang has done some mapletree maintainance work in the series
'maple_tree: add mt_free_one() and mt_attr() helpers'
'Some cleanups of maple tree'
- In the series 'mm: use memmap_on_memory semantics for dax/kmem'
Vishal Verma has altered the interworking between memory-hotplug
and dax/kmem so that newly added 'device memory' can more easily
have its memmap placed within that newly added memory.
- Matthew Wilcox continues folio-related work (including a few fixes)
in the patch series
'Add folio_zero_tail() and folio_fill_tail()'
'Make folio_start_writeback return void'
'Fix fault handler's handling of poisoned tail pages'
'Convert aops->error_remove_page to ->error_remove_folio'
'Finish two folio conversions'
'More swap folio conversions'
- Kefeng Wang has also contributed folio-related work in the series
'mm: cleanup and use more folio in page fault'
- Jim Cromie has improved the kmemleak reporting output in the series
'tweak kmemleak report format'.
- In the series 'stackdepot: allow evicting stack traces' Andrey
Konovalov to permits clients (in this case KASAN) to cause eviction
of no longer needed stack traces.
- Charan Teja Kalla has fixed some accounting issues in the page
allocator's atomic reserve calculations in the series 'mm:
page_alloc: fixes for high atomic reserve caluculations'.
- Dmitry Rokosov has added to the samples/ dorectory some sample code
for a userspace memcg event listener application. See the series
'samples: introduce cgroup events listeners'.
- Some mapletree maintanance work from Liam Howlett in the series
'maple_tree: iterator state changes'.
- Nhat Pham has improved zswap's approach to writeback in the series
'workload-specific and memory pressure-driven zswap writeback'.
- DAMON/DAMOS feature and maintenance work from SeongJae Park in the
series
'mm/damon: let users feed and tame/auto-tune DAMOS'
'selftests/damon: add Python-written DAMON functionality tests'
'mm/damon: misc updates for 6.8'
- Yosry Ahmed has improved memcg's stats flushing in the series 'mm:
memcg: subtree stats flushing and thresholds'.
- In the series 'Multi-size THP for anonymous memory' Ryan Roberts
has added a runtime opt-in feature to transparent hugepages which
improves performance by allocating larger chunks of memory during
anonymous page faults.
- Matthew Wilcox has also contributed some cleanup and maintenance
work against eh buffer_head code int he series 'More buffer_head
cleanups'.
- Suren Baghdasaryan has done work on Andrea Arcangeli's series
'userfaultfd move option'. UFFDIO_MOVE permits userspace heap
compaction algorithms to move userspace's pages around rather than
UFFDIO_COPY'a alloc/copy/free.
- Stefan Roesch has developed a 'KSM Advisor', in the series 'mm/ksm:
Add ksm advisor'. This is a governor which tunes KSM's scanning
aggressiveness in response to userspace's current needs.
- Chengming Zhou has optimized zswap's temporary working memory use
in the series 'mm/zswap: dstmem reuse optimizations and cleanups'.
- Matthew Wilcox has performed some maintenance work on the writeback
code, both code and within filesystems. The series is 'Clean up the
writeback paths'.
- Andrey Konovalov has optimized KASAN's handling of alloc and free
stack traces for secondary-level allocators, in the series 'kasan:
save mempool stack traces'.
- Andrey also performed some KASAN maintenance work in the series
'kasan: assorted clean-ups'.
- David Hildenbrand has gone to town on the rmap code. Cleanups, more
pte batching, folio conversions and more. See the series 'mm/rmap:
interface overhaul'.
- Kinsey Ho has contributed some maintenance work on the MGLRU code
in the series 'mm/mglru: Kconfig cleanup'.
- Matthew Wilcox has contributed lruvec page accounting code cleanups
in the series 'Remove some lruvec page accounting functions'"
* tag 'mm-stable-2024-01-08-15-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (361 commits)
mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDER
mm, treewide: introduce NR_PAGE_ORDERS
selftests/mm: add separate UFFDIO_MOVE test for PMD splitting
selftests/mm: skip test if application doesn't has root privileges
selftests/mm: conform test to TAP format output
selftests: mm: hugepage-mmap: conform to TAP format output
selftests/mm: gup_test: conform test to TAP format output
mm/selftests: hugepage-mremap: conform test to TAP format output
mm/vmstat: move pgdemote_* out of CONFIG_NUMA_BALANCING
mm: zsmalloc: return -ENOSPC rather than -EINVAL in zs_malloc while size is too large
mm/memcontrol: remove __mod_lruvec_page_state()
mm/khugepaged: use a folio more in collapse_file()
slub: use a folio in __kmalloc_large_node
slub: use folio APIs in free_large_kmalloc()
slub: use alloc_pages_node() in alloc_slab_page()
mm: remove inc/dec lruvec page state functions
mm: ratelimit stat flush from workingset shrinker
kasan: stop leaking stack trace handles
mm/mglru: remove CONFIG_TRANSPARENT_HUGEPAGE
mm/mglru: add dummy pmd_dirty()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab
Pull slab updates from Vlastimil Babka:
- SLUB: delayed freezing of CPU partial slabs (Chengming Zhou)
Freezing is an operation involving double_cmpxchg() that makes a slab
exclusive for a particular CPU. Chengming noticed that we use it also
in situations where we are not yet installing the slab as the CPU
slab, because freezing also indicates that the slab is not on the
shared list. This results in redundant freeze/unfreeze operation and
can be avoided by marking separately the shared list presence by
reusing the PG_workingset flag.
This approach neatly avoids the issues described in 9b1ea29bc0d7
("Revert "mm, slub: consider rest of partial list if acquire_slab()
fails"") as we can now grab a slab from the shared list in a quick
and guaranteed way without the cmpxchg_double() operation that
amplifies the lock contention and can fail.
As a result, lkp has reported 34.2% improvement of
stress-ng.rawudp.ops_per_sec
- SLAB removal and SLUB cleanups (Vlastimil Babka)
The SLAB allocator has been deprecated since 6.5 and nobody has
objected so far. We agreed at LSF/MM to wait until the next LTS,
which is 6.6, so we should be good to go now.
This doesn't yet erase all traces of SLAB outside of mm/ so some dead
code, comments or documentation remain, and will be cleaned up
gradually (some series are already in the works).
Removing the choice of allocators has already allowed to simplify and
optimize the code wiring up the kmalloc APIs to the SLUB
implementation.
* tag 'slab-for-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab: (34 commits)
mm/slub: free KFENCE objects in slab_free_hook()
mm/slub: handle bulk and single object freeing separately
mm/slub: introduce __kmem_cache_free_bulk() without free hooks
mm/slub: fix bulk alloc and free stats
mm/slub: optimize free fast path code layout
mm/slub: optimize alloc fastpath code layout
mm/slub: remove slab_alloc() and __kmem_cache_alloc_lru() wrappers
mm/slab: move kmalloc() functions from slab_common.c to slub.c
mm/slab: move kmalloc_slab() to mm/slab.h
mm/slab: move kfree() from slab_common.c to slub.c
mm/slab: move struct kmem_cache_node from slab.h to slub.c
mm/slab: move memcg related functions from slab.h to slub.c
mm/slab: move pre/post-alloc hooks from slab.h to slub.c
mm/slab: consolidate includes in the internal mm/slab.h
mm/slab: move the rest of slub_def.h to mm/slab.h
mm/slab: move struct kmem_cache_cpu declaration to slub.c
mm/slab: remove mm/slab.c and slab_def.h
mm/mempool/dmapool: remove CONFIG_DEBUG_SLAB ifdefs
mm/slab: remove CONFIG_SLAB code from slab common code
cpu/hotplug: remove CPUHP_SLAB_PREPARE hooks
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo:
- Yafang Shao added task_get_cgroup1() helper to enable a similar BPF
helper so that BPF progs can be more useful on cgroup1 hierarchies.
While cgroup1 is mostly in maintenance mode, this addition is very
small while having an outsized usefulness for users who are still on
cgroup1. Yafang also optimized root cgroup list access by making it
RCU protected in the process.
- Waiman Long optimized rstat operation leading to substantially lower
and more consistent lock hold time while flushing the hierarchical
statistics. As the lock can be acquired briefly in various hot paths,
this reduction has cascading benefits.
- Waiman also improved the quality of isolation for cpuset's isolated
partitions. CPUs which are allocated to isolated partitions are now
excluded from running unbound work items and cpu_is_isolated() test
which is used by vmstat and memcg to reduce interference now includes
cpuset isolated CPUs. While it isn't there yet, the hope is
eventually reaching parity with the isolation level provided by the
`isolcpus` boot param but in a dynamic manner.
* tag 'cgroup-for-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup: Move rcu_head up near the top of cgroup_root
cgroup/cpuset: Include isolated cpuset CPUs in cpu_is_isolated() check
cgroup: Avoid false cacheline sharing of read mostly rstat_cpu
cgroup/rstat: Optimize cgroup_rstat_updated_list()
cgroup: Fix documentation for cpu.idle
cgroup/cpuset: Expose cpuset.cpus.isolated
workqueue: Move workqueue_set_unbound_cpumask() and its helpers inside CONFIG_SYSFS
cgroup/rstat: Reduce cpu_lock hold time in cgroup_rstat_flush_locked()
cgroup/cpuset: Take isolated CPUs out of workqueue unbound cpumask
cgroup/cpuset: Keep track of CPUs in isolated partitions
selftests/cgroup: Minor code cleanup and reorganization of test_cpuset_prs.sh
workqueue: Add workqueue_unbound_exclude_cpumask() to exclude CPUs from wq_unbound_cpumask
selftests: cgroup: Fixes a typo in a comment
cgroup: Add a new helper for cgroup1 hierarchy
cgroup: Add annotation for holding namespace_sem in current_cgns_cgroup_from_root()
cgroup: Eliminate the need for cgroup_mutex in proc_cgroup_show()
cgroup: Make operations on the cgroup root_list RCU safe
cgroup: Remove unnecessary list_empty()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
"Energy scheduling:
- Consolidate how the max compute capacity is used in the scheduler
and how we calculate the frequency for a level of utilization.
- Rework interface between the scheduler and the schedutil governor
- Simplify the util_est logic
Deadline scheduler:
- Work more towards reducing SCHED_DEADLINE starvation of low
priority tasks (e.g., SCHED_OTHER) tasks when higher priority tasks
monopolize CPU cycles, via the introduction of 'deadline servers'
(nested/2-level scheduling).
"Fair servers" to make use of this facility are not introduced yet.
EEVDF:
- Introduce O(1) fastpath for EEVDF task selection
NUMA balancing:
- Tune the NUMA-balancing vma scanning logic some more, to better
distribute the probability of a particular vma getting scanned.
Plus misc fixes, cleanups and updates"
* tag 'sched-core-2024-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (30 commits)
sched/fair: Fix tg->load when offlining a CPU
sched/fair: Remove unused 'next_buddy_marked' local variable in check_preempt_wakeup_fair()
sched/fair: Use all little CPUs for CPU-bound workloads
sched/fair: Simplify util_est
sched/fair: Remove SCHED_FEAT(UTIL_EST_FASTUP, true)
arm64/amu: Use capacity_ref_freq() to set AMU ratio
cpufreq/cppc: Set the frequency used for computing the capacity
cpufreq/cppc: Move and rename cppc_cpufreq_{perf_to_khz|khz_to_perf}()
energy_model: Use a fixed reference frequency
cpufreq/schedutil: Use a fixed reference frequency
cpufreq: Use the fixed and coherent frequency for scaling capacity
sched/topology: Add a new arch_scale_freq_ref() method
freezer,sched: Clean saved_state when restoring it during thaw
sched/fair: Update min_vruntime for reweight_entity() correctly
sched/doc: Update documentation after renames and synchronize Chinese version
sched/cpufreq: Rework iowait boost
sched/cpufreq: Rework schedutil governor performance estimation
sched/pelt: Avoid underestimation of task utilization
sched/timers: Explain why idle task schedules out on remote timer enqueue
sched/cpuidle: Comment about timers requirements VS idle handler
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull performance events updates from Ingo Molnar:
- Add branch stack counters ABI extension to better capture the growing
amount of information the PMU exposes via branch stack sampling.
There's matching tooling support.
- Fix race when creating the nr_addr_filters sysfs file
- Add Intel Sierra Forest and Grand Ridge intel/cstate PMU support
- Add Intel Granite Rapids, Sierra Forest and Grand Ridge uncore PMU
support
- Misc cleanups & fixes
* tag 'perf-core-2024-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel/uncore: Factor out topology_gidnid_map()
perf/x86/intel/uncore: Fix NULL pointer dereference issue in upi_fill_topology()
perf/x86/amd: Reject branch stack for IBS events
perf/x86/intel/uncore: Support Sierra Forest and Grand Ridge
perf/x86/intel/uncore: Support IIO free-running counters on GNR
perf/x86/intel/uncore: Support Granite Rapids
perf/x86/uncore: Use u64 to replace unsigned for the uncore offsets array
perf/x86/intel/uncore: Generic uncore_get_uncores and MMIO format of SPR
perf: Fix the nr_addr_filters fix
perf/x86/intel/cstate: Add Grand Ridge support
perf/x86/intel/cstate: Add Sierra Forest support
x86/smp: Export symbol cpu_clustergroup_mask()
perf/x86/intel/cstate: Cleanup duplicate attr_groups
perf/core: Fix narrow startup race when creating the perf nr_addr_filters sysfs file
perf/x86/intel: Support branch counters logging
perf/x86/intel: Reorganize attrs and is_visible
perf: Add branch_sample_call_stack
perf/x86: Add PERF_X86_EVENT_NEEDS_BRANCH_STACK flag
perf: Add branch stack counters
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq subsystem updates from Ingo Molnar:
- Add support for the IA55 interrupt controller on RZ/G3S SoC's
- Update/fix the Qualcom MPM Interrupt Controller driver's register
enumeration within the somewhat exotic "RPM Message RAM" MMIO-mapped
shared memory region that is used for other purposes as well
- Clean up the Xtensa built-in Programmable Interrupt Controller driver
(xtensa-pic) a bit
* tag 'irq-core-2024-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/irq-xtensa-pic: Clean up
irqchip/qcom-mpm: Support passing a slice of SRAM as reg space
dt-bindings: interrupt-controller: mpm: Pass MSG RAM slice through phandle
dt-bindings: interrupt-controller: renesas,rzg2l-irqc: Document RZ/G3S
irqchip/renesas-rzg2l: Add support for suspend to RAM
irqchip/renesas-rzg2l: Add macro to retrieve TITSR register offset based on register's index
irqchip/renesas-rzg2l: Implement restriction when writing ISCR register
irqchip/renesas-rzg2l: Document structure members
irqchip/renesas-rzg2l: Align struct member names to tabs
irqchip/renesas-rzg2l: Use tabs instead of spaces
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Ingo Molar:
"Lock guards:
- Use lock guards in the ptrace code
- Introduce conditional guards to extend to conditional lock
primitives like mutex_trylock()/mutex_lock_interruptible()/etc.
lockdep:
- Optimize 'struct lock_class' to be smaller
- Update file patterns in MAINTAINERS
mutexes:
- Document mutex lifetime rules a bit more"
* tag 'locking-core-2024-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/mutex: Clarify that mutex_unlock(), and most other sleeping locks, can still use the lock object after it's unlocked
locking/mutex: Document that mutex_unlock() is non-atomic
ptrace: Convert ptrace_attach() to use lock guards
locking/lockdep: Slightly reorder 'struct lock_class' to save some memory
MAINTAINERS: Add include/linux/lockdep*.h
cleanup: Add conditional guard support
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cleanups from Ingo Molnar:
- Change global variables to local
- Add missing kernel-doc function parameter descriptions
- Remove unused parameter from a macro
- Remove obsolete Kconfig entry
- Fix comments
- Fix typos, mostly scripted, manually reviewed
and a micro-optimization got misplaced as a cleanup:
- Micro-optimize the asm code in secondary_startup_64_no_verify()
* tag 'x86-cleanups-2024-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
arch/x86: Fix typos
x86/head_64: Use TESTB instead of TESTL in secondary_startup_64_no_verify()
x86/docs: Remove reference to syscall trampoline in PTI
x86/Kconfig: Remove obsolete config X86_32_SMP
x86/io: Remove the unused 'bw' parameter from the BUILDIO() macro
x86/mtrr: Document missing function parameters in kernel-doc
x86/setup: Make relocated_ramdisk a local variable of relocate_initrd()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Will Deacon:
"CPU features:
- Remove ARM64_HAS_NO_HW_PREFETCH copy_page() optimisation for ye
olde Thunder-X machines
- Avoid mapping KPTI trampoline when it is not required
- Make CPU capability API more robust during early initialisation
Early idreg overrides:
- Remove dependencies on core kernel helpers from the early
command-line parsing logic in preparation for moving this code
before the kernel is mapped
FPsimd:
- Restore kernel-mode fpsimd context lazily, allowing us to run
fpsimd code sequences in the kernel with pre-emption enabled
KBuild:
- Install 'vmlinuz.efi' when CONFIG_EFI_ZBOOT=y
- Makefile cleanups
LPA2 prep:
- Preparatory work for enabling the 'LPA2' extension, which will
introduce 52-bit virtual and physical addressing even with 4KiB
pages (including for KVM guests).
Misc:
- Remove dead code and fix a typo
MM:
- Pass NUMA node information for IRQ stack allocations
Perf:
- Add perf support for the Synopsys DesignWare PCIe PMU
- Add support for event counting thresholds (FEAT_PMUv3_TH)
introduced in Armv8.8
- Add support for i.MX8DXL SoCs to the IMX DDR PMU driver.
- Minor PMU driver fixes and optimisations
RIP VPIPT:
- Remove what support we had for the obsolete VPIPT I-cache policy
Selftests:
- Improvements to the SVE and SME selftests
Stacktrace:
- Refactor kernel unwind logic so that it can used by BPF unwinding
and, eventually, reliable backtracing
Sysregs:
- Update a bunch of register definitions based on the latest XML drop
from Arm"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (87 commits)
kselftest/arm64: Don't probe the current VL for unsupported vector types
efi/libstub: zboot: do not use $(shell ...) in cmd_copy_and_pad
arm64: properly install vmlinuz.efi
arm64/sysreg: Add missing system instruction definitions for FGT
arm64/sysreg: Add missing system register definitions for FGT
arm64/sysreg: Add missing ExtTrcBuff field definition to ID_AA64DFR0_EL1
arm64/sysreg: Add missing Pauth_LR field definitions to ID_AA64ISAR1_EL1
arm64: memory: remove duplicated include
arm: perf: Fix ARCH=arm build with GCC
arm64: Align boot cpucap handling with system cpucap handling
arm64: Cleanup system cpucap handling
MAINTAINERS: add maintainers for DesignWare PCIe PMU driver
drivers/perf: add DesignWare PCIe PMU driver
PCI: Move pci_clear_and_set_dword() helper to PCI header
PCI: Add Alibaba Vendor ID to linux/pci_ids.h
docs: perf: Add description for Synopsys DesignWare PCIe PMU driver
arm64: irq: set the correct node for shadow call stack
Revert "perf/arm_dmc620: Remove duplicate format attribute #defines"
arm64: fpsimd: Implement lazy restore for kernel mode FPSIMD
arm64: fpsimd: Preserve/restore kernel mode NEON at context switch
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
- Add initial support to recognise the HeXin C2000 processor.
- Add papr-vpd and papr-sysparm character device drivers for VPD &
sysparm retrieval, so userspace tools can be adapted to avoid doing
raw firmware calls from userspace.
- Sched domains optimisations for shared processor partitions on
P9/P10.
- A series of optimisations for KVM running as a nested HV under
PowerVM.
- Other small features and fixes.
Thanks to Aditya Gupta, Aneesh Kumar K.V, Arnd Bergmann, Christophe
Leroy, Colin Ian King, Dario Binacchi, David Heidelberg, Geoff Levand,
Gustavo A. R. Silva, Haoran Liu, Jordan Niethe, Kajol Jain, Kevin Hao,
Kunwu Chan, Li kunyu, Li zeming, Masahiro Yamada, Michal Suchánek,
Nathan Lynch, Naveen N Rao, Nicholas Piggin, Randy Dunlap, Sathvika
Vasireddy, Srikar Dronamraju, Stephen Rothwell, Vaibhav Jain, and
Zhao Ke.
* tag 'powerpc-6.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (96 commits)
powerpc/ps3_defconfig: Disable PPC64_BIG_ENDIAN_ELF_ABI_V2
powerpc/86xx: Drop unused CONFIG_MPC8610
powerpc/powernv: Add error handling to opal_prd_range_is_valid
selftests/powerpc: Fix spelling mistake "EACCESS" -> "EACCES"
powerpc/hvcall: Reorder Nestedv2 hcall opcodes
powerpc/ps3: Add missing set_freezable() for ps3_probe_thread()
powerpc/mpc83xx: Use wait_event_freezable() for freezable kthread
powerpc/mpc83xx: Add the missing set_freezable() for agent_thread_fn()
powerpc/fsl: Fix fsl,tmu-calibration to match the schema
powerpc/smp: Dynamically build Powerpc topology
powerpc/smp: Avoid asym packing within thread_group of a core
powerpc/smp: Add __ro_after_init attribute
powerpc/smp: Disable MC domain for shared processor
powerpc/smp: Enable Asym packing for cores on shared processor
powerpc/sched: Cleanup vcpu_is_preempted()
powerpc: add cpu_spec.cpu_features to vmcoreinfo
powerpc/imc-pmu: Add a null pointer check in update_events_in_group()
powerpc/powernv: Add a null pointer check in opal_powercap_init()
powerpc/powernv: Add a null pointer check in opal_event_init()
powerpc/powernv: Add a null pointer check to scom_debug_init_one()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 RAS updates from Borislav Petkov:
- Convert the hw error storm handling into a finer-grained, per-bank
solution which allows for more timely detection and reporting of
errors
- Start a documentation section which will hold down relevant RAS
features description and how they should be used
- Add new AMD error bank types
- Slim down and remove error type descriptions from the kernel side of
error decoding to rasdaemon which can be used from now on to decode
hw errors on AMD
- Mark pages containing uncorrectable errors as poison so that kdump
can avoid them and thus not cause another panic
- The usual cleanups and fixlets
* tag 'ras_core_for_v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mce: Handle Intel threshold interrupt storms
x86/mce: Add per-bank CMCI storm mitigation
x86/mce: Remove old CMCI storm mitigation code
Documentation: Begin a RAS section
x86/MCE/AMD: Add new MA_LLC, USR_DP, and USR_CP bank types
EDAC/mce_amd: Remove SMCA Extended Error code descriptions
x86/mce/amd, EDAC/mce_amd: Move long names to decoder module
x86/mce/inject: Clear test status value
x86/mce: Remove redundant check from mce_device_create()
x86/mce: Mark fatal MCE's page as poison to avoid panic in the kdump kernel
|
|
commit 23baf831a32c ("mm, treewide: redefine MAX_ORDER sanely") has
changed the definition of MAX_ORDER to be inclusive. This has caused
issues with code that was not yet upstream and depended on the previous
definition.
To draw attention to the altered meaning of the define, rename MAX_ORDER
to MAX_PAGE_ORDER.
Link: https://lkml.kernel.org/r/20231228144704.14033-2-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
NR_PAGE_ORDERS defines the number of page orders supported by the page
allocator, ranging from 0 to MAX_ORDER, MAX_ORDER + 1 in total.
NR_PAGE_ORDERS assists in defining arrays of page orders and allows for
more natural iteration over them.
[kirill.shutemov@linux.intel.com: fixup for kerneldoc warning]
Link: https://lkml.kernel.org/r/20240101111512.7empzyifq7kxtzk3@box
Link: https://lkml.kernel.org/r/20231228144704.14033-1-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull misc x86 updates from Borislav Petkov:
- Add an informational message which gets issued when IA32 emulation
has been disabled on the cmdline
- Clarify in detail how /proc/cpuinfo is used on x86
- Fix a theoretical overflow in num_digits()
* tag 'x86_misc_for_v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/ia32: State that IA32 emulation is disabled
Documentation/x86: Document what /proc/cpuinfo is for
x86/lib: Fix overflow when counting digits
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs super updates from Christian Brauner:
"This contains the super work for this cycle including the long-awaited
series by Jan to make it possible to prevent writing to mounted block
devices:
- Writing to mounted devices is dangerous and can lead to filesystem
corruption as well as crashes. Furthermore syzbot comes with more
and more involved examples how to corrupt block device under a
mounted filesystem leading to kernel crashes and reports we can do
nothing about. Add tracking of writers to each block device and a
kernel cmdline argument which controls whether other writeable
opens to block devices open with BLK_OPEN_RESTRICT_WRITES flag are
allowed.
Note that this effectively only prevents modification of the
particular block device's page cache by other writers. The actual
device content can still be modified by other means - e.g. by
issuing direct scsi commands, by doing writes through devices lower
in the storage stack (e.g. in case loop devices, DM, or MD are
involved) etc. But blocking direct modifications of the block
device page cache is enough to give filesystems a chance to perform
data validation when loading data from the underlying storage and
thus prevent kernel crashes.
Syzbot can use this cmdline argument option to avoid uninteresting
crashes. Also users whose userspace setup does not need writing to
mounted block devices can set this option for hardening. We expect
that this will be interesting to quite a few workloads.
Btrfs is currently opted out of this because they still haven't
merged patches we require for this to work from three kernel
releases ago.
- Reimplement block device freezing and thawing as holder operations
on the block device.
This allows us to extend block device freezing to all devices
associated with a superblock and not just the main device. It also
allows us to remove get_active_super() and thus another function
that scans the global list of superblocks.
Freezing via additional block devices only works if the filesystem
chooses to use @fs_holder_ops for these additional devices as well.
That currently only includes ext4 and xfs.
Earlier releases switched get_tree_bdev() and mount_bdev() to use
@fs_holder_ops. The remaining nilfs2 open-coded version of
mount_bdev() has been converted to rely on @fs_holder_ops as well.
So block device freezing for the main block device will continue to
work as before.
There should be no regressions in functionality. The only special
case is btrfs where block device freezing for the main block device
never worked because sb->s_bdev isn't set. Block device freezing
for btrfs can be fixed once they can switch to @fs_holder_ops but
that can happen whenever they're ready"
* tag 'vfs-6.8.super' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (27 commits)
block: Fix a memory leak in bdev_open_by_dev()
super: don't bother with WARN_ON_ONCE()
super: massage wait event mechanism
ext4: Block writes to journal device
xfs: Block writes to log device
fs: Block writes to mounted block devices
btrfs: Do not restrict writes to btrfs devices
block: Add config option to not allow writing to mounted devices
block: Remove blkdev_get_by_*() functions
bcachefs: Convert to bdev_open_by_path()
fs: handle freezing from multiple devices
fs: remove dead check
nilfs2: simplify device handling
fs: streamline thaw_super_locked
ext4: simplify device handling
xfs: simplify device handling
fs: simplify setup_bdev_super() calls
blkdev: comment fs_holder_ops
porting: document block device freeze and thaw changes
fs: remove unused helper
...
|
|
can still use the lock object after it's unlocked
Clarify the mutex lock lifetime rules a bit more.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Jann Horn <jannh@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20231201121808.GL3818@noisy.programming.kicks-ass.net
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"Improve the detection when to run atomic transfer handlers for kernels
with preemption disabled. This removes some false positive splats a
number of users were seeing if their driver didn't have support for
atomic transfers.
Also, fix a typo in the docs while we are here"
* tag 'i2c-for-6.7-final' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: core: Fix atomic xfer check for non-preempt config
Documentation/i2c: fix spelling error in i2c-address-translators
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from wireless and netfilter.
We haven't accumulated much over the break. If it wasn't for the
uninterrupted stream of fixes for Intel drivers this PR would be very
slim. There was a handful of user reports, however, either they stood
out because of the lower traffic or users have had more time to test
over the break. The ones which are v6.7-relevant should be wrapped up.
Current release - regressions:
- Revert "net: ipv6/addrconf: clamp preferred_lft to the minimum
required", it caused issues on networks where routers send prefixes
with preferred_lft=0
- wifi:
- iwlwifi: pcie: don't synchronize IRQs from IRQ, prevent deadlock
- mac80211: fix re-adding debugfs entries during reconfiguration
Current release - new code bugs:
- tcp: print AO/MD5 messages only if there are any keys
Previous releases - regressions:
- virtio_net: fix missing dma unmap for resize, prevent OOM
Previous releases - always broken:
- mptcp: prevent tcp diag from closing listener subflows
- nf_tables:
- set transport header offset for egress hook, fix IPv4 mangling
- skip set commit for deleted/destroyed sets, avoid double deactivation
- nat: make sure action is set for all ct states, fix openvswitch
matching on ICMP packets in related state
- eth: mlxbf_gige: fix receive hang under heavy traffic
- eth: r8169: fix PCI error on system resume for RTL8168FP
- net: add missing getsockopt(SO_TIMESTAMPING_NEW) and cmsg handling"
* tag 'net-6.7-rc9' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (52 commits)
net/tcp: Only produce AO/MD5 logs if there are any keys
net: Implement missing SO_TIMESTAMPING_NEW cmsg support
bnxt_en: Remove mis-applied code from bnxt_cfg_ntp_filters()
net: ravb: Wait for operating mode to be applied
asix: Add check for usbnet_get_endpoints
octeontx2-af: Re-enable MAC TX in otx2_stop processing
octeontx2-af: Always configure NIX TX link credits based on max frame size
net/smc: fix invalid link access in dumping SMC-R connections
net/qla3xxx: fix potential memleak in ql_alloc_buffer_queues
virtio_net: fix missing dma unmap for resize
igc: Fix hicredit calculation
ice: fix Get link status data length
i40e: Restore VF MSI-X state during PCI reset
i40e: fix use-after-free in i40e_aqc_add_filters()
net: Save and restore msg_namelen in sock_sendmsg
netfilter: nft_immediate: drop chain reference counter on error
netfilter: nf_nat: fix action not being set for all ct states
net: bcmgenet: Fix FCS generation for fragmented skbuffs
mptcp: prevent tcp diag from closing listener subflows
MAINTAINERS: add Geliang as reviewer for MPTCP
...
|
|
* for-next/perf: (30 commits)
arm: perf: Fix ARCH=arm build with GCC
MAINTAINERS: add maintainers for DesignWare PCIe PMU driver
drivers/perf: add DesignWare PCIe PMU driver
PCI: Move pci_clear_and_set_dword() helper to PCI header
PCI: Add Alibaba Vendor ID to linux/pci_ids.h
docs: perf: Add description for Synopsys DesignWare PCIe PMU driver
Revert "perf/arm_dmc620: Remove duplicate format attribute #defines"
Documentation: arm64: Document the PMU event counting threshold feature
arm64: perf: Add support for event counting threshold
arm: pmu: Move error message and -EOPNOTSUPP to individual PMUs
KVM: selftests: aarch64: Update tools copy of arm_pmuv3.h
perf/arm_dmc620: Remove duplicate format attribute #defines
arm: pmu: Share user ABI format mechanism with SPE
arm64: perf: Include threshold control fields in PMEVTYPER mask
arm: perf: Convert remaining fields to use GENMASK
arm: perf: Use GENMASK for PMMIR fields
arm: perf/kvm: Use GENMASK for ARMV8_PMU_PMCR_N
arm: perf: Remove inlines from arm_pmuv3.c
drivers/perf: arm_dsu_pmu: Remove kerneldoc-style comment syntax
drivers/perf: Remove usage of the deprecated ida_simple_xx() API
...
|
|
The commit had a bug and might not have been the right approach anyway.
Fixes: 629df6701c8a ("net: ipv6/addrconf: clamp preferred_lft to the minimum required")
Fixes: ec575f885e3e ("Documentation: networking: explain what happens if temp_prefered_lft is too small or too large")
Reported-by: Dan Moulding <dan@danm.net>
Closes: https://lore.kernel.org/netdev/20231221231115.12402-1-dan@danm.net/
Link: https://lore.kernel.org/netdev/CAMMLpeTdYhd=7hhPi2Y7pwdPCgnnW5JYh-bu3hSc7im39uxnEA@mail.gmail.com/
Signed-off-by: Alex Henrie <alexhenrie24@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20231230043252.10530-1-alexhenrie24@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Pick up these commits from Linus's tree:
b106bcf0f99a ("locking/osq_lock: Clarify osq_wait_next()")
563adbfc351b ("locking/osq_lock: Clarify osq_wait_next() calling convention")
7c2230982129 ("locking/osq_lock: Move the definition of optimistic_spin_node into osq_lock.c")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
During our experiment with zswap, we sometimes observe swap IOs due to
occasional zswap store failures and writebacks-to-swap. These swapping
IOs prevent many users who cannot tolerate swapping from adopting zswap to
save memory and improve performance where possible.
This patch adds the option to disable this behavior entirely: do not
writeback to backing swapping device when a zswap store attempt fail, and
do not write pages in the zswap pool back to the backing swap device (both
when the pool is full, and when the new zswap shrinker is called).
This new behavior can be opted-in/out on a per-cgroup basis via a new
cgroup file. By default, writebacks to swap device is enabled, which is
the previous behavior. Initially, writeback is enabled for the root
cgroup, and a newly created cgroup will inherit the current setting of its
parent.
Note that this is subtly different from setting memory.swap.max to 0, as
it still allows for pages to be stored in the zswap pool (which itself
consumes swap space in its current form).
This patch should be applied on top of the zswap shrinker series:
https://lore.kernel.org/linux-mm/20231130194023.4102148-1-nphamcs@gmail.com/
as it also disables the zswap shrinker, a major source of zswap
writebacks.
For the most part, this feature is motivated by internal parties who
have already established their opinions regarding swapping - the
workloads that are highly sensitive to IO, and especially those who are
using servers with really slow disk performance (for instance, massive
but slow HDDs). For these folks, it's impossible to convince them to
even entertain zswap if swapping also comes as a packaged deal.
Writeback disabling is quite a useful feature in these situations - on
a mixed workloads deployment, they can disable writeback for the more
IO-sensitive workloads, and enable writeback for other background
workloads.
For instance, on a server with HDD, I allocate memories and populate
them with random values (so that zswap store will always fail), and
specify memory.high low enough to trigger reclaim. The time it takes
to allocate the memories and just read through it a couple of times
(doing silly things like computing the values' average etc.):
zswap.writeback disabled:
real 0m30.537s
user 0m23.687s
sys 0m6.637s
0 pages swapped in
0 pages swapped out
zswap.writeback enabled:
real 0m45.061s
user 0m24.310s
sys 0m8.892s
712686 pages swapped in
461093 pages swapped out
(the last two lines are from vmstat -s).
[nphamcs@gmail.com: add a comment about recurring zswap store failures leading to reclaim inefficiency]
Link: https://lkml.kernel.org/r/20231221005725.3446672-1-nphamcs@gmail.com
Link: https://lkml.kernel.org/r/20231207192406.3809579-1-nphamcs@gmail.com
Signed-off-by: Nhat Pham <nphamcs@gmail.com>
Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Yosry Ahmed <yosryahmed@google.com>
Acked-by: Chris Li <chrisl@kernel.org>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: David Heidelberg <david@ixit.cz>
Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vitaly Wool <vitaly.wool@konsulko.com>
Cc: Zefan Li <lizefan.x@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
We removed all "bool compound" and RMAP_COMPOUND parameters. Let's remove
the remaining "compound" terminology by making COMPOUND_MAPPED match the
"folio->_entire_mapcount" terminology, renaming it to ENTIRELY_MAPPED.
ENTIRELY_MAPPED is only used when the whole folio is mapped using a single
page table entry (e.g., a single PMD mapping a PMD-sized THP). For now,
we don't support mapping any THP bigger than that, so ENTIRELY_MAPPED only
applies to PMD-mapped PMD-sized THP only.
Link: https://lkml.kernel.org/r/20231220224504.646757-40-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Yin Fengwei <fengwei.yin@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Refer to folio_remove_rmap_*() instaed.
Link: https://lkml.kernel.org/r/20231220224504.646757-32-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Yin Fengwei <fengwei.yin@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
This documents the KSM advisor and its new knobs in /sys/fs/kernel/mm.
Link: https://lkml.kernel.org/r/20231218231054.1625219-5-shr@devkernel.io
Signed-off-by: Stefan Roesch <shr@devkernel.io>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Rik van Riel <riel@surriel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Implement the uABI of UFFDIO_MOVE ioctl.
UFFDIO_COPY performs ~20% better than UFFDIO_MOVE when the application
needs pages to be allocated [1]. However, with UFFDIO_MOVE, if pages are
available (in userspace) for recycling, as is usually the case in heap
compaction algorithms, then we can avoid the page allocation and memcpy
(done by UFFDIO_COPY). Also, since the pages are recycled in the
userspace, we avoid the need to release (via madvise) the pages back to
the kernel [2].
We see over 40% reduction (on a Google pixel 6 device) in the compacting
thread's completion time by using UFFDIO_MOVE vs. UFFDIO_COPY. This was
measured using a benchmark that emulates a heap compaction implementation
using userfaultfd (to allow concurrent accesses by application threads).
More details of the usecase are explained in [2]. Furthermore,
UFFDIO_MOVE enables moving swapped-out pages without touching them within
the same vma. Today, it can only be done by mremap, however it forces
splitting the vma.
[1] https://lore.kernel.org/all/1425575884-2574-1-git-send-email-aarcange@redhat.com/
[2] https://lore.kernel.org/linux-mm/CA+EESO4uO84SSnBhArH4HvLNhaUQ5nZKNKXqxRCyjniNVjp0Aw@mail.gmail.com/
Update for the ioctl_userfaultfd(2) manpage:
UFFDIO_MOVE
(Since Linux xxx) Move a continuous memory chunk into the
userfault registered range and optionally wake up the blocked
thread. The source and destination addresses and the number of
bytes to move are specified by the src, dst, and len fields of
the uffdio_move structure pointed to by argp:
struct uffdio_move {
__u64 dst; /* Destination of move */
__u64 src; /* Source of move */
__u64 len; /* Number of bytes to move */
__u64 mode; /* Flags controlling behavior of move */
__s64 move; /* Number of bytes moved, or negated error */
};
The following value may be bitwise ORed in mode to change the
behavior of the UFFDIO_MOVE operation:
UFFDIO_MOVE_MODE_DONTWAKE
Do not wake up the thread that waits for page-fault
resolution
UFFDIO_MOVE_MODE_ALLOW_SRC_HOLES
Allow holes in the source virtual range that is being moved.
When not specified, the holes will result in ENOENT error.
When specified, the holes will be accounted as successfully
moved memory. This is mostly useful to move hugepage aligned
virtual regions without knowing if there are transparent
hugepages in the regions or not, but preventing the risk of
having to split the hugepage during the operation.
The move field is used by the kernel to return the number of
bytes that was actually moved, or an error (a negated errno-
style value). If the value returned in move doesn't match the
value that was specified in len, the operation fails with the
error EAGAIN. The move field is output-only; it is not read by
the UFFDIO_MOVE operation.
The operation may fail for various reasons. Usually, remapping of
pages that are not exclusive to the given process fail; once KSM
might deduplicate pages or fork() COW-shares pages during fork()
with child processes, they are no longer exclusive. Further, the
kernel might only perform lightweight checks for detecting whether
the pages are exclusive, and return -EBUSY in case that check fails.
To make the operation more likely to succeed, KSM should be
disabled, fork() should be avoided or MADV_DONTFORK should be
configured for the source VMA before fork().
This ioctl(2) operation returns 0 on success. In this case, the
entire area was moved. On error, -1 is returned and errno is
set to indicate the error. Possible errors include:
EAGAIN The number of bytes moved (i.e., the value returned in
the move field) does not equal the value that was
specified in the len field.
EINVAL Either dst or len was not a multiple of the system page
size, or the range specified by src and len or dst and len
was invalid.
EINVAL An invalid bit was specified in the mode field.
ENOENT
The source virtual memory range has unmapped holes and
UFFDIO_MOVE_MODE_ALLOW_SRC_HOLES is not set.
EEXIST
The destination virtual memory range is fully or partially
mapped.
EBUSY
The pages in the source virtual memory range are either
pinned or not exclusive to the process. The kernel might
only perform lightweight checks for detecting whether the
pages are exclusive. To make the operation more likely to
succeed, KSM should be disabled, fork() should be avoided
or MADV_DONTFORK should be configured for the source virtual
memory area before fork().
ENOMEM Allocating memory needed for the operation failed.
ESRCH
The target process has exited at the time of a UFFDIO_MOVE
operation.
Link: https://lkml.kernel.org/r/20231206103702.3873743-3-surenb@google.com
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Brian Geffon <bgeffon@google.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Nicolas Geoffray <ngeoffray@google.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: ZhangPeng <zhangpeng362@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Correct to "stretched" from "streched" in "keeps clock streched on bus A
waiting for reply".
Signed-off-by: Attreyee Mukherjee <tintinm2017@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char / misc driver fixes from Greg KH:
"Here are a small number of various driver fixes for 6.7-rc7 that
normally come through the char-misc tree, and one debugfs fix as well.
Included in here are:
- iio and hid sensor driver fixes for a number of small things
- interconnect driver fixes
- brcm_nvmem driver fixes
- debugfs fix for previous fix
- guard() definition in device.h so that many subsystems can start
using it for 6.8-rc1 (requested by Dan Williams to make future
merges easier)
All of these have been in linux-next for a while with no reported
issues"
* tag 'char-misc-6.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (21 commits)
debugfs: initialize cancellations earlier
Revert "iio: hid-sensor-als: Add light color temperature support"
Revert "iio: hid-sensor-als: Add light chromaticity support"
nvmem: brcm_nvram: store a copy of NVRAM content
dt-bindings: nvmem: mxs-ocotp: Document fsl,ocotp
driver core: Add a guard() definition for the device_lock()
interconnect: qcom: icc-rpm: Fix peak rate calculation
iio: adc: MCP3564: fix hardware identification logic
iio: adc: MCP3564: fix calib_bias and calib_scale range checks
iio: adc: meson: add separate config for axg SoC family
iio: adc: imx93: add four channels for imx93 adc
iio: adc: ti_am335x_adc: Fix return value check of tiadc_request_dma()
interconnect: qcom: sm8250: Enable sync_state
iio: triggered-buffer: prevent possible freeing of wrong buffer
iio: imu: inv_mpu6050: fix an error code problem in inv_mpu6050_read_raw
iio: imu: adis16475: use bit numbers in assign_bit()
iio: imu: adis16475: add spi_device_id table
iio: tmag5273: fix temperature offset
interconnect: Treat xlate() returning NULL node as an error
iio: common: ms_sensors: ms_sensors_i2c: fix humidity conversion time table
...
|
|
sched_feat(UTIL_EST_FASTUP) has been added to easily disable the feature
in order to check for possibly related regressions. After 3 years, it has
never been used and no regression has been reported. Let's remove it
and make fast increase a permanent behavior.
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Lukasz Luba <lukasz.luba@arm.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Reviewed-by: Hongyan Xia <hongyan.xia2@arm.com>
Reviewed-by: Tang Yizhou <yizhou.tang@shopee.com>
Reviewed-by: Yanteng Si <siyanteng@loongson.cn> [for the Chinese translation]
Reviewed-by: Alex Shi <alexs@kernel.org>
Link: https://lore.kernel.org/r/20231201161652.1241695-2-vincent.guittot@linaro.org
|
|
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
commands
There are eight command inputs for 'state' DAMON sysfs file, and those are
verbosely explained in multiple paragraphs. It is not easy to find
explanation of specific command, and getting whole picture of supported
commands. Replace the paragraphs with a list.
Link: https://lkml.kernel.org/r/20231213190338.54146-7-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
'Sysfs Files Hierarchy' section of DAMON usage document shows whole
picture of the interface. Then sections for detailed explanation of the
files follow. Due to the amount of the files, navigating between the
whole picture and the section for specific files sometimes require no
subtle amount of scrolling. Add links from the whole picture to the
dedicated sections for making the navigation easier.
Link: https://lkml.kernel.org/r/20231213190338.54146-6-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The label for context DAMON sysfs directory section is having name
sysfs_contexts. The name would be better to be used for the contexts
directory. Rename it to represent a single context.
Link: https://lkml.kernel.org/r/20231213190338.54146-5-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The execution model and data structures section at the end of the design
document is briefly explaining how DAMON works overall. Knowing that
first may help better drawing the overall picture. It may also help
better understanding following detailed sections. Move it to the
beginning of the document.
Link: https://lkml.kernel.org/r/20231213190338.54146-4-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
In preparation for adding support for anonymous multi-size THP, introduce
new sysfs structure that will be used to control the new behaviours. A
new directory is added under transparent_hugepage for each supported THP
size, and contains an `enabled` file, which can be set to "inherit" (to
inherit the global setting), "always", "madvise" or "never". For now, the
kernel still only supports PMD-sized anonymous THP, so only 1 directory is
populated.
The first half of the change converts transhuge_vma_suitable() and
hugepage_vma_check() so that they take a bitfield of orders for which the
user wants to determine support, and the functions filter out all the
orders that can't be supported, given the current sysfs configuration and
the VMA dimensions. The resulting functions are renamed to
thp_vma_suitable_orders() and thp_vma_allowable_orders() respectively.
Convenience functions that take a single, unencoded order and return a
boolean are also defined as thp_vma_suitable_order() and
thp_vma_allowable_order().
The second half of the change implements the new sysfs interface. It has
been done so that each supported THP size has a `struct thpsize`, which
describes the relevant metadata and is itself a kobject. This is pretty
minimal for now, but should make it easy to add new per-thpsize files to
the interface if needed in future (e.g. per-size defrag). Rather than
keep the `enabled` state directly in the struct thpsize, I've elected to
directly encode it into huge_anon_orders_[always|madvise|inherit]
bitfields since this reduces the amount of work required in
thp_vma_allowable_orders() which is called for every page fault.
See Documentation/admin-guide/mm/transhuge.rst, as modified by this
commit, for details of how the new sysfs interface works.
[ryan.roberts@arm.com: fix build warning when CONFIG_SYSFS is disabled]
Link: https://lkml.kernel.org/r/20231211125320.3997543-1-ryan.roberts@arm.com
Link: https://lkml.kernel.org/r/20231207161211.2374093-4-ryan.roberts@arm.com
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: Barry Song <v-songbaohua@oppo.com>
Tested-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Tested-by: John Hubbard <jhubbard@nvidia.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Rientjes <rientjes@google.com>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Itaru Kitayama <itaru.kitayama@gmail.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Yin Fengwei <fengwei.yin@intel.com>
Cc: Yu Zhao <yuzhao@google.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Pull drm fixes from Dave Airlie:
"More regular fixes, amdgpu, i915, mediatek and nouveau are most of
them this week. Nothing too major, then a few misc bits and pieces in
core, panel and ivpu.
drm:
- fix uninit problems in crtc
- fix fd ownership check
- edid: add modes in fallback paths
panel:
- move LG panel into DSI yaml
- ltk050h3146w: set burst mode
mediatek:
- mtk_disp_gamma: Fix breakage due to merge issue
- fix kernel oops if no crtc is found
- Add spinlock for setting vblank event in atomic_begin
- Fix access violation in mtk_drm_crtc_dma_dev_get
i915:
- Fix selftest engine reset count storage for multi-tile
- Fix out-of-bounds reads for engine reset counts
- Fix ADL+ remapped stride with CCS
- Fix intel_atomic_setup_scalers() plane_state handling
- Fix ADL+ tiled plane stride when the POT stride is smaller than the original
- Fix eDP 1.4 rate select method link configuration
amdgpu:
- Fix suspend fix that got accidently mangled last week
- Fix OD regression
- PSR fixes
- OLED Backlight regression fix
- JPEG 4.0.5 fix
- Misc display fixes
- SDMA 5.2 fix
- SDMA 2.4 regression fix
- GPUVM race fix
nouveau:
- fix gk20a instobj hierarchy
- fix headless iors inheritance regression
ivpu:
- fix WA initialisation"
* tag 'drm-fixes-2023-12-15' of git://anongit.freedesktop.org/drm/drm: (31 commits)
drm/nouveau/kms/nv50-: Don't allow inheritance of headless iors
drm/nouveau: Fixup gk20a instobj hierarchy
drm/amdgpu: warn when there are still mappings when a BO is destroyed v2
drm/amdgpu: fix tear down order in amdgpu_vm_pt_free
drm/amd: Fix a probing order problem on SDMA 2.4
drm/amdgpu/sdma5.2: add begin/end_use ring callbacks
drm/panel: ltk050h3146w: Set burst mode for ltk050h3148w
dt-bindings: panel-simple-dsi: move LG 5" HD TFT LCD panel into DSI yaml
drm/amd/display: Disable PSR-SU on Parade 0803 TCON again
drm/amd/display: Populate dtbclk from bounding box
drm/amd/display: Revert "Fix conversions between bytes and KB"
drm/amdgpu/jpeg: configure doorbell for each playback
drm/amd/display: Restore guard against default backlight value < 1 nit
drm/amd/display: fix hw rotated modes when PSR-SU is enabled
drm/amd/pm: fix pp_*clk_od typo
drm/amdgpu: fix buffer funcs setting order on suspend harder
drm/mediatek: Fix access violation in mtk_drm_crtc_dma_dev_get
drm/edid: also call add modes in EDID connector update fallback
drm/i915/edp: don't write to DP_LINK_BW_SET when using rate select
drm/i915: Fix ADL+ tiled plane stride when the POT stride is smaller than the original
...
|
|
Both imx23.dtsi and imx28.dtsi describe the OCOTP nodes in
the format:
compatible = "fsl,imx28-ocotp", "fsl,ocotp";
Document the "fsl,ocotp" entry to fix the following schema
warning:
efuse@8002c000: compatible: ['fsl,imx23-ocotp', 'fsl,ocotp'] is too long
from schema $id: http://devicetree.org/schemas/nvmem/mxs-ocotp.yaml#
Fixes: 2c504460f502 ("dt-bindings: nvmem: Convert MXS OCOTP to json-schema")
Cc: <Stable@vger.kernel.org>
Signed-off-by: Fabio Estevam <festevam@denx.de>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Link: https://lore.kernel.org/r/20231215111358.316727-2-srinivas.kandagatla@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
drm-misc-fixes for v6.7-rc6:
- Fix regression for checking if FD is master capable.
- Fix uninitialized variables in drm/crtc.
- Fix ivpu w/a.
- Refresh modes correctly when updating EDID.
- Small panel fixes.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/2d46b68f-c5a4-45e5-beb4-411569f4aac8@linux.intel.com
|
|
Alibaba's T-Head Yitan 710 SoC includes Synopsys' DesignWare Core PCIe
controller which implements PMU for performance and functional debugging to
facilitate system maintenance.
Document it to provide guidance on how to use it.
Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Yicong Yang <yangyicong@hisilicon.com>
Tested-by: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Link: https://lore.kernel.org/r/20231208025652.87192-2-xueshuai@linux.alibaba.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Originally was in the panel-simple, but belongs to panel-simple-dsi.
See arch/arm/boot/dts/nvidia/tegra114-roth.dts for more details.
Resolves the following warning:
```
arch/arm/boot/dts/tegra114-roth.dt.yaml: panel@0: 'reg' does not match any of the regexes: 'pinctrl-[0-9]+'
From schema: Documentation/devicetree/bindings/display/panel/panel-simple.yaml
```
Fixes: 310abcea76e9 ("dt-bindings: display: convert simple lg panels to DT Schema")
Signed-off-by: David Heidelberg <david@ixit.cz>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Acked-by: Jessica Zhang <quic_jesszhan@quicinc.com>
Link: https://lore.kernel.org/r/20231212200934.99262-1-david@ixit.cz
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20231212200934.99262-1-david@ixit.cz
|
|
Until now the papr_sysparm APIs have been kernel-internal. But user
space needs access to PAPR system parameters too. The only method
available to user space today to get or set system parameters is using
sys_rtas() and /dev/mem to pass RTAS-addressable buffers between user
space and firmware. This is incompatible with lockdown and should be
deprecated.
So provide an alternative ABI to user space in the form of a
/dev/papr-sysparm character device with just two ioctl commands (get
and set). The data payloads involved are small enough to fit in the
ioctl argument buffer, making the code relatively simple.
Exposing the system parameters through sysfs has been considered but
it would be too awkward:
* The kernel currently does not have to contain an exhaustive list of
defined system parameters. This is a convenient property to maintain
because we don't have to update the kernel whenever a new parameter
is added to PAPR. Exporting a named attribute in sysfs for each
parameter would negate this.
* Some system parameters are text-based and some are not.
* Retrieval of at least one system parameter requires input data,
which a simple read-oriented interface can't support.
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20231212-papr-sys_rtas-vs-lockdown-v6-11-e9eafd0c8c6c@linux.ibm.com
|
|
PowerVM LPARs may retrieve Vital Product Data (VPD) for system
components using the ibm,get-vpd RTAS function.
We can expose this to user space with a /dev/papr-vpd character
device, where the programming model is:
struct papr_location_code plc = { .str = "", }; /* obtain all VPD */
int devfd = open("/dev/papr-vpd", O_RDONLY);
int vpdfd = ioctl(devfd, PAPR_VPD_CREATE_HANDLE, &plc);
size_t size = lseek(vpdfd, 0, SEEK_END);
char *buf = malloc(size);
pread(devfd, buf, size, 0);
When a file descriptor is obtained from ioctl(PAPR_VPD_CREATE_HANDLE),
the file contains the result of a complete ibm,get-vpd sequence. The
file contents are immutable from the POV of user space. To get a new
view of the VPD, the client must create a new handle.
This design choice insulates user space from most of the complexities
that ibm,get-vpd brings:
* ibm,get-vpd must be called more than once to obtain complete
results.
* Only one ibm,get-vpd call sequence should be in progress at a time;
interleaved sequences will disrupt each other. Callers must have a
protocol for serializing their use of the function.
* A call sequence in progress may receive a "VPD changed, try again"
status, requiring the client to abandon the sequence and start
over.
The memory required for the VPD buffers seems acceptable, around 20KB
for all VPD on one of my systems. And the value of the
/rtas/ibm,vpd-size DT property (the estimated maximum size of VPD) is
consistently 300KB across various systems I've checked.
I've implemented support for this new ABI in the rtas_get_vpd()
function in librtas, which the vpdupdate command currently uses to
populate its VPD database. I've verified that an unmodified vpdupdate
binary generates an identical database when using a librtas.so that
prefers the new ABI.
Along with the papr-vpd.h header exposed to user space, this
introduces a common papr-miscdev.h uapi header to share a base ioctl
ID with similar drivers to come.
Tested-by: Michal Suchánek <msuchanek@suse.de>
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20231212-papr-sys_rtas-vs-lockdown-v6-9-e9eafd0c8c6c@linux.ibm.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
Pull fuse fixes from Miklos Szeredi:
- Fix a couple of potential crashes, one introduced in 6.6 and one
in 5.10
- Fix misbehavior of virtiofs submounts on memory pressure
- Clarify naming in the uAPI for a recent feature
* tag 'fuse-fixes-6.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
fuse: disable FOPEN_PARALLEL_DIRECT_WRITES with FUSE_DIRECT_IO_ALLOW_MMAP
fuse: dax: set fc->dax to NULL in fuse_dax_conn_free()
fuse: share lookup state between submount and its parent
docs/fuse-io: Document the usage of DIRECT_IO_ALLOW_MMAP
fuse: Rename DIRECT_IO_RELAX to DIRECT_IO_ALLOW_MMAP
|
|
Update DAMON sysfs usage for newly added DAMOS quota goals interface.
Link: https://lkml.kernel.org/r/20231130023652.50284-10-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: David Gow <davidgow@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Update DAMON ABI document for the newly added DAMON sysfs files and inputs
for DAMOS quota goals.
Link: https://lkml.kernel.org/r/20231130023652.50284-9-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: David Gow <davidgow@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Document the DAMOS quota auto tuning feature on the design document.
Link: https://lkml.kernel.org/r/20231130023652.50284-8-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: David Gow <davidgow@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Currently, we only shrink the zswap pool when the user-defined limit is
hit. This means that if we set the limit too high, cold data that are
unlikely to be used again will reside in the pool, wasting precious
memory. It is hard to predict how much zswap space will be needed ahead
of time, as this depends on the workload (specifically, on factors such as
memory access patterns and compressibility of the memory pages).
This patch implements a memcg- and NUMA-aware shrinker for zswap, that is
initiated when there is memory pressure. The shrinker does not have any
parameter that must be tuned by the user, and can be opted in or out on a
per-memcg basis.
Furthermore, to make it more robust for many workloads and prevent
overshrinking (i.e evicting warm pages that might be refaulted into
memory), we build in the following heuristics:
* Estimate the number of warm pages residing in zswap, and attempt to
protect this region of the zswap LRU.
* Scale the number of freeable objects by an estimate of the memory
saving factor. The better zswap compresses the data, the fewer pages
we will evict to swap (as we will otherwise incur IO for relatively
small memory saving).
* During reclaim, if the shrinker encounters a page that is also being
brought into memory, the shrinker will cautiously terminate its
shrinking action, as this is a sign that it is touching the warmer
region of the zswap LRU.
As a proof of concept, we ran the following synthetic benchmark: build the
linux kernel in a memory-limited cgroup, and allocate some cold data in
tmpfs to see if the shrinker could write them out and improved the overall
performance. Depending on the amount of cold data generated, we observe
from 14% to 35% reduction in kernel CPU time used in the kernel builds.
[nphamcs@gmail.com: check shrinker enablement early, use less costly stat flushing]
Link: https://lkml.kernel.org/r/20231206194456.3234203-1-nphamcs@gmail.com
Link: https://lkml.kernel.org/r/20231130194023.4102148-7-nphamcs@gmail.com
Signed-off-by: Nhat Pham <nphamcs@gmail.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Tested-by: Bagas Sanjaya <bagasdotme@gmail.com>
Cc: Chris Li <chrisl@kernel.org>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vitaly Wool <vitaly.wool@konsulko.com>
Cc: Yosry Ahmed <yosryahmed@google.com>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Due to the wild nature of the Qualcomm RPM Message RAM, the kernel can't
really use 'reg' to point to the MPM's slice of Message RAM without cutting
into an already-defined RPM MSG RAM node used for GLINK and SMEM.
Document passing the register space as a slice of SRAM through the
qcom,rpm-msg-ram property. This also makes 'reg' deprecated.
Signed-off-by: Konrad Dybcio <konrad.dybcio@linaro.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20230328-topic-msgram_mpm-v7-1-6ee2bfeaac2c@linaro.org
|
|
Document the RZ/G3S (R9108G045) interrupt controller. This has few extra
functionalities compared with RZ/G2UL but the already existing driver
can still be used.
Signed-off-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20231120111820.87398-9-claudiu.beznea.uj@bp.renesas.com
|
|
Commit
bf904d2762ee ("x86/pti/64: Remove the SYSCALL64 entry trampoline")
removed the syscall trampoline and instead opted to enable using the
default SYSCALL64 entry point by mapping the percpu TSS. Unfortunately,
the PTI documentation wasn't updated when the respective changes were
made, so bring the doc up to speed.
Signed-off-by: Nikolay Borisov <nik.borisov@suse.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231102130204.41043-1-nik.borisov@suse.com
|