Age | Commit message (Collapse) | Author | Files | Lines |
|
commit 1b5487aefb1ce7a6b1f15a33297d1231306b4122 upstream.
Setting encryption as required in security flags was broken.
For example (to require all mounts to be encrypted by setting):
"echo 0x400c5 > /proc/fs/cifs/SecurityFlags"
Would return "Invalid argument" and log "Unsupported security flags"
This patch fixes that (e.g. allowing overriding the default for
SecurityFlags 0x00c5, including 0x40000 to require seal, ie
SMB3.1.1 encryption) so now that works and forces encryption
on subsequent mounts.
Acked-by: Bharath SM <bharathsm@microsoft.com>
Cc: stable@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit b88f55389ad27f05ed84af9e1026aa64dbfabc9a upstream.
The kernel sleep profile is no longer working due to a recursive locking
bug introduced by commit 42a20f86dc19 ("sched: Add wrapper for get_wchan()
to keep task blocked")
Booting with the 'profile=sleep' kernel command line option added or
executing
# echo -n sleep > /sys/kernel/profiling
after boot causes the system to lock up.
Lockdep reports
kthreadd/3 is trying to acquire lock:
ffff93ac82e08d58 (&p->pi_lock){....}-{2:2}, at: get_wchan+0x32/0x70
but task is already holding lock:
ffff93ac82e08d58 (&p->pi_lock){....}-{2:2}, at: try_to_wake_up+0x53/0x370
with the call trace being
lock_acquire+0xc8/0x2f0
get_wchan+0x32/0x70
__update_stats_enqueue_sleeper+0x151/0x430
enqueue_entity+0x4b0/0x520
enqueue_task_fair+0x92/0x6b0
ttwu_do_activate+0x73/0x140
try_to_wake_up+0x213/0x370
swake_up_locked+0x20/0x50
complete+0x2f/0x40
kthread+0xfb/0x180
However, since nobody noticed this regression for more than two years,
let's remove 'profile=sleep' support based on the assumption that nobody
needs this functionality.
Fixes: 42a20f86dc19 ("sched: Add wrapper for get_wchan() to keep task blocked")
Cc: stable@vger.kernel.org # v5.16+
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 00f58104202c472e487f0866fbd38832523fd4f9 ]
Since the introduction of mTHP, the docuementation has stated that
khugepaged would be enabled when any mTHP size is enabled, and disabled
when all mTHP sizes are disabled. There are 2 problems with this; 1.
this is not what was implemented by the code and 2. this is not the
desirable behavior.
Desirable behavior is for khugepaged to be enabled when any PMD-sized THP
is enabled, anon or file. (Note that file THP is still controlled by the
top-level control so we must always consider that, as well as the PMD-size
mTHP control for anon). khugepaged only supports collapsing to PMD-sized
THP so there is no value in enabling it when PMD-sized THP is disabled.
So let's change the code and documentation to reflect this policy.
Further, per-size enabled control modification events were not previously
forwarded to khugepaged to give it an opportunity to start or stop.
Consequently the following was resulting in khugepaged eroneously not
being activated:
echo never > /sys/kernel/mm/transparent_hugepage/enabled
echo always > /sys/kernel/mm/transparent_hugepage/hugepages-2048kB/enabled
[ryan.roberts@arm.com: v3]
Link: https://lkml.kernel.org/r/20240705102849.2479686-1-ryan.roberts@arm.com
Link: https://lkml.kernel.org/r/20240705102849.2479686-1-ryan.roberts@arm.com
Link: https://lkml.kernel.org/r/20240704091051.2411934-1-ryan.roberts@arm.com
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Fixes: 3485b88390b0 ("mm: thp: introduce multi-size THP sysfs interface")
Closes: https://lore.kernel.org/linux-mm/7a0bbe69-1e3d-4263-b206-da007791a5c4@redhat.com/
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Lance Yang <ioworker0@gmail.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit c793a62823d1ce8f70d9cfc7803e3ea436277cda ]
Use preempt_model_preemptible() to detect a preemptible kernel when
deciding whether or not to reschedule in order to drop a contended
spinlock or rwlock. Because PREEMPT_DYNAMIC selects PREEMPTION, kernels
built with PREEMPT_DYNAMIC=y will yield contended locks even if the live
preemption model is "none" or "voluntary". In short, make kernels with
dynamically selected models behave the same as kernels with statically
selected models.
Somewhat counter-intuitively, NOT yielding a lock can provide better
latency for the relevant tasks/processes. E.g. KVM x86's mmu_lock, a
rwlock, is often contended between an invalidation event (takes mmu_lock
for write) and a vCPU servicing a guest page fault (takes mmu_lock for
read). For _some_ setups, letting the invalidation task complete even
if there is mmu_lock contention provides lower latency for *all* tasks,
i.e. the invalidation completes sooner *and* the vCPU services the guest
page fault sooner.
But even KVM's mmu_lock behavior isn't uniform, e.g. the "best" behavior
can vary depending on the host VMM, the guest workload, the number of
vCPUs, the number of pCPUs in the host, why there is lock contention, etc.
In other words, simply deleting the CONFIG_PREEMPTION guard (or doing the
opposite and removing contention yielding entirely) needs to come with a
big pile of data proving that changing the status quo is a net positive.
Opportunistically document this side effect of preempt=full, as yielding
contended spinlocks can have significant, user-visible impact.
Fixes: c597bfddc9e9 ("sched: Provide Kconfig support for default dynamic preempt mode")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Ankur Arora <ankur.a.arora@oracle.com>
Reviewed-by: Chen Yu <yu.c.chen@intel.com>
Link: https://lore.kernel.org/kvm/ef81ff36-64bb-4cfe-ae9b-e3acf47bff24@proxmox.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
If you try to set /proc/fs/cifs/SecurityFlags to 1 it
will set them to CIFSSEC_MUST_NTLMV2 which no longer is
relevant (the less secure ones like lanman have been removed
from cifs.ko) and is also missing some flags (like for
signing and encryption) and can even cause mount to fail,
so change this to set it to Kerberos in this case.
Also change the description of the SecurityFlags to remove mention
of flags which are no longer supported.
Cc: stable@vger.kernel.org
Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty / serial / console fixes from Greg KH:
"Here are a bunch of fixes/reverts for 6.10-rc6. Include in here are:
- revert the bunch of tty/serial/console changes that landed in -rc1
that didn't quite work properly yet.
Everyone agreed to just revert them for now and will work on making
them better for a future release instead of trying to quick fix the
existing changes this late in the release cycle
- 8250 driver port count bugfix
- Other tiny serial port bugfixes for reported issues
All of these have been in linux-next this week with no reported
issues"
* tag 'tty-6.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
Revert "printk: Save console options for add_preferred_console_match()"
Revert "printk: Don't try to parse DEVNAME:0.0 console options"
Revert "printk: Flag register_console() if console is set on command line"
Revert "serial: core: Add support for DEVNAME:0.0 style naming for kernel console"
Revert "serial: core: Handle serial console options"
Revert "serial: 8250: Add preferred console in serial8250_isa_init_ports()"
Revert "Documentation: kernel-parameters: Add DEVNAME:0.0 format for serial ports"
Revert "serial: 8250: Fix add preferred console for serial8250_isa_init_ports()"
Revert "serial: core: Fix ifdef for serial base console functions"
serial: bcm63xx-uart: fix tx after conversion to uart_port_tx_limited()
serial: core: introduce uart_port_tx_limited_flags()
Revert "serial: core: only stop transmit when HW fifo is empty"
serial: imx: set receiver level before starting uart
tty: mcf: MCF54418 has 10 UARTS
serial: 8250_omap: Implementation of Errata i2310
tty: serial: 8250: Fix port count mismatch with the device
|
|
ports"
This reverts commit 5c3a766e9f057ee7a54b5d7addff7fab02676fea.
Let's roll back all of the serial core and printk console changes that
went into 6.10-rc1 as there still are problems with them that need to be
sorted out.
Link: https://lore.kernel.org/r/ZnpRozsdw6zbjqze@tlindgre-MOBL1
Reported-by: Petr Mladek <pmladek@suse.com>
Reported-by: Tony Lindgren <tony@atomide.com>
Cc: Jiri Slaby <jirislaby@kernel.org>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
There was insufficient review and no agreement that this is the right
approach.
There are serious flaws with the implementation that make processes using
mlock() not even work with simple fork() [1] and we get reliable crashes
when rebooting.
Further, simply because we might be unmapping a single PTE of a large
mlocked folio, we shouldn't zero out the whole folio.
... especially because the code can also *corrupt* urelated memory because
kernel_init_pages(page, folio_nr_pages(folio));
Could end up writing outside of the actual folio if we work with a tail
page.
Let's revert it. Once there is agreement that this is the right approach,
the issues were fixed and there was reasonable review and proper testing,
we can consider it again.
[1] https://lkml.kernel.org/r/4da9da2f-73e4-45fd-b62f-a8a513314057@redhat.com
Link: https://lkml.kernel.org/r/20240605091710.38961-1-david@redhat.com
Fixes: ba42b524a040 ("mm: init_mlocked_on_free_v3")
Signed-off-by: David Hildenbrand <david@redhat.com>
Reported-by: David Wang <00107082@163.com>
Closes: https://lore.kernel.org/lkml/20240528151340.4282-1-00107082@163.com/
Reported-by: Lance Yang <ioworker0@gmail.com>
Closes: https://lkml.kernel.org/r/20240601140917.43562-1-ioworker0@gmail.com
Acked-by: Lance Yang <ioworker0@gmail.com>
Cc: York Jasper Niebuhr <yjnworkstation@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"14 hotfixes, 6 of which are cc:stable.
All except the nilfs2 fix affect MM and all are singletons - see the
chagelogs for details"
* tag 'mm-hotfixes-stable-2024-06-07-15-24' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
nilfs2: fix nilfs_empty_dir() misjudgment and long loop on I/O errors
mm: fix xyz_noprof functions calling profiled functions
codetag: avoid race at alloc_slab_obj_exts
mm/hugetlb: do not call vma_add_reservation upon ENOMEM
mm/ksm: fix ksm_zero_pages accounting
mm/ksm: fix ksm_pages_scanned accounting
kmsan: do not wipe out origin when doing partial unpoisoning
vmalloc: check CONFIG_EXECMEM in is_vmalloc_or_module_addr()
mm: page_alloc: fix highatomic typing in multi-block buddies
nilfs2: fix potential kernel bug due to lack of writeback flag waiting
memcg: remove the lockdep assert from __mod_objcg_mlstate()
mm: arm64: fix the out-of-bounds issue in contpte_clear_young_dirty_ptes
mm: huge_mm: fix undefined reference to `mthp_stats' for CONFIG_SYSFS=n
mm: drop the 'anon_' prefix for swap-out mTHP counters
|
|
The mTHP swap related counters: 'anon_swpout' and 'anon_swpout_fallback'
are confusing with an 'anon_' prefix, since the shmem can swap out
non-anonymous pages. So drop the 'anon_' prefix to keep consistent with
the old swap counter names.
This is needed in 6.10-rcX to avoid having an inconsistent ABI out in the
field.
Link: https://lkml.kernel.org/r/7a8989c13299920d7589007a30065c3e2c19f0e0.1716431702.git.baolin.wang@linux.alibaba.com
Fixes: d0f048ac39f6 ("mm: add per-order mTHP anon_swpout and anon_swpout_fallback counters")
Fixes: 42248b9d34ea ("mm: add docs for per-order mTHP counters and transhuge_page ABI")
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Suggested-by: "Huang, Ying" <ying.huang@intel.com>
Acked-by: Barry Song <baohua@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Lance Yang <ioworker0@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
TOMOYO project has moved to SourceForge.net .
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
|
|
properties from cmdline
On x86/ACPI platforms touchscreens mostly just work without needing any
device/model specific configuration. But in some cases (mostly with Silead
and Goodix touchscreens) it is still necessary to manually specify various
touchscreen-properties on a per model basis.
touchscreen_dmi is a special place for DMI quirks for this, but it can be
challenging for users to figure out the right property values, especially
for Silead touchscreens where non of these can be read back from
the touchscreen-controller.
ATM users can only test touchscreen properties by editing touchscreen_dmi.c
and then building a completely new kernel which makes it unnecessary
difficult for users to test and submit properties when necessary for their
laptop / tablet model.
Add support for specifying properties on the kernel commandline to allow
users to easily figure out the right settings. See the added documentation
in kernel-parameters.txt for the commandline syntax.
Cc: Gregor Riepl <onitake@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20240523143601.47555-1-hdegoede@redhat.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty / serial updates from Greg KH:
"Here is the big set of tty/serial driver changes for 6.10-rc1.
Included in here are:
- Usual good set of api cleanups and evolution by Jiri Slaby to make
the serial interfaces move out of the 1990's by using kfifos
instead of hand-rolling their own logic.
- 8250_exar driver updates
- max3100 driver updates
- sc16is7xx driver updates
- exar driver updates
- sh-sci driver updates
- tty ldisc api addition to help refuse bindings
- other smaller serial driver updates
All of these have been in linux-next for a while with no reported
issues"
* tag 'tty-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (113 commits)
serial: Clear UPF_DEAD before calling tty_port_register_device_attr_serdev()
serial: imx: Raise TX trigger level to 8
serial: 8250_pnp: Simplify "line" related code
serial: sh-sci: simplify locking when re-issuing RXDMA fails
serial: sh-sci: let timeout timer only run when DMA is scheduled
serial: sh-sci: describe locking requirements for invalidating RXDMA
serial: sh-sci: protect invalidating RXDMA on shutdown
tty: add the option to have a tty reject a new ldisc
serial: core: Call device_set_awake_path() for console port
dt-bindings: serial: brcm,bcm2835-aux-uart: convert to dtschema
tty: serial: uartps: Add support for uartps controller reset
arm64: zynqmp: Add resets property for UART nodes
dt-bindings: serial: cdns,uart: Add optional reset property
serial: 8250_pnp: Switch to DEFINE_SIMPLE_DEV_PM_OPS()
serial: 8250_exar: Keep the includes sorted
serial: 8250_exar: Make type of bit the same in exar_ee_*_bit()
serial: 8250_exar: Use BIT() in exar_ee_read()
serial: 8250_exar: Switch to use dev_err_probe()
serial: 8250_exar: Return directly from switch-cases
serial: 8250_exar: Decrease indentation level
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull non-mm updates from Andrew Morton:
"Mainly singleton patches, documented in their respective changelogs.
Notable series include:
- Some maintenance and performance work for ocfs2 in Heming Zhao's
series "improve write IO performance when fragmentation is high".
- Some ocfs2 bugfixes from Su Yue in the series "ocfs2 bugs fixes
exposed by fstests".
- kfifo header rework from Andy Shevchenko in the series "kfifo:
Clean up kfifo.h".
- GDB script fixes from Florian Rommel in the series "scripts/gdb:
Fixes for $lx_current and $lx_per_cpu".
- After much discussion, a coding-style update from Barry Song
explaining one reason why inline functions are preferred over
macros. The series is "codingstyle: avoid unused parameters for a
function-like macro""
* tag 'mm-nonmm-stable-2024-05-19-11-56' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (62 commits)
fs/proc: fix softlockup in __read_vmcore
nilfs2: convert BUG_ON() in nilfs_finish_roll_forward() to WARN_ON()
scripts: checkpatch: check unused parameters for function-like macro
Documentation: coding-style: ask function-like macros to evaluate parameters
nilfs2: use __field_struct() for a bitwise field
selftests/kcmp: remove unused open mode
nilfs2: remove calls to folio_set_error() and folio_clear_error()
kernel/watchdog_perf.c: tidy up kerneldoc
watchdog: allow nmi watchdog to use raw perf event
watchdog: handle comma separated nmi_watchdog command line
nilfs2: make superblock data array index computation sparse friendly
squashfs: remove calls to set the folio error flag
squashfs: convert squashfs_symlink_read_folio to use folio APIs
scripts/gdb: fix detection of current CPU in KGDB
scripts/gdb: make get_thread_info accept pointers
scripts/gdb: fix parameter handling in $lx_per_cpu
scripts/gdb: fix failing KGDB detection during probe
kfifo: don't use "proxy" headers
media: stih-cec: add missing io.h
media: rc: add missing io.h
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
- Fix a sched_balance_newidle setting bug
- Fix bug in the setting of /sys/fs/cgroup/test/cpu.max.burst
- Fix variable-shadowing build warning
- Extend sched-domains debug output
- Fix documentation
- Fix comments
* tag 'sched-urgent-2024-05-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/core: Fix incorrect initialization of the 'burst' parameter in cpu_max_write()
sched/fair: Remove stale FREQUENCY_UTIL comment
sched/fair: Fix initial util_avg calculation
docs: cgroup-v1: Clarify that domain levels are system-specific
sched/debug: Dump domains' level
sched/fair: Allow disabling sched_balance_newidle with sched_relax_domain_level
arch/topology: Fix variable naming to avoid shadowing
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull mm updates from Andrew Morton:
"The usual shower of singleton fixes and minor series all over MM,
documented (hopefully adequately) in the respective changelogs.
Notable series include:
- Lucas Stach has provided some page-mapping cleanup/consolidation/
maintainability work in the series "mm/treewide: Remove pXd_huge()
API".
- In the series "Allow migrate on protnone reference with
MPOL_PREFERRED_MANY policy", Donet Tom has optimized mempolicy's
MPOL_PREFERRED_MANY mode, yielding almost doubled performance in
one test.
- In their series "Memory allocation profiling" Kent Overstreet and
Suren Baghdasaryan have contributed a means of determining (via
/proc/allocinfo) whereabouts in the kernel memory is being
allocated: number of calls and amount of memory.
- Matthew Wilcox has provided the series "Various significant MM
patches" which does a number of rather unrelated things, but in
largely similar code sites.
- In his series "mm: page_alloc: freelist migratetype hygiene"
Johannes Weiner has fixed the page allocator's handling of
migratetype requests, with resulting improvements in compaction
efficiency.
- In the series "make the hugetlb migration strategy consistent"
Baolin Wang has fixed a hugetlb migration issue, which should
improve hugetlb allocation reliability.
- Liu Shixin has hit an I/O meltdown caused by readahead in a
memory-tight memcg. Addressed in the series "Fix I/O high when
memory almost met memcg limit".
- In the series "mm/filemap: optimize folio adding and splitting"
Kairui Song has optimized pagecache insertion, yielding ~10%
performance improvement in one test.
- Baoquan He has cleaned up and consolidated the early zone
initialization code in the series "mm/mm_init.c: refactor
free_area_init_core()".
- Baoquan has also redone some MM initializatio code in the series
"mm/init: minor clean up and improvement".
- MM helper cleanups from Christoph Hellwig in his series "remove
follow_pfn".
- More cleanups from Matthew Wilcox in the series "Various
page->flags cleanups".
- Vlastimil Babka has contributed maintainability improvements in the
series "memcg_kmem hooks refactoring".
- More folio conversions and cleanups in Matthew Wilcox's series:
"Convert huge_zero_page to huge_zero_folio"
"khugepaged folio conversions"
"Remove page_idle and page_young wrappers"
"Use folio APIs in procfs"
"Clean up __folio_put()"
"Some cleanups for memory-failure"
"Remove page_mapping()"
"More folio compat code removal"
- David Hildenbrand chipped in with "fs/proc/task_mmu: convert
hugetlb functions to work on folis".
- Code consolidation and cleanup work related to GUP's handling of
hugetlbs in Peter Xu's series "mm/gup: Unify hugetlb, part 2".
- Rick Edgecombe has developed some fixes to stack guard gaps in the
series "Cover a guard gap corner case".
- Jinjiang Tu has fixed KSM's behaviour after a fork+exec in the
series "mm/ksm: fix ksm exec support for prctl".
- Baolin Wang has implemented NUMA balancing for multi-size THPs.
This is a simple first-cut implementation for now. The series is
"support multi-size THP numa balancing".
- Cleanups to vma handling helper functions from Matthew Wilcox in
the series "Unify vma_address and vma_pgoff_address".
- Some selftests maintenance work from Dev Jain in the series
"selftests/mm: mremap_test: Optimizations and style fixes".
- Improvements to the swapping of multi-size THPs from Ryan Roberts
in the series "Swap-out mTHP without splitting".
- Kefeng Wang has significantly optimized the handling of arm64's
permission page faults in the series
"arch/mm/fault: accelerate pagefault when badaccess"
"mm: remove arch's private VM_FAULT_BADMAP/BADACCESS"
- GUP cleanups from David Hildenbrand in "mm/gup: consistently call
it GUP-fast".
- hugetlb fault code cleanups from Vishal Moola in "Hugetlb fault
path to use struct vm_fault".
- selftests build fixes from John Hubbard in the series "Fix
selftests/mm build without requiring "make headers"".
- Memory tiering fixes/improvements from Ho-Ren (Jack) Chuang in the
series "Improved Memory Tier Creation for CPUless NUMA Nodes".
Fixes the initialization code so that migration between different
memory types works as intended.
- David Hildenbrand has improved follow_pte() and fixed an errant
driver in the series "mm: follow_pte() improvements and acrn
follow_pte() fixes".
- David also did some cleanup work on large folio mapcounts in his
series "mm: mapcount for large folios + page_mapcount() cleanups".
- Folio conversions in KSM in Alex Shi's series "transfer page to
folio in KSM".
- Barry Song has added some sysfs stats for monitoring multi-size
THP's in the series "mm: add per-order mTHP alloc and swpout
counters".
- Some zswap cleanups from Yosry Ahmed in the series "zswap
same-filled and limit checking cleanups".
- Matthew Wilcox has been looking at buffer_head code and found the
documentation to be lacking. The series is "Improve buffer head
documentation".
- Multi-size THPs get more work, this time from Lance Yang. His
series "mm/madvise: enhance lazyfreeing with mTHP in madvise_free"
optimizes the freeing of these things.
- Kemeng Shi has added more userspace-visible writeback
instrumentation in the series "Improve visibility of writeback".
- Kemeng Shi then sent some maintenance work on top in the series
"Fix and cleanups to page-writeback".
- Matthew Wilcox reduces mmap_lock traffic in the anon vma code in
the series "Improve anon_vma scalability for anon VMAs". Intel's
test bot reported an improbable 3x improvement in one test.
- SeongJae Park adds some DAMON feature work in the series
"mm/damon: add a DAMOS filter type for page granularity access recheck"
"selftests/damon: add DAMOS quota goal test"
- Also some maintenance work in the series
"mm/damon/paddr: simplify page level access re-check for pageout"
"mm/damon: misc fixes and improvements"
- David Hildenbrand has disabled some known-to-fail selftests ni the
series "selftests: mm: cow: flag vmsplice() hugetlb tests as
XFAIL".
- memcg metadata storage optimizations from Shakeel Butt in "memcg:
reduce memory consumption by memcg stats".
- DAX fixes and maintenance work from Vishal Verma in the series
"dax/bus.c: Fixups for dax-bus locking""
* tag 'mm-stable-2024-05-17-19-19' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (426 commits)
memcg, oom: cleanup unused memcg_oom_gfp_mask and memcg_oom_order
selftests/mm: hugetlb_madv_vs_map: avoid test skipping by querying hugepage size at runtime
mm/hugetlb: add missing VM_FAULT_SET_HINDEX in hugetlb_wp
mm/hugetlb: add missing VM_FAULT_SET_HINDEX in hugetlb_fault
selftests: cgroup: add tests to verify the zswap writeback path
mm: memcg: make alloc_mem_cgroup_per_node_info() return bool
mm/damon/core: fix return value from damos_wmark_metric_value
mm: do not update memcg stats for NR_{FILE/SHMEM}_PMDMAPPED
selftests: cgroup: remove redundant enabling of memory controller
Docs/mm/damon/maintainer-profile: allow posting patches based on damon/next tree
Docs/mm/damon/maintainer-profile: change the maintainer's timezone from PST to PT
Docs/mm/damon/design: use a list for supported filters
Docs/admin-guide/mm/damon/usage: fix wrong schemes effective quota update command
Docs/admin-guide/mm/damon/usage: fix wrong example of DAMOS filter matching sysfs file
selftests/damon: classify tests for functionalities and regressions
selftests/damon/_damon_sysfs: use 'is' instead of '==' for 'None'
selftests/damon/_damon_sysfs: find sysfs mount point from /proc/mounts
selftests/damon/_damon_sysfs: check errors from nr_schemes file reads
mm/damon/core: initialize ->esz_bp from damos_quota_init_priv()
selftests/damon: add a test for DAMOS quota goal
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull iommu updates from Joerg Roedel:
"Core:
- IOMMU memory usage observability - This will make the memory used
for IO page tables explicitly visible.
- Simplify arch_setup_dma_ops()
Intel VT-d:
- Consolidate domain cache invalidation
- Remove private data from page fault message
- Allocate DMAR fault interrupts locally
- Cleanup and refactoring
ARM-SMMUv2:
- Support for fault debugging hardware on Qualcomm implementations
- Re-land support for the ->domain_alloc_paging() callback
ARM-SMMUv3:
- Improve handling of MSI allocation failure
- Drop support for the "disable_bypass" cmdline option
- Major rework of the CD creation code, following on directly from
the STE rework merged last time around.
- Add unit tests for the new STE/CD manipulation logic
AMD-Vi:
- Final part of SVA changes with generic IO page fault handling
Renesas IPMMU:
- Add support for R8A779H0 hardware
... and a couple smaller fixes and updates across the sub-tree"
* tag 'iommu-updates-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (80 commits)
iommu/arm-smmu-v3: Make the kunit into a module
arm64: Properly clean up iommu-dma remnants
iommu/amd: Enable Guest Translation after reading IOMMU feature register
iommu/vt-d: Decouple igfx_off from graphic identity mapping
iommu/amd: Fix compilation error
iommu/arm-smmu-v3: Add unit tests for arm_smmu_write_entry
iommu/arm-smmu-v3: Build the whole CD in arm_smmu_make_s1_cd()
iommu/arm-smmu-v3: Move the CD generation for SVA into a function
iommu/arm-smmu-v3: Allocate the CD table entry in advance
iommu/arm-smmu-v3: Make arm_smmu_alloc_cd_ptr()
iommu/arm-smmu-v3: Consolidate clearing a CD table entry
iommu/arm-smmu-v3: Move the CD generation for S1 domains into a function
iommu/arm-smmu-v3: Make CD programming use arm_smmu_write_entry()
iommu/arm-smmu-v3: Add an ops indirection to the STE code
iommu/arm-smmu-qcom: Don't build debug features as a kernel module
iommu/amd: Add SVA domain support
iommu: Add ops->domain_alloc_sva()
iommu/amd: Initial SVA support for AMD IOMMU
iommu/amd: Add support for enable/disable IOPF
iommu/amd: Add IO page fault notifier handler
...
|
|
Add a clarification that domain levels are system-specific
and where to check for system details.
Signed-off-by: Vitalii Bursov <vitaly@bursov.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Valentin Schneider <vschneid@redhat.com>
Acked-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lore.kernel.org/r/42b177a2e897cdf880caf9c2025f5b609e820334.1714488502.git.vitaly@bursov.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media updates from Mauro Carvalho Chehab:
- New V4L2 ioctl VIDIOC_REMOVE_BUFS
- experimental support for using generic metaformats on V4L2 core
- New drivers: Intel IPU6 controller driver, Broadcom BCM283x/BCM271x
- More cleanups at atomisp driver
- Usual bunch of driver cleanups, improvements and fixes
* tag 'media/v6.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (328 commits)
media: bcm2835-unicam: Depend on COMMON_CLK
Revert "media: v4l2-ctrls: show all owned controls in log_status"
media: ov2740: Ensure proper reset sequence on probe()
media: intel/ipu6: Don't print user-triggerable errors to kernel log
media: bcm2835-unicam: Fix driver path in MAINTAINERS
media: bcm2835-unicam: Fix a NULL vs IS_ERR() check
media: bcm2835-unicam: Do not print error when irq not found
media: bcm2835-unicam: Do not replace IRQ retcode during probe
media: bcm2835-unicam: Convert to platform remove callback returning void
media: media: intel/ipu6: Fix spelling mistake "remappinp" -> "remapping"
media: intel/ipu6: explicitly include vmalloc.h
media: cec.h: Fix kerneldoc
media: uvcvideo: Refactor iterators
media: v4l: async: refactor v4l2_async_create_ancillary_links
media: intel/ipu6: Don't re-allocate memory for firmware
media: dvb-frontends: tda10048: Fix integer overflow
media: tc358746: Use the correct div_ function
media: i2c: st-mipid02: Use the correct div function
media: tegra-vde: Refactor timeout handling
media: stk1160: Use min macro
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo:
- The locking around cpuset hotplug processing has always been a bit of
mess which was worked around by making hotplug processing
asynchronous. The asynchronity isn't great and led to other issues.
We tried to make the behavior synchronous a while ago but that led to
lockdep splats. Waiman took another stab at cleaning up and making it
synchronous. The patch has been in -next for well over a month and
there haven't been any complaints, so fingers crossed.
- Tracepoints added to help understanding rstat lock contentions.
- A bunch of minor changes - doc updates, code cleanups and selftests.
* tag 'cgroup-for-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (24 commits)
cgroup/rstat: add cgroup_rstat_cpu_lock helpers and tracepoints
selftests/cgroup: Drop define _GNU_SOURCE
docs: cgroup-v1: Update page cache removal functions
selftests/cgroup: fix uninitialized variables in test_zswap.c
selftests/cgroup: cpu_hogger init: use {} instead of {NULL}
selftests/cgroup: fix clang warnings: uninitialized fd variable
selftests/cgroup: fix clang build failures for abs() calls
cgroup/cpuset: Remove outdated comment in sched_partition_write()
cgroup/cpuset: Fix incorrect top_cpuset flags
cgroup/cpuset: Avoid clearing CS_SCHED_LOAD_BALANCE twice
cgroup/cpuset: Statically initialize more members of top_cpuset
cgroup: Avoid unnecessary looping in cgroup_no_v1()
cgroup, legacy_freezer: update comment for freezer_css_offline()
docs, cgroup: add entries for pids to cgroup-v2.rst
cgroup: don't call cgroup1_pidlist_destroy_all() for v2
cgroup_freezer: update comment for freezer_css_online()
cgroup/rstat: desc member cgrp in cgroup_rstat_flush_release
cgroup/rstat: add cgroup_rstat_lock helpers and tracepoints
cgroup/pids: Remove superfluous zeroing
docs: cgroup-v1: Fix description for css_online
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
"Core & protocols:
- Complete rework of garbage collection of AF_UNIX sockets.
AF_UNIX is prone to forming reference count cycles due to fd
passing functionality. New method based on Tarjan's Strongly
Connected Components algorithm should be both faster and remove a
lot of workarounds we accumulated over the years.
- Add TCP fraglist GRO support, allowing chaining multiple TCP
packets and forwarding them together. Useful for small switches /
routers which lack basic checksum offload in some scenarios (e.g.
PPPoE).
- Support using SMP threads for handling packet backlog i.e. packet
processing from software interfaces and old drivers which don't use
NAPI. This helps move the processing out of the softirq jumble.
- Continue work of converting from rtnl lock to RCU protection.
Don't require rtnl lock when reading: IPv6 routing FIB, IPv6
address labels, netdev threaded NAPI sysfs files, bonding driver's
sysfs files, MPLS devconf, IPv4 FIB rules, netns IDs, tcp metrics,
TC Qdiscs, neighbor entries, ARP entries via ioctl(SIOCGARP), a lot
of the link information available via rtnetlink.
- Small optimizations from Eric to UDP wake up handling, memory
accounting, RPS/RFS implementation, TCP packet sizing etc.
- Allow direct page recycling in the bulk API used by XDP, for +2%
PPS.
- Support peek with an offset on TCP sockets.
- Add MPTCP APIs for querying last time packets were received/sent/acked
and whether MPTCP "upgrade" succeeded on a TCP socket.
- Add intra-node communication shortcut to improve SMC performance.
- Add IPv6 (and IPv{4,6}-over-IPv{4,6}) support to the GTP protocol
driver.
- Add HSR-SAN (RedBOX) mode of operation to the HSR protocol driver.
- Add reset reasons for tracing what caused a TCP reset to be sent.
- Introduce direction attribute for xfrm (IPSec) states. State can be
used either for input or output packet processing.
Things we sprinkled into general kernel code:
- Add bitmap_{read,write}(), bitmap_size(), expose BYTES_TO_BITS().
This required touch-ups and renaming of a few existing users.
- Add Endian-dependent __counted_by_{le,be} annotations.
- Make building selftests "quieter" by printing summaries like
"CC object.o" rather than full commands with all the arguments.
Netfilter:
- Use GFP_KERNEL to clone elements, to deal better with OOM
situations and avoid failures in the .commit step.
BPF:
- Add eBPF JIT for ARCv2 CPUs.
- Support attaching kprobe BPF programs through kprobe_multi link in
a session mode, meaning, a BPF program is attached to both function
entry and return, the entry program can decide if the return
program gets executed and the entry program can share u64 cookie
value with return program. "Session mode" is a common use-case for
tetragon and bpftrace.
- Add the ability to specify and retrieve BPF cookie for raw
tracepoint programs in order to ease migration from classic to raw
tracepoints.
- Add an internal-only BPF per-CPU instruction for resolving per-CPU
memory addresses and implement support in x86, ARM64 and RISC-V
JITs. This allows inlining functions which need to access per-CPU
state.
- Optimize x86 BPF JIT's emit_mov_imm64, and add support for various
atomics in bpf_arena which can be JITed as a single x86
instruction. Support BPF arena on ARM64.
- Add a new bpf_wq API for deferring events and refactor
process-context bpf_timer code to keep common code where possible.
- Harden the BPF verifier's and/or/xor value tracking.
- Introduce crypto kfuncs to let BPF programs call kernel crypto
APIs.
- Support bpf_tail_call_static() helper for BPF programs with GCC 13.
- Add bpf_preempt_{disable,enable}() kfuncs in order to allow a BPF
program to have code sections where preemption is disabled.
Driver API:
- Skip software TC processing completely if all installed rules are
marked as HW-only, instead of checking the HW-only flag rule by
rule.
- Add support for configuring PoE (Power over Ethernet), similar to
the already existing support for PoDL (Power over Data Line)
config.
- Initial bits of a queue control API, for now allowing a single
queue to be reset without disturbing packet flow to other queues.
- Common (ethtool) statistics for hardware timestamping.
Tests and tooling:
- Remove the need to create a config file to run the net forwarding
tests so that a naive "make run_tests" can exercise them.
- Define a method of writing tests which require an external endpoint
to communicate with (to send/receive data towards the test
machine). Add a few such tests.
- Create a shared code library for writing Python tests. Expose the
YAML Netlink library from tools/ to the tests for easy Netlink
access.
- Move netfilter tests under net/, extend them, separate performance
tests from correctness tests, and iron out issues found by running
them "on every commit".
- Refactor BPF selftests to use common network helpers.
- Further work filling in YAML definitions of Netlink messages for:
nftables, team driver, bonding interfaces, vlan interfaces, VF
info, TC u32 mark, TC police action.
- Teach Python YAML Netlink to decode attribute policies.
- Extend the definition of the "indexed array" construct in the specs
to cover arrays of scalars rather than just nests.
- Add hyperlinks between definitions in generated Netlink docs.
Drivers:
- Make sure unsupported flower control flags are rejected by drivers,
and make more drivers report errors directly to the application
rather than dmesg (large number of driver changes from Asbjørn
Sloth Tønnesen).
- Ethernet high-speed NICs:
- Broadcom (bnxt):
- support multiple RSS contexts and steering traffic to them
- support XDP metadata
- make page pool allocations more NUMA aware
- Intel (100G, ice, idpf):
- extract datapath code common among Intel drivers into a library
- use fewer resources in switchdev by sharing queues with the PF
- add PFCP filter support
- add Ethernet filter support
- use a spinlock instead of HW lock in PTP clock ops
- support 5 layer Tx scheduler topology
- nVidia/Mellanox:
- 800G link modes and 100G SerDes speeds
- per-queue IRQ coalescing configuration
- Marvell Octeon:
- support offloading TC packet mark action
- Ethernet NICs consumer, embedded and virtual:
- stop lying about skb->truesize in USB Ethernet drivers, it
messes up TCP memory calculations
- Google cloud vNIC:
- support changing ring size via ethtool
- support ring reset using the queue control API
- VirtIO net:
- expose flow hash from RSS to XDP
- per-queue statistics
- add selftests
- Synopsys (stmmac):
- support controllers which require an RX clock signal from the
MII bus to perform their hardware initialization
- TI:
- icssg_prueth: support ICSSG-based Ethernet on AM65x SR1.0 devices
- icssg_prueth: add SW TX / RX Coalescing based on hrtimers
- cpsw: minimal XDP support
- Renesas (ravb):
- support describing the MDIO bus
- Realtek (r8169):
- add support for RTL8168M
- Microchip Sparx5:
- matchall and flower actions mirred and redirect
- Ethernet switches:
- nVidia/Mellanox:
- improve events processing performance
- Marvell:
- add support for MV88E6250 family internal PHYs
- Microchip:
- add DCB and DSCP mapping support for KSZ switches
- vsc73xx: convert to PHYLINK
- Realtek:
- rtl8226b/rtl8221b: add C45 instances and SerDes switching
- Many driver changes related to PHYLIB and PHYLINK deprecated API
cleanup
- Ethernet PHYs:
- Add a new driver for Airoha EN8811H 2.5 Gigabit PHY.
- micrel: lan8814: add support for PPS out and external timestamp trigger
- WiFi:
- Disable Wireless Extensions (WEXT) in all Wi-Fi 7 devices
drivers. Modern devices can only be configured using nl80211.
- mac80211/cfg80211
- handle color change per link for WiFi 7 Multi-Link Operation
- Intel (iwlwifi):
- don't support puncturing in 5 GHz
- support monitor mode on passive channels
- BZ-W device support
- P2P with HE/EHT support
- re-add support for firmware API 90
- provide channel survey information for Automatic Channel Selection
- MediaTek (mt76):
- mt7921 LED control
- mt7925 EHT radiotap support
- mt7920e PCI support
- Qualcomm (ath11k):
- P2P support for QCA6390, WCN6855 and QCA2066
- support hibernation
- ieee80211-freq-limit Device Tree property support
- Qualcomm (ath12k):
- refactoring in preparation of multi-link support
- suspend and hibernation support
- ACPI support
- debugfs support, including dfs_simulate_radar support
- RealTek:
- rtw88: RTL8723CS SDIO device support
- rtw89: RTL8922AE Wi-Fi 7 PCI device support
- rtw89: complete features of new WiFi 7 chip 8922AE including
BT-coexistence and Wake-on-WLAN
- rtw89: use BIOS ACPI settings to set TX power and channels
- rtl8xxxu: enable Management Frame Protection (MFP) support
- Bluetooth:
- support for Intel BlazarI and Filmore Peak2 (BE201)
- support for MediaTek MT7921S SDIO
- initial support for Intel PCIe BT driver
- remove HCI_AMP support"
* tag 'net-next-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1827 commits)
selftests: netfilter: fix packetdrill conntrack testcase
net: gro: fix napi_gro_cb zeroed alignment
Bluetooth: btintel_pcie: Refactor and code cleanup
Bluetooth: btintel_pcie: Fix warning reported by sparse
Bluetooth: hci_core: Fix not handling hdev->le_num_of_adv_sets=1
Bluetooth: btintel: Fix compiler warning for multi_v7_defconfig config
Bluetooth: btintel_pcie: Fix compiler warnings
Bluetooth: btintel_pcie: Add *setup* function to download firmware
Bluetooth: btintel_pcie: Add support for PCIe transport
Bluetooth: btintel: Export few static functions
Bluetooth: HCI: Remove HCI_AMP support
Bluetooth: L2CAP: Fix div-by-zero in l2cap_le_flowctl_init()
Bluetooth: qca: Fix error code in qca_read_fw_build_info()
Bluetooth: hci_conn: Use __counted_by() and avoid -Wfamnae warning
Bluetooth: btintel: Add support for Filmore Peak2 (BE201)
Bluetooth: btintel: Add support for BlazarI
LE Create Connection command timeout increased to 20 secs
dt-bindings: net: bluetooth: Add MediaTek MT7921S SDIO Bluetooth
Bluetooth: compute LE flow credits based on recvbuf space
Bluetooth: hci_sync: Use cmd->num_cis instead of magic number
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper updates from Mike Snitzer:
- Add a dm-crypt optional "high_priority" flag that enables the crypt
workqueues to use WQ_HIGHPRI.
- Export dm-crypt workqueues via sysfs (by enabling WQ_SYSFS) to allow
for improved visibility and controls over IO and crypt workqueues.
- Fix dm-crypt to no longer constrain max_segment_size to PAGE_SIZE.
This limit isn't needed given that the block core provides late bio
splitting if bio exceeds underlying limits (e.g. max_segment_size).
- Fix dm-crypt crypt_queue's use of WQ_UNBOUND to not use
WQ_CPU_INTENSIVE because it is meaningless with WQ_UNBOUND.
- Fix various issues with dm-delay target (ranging from a resource
teardown fix, a fix for hung task when using kthread mode, and other
improvements that followed from code inspection).
* tag 'for-6.10/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm-delay: remove timer_lock
dm-delay: change locking to avoid contention
dm-delay: fix max_delay calculations
dm-delay: fix hung task introduced by kthread mode
dm-delay: fix workqueue delay_timer race
dm-crypt: don't set WQ_CPU_INTENSIVE for WQ_UNBOUND crypt_queue
dm: use queue_limits_set
dm-crypt: stop constraining max_segment_size to PAGE_SIZE
dm-crypt: export sysfs of all workqueues
dm-crypt: add the optional "high_priority" flag
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Will Deacon:
"The most interesting parts are probably the mm changes from Ryan which
optimise the creation of the linear mapping at boot and (separately)
implement write-protect support for userfaultfd.
Outside of our usual directories, the Kbuild-related changes under
scripts/ have been acked by Masahiro whilst the drivers/acpi/ parts
have been acked by Rafael and the addition of cpumask_any_and_but()
has been acked by Yury.
ACPI:
- Support for the Firmware ACPI Control Structure (FACS) signature
feature which is used to reboot out of hibernation on some systems
Kbuild:
- Support for building Flat Image Tree (FIT) images, where the kernel
Image is compressed alongside a set of devicetree blobs
Memory management:
- Optimisation of our early page-table manipulation for creation of
the linear mapping
- Support for userfaultfd write protection, which brings along some
nice cleanups to our handling of invalid but present ptes
- Extend our use of range TLBI invalidation at EL1
Perf and PMUs:
- Ensure that the 'pmu->parent' pointer is correctly initialised by
PMU drivers
- Avoid allocating 'cpumask_t' types on the stack in some PMU drivers
- Fix parsing of the CPU PMU "version" field in assembly code, as it
doesn't follow the usual architectural rules
- Add best-effort unwinding support for USER_STACKTRACE
- Minor driver fixes and cleanups
Selftests:
- Minor cleanups to the arm64 selftests (missing NULL check, unused
variable)
Miscellaneous:
- Add a command-line alias for disabling 32-bit application support
- Add part number for Neoverse-V2 CPUs
- Minor fixes and cleanups"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (64 commits)
arm64/mm: Fix pud_user_accessible_page() for PGTABLE_LEVELS <= 2
arm64/mm: Add uffd write-protect support
arm64/mm: Move PTE_PRESENT_INVALID to overlay PTE_NG
arm64/mm: Remove PTE_PROT_NONE bit
arm64/mm: generalize PMD_PRESENT_INVALID for all levels
arm64: simplify arch_static_branch/_jump function
arm64: Add USER_STACKTRACE support
arm64: Add the arm64.no32bit_el0 command line option
drivers/perf: hisi: hns3: Actually use devm_add_action_or_reset()
drivers/perf: hisi: hns3: Fix out-of-bound access when valid event group
drivers/perf: hisi_pcie: Fix out-of-bound access when valid event group
kselftest: arm64: Add a null pointer check
arm64: defer clearing DAIF.D
arm64: assembler: update stale comment for disable_step_tsk
arm64/sysreg: Update PIE permission encodings
kselftest/arm64: Remove unused parameters in abi test
perf/arm-spe: Assign parents for event_source device
perf/arm-smmuv3: Assign parents for event_source device
perf/arm-dsu: Assign parents for event_source device
perf/arm-dmc620: Assign parents for event_source device
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 interrupt handling updates from Thomas Gleixner:
"Add support for posted interrupts on bare metal.
Posted interrupts is a virtualization feature which allows to inject
interrupts directly into a guest without host interaction. The VT-d
interrupt remapping hardware sets the bit which corresponds to the
interrupt vector in a vector bitmap which is either used to inject the
interrupt directly into the guest via a virtualized APIC or in case
that the guest is scheduled out provides a host side notification
interrupt which informs the host that an interrupt has been marked
pending in the bitmap.
This can be utilized on bare metal for scenarios where multiple
devices, e.g. NVME storage, raise interrupts with a high frequency. In
the default mode these interrupts are handles independently and
therefore require a full roundtrip of interrupt entry/exit.
Utilizing posted interrupts this roundtrip overhead can be avoided by
coalescing these interrupt entries to a single entry for the posted
interrupt notification. The notification interrupt then demultiplexes
the pending bits in a memory based bitmap and invokes the
corresponding device specific handlers.
Depending on the usage scenario and device utilization throughput
improvements between 10% and 130% have been measured.
As this is only relevant for high end servers with multiple device
queues per CPU attached and counterproductive for situations where
interrupts are arriving at distinct times, the functionality is opt-in
via a kernel command line parameter"
* tag 'x86-irq-2024-05-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/irq: Use existing helper for pending vector check
iommu/vt-d: Enable posted mode for device MSIs
iommu/vt-d: Make posted MSI an opt-in command line option
x86/irq: Extend checks for pending vectors to posted interrupts
x86/irq: Factor out common code for checking pending interrupts
x86/irq: Install posted MSI notification handler
x86/irq: Factor out handler invocation from common_interrupt()
x86/irq: Set up per host CPU posted interrupt descriptors
x86/irq: Reserve a per CPU IDT vector for posted MSIs
x86/irq: Add a Kconfig option for posted MSI
x86/irq: Remove bitfields in posted interrupt descriptor
x86/irq: Unionize PID.PIR for 64bit access w/o casting
KVM: VMX: Move posted interrupt descriptor out of VMX code
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
- Add cpufreq pressure feedback for the scheduler
- Rework misfit load-balancing wrt affinity restrictions
- Clean up and simplify the code around ::overutilized and
::overload access.
- Simplify sched_balance_newidle()
- Bump SCHEDSTAT_VERSION to 16 due to a cleanup of CPU_MAX_IDLE_TYPES
handling that changed the output.
- Rework & clean up <asm/vtime.h> interactions wrt arch_vtime_task_switch()
- Reorganize, clean up and unify most of the higher level
scheduler balancing function names around the sched_balance_*()
prefix
- Simplify the balancing flag code (sched_balance_running)
- Miscellaneous cleanups & fixes
* tag 'sched-core-2024-05-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (50 commits)
sched/pelt: Remove shift of thermal clock
sched/cpufreq: Rename arch_update_thermal_pressure() => arch_update_hw_pressure()
thermal/cpufreq: Remove arch_update_thermal_pressure()
sched/cpufreq: Take cpufreq feedback into account
cpufreq: Add a cpufreq pressure feedback for the scheduler
sched/fair: Fix update of rd->sg_overutilized
sched/vtime: Do not include <asm/vtime.h> header
s390/irq,nmi: Include <asm/vtime.h> header directly
s390/vtime: Remove unused __ARCH_HAS_VTIME_TASK_SWITCH leftover
sched/vtime: Get rid of generic vtime_task_switch() implementation
sched/vtime: Remove confusing arch_vtime_task_switch() declaration
sched/balancing: Simplify the sg_status bitmask and use separate ->overloaded and ->overutilized flags
sched/fair: Rename set_rd_overutilized_status() to set_rd_overutilized()
sched/fair: Rename SG_OVERLOAD to SG_OVERLOADED
sched/fair: Rename {set|get}_rd_overload() to {set|get}_rd_overloaded()
sched/fair: Rename root_domain::overload to ::overloaded
sched/fair: Use helper functions to access root_domain::overload
sched/fair: Check root_domain::overload value before update
sched/fair: Combine EAS check with root_domain::overutilized access
sched/fair: Simplify the continue_balancing logic in sched_balance_newidle()
...
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:
====================
pull-request: bpf-next 2024-05-13
We've added 119 non-merge commits during the last 14 day(s) which contain
a total of 134 files changed, 9462 insertions(+), 4742 deletions(-).
The main changes are:
1) Add BPF JIT support for 32-bit ARCv2 processors, from Shahab Vahedi.
2) Add BPF range computation improvements to the verifier in particular
around XOR and OR operators, refactoring of checks for range computation
and relaxing MUL range computation so that src_reg can also be an unknown
scalar, from Cupertino Miranda.
3) Add support to attach kprobe BPF programs through kprobe_multi link in
a session mode, meaning, a BPF program is attached to both function entry
and return, the entry program can decide if the return program gets
executed and the entry program can share u64 cookie value with return
program. Session mode is a common use-case for tetragon and bpftrace,
from Jiri Olsa.
4) Fix a potential overflow in libbpf's ring__consume_n() and improve libbpf
as well as BPF selftest's struct_ops handling, from Andrii Nakryiko.
5) Improvements to BPF selftests in context of BPF gcc backend,
from Jose E. Marchesi & David Faust.
6) Migrate remaining BPF selftest tests from test_sock_addr.c to prog_test-
-style in order to retire the old test, run it in BPF CI and additionally
expand test coverage, from Jordan Rife.
7) Big batch for BPF selftest refactoring in order to remove duplicate code
around common network helpers, from Geliang Tang.
8) Another batch of improvements to BPF selftests to retire obsolete
bpf_tcp_helpers.h as everything is available vmlinux.h,
from Martin KaFai Lau.
9) Fix BPF map tear-down to not walk the map twice on free when both timer
and wq is used, from Benjamin Tissoires.
10) Fix BPF verifier assumptions about socket->sk that it can be non-NULL,
from Alexei Starovoitov.
11) Change BTF build scripts to using --btf_features for pahole v1.26+,
from Alan Maguire.
12) Small improvements to BPF reusing struct_size() and krealloc_array(),
from Andy Shevchenko.
13) Fix s390 JIT to emit a barrier for BPF_FETCH instructions,
from Ilya Leoshkevich.
14) Extend TCP ->cong_control() callback in order to feed in ack and
flag parameters and allow write-access to tp->snd_cwnd_stamp
from BPF program, from Miao Xu.
15) Add support for internal-only per-CPU instructions to inline
bpf_get_smp_processor_id() helper call for arm64 and riscv64 BPF JITs,
from Puranjay Mohan.
16) Follow-up to remove the redundant ethtool.h from tooling infrastructure,
from Tushar Vyavahare.
17) Extend libbpf to support "module:<function>" syntax for tracing
programs, from Viktor Malik.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (119 commits)
bpf: make list_for_each_entry portable
bpf: ignore expected GCC warning in test_global_func10.c
bpf: disable strict aliasing in test_global_func9.c
selftests/bpf: Free strdup memory in xdp_hw_metadata
selftests/bpf: Fix a few tests for GCC related warnings.
bpf: avoid gcc overflow warning in test_xdp_vlan.c
tools: remove redundant ethtool.h from tooling infra
selftests/bpf: Expand ATTACH_REJECT tests
selftests/bpf: Expand getsockname and getpeername tests
sefltests/bpf: Expand sockaddr hook deny tests
selftests/bpf: Expand sockaddr program return value tests
selftests/bpf: Retire test_sock_addr.(c|sh)
selftests/bpf: Remove redundant sendmsg test cases
selftests/bpf: Migrate ATTACH_REJECT test cases
selftests/bpf: Migrate expected_attach_type tests
selftests/bpf: Migrate wildcard destination rewrite test
selftests/bpf: Migrate sendmsg6 v4 mapped address tests
selftests/bpf: Migrate sendmsg deny test cases
selftests/bpf: Migrate WILDCARD_IP test
selftests/bpf: Handle SYSCALL_EPERM and SYSCALL_ENOTSUPP test cases
...
====================
Link: https://lore.kernel.org/r/20240513134114.17575-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Pull documentation updates from Jonathan Corbet:
"Another not-too-busy cycle for documentation, including:
- Some build-system changes to detect the variable fonts installed by
some distributions that can break the PDF build.
- Various updates and additions to the Spanish, Chinese, Italian, and
Japanese translations.
- Update the stable-kernel rules to match modern practice
... and the usual array of corrections, updates, and typo fixes"
* tag 'docs-6.10' of git://git.lwn.net/linux: (42 commits)
cgroup: Add documentation for missing zswap memory.stat
kernel-doc: Added "*" in $type_constants2 to fix 'make htmldocs' warning.
docs:core-api: fixed typos and grammar in printk-index page
Documentation: tracing: Fix spelling mistakes
docs/zh_CN/rust: Update the translation of quick-start to 6.9-rc4
docs/zh_CN/rust: Update the translation of general-information to 6.9-rc4
docs/zh_CN/rust: Update the translation of coding-guidelines to 6.9-rc4
docs/zh_CN/rust: Update the translation of arch-support to 6.9-rc4
docs: stable-kernel-rules: fix typo sent->send
docs/zh_CN: remove two inconsistent spaces
docs: scripts/check-variable-fonts.sh: Improve commands for detection
docs: stable-kernel-rules: create special tag to flag 'no backporting'
docs: stable-kernel-rules: explain use of stable@kernel.org (w/o @vger.)
docs: stable-kernel-rules: remove code-labels tags and a indention level
docs: stable-kernel-rules: call mainline by its name and change example
docs: stable-kernel-rules: reduce redundancy
docs, kprobes: Add riscv as supported architecture
Docs: typos/spelling
docs: kernel_include.py: Cope with docutils 0.21
docs: ja_JP/howto: Catch up update in v6.8
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd
Pull trusted keys updates from Jarkko Sakkinen:
"This contains a new key type for the Data Co-Processor (DCP), which is
an IP core built into many NXP SoCs such as i.mx6ull"
* tag 'keys-trusted-next-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd:
docs: trusted-encrypted: add DCP as new trust source
docs: document DCP-backed trusted keys kernel params
MAINTAINERS: add entry for DCP-based trusted keys
KEYS: trusted: Introduce NXP DCP-backed trusted keys
KEYS: trusted: improve scalability of trust source config
crypto: mxs-dcp: Add support for hardware-bound keys
|
|
Commit 452e9e6992fe ("filemap: Add filemap_remove_folio and
__filemap_remove_folio") reimplemented __delete_from_page_cache() as
__filemap_remove_folio() and delete_from_page_cache() as
filemap_remove_folio(). The compatibility wrappers were finally removed
in ece62684dcfb ("hugetlbfs: convert hugetlb_delete_from_page_cache() to
use folios") and 6ffcd825e7d0 ("mm: Remove __delete_from_page_cache()").
Update the remaining references to dead functions in the memcg
implementation memo.
Signed-off-by: Illia Ostapyshyn <illia@yshyn.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
Pull RCU updates from Uladzislau Rezki:
- Fix a lockdep complain for lazy-preemptible kernel, remove redundant
BH disable for TINY_RCU, remove redundant READ_ONCE() in tree.c, fix
false positives KCSAN splat and fix buffer overflow in the
print_cpu_stall_info().
- Misc updates related to bpf, tracing and update the MAINTAINERS file.
- An improvement of a normal synchronize_rcu() call in terms of
latency. It maintains a separate track for sync. users only. This
approach bypasses per-cpu nocb-lists thus sync-users do not depend on
nocb-list length and how fast regular callbacks are processed.
- RCU tasks: switch tasks RCU grace periods to sleep at TASK_IDLE
priority, fix some comments, add some diagnostic warning to the
exit_tasks_rcu_start() and fix a buffer overflow in the
show_rcu_tasks_trace_gp_kthread().
- RCU torture: Increase memory to guest OS, fix a Tasks Rude RCU
testing, some updates for TREE09, dump mode information to debug GP
kthread state, remove redundant READ_ONCE(), fix some comments about
RCU_TORTURE_PIPE_LEN and pipe_count, remove some redundant pointer
initialization, fix a hung splat task by when the rcutorture tests
start to exit, fix invalid context warning, add '--do-kvfree'
parameter to torture test and use slow register unregister callbacks
only for rcutype test.
* tag 'rcu.next.v6.10' of https://github.com/urezki/linux: (48 commits)
rcutorture: Use rcu_gp_slow_register/unregister() only for rcutype test
torture: Scale --do-kvfree test time
rcutorture: Fix invalid context warning when enable srcu barrier testing
rcutorture: Make stall-tasks directly exit when rcutorture tests end
rcutorture: Removing redundant function pointer initialization
rcutorture: Make rcutorture support print rcu-tasks gp state
rcutorture: Use the gp_kthread_dbg operation specified by cur_ops
rcutorture: Re-use value stored to ->rtort_pipe_count instead of re-reading
rcutorture: Fix rcu_torture_one_read() pipe_count overflow comment
rcutorture: Remove extraneous rcu_torture_pipe_update_one() READ_ONCE()
rcu: Allocate WQ with WQ_MEM_RECLAIM bit set
rcu: Support direct wake-up of synchronize_rcu() users
rcu: Add a trace event for synchronize_rcu_normal()
rcu: Reduce synchronize_rcu() latency
rcu: Fix buffer overflow in print_cpu_stall_info()
rcu: Mollify sparse with RCU guard
rcu-tasks: Fix show_rcu_tasks_trace_gp_kthread buffer overflow
rcu-tasks: Fix the comments for tasks_rcu_exit_srcu_stall_timer
rcu-tasks: Replace exit_tasks_rcu_start() initialization with WARN_ON_ONCE()
rcu: Remove redundant CONFIG_PROVE_RCU #if condition
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Alexander Gordeev:
- Store AP Query Configuration Information in a static buffer
- Rework the AP initialization and add missing cleanups to the error
path
- Swap IRQ and AP bus/device registration to avoid race conditions
- Export prot_virt_guest symbol
- Introduce AP configuration changes notifier interface to facilitate
modularization of the AP bus
- Add CONFIG_AP kernel configuration option to allow modularization of
the AP bus
- Rework CONFIG_ZCRYPT_DEBUG kernel configuration option description
and dependency and rename it to CONFIG_AP_DEBUG
- Convert sprintf() and snprintf() to sysfs_emit() in CIO code
- Adjust indentation of RELOCS command build step
- Make crypto performance counters upward compatible
- Convert make_page_secure() and gmap_make_secure() to use folio
- Rework channel-utilization-block (CUB) handling in preparation of
introducing additional CUBs
- Use attribute groups to simplify registration, removal and extension
of measurement-related channel-path sysfs attributes
- Add a per-channel-path binary "ext_measurement" sysfs attribute that
provides access to extended channel-path measurement data
- Export measurement data for all channel-measurement-groups (CMG), not
only for a specific ones. This enables support of new CMG data
formats in userspace without the need for kernel changes
- Add a per-channel-path sysfs attribute "speed_bps" that provides the
operating speed in bits per second or 0 if the operating speed is not
available
- The CIO tracepoint subchannel-type field "st" is incorrectly set to
the value of subchannel-enabled SCHIB "ena" field. Fix that
- Do not forcefully limit vmemmap starting address to MAX_PHYSMEM_BITS
- Consider the maximum physical address available to a DCSS segment
(512GB) when memory layout is set up
- Simplify the virtual memory layout setup by reducing the size of
identity mapping vs vmemmap overlap
- Swap vmalloc and Lowcore/Real Memory Copy areas in virtual memory.
This will allow to place the kernel image next to kernel modules
- Move everyting KASLR related from <asm/setup.h> to <asm/page.h>
- Put virtual memory layout information into a structure to improve
code generation
- Currently __kaslr_offset is the kernel offset in both physical and
virtual memory spaces. Uncouple these offsets to allow uncoupling of
the addresses spaces
- Currently the identity mapping base address is implicit and is always
set to zero. Make it explicit by putting into __identity_base
persistent boot variable and use it in proper context
- Introduce .amode31 section start and end macros AMODE31_START and
AMODE31_END
- Introduce OS_INFO entries that do not reference any data in memory,
but rather provide only values
- Store virtual memory layout in OS_INFO. It is read out by
makedumpfile, crash and other tools
- Store virtual memory layout in VMCORE_INFO. It is read out by crash
and other tools when /proc/kcore device is used
- Create additional PT_LOAD ELF program header that covers kernel image
only, so that vmcore tools could locate kernel text and data when
virtual and physical memory spaces are uncoupled
- Uncouple physical and virtual address spaces
- Map kernel at fixed location when KASLR mode is disabled. The
location is defined by CONFIG_KERNEL_IMAGE_BASE kernel configuration
value.
- Rework deployment of kernel image for both compressed and
uncompressed variants as defined by CONFIG_KERNEL_UNCOMPRESSED kernel
configuration value
- Move .vmlinux.relocs section in front of the compressed kernel. The
interim section rescue step is avoided as result
- Correct modules thunk offset calculation when branch target is more
than 2GB away
- Kernel modules contain their own set of expoline thunks. Now that the
kernel modules area is less than 4GB away from kernel expoline
thunks, make modules use kernel expolines. Also make EXPOLINE_EXTERN
the default if the compiler supports it
- userfaultfd can insert shared zeropages into processes running VMs,
but that is not allowed for s390. Fallback to allocating a fresh
zeroed anonymous folio and insert that instead
- Re-enable shared zeropages for non-PV and non-skeys KVM guests
- Rename hex2bitmap() to ap_hex2bitmap() and export it for external use
- Add ap_config sysfs attribute to provide the means for setting or
displaying adapters, domains and control domains assigned to a
vfio-ap mediated device in a single operation
- Make vfio_ap_mdev_link_queue() ignore duplicate link requests
- Add write support to ap_config sysfs attribute to allow atomic update
a vfio-ap mediated device state
- Document ap_config sysfs attribute
- Function os_info_old_init() is expected to be called only from a
regular kdump kernel. Enable it to be called from a stand-alone dump
kernel
- Address gcc -Warray-bounds warning and fix array size in struct
os_info
- s390 does not support SMBIOS, so drop unneeded CONFIG_DMI checks
- Use unwinder instead of __builtin_return_address() with ftrace to
prevent returning of undefined values
- Sections .hash and .gnu.hash are only created when CONFIG_PIE_BUILD
kernel is enabled. Drop these for the case CONFIG_PIE_BUILD is
disabled
- Compile kernel with -fPIC and link with -no-pie to allow kpatch
feature always succeed and drop the whole CONFIG_PIE_BUILD
option-enabled code
- Add missing virt_to_phys() converter for VSIE facility and crypto
control blocks
* tag 's390-6.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (54 commits)
Revert "s390: Relocate vmlinux ELF data to virtual address space"
KVM: s390: vsie: Use virt_to_phys for crypto control block
s390: Relocate vmlinux ELF data to virtual address space
s390: Compile kernel with -fPIC and link with -no-pie
s390: vmlinux.lds.S: Drop .hash and .gnu.hash for !CONFIG_PIE_BUILD
s390/ftrace: Use unwinder instead of __builtin_return_address()
s390/pci: Drop unneeded reference to CONFIG_DMI
s390/os_info: Fix array size in struct os_info
s390/os_info: Initialize old os_info in standalone dump kernel
docs: Update s390 vfio-ap doc for ap_config sysfs attribute
s390/vfio-ap: Add write support to sysfs attr ap_config
s390/vfio-ap: Ignore duplicate link requests in vfio_ap_mdev_link_queue
s390/vfio-ap: Add sysfs attr, ap_config, to export mdev state
s390/ap: Externalize AP bus specific bitmap reading function
s390/mm: Re-enable the shared zeropage for !PV and !skeys KVM guests
mm/userfaultfd: Do not place zeropages when zeropages are disallowed
s390/expoline: Make modules use kernel expolines
s390/nospec: Correct modules thunk offset calculation
s390/boot: Do not rescue .vmlinux.relocs section
s390/boot: Rework deployment of the kernel image
...
|
|
into next
|
|
This will add eBPF JIT support to the 32-bit ARCv2 processors. The
implementation is qualified by running the BPF tests on a Synopsys HSDK
board with "ARC HS38 v2.1c at 500 MHz" as the 4-core CPU.
The test_bpf.ko reports 2-10 fold improvements in execution time of its
tests. For instance:
test_bpf: #33 tcpdump port 22 jited:0 704 1766 2104 PASS
test_bpf: #33 tcpdump port 22 jited:1 120 224 260 PASS
test_bpf: #141 ALU_DIV_X: 4294967295 / 4294967295 = 1 jited:0 238 PASS
test_bpf: #141 ALU_DIV_X: 4294967295 / 4294967295 = 1 jited:1 23 PASS
test_bpf: #776 JMP32_JGE_K: all ... magnitudes jited:0 2034681 PASS
test_bpf: #776 JMP32_JGE_K: all ... magnitudes jited:1 1020022 PASS
Deployment and structure
------------------------
The related codes are added to "arch/arc/net":
- bpf_jit.h -- The interface that a back-end translator must provide
- bpf_jit_core.c -- Knows how to handle the input eBPF byte stream
- bpf_jit_arcv2.c -- The back-end code that knows the translation logic
The bpf_int_jit_compile() at the end of bpf_jit_core.c is the entrance
to the whole process. Normally, the translation is done in one pass,
namely the "normal pass". In case some relocations are not known during
this pass, some data (arc_jit_data) is allocated for the next pass to
come. This possible next (and last) pass is called the "extra pass".
1. Normal pass # The necessary pass
1a. Dry run # Get the whole JIT length, epilogue offset, etc.
1b. Emit phase # Allocate memory and start emitting instructions
2. Extra pass # Only needed if there are relocations to be fixed
2a. Patch relocations
Support status
--------------
The JIT compiler supports BPF instructions up to "cpu=v4". However, it
does not yet provide support for:
- Tail calls
- Atomic operations
- 64-bit division/remainder
- BPF_PROBE_MEM* (exception table)
The result of "test_bpf" test suite on an HSDK board is:
hsdk-lnx# insmod test_bpf.ko test_suite=test_bpf
test_bpf: Summary: 863 PASSED, 186 FAILED, [851/851 JIT'ed]
All the failing test cases are due to the ones that were not JIT'ed.
Categorically, they can be represented as:
.-----------.------------.-------------.
| test type | opcodes | # of cases |
|-----------+------------+-------------|
| atomic | 0xC3, 0xDB | 149 |
| div64 | 0x37, 0x3F | 22 |
| mod64 | 0x97, 0x9F | 15 |
`-----------^------------+-------------|
| (total) 186 |
`-------------'
Setup: build config
-------------------
The following configs must be set to have a working JIT test:
CONFIG_BPF_JIT=y
CONFIG_BPF_JIT_ALWAYS_ON=y
CONFIG_TEST_BPF=m
The following options are not necessary for the tests module,
but are good to have:
CONFIG_DEBUG_INFO=y # prerequisite for below
CONFIG_DEBUG_INFO_BTF=y # so bpftool can generate vmlinux.h
CONFIG_FTRACE=y #
CONFIG_BPF_SYSCALL=y # all these options lead to
CONFIG_KPROBE_EVENTS=y # having CONFIG_BPF_EVENTS=y
CONFIG_PERF_EVENTS=y #
Some BPF programs provide data through /sys/kernel/debug:
CONFIG_DEBUG_FS=y
arc# mount -t debugfs debugfs /sys/kernel/debug
Setup: elfutils
---------------
The libdw.{so,a} library that is used by pahole for processing
the final binary must come from elfutils 0.189 or newer. The
support for ARCv2 [1] has been added since that version.
[1]
https://sourceware.org/git/?p=elfutils.git;a=commit;h=de3d46b3e7
Setup: pahole
-------------
The line below in linux/scripts/Makefile.btf must be commented out:
pahole-flags-$(call test-ge, $(pahole-ver), 121) += --btf_gen_floats
Or else, the build will fail:
$ make V=1
...
BTF .btf.vmlinux.bin.o
pahole -J --btf_gen_floats \
-j --lang_exclude=rust \
--skip_encoding_btf_inconsistent_proto \
--btf_gen_optimized .tmp_vmlinux.btf
Complex, interval and imaginary float types are not supported
Encountered error while encoding BTF.
...
BTFIDS vmlinux
./tools/bpf/resolve_btfids/resolve_btfids vmlinux
libbpf: failed to find '.BTF' ELF section in vmlinux
FAILED: load BTF from vmlinux: No data available
This is due to the fact that the ARC toolchains generate
"complex float" DIE entries in libgcc and at the moment, pahole
can't handle such entries.
Running the tests
-----------------
host$ scp /bld/linux/lib/test_bpf.ko arc:
arc # sysctl net.core.bpf_jit_enable=1
arc # insmod test_bpf.ko test_suite=test_bpf
...
test_bpf: #1048 Staggered jumps: JMP32_JSLE_X jited:1 697811 PASS
test_bpf: Summary: 863 PASSED, 186 FAILED, [851/851 JIT'ed]
Acknowledgments
---------------
- Claudiu Zissulescu for his unwavering support
- Yuriy Kolerov for testing and troubleshooting
- Vladimir Isaev for the pahole workaround
- Sergey Matyukevich for paving the road by adding the interpreter support
Signed-off-by: Shahab Vahedi <shahab@synopsys.com>
Link: https://lore.kernel.org/r/20240430145604.38592-1-list+bpf@vahedi.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
command
To update effective size quota of DAMOS schemes on DAMON sysfs file
interface, user should write 'update_schemes_effective_quotas' to the
kdamond 'state' file. But the document is mistakenly saying the input
string as 'update_schemes_effective_bytes'. Fix it (s/bytes/quotas/).
Link: https://lkml.kernel.org/r/20240503180318.72798-8-sj@kernel.org
Fixes: a6068d6dfa2f ("Docs/admin-guide/mm/damon/usage: document effective_bytes file")
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: <stable@vger.kernel.org> [6.9.x]
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
sysfs file
The example usage of DAMOS filter sysfs files, specifically the part of
'matching' file writing for memcg type filter, is wrong. The intention is
to exclude pages of a memcg that already getting enough care from a given
scheme, but the example is setting the filter to apply the scheme to only
the pages of the memcg. Fix it.
Link: https://lkml.kernel.org/r/20240503180318.72798-7-sj@kernel.org
Fixes: 9b7f9322a530 ("Docs/admin-guide/mm/damon/usage: document DAMOS filters of sysfs")
Closes: https://lore.kernel.org/r/20240317191358.97578-1-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: <stable@vger.kernel.org> [6.3.x]
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
This includes zswpin, zswpout and zswpwb.
Signed-off-by: Usama Arif <usamaarif642@gmail.com>
Acked-by: Nhat Pham <nphamcs@gmail.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <20240502185307.3942173-2-usamaarif642@gmail.com>
|
|
Document the kernel parameters trusted.dcp_use_otp_key
and trusted.dcp_skip_zk_test for DCP-backed trusted keys.
Co-developed-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Richard Weinberger <richard@nod.at>
Co-developed-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
Signed-off-by: David Gstir <david@sigma-star.at>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
|
|
* for-next/perf: (41 commits)
arm64: Add USER_STACKTRACE support
drivers/perf: hisi: hns3: Actually use devm_add_action_or_reset()
drivers/perf: hisi: hns3: Fix out-of-bound access when valid event group
drivers/perf: hisi_pcie: Fix out-of-bound access when valid event group
perf/arm-spe: Assign parents for event_source device
perf/arm-smmuv3: Assign parents for event_source device
perf/arm-dsu: Assign parents for event_source device
perf/arm-dmc620: Assign parents for event_source device
perf/arm-ccn: Assign parents for event_source device
perf/arm-cci: Assign parents for event_source device
perf/alibaba_uncore: Assign parents for event_source device
perf/arm_pmu: Assign parents for event_source devices
perf/imx_ddr: Assign parents for event_source devices
perf/qcom: Assign parents for event_source devices
Documentation: qcom-pmu: Use /sys/bus/event_source/devices paths
perf/riscv: Assign parents for event_source devices
perf/thunderx2: Assign parents for event_source devices
Documentation: thunderx2-pmu: Use /sys/bus/event_source/devices paths
perf/xgene: Assign parents for event_source devices
Documentation: xgene-pmu: Use /sys/bus/event_source/devices paths
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Pull pci fixes from Bjorn Helgaas:
- Update kernel-parameters doc to describe "pcie_aspm=off" more
accurately (Bjorn Helgaas)
- Restore the parent's (not the child's) ASPM state to the parent
during resume, which fixes a reboot during resume (Kai-Heng Feng)
* tag 'pci-v6.9-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
PCI/ASPM: Restore parent state to parent, child state to child
PCI/ASPM: Clarify that pcie_aspm=off means leave ASPM untouched
|
|
NMI watchdog permanently consumes one hardware counters per CPU on the
system. For systems that use many hardware counters, this causes more
aggressive time multiplexing of perf events.
OTOH, some CPUs (mostly Intel) support "ref-cycles" event, which is rarely
used. Add kernel cmdline arg nmi_watchdog=rNNN to configure the watchdog
to use raw event. For example, on Intel CPUs, we can use "r300" to
configure the watchdog to use ref-cycles event.
If the raw event does not work, fall back to use "cycles".
[akpm@linux-foundation.org: fix kerneldoc]
Link: https://lkml.kernel.org/r/20240430060236.1878002-2-song@kernel.org
Signed-off-by: Song Liu <song@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Update DAMON usage document for the newly added DAMOS filter type, 'young
page'.
Link: https://lkml.kernel.org/r/20240426195247.100306-7-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Honggyu Kim <honggyu.kim@sk.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
We want to limit the use of page_mapcount() to places where absolutely
required, to prepare for kernel configs where we won't keep track of
per-page mapcounts in large folios.
khugepaged is one of the remaining "more challenging" page_mapcount()
users, but we might be able to move away from page_mapcount() without
resulting in a significant behavior change that would warrant
special-casing based on kernel configs.
In 2020, we first added support to khugepaged for collapsing COW-shared
pages via commit 9445689f3b61 ("khugepaged: allow to collapse a page
shared across fork"), followed by support for collapsing PTE-mapped THP in
commit 5503fbf2b0b8 ("khugepaged: allow to collapse PTE-mapped compound
pages") and limiting the memory waste via the "page_count() > 1" check in
commit 71a2c112a0f6 ("khugepaged: introduce 'max_ptes_shared' tunable").
As a default, khugepaged will allow up to half of the PTEs to map shared
pages: where page_mapcount() > 1. MADV_COLLAPSE ignores the khugepaged
setting.
khugepaged does currently not care about swapcache page references, and
does not check under folio lock: so in some corner cases the "shared vs.
exclusive" detection might be a bit off, making us detect "exclusive" when
it's actually "shared".
Most of our anonymous folios in the system are usually exclusive. We
frequently see sharing of anonymous folios for a short period of time,
after which our short-lived suprocesses either quit or exec().
There are some famous examples, though, where child processes exist for a
long time, and where memory is COW-shared with a lot of processes
(webservers, webbrowsers, sshd, ...) and COW-sharing is crucial for
reducing the memory footprint. We don't want to suddenly change the
behavior to result in a significant increase in memory waste.
Interestingly, khugepaged will only collapse an anonymous THP if at least
one PTE is writable. After fork(), that means that something (usually a
page fault) populated at least a single exclusive anonymous THP in that
PMD range.
So ... what happens when we switch to "is this folio mapped shared"
instead of "is this page mapped shared" by using
folio_likely_mapped_shared()?
For "not-COW-shared" folios, small folios and for THPs (large folios) that
are completely mapped into at least one process, switching to
folio_likely_mapped_shared() will not result in a change.
We'll only see a change for COW-shared PTE-mapped THPs that are partially
mapped into all involved processes.
There are two cases to consider:
(A) folio_likely_mapped_shared() returns "false" for a PTE-mapped THP
If the folio is detected as exclusive, and it actually is exclusive,
there is no change: page_mapcount() == 1. This is the common case
without fork() or with short-lived child processes.
folio_likely_mapped_shared() might currently still detect a folio as
exclusive although it is shared (false negatives): if the first page is
not mapped multiple times and if the average per-page mapcount is smaller
than 1, implying that (1) the folio is partially mapped and (2) if we are
responsible for many mapcounts by mapping many pages others can't
("mostly exclusive") (3) if we are not responsible for many mapcounts by
mapping little pages ("mostly shared") it won't make a big impact on the
end result.
So while we might now detect a page as "exclusive" although it isn't,
it's not expected to make a big difference in common cases.
(B) folio_likely_mapped_shared() returns "true" for a PTE-mapped THP
folio_likely_mapped_shared() will never detect a large anonymous folio
as shared although it is exclusive: there are no false positives.
If we detect a THP as shared, at least one page of the THP is mapped by
another process. It could well be that some pages are actually exclusive.
For example, our child processes could have unmapped/COW'ed some pages
such that they would now be exclusive to out process, which we now
would treat as still-shared.
Examples:
(1) Parent maps all pages of a THP, child maps some pages. We detect
all pages in the parent as shared although some are actually
exclusive.
(2) Parent maps all but some page of a THP, child maps the remainder.
We detect all pages of the THP that the parent maps as shared
although they are all exclusive.
In (1) we wouldn't collapse a THP right now already: no PTE
is writable, because a write fault would have resulted in COW of a
single page and the parent would no longer map all pages of that THP.
For (2) we would have collapsed a THP in the parent so far, now we
wouldn't as long as the child process is still alive: unless the child
process unmaps the remaining THP pages or we decide to split that THP.
Possibly, the child COW'ed many pages, meaning that it's likely that
we can populate a THP for our child first, and then for our parent.
For (2), we are making really bad use of the THP in the first
place (not even mapped completely in at least one process). If the
THP would be completely partially mapped, it would be on the deferred
split queue where we would split it lazily later.
For short-running child processes, we don't particularly care. For
long-running processes, the expectation is that such scenarios are
rather rare: further, a THP might be best placed if most data in the
PMD range is actually written, implying that we'll have to COW more
pages first before khugepaged would collapse it.
To summarize, in the common case, this change is not expected to matter
much. The more common application of khugepaged operates on exclusive
pages, either before fork() or after a child quit.
Can we improve (A)? Yes, if we implement more precise tracking of "mapped
shared" vs. "mapped exclusively", we could get rid of the false negatives
completely.
Can we improve (B)? We could count how many pages of a large folio we map
inside the current page table and detect that we are responsible for most
of the folio mapcount and conclude "as good as exclusive", which might
help in some cases. ... but likely, some other mechanism should detect
that the THP is not a good use in the scenario (not even mapped completely
in a single process) and try splitting that folio lazily etc.
We'll move the folio_test_anon() check before our "shared" check, so we
might get more expressive results for SCAN_EXCEED_SHARED_PTE: this order
of checks now matches the one in __collapse_huge_page_isolate(). Extend
documentation.
Link: https://lkml.kernel.org/r/20240424122630.495788-1-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Yang Shi <yang.shi@linux.alibaba.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
These knobs offer more fine-grained control to userspace than needed and
directly expose/influence kernel implementation; remove them.
For disabling same_filled handling, there is no logical reason to refuse
storing same-filled pages more efficiently and opt for compression.
Scanning pages for patterns may be an argument, but the page contents will
be read into the CPU cache anyway during compression. Also, removing the
same_filled handling code does not move the needle significantly in terms
of performance anyway [1].
For disabling non_same_filled handling, it was added when the compressed
pages in zswap were not being properly charged to memcgs, as workloads
could escape the accounting with compression [2]. This is no longer the
case after commit f4840ccfca25 ("zswap: memcg accounting"), and using
zswap without compression does not make much sense.
[1]https://lore.kernel.org/lkml/CAJD7tkaySFP2hBQw4pnZHJJwe3bMdjJ1t9VC2VJd=khn1_TXvA@mail.gmail.com/
[2]https://lore.kernel.org/lkml/19d5cdee-2868-41bd-83d5-6da75d72e940@maciej.szmigiero.name/
[yosryahmed@google.com: remove same_filled_pages from docs]
Link: https://lkml.kernel.org/r/ZhxFVggdyvCo79jc@google.com
Link: https://lkml.kernel.org/r/20240413022407.785696-5-yosryahmed@google.com
Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Nhat Pham <nphamcs@gmail.com>
Reviewed-by: Chengming Zhou <chengming.zhou@linux.dev>
Cc: "Maciej S. Szmigiero" <mail@maciej.szmigiero.name>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The documentation does not align with the code. In
__do_huge_pmd_anonymous_page(), THP_FAULT_FALLBACK is incremented when
mem_cgroup_charge() fails, despite the allocation succeeding, whereas
THP_FAULT_ALLOC is only incremented after a successful charge.
Link: https://lkml.kernel.org/r/20240412114858.407208-5-21cnbao@gmail.com
Signed-off-by: Barry Song <v-songbaohua@oppo.com>
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Chris Li <chrisl@kernel.org>
Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com>
Cc: Kairui Song <kasong@tencent.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Yosry Ahmed <yosryahmed@google.com>
Cc: Yu Zhao <yuzhao@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
This patch includes documentation for mTHP counters and an ABI file for
sys-kernel-mm-transparent-hugepage, which appears to have been missing for
some time.
[v-songbaohua@oppo.com: fix the name and unexpected indentation]
Link: https://lkml.kernel.org/r/20240415054538.17071-1-21cnbao@gmail.com
Link: https://lkml.kernel.org/r/20240412114858.407208-4-21cnbao@gmail.com
Signed-off-by: Barry Song <v-songbaohua@oppo.com>
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Chris Li <chrisl@kernel.org>
Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com>
Cc: Kairui Song <kasong@tencent.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Yosry Ahmed <yosryahmed@google.com>
Cc: Yu Zhao <yuzhao@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Let's stop talking about page_mapcount().
Link: https://lkml.kernel.org/r/20240409192301.907377-19-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: Hugh Dickins <hughd@google.com>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Richard Chang <richardycc@google.com>
Cc: Rich Felker <dalias@libc.org>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Yin Fengwei <fengwei.yin@intel.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Previously we claimed "pcie_aspm=off" meant that ASPM would be disabled,
which is wrong.
Correct this to say that with "pcie_aspm=off", Linux doesn't touch any ASPM
configuration at all. ASPM may have been enabled by firmware, and that
will be left unchanged. See "aspm_support_enabled".
Link: https://lore.kernel.org/r/20240429191821.691726-1-helgaas@kernel.org
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: David E. Box <david.e.box@linux.intel.com>
|
|
Introducing the field 'el0' to the idreg-override for register
ID_AA64PFR0_EL1. This field is also aliased to the new kernel
command line option 'arm64.no32bit_el0' as a more recognizable
and mnemonic name to disable the execution of 32 bit userspace
applications (i.e. avoid Aarch32 execution state in EL0) from
kernel command line.
Link: https://lore.kernel.org/all/20240207105847.7739-1-andrea.porta@suse.com/
Signed-off-by: Andrea della Porta <andrea.porta@suse.com>
Link: https://lore.kernel.org/r/20240429102833.6426-1-andrea.porta@suse.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Fix spelling and grammar in Docs descriptions
Signed-off-by: Remington Brasga <rbrasga@uci.edu>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20240429225527.2329-1-rbrasga@uci.edu
|
|
Add a command line opt-in option for posted MSI if CONFIG_X86_POSTED_MSI=y.
Also introduce a helper function for testing if posted MSI is supported on
the platform.
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240423174114.526704-12-jacob.jun.pan@linux.intel.com
|