Age | Commit message (Collapse) | Author | Files | Lines |
|
git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux
Pull set_user() update from Christian Brauner:
"This contains a single fix to set_user() which aligns permission
checks with the corresponding fork() codepath. No one involved in this
could come up with a reason for the difference.
A capable caller can already circumvent the check when they fork where
the permission checks are already for the relevant capabilities in
addition to also allowing to exceed nproc when it is the init user.
So apply the same logic to set_user()"
* tag 'kernel.sys.v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
set_user: add capability check when rlimit(RLIMIT_NPROC) exceeds
|
|
Use depends on instead of select for DMA_RESTRICTED_POOL; otherwise it
will make SWIOTLB user configurable and cause compile errors for some
arch (e.g. mips).
Fixes: 0b84e4f8b793 ("swiotlb: Add restricted DMA pool initialization")
Acked-by: Will Deacon <will@kernel.org>
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Claire Chang <tientzu@chromium.org>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
|
Pull nfsd updates from Chuck Lever:
"New features:
- Support for server-side disconnect injection via debugfs
- Protocol definitions for new RPC_AUTH_TLS authentication flavor
Performance improvements:
- Reduce page allocator traffic in the NFSD splice read actor
- Reduce CPU utilization in svcrdma's Send completion handler
Notable bug fixes:
- Stabilize lockd operation when re-exporting NFS mounts
- Fix the use of %.*s in NFSD tracepoints
- Fix /proc/sys/fs/nfs/nsm_use_hostnames"
* tag 'nfsd-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (31 commits)
nfsd: fix crash on LOCKT on reexported NFSv3
nfs: don't allow reexport reclaims
lockd: don't attempt blocking locks on nfs reexports
nfs: don't atempt blocking locks on nfs reexports
Keep read and write fds with each nlm_file
lockd: update nlm_lookup_file reexport comment
nlm: minor refactoring
nlm: minor nlm_lookup_file argument change
lockd: lockd server-side shouldn't set fl_ops
SUNRPC: Add documentation for the fail_sunrpc/ directory
SUNRPC: Server-side disconnect injection
SUNRPC: Move client-side disconnect injection
SUNRPC: Add a /sys/kernel/debug/fail_sunrpc/ directory
svcrdma: xpt_bc_xprt is already clear in __svc_rdma_free()
nfsd4: Fix forced-expiry locking
rpc: fix gss_svc_init cleanup on failure
SUNRPC: Add RPC_AUTH_TLS protocol numbers
lockd: change the proc_handler for nsm_use_hostnames
sysctl: introduce new proc handler proc_dobool
SUNRPC: Fix a NULL pointer deref in trace_svc_stats_latency()
...
|
|
Pull io_uring updates from Jens Axboe:
- cancellation cleanups (Hao, Pavel)
- io-wq accounting cleanup (Hao)
- io_uring submit locking fix (Hao)
- io_uring link handling fixes (Hao)
- fixed file improvements (wangyangbo, Pavel)
- allow updates of linked timeouts like regular timeouts (Pavel)
- IOPOLL fix (Pavel)
- remove batched file get optimization (Pavel)
- improve reference handling (Pavel)
- IRQ task_work batching (Pavel)
- allow pure fixed file, and add support for open/accept (Pavel)
- GFP_ATOMIC RT kernel fix
- multiple CQ ring waiter improvement
- funnel IRQ completions through task_work
- add support for limiting async workers explicitly
- add different clocksource support for timeouts
- io-wq wakeup race fix
- lots of cleanups and improvement (Pavel et al)
* tag 'for-5.15/io_uring-2021-08-30' of git://git.kernel.dk/linux-block: (87 commits)
io-wq: fix wakeup race when adding new work
io-wq: wqe and worker locks no longer need to be IRQ safe
io-wq: check max_worker limits if a worker transitions bound state
io_uring: allow updating linked timeouts
io_uring: keep ltimeouts in a list
io_uring: support CLOCK_BOOTTIME/REALTIME for timeouts
io-wq: provide a way to limit max number of workers
io_uring: add build check for buf_index overflows
io_uring: clarify io_req_task_cancel() locking
io_uring: add task-refs-get helper
io_uring: fix failed linkchain code logic
io_uring: remove redundant req_set_fail()
io_uring: don't free request to slab
io_uring: accept directly into fixed file table
io_uring: hand code io_accept() fd installing
io_uring: openat directly into fixed fd table
net: add accept helper not installing fd
io_uring: fix io_try_cancel_userdata race for iowq
io_uring: IRQ rw completion batching
io_uring: batch task work locking
...
|
|
Pull block driver updates from Jens Axboe:
"Sitting on top of the core block changes, here are the driver changes
for the 5.15 merge window:
- NVMe updates via Christoph:
- suspend improvements for devices with an HMB (Keith Busch)
- handle double completions more gacefull (Sagi Grimberg)
- cleanup the selects for the nvme core code a bit (Sagi Grimberg)
- don't update queue count when failing to set io queues (Ruozhu Li)
- various nvmet connect fixes (Amit Engel)
- cleanup lightnvm leftovers (Keith Busch, me)
- small cleanups (Colin Ian King, Hou Pu)
- add tracing for the Set Features command (Hou Pu)
- CMB sysfs cleanups (Keith Busch)
- add a mutex_destroy call (Keith Busch)
- remove lightnvm subsystem. It's served its purpose and ultimately
led to zoned nvme support, we no longer need it (Christoph)
- revert floppy O_NDELAY fix (Denis)
- nbd fixes (Hou, Pavel, Baokun)
- nbd locking fixes (Tetsuo)
- nbd device removal fixes (Christoph)
- raid10 rcu warning fix (Xiao)
- raid1 write behind fix (Guoqing)
- rnbd fixes (Gioh, Md Haris)
- misc fixes (Colin)"
* tag 'for-5.15/drivers-2021-08-30' of git://git.kernel.dk/linux-block: (42 commits)
Revert "floppy: reintroduce O_NDELAY fix"
raid1: ensure write behind bio has less than BIO_MAX_VECS sectors
md/raid10: Remove unnecessary rcu_dereference in raid10_handle_discard
nbd: remove nbd->destroy_complete
nbd: only return usable devices from nbd_find_unused
nbd: set nbd->index before releasing nbd_index_mutex
nbd: prevent IDR lookups from finding partially initialized devices
nbd: reset NBD to NULL when restarting in nbd_genl_connect
nbd: add missing locking to the nbd_dev_add error path
nvme: remove the unused NVME_NS_* enum
nvme: remove nvm_ndev from ns
nvme: Have NVME_FABRICS select NVME_CORE instead of transport drivers
block: nbd: add sanity check for first_minor
nvmet: check that host sqsize does not exceed ctrl MQES
nvmet: avoid duplicate qid in connect cmd
nvmet: pass back cntlid on successful completion
nvme-rdma: don't update queue count when failing to set io queues
nvme-tcp: don't update queue count when failing to set io queues
nvme-tcp: pair send_mutex init with destroy
nvme: allow user toggling hmb usage
...
|
|
Daniel Borkmann says:
====================
bpf-next 2021-08-31
We've added 116 non-merge commits during the last 17 day(s) which contain
a total of 126 files changed, 6813 insertions(+), 4027 deletions(-).
The main changes are:
1) Add opaque bpf_cookie to perf link which the program can read out again,
to be used in libbpf-based USDT library, from Andrii Nakryiko.
2) Add bpf_task_pt_regs() helper to access userspace pt_regs, from Daniel Xu.
3) Add support for UNIX stream type sockets for BPF sockmap, from Jiang Wang.
4) Allow BPF TCP congestion control progs to call bpf_setsockopt() e.g. to switch
to another congestion control algorithm during init, from Martin KaFai Lau.
5) Extend BPF iterator support for UNIX domain sockets, from Kuniyuki Iwashima.
6) Allow bpf_{set,get}sockopt() calls from setsockopt progs, from Prankur Gupta.
7) Add bpf_get_netns_cookie() helper for BPF_PROG_TYPE_{SOCK_OPS,CGROUP_SOCKOPT}
progs, from Xu Liu and Stanislav Fomichev.
8) Support for __weak typed ksyms in libbpf, from Hao Luo.
9) Shrink struct cgroup_bpf by 504 bytes through refactoring, from Dave Marchevsky.
10) Fix a smatch complaint in verifier's narrow load handling, from Andrey Ignatov.
11) Fix BPF interpreter's tail call count limit, from Daniel Borkmann.
12) Big batch of improvements to BPF selftests, from Magnus Karlsson, Li Zhijian,
Yucong Sun, Yonghong Song, Ilya Leoshkevich, Jussi Maki, Ilya Leoshkevich, others.
13) Another big batch to revamp XDP samples in order to give them consistent look
and feel, from Kumar Kartikeya Dwivedi.
* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (116 commits)
MAINTAINERS: Remove self from powerpc BPF JIT
selftests/bpf: Fix potential unreleased lock
samples: bpf: Fix uninitialized variable in xdp_redirect_cpu
selftests/bpf: Reduce more flakyness in sockmap_listen
bpf: Fix bpf-next builds without CONFIG_BPF_EVENTS
bpf: selftests: Add dctcp fallback test
bpf: selftests: Add connect_to_fd_opts to network_helpers
bpf: selftests: Add sk_state to bpf_tcp_helpers.h
bpf: tcp: Allow bpf-tcp-cc to call bpf_(get|set)sockopt
selftests: xsk: Preface options with opt
selftests: xsk: Make enums lower case
selftests: xsk: Generate packets from specification
selftests: xsk: Generate packet directly in umem
selftests: xsk: Simplify cleanup of ifobjects
selftests: xsk: Decrease sending speed
selftests: xsk: Validate tx stats on tx thread
selftests: xsk: Simplify packet validation in xsk tests
selftests: xsk: Rename worker_* functions that are not thread entry points
selftests: xsk: Disassociate umem size with packets sent
selftests: xsk: Remove end-of-test packet
...
====================
Link: https://lore.kernel.org/r/20210830225618.11634-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer updates from Thomas Gleixner:
"Updates for timekeeping, timers and related drivers:
Core code:
- Cure a couple of correctness issues in the posix CPU timer code to
prevent that the tick dependency for NOHZ full is kept alive for no
reason.
- Avoid expensive double reprogramming of the clockevent device in
hrtimer_start_range_ns().
- Avoid pointless SMP function calls when the clock was set to avoid
disturbing CPUs which do not have any affected timers queued.
- Make the clocksource watchdog test work correctly when CONFIG_HZ is
less than 100.
Drivers:
- Prefer the ARM architected timer over the Exynos timer which is way
more expensive to access.
- Add device tree bindings for new Ingenic SoCs
- The usual improvements and cleanups all over the place"
* tag 'timers-core-2021-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (29 commits)
clocksource: Make clocksource watchdog test safe for slow-HZ systems
dt-bindings: timer: Add ABIs for new Ingenic SoCs
clocksource/drivers/fttmr010: Pass around less pointers
clocksource/drivers/mediatek: Optimize systimer irq clear flow on shutdown
clocksource/drivers/ingenic: Use bitfield macro helpers
clocksource/drivers/sh_cmt: Fix wrong setting if don't request IRQ for clock source channel
dt-bindings: timer: convert rockchip,rk-timer.txt to YAML
clocksource/drivers/exynos_mct: Mark MCT device as CLOCK_EVT_FEAT_PERCPU
clocksource/drivers/exynos_mct: Prioritise Arm arch timer on arm64
hrtimer: Unbreak hrtimer_force_reprogram()
hrtimer: Use raw_cpu_ptr() in clock_was_set()
hrtimer: Avoid more SMP function calls in clock_was_set()
hrtimer: Avoid unnecessary SMP function calls in clock_was_set()
hrtimer: Add bases argument to clock_was_set()
time/timekeeping: Avoid invoking clock_was_set() twice
timekeeping: Distangle resume and clock-was-set events
timerfd: Provide timerfd_resume()
hrtimer: Force clock_was_set() handling for the HIGHRES=n, NOHZ=y case
hrtimer: Ensure timerfd notification for HIGHRES=n
hrtimer: Consolidate reprogramming code
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq updates from Thomas Gleixner:
"Updates to the interrupt core and driver subsystems:
Core changes:
- The usual set of small fixes and improvements all over the place,
but nothing stands out
MSI changes:
- Further consolidation of the PCI/MSI interrupt chip code
- Make MSI sysfs code independent of PCI/MSI and expose the MSI
interrupts of platform devices in the same way as PCI exposes them.
Driver changes:
- Support for ARM GICv3 EPPI partitions
- Treewide conversion to generic_handle_domain_irq() for all chained
interrupt controllers
- Conversion to bitmap_zalloc() throughout the irq chip drivers
- The usual set of small fixes and improvements"
* tag 'irq-core-2021-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (57 commits)
platform-msi: Add ABI to show msi_irqs of platform devices
genirq/msi: Move MSI sysfs handling from PCI to MSI core
genirq/cpuhotplug: Demote debug printk to KERN_DEBUG
irqchip/qcom-pdc: Trim unused levels of the interrupt hierarchy
irqdomain: Export irq_domain_disconnect_hierarchy()
irqchip/gic-v3: Fix priority comparison when non-secure priorities are used
irqchip/apple-aic: Fix irq_disable from within irq handlers
pinctrl/rockchip: drop the gpio related codes
gpio/rockchip: drop irq_gc_lock/irq_gc_unlock for irq set type
gpio/rockchip: support next version gpio controller
gpio/rockchip: use struct rockchip_gpio_regs for gpio controller
gpio/rockchip: add driver for rockchip gpio
dt-bindings: gpio: change items restriction of clock for rockchip,gpio-bank
pinctrl/rockchip: add pinctrl device to gpio bank struct
pinctrl/rockchip: separate struct rockchip_pin_bank to a head file
pinctrl/rockchip: always enable clock for gpio controller
genirq: Fix kernel doc indentation
EDAC/altera: Convert to generic_handle_domain_irq()
powerpc: Bulk conversion to generic_handle_domain_irq()
nios2: Bulk conversion to generic_handle_domain_irq()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking and atomics updates from Thomas Gleixner:
"The regular pile:
- A few improvements to the mutex code
- Documentation updates for atomics to clarify the difference between
cmpxchg() and try_cmpxchg() and to explain the forward progress
expectations.
- Simplification of the atomics fallback generator
- The addition of arch_atomic_long*() variants and generic arch_*()
bitops based on them.
- Add the missing might_sleep() invocations to the down*() operations
of semaphores.
The PREEMPT_RT locking core:
- Scheduler updates to support the state preserving mechanism for
'sleeping' spin- and rwlocks on RT.
This mechanism is carefully preserving the state of the task when
blocking on a 'sleeping' spin- or rwlock and takes regular wake-ups
targeted at the same task into account. The preserved or updated
(via a regular wakeup) state is restored when the lock has been
acquired.
- Restructuring of the rtmutex code so it can be utilized and
extended for the RT specific lock variants.
- Restructuring of the ww_mutex code to allow sharing of the ww_mutex
specific functionality for rtmutex based ww_mutexes.
- Header file disentangling to allow substitution of the regular lock
implementations with the PREEMPT_RT variants without creating an
unmaintainable #ifdef mess.
- Shared base code for the PREEMPT_RT specific rw_semaphore and
rwlock implementations.
Contrary to the regular rw_semaphores and rwlocks the PREEMPT_RT
implementation is writer unfair because it is infeasible to do
priority inheritance on multiple readers. Experience over the years
has shown that real-time workloads are not the typical workloads
which are sensitive to writer starvation.
The alternative solution would be to allow only a single reader
which has been tried and discarded as it is a major bottleneck
especially for mmap_sem. Aside of that many of the writer
starvation critical usage sites have been converted to a writer
side mutex/spinlock and RCU read side protections in the past
decade so that the issue is less prominent than it used to be.
- The actual rtmutex based lock substitutions for PREEMPT_RT enabled
kernels which affect mutex, ww_mutex, rw_semaphore, spinlock_t and
rwlock_t. The spin/rw_lock*() functions disable migration across
the critical section to preserve the existing semantics vs per-CPU
variables.
- Rework of the futex REQUEUE_PI mechanism to handle the case of
early wake-ups which interleave with a re-queue operation to
prevent the situation that a task would be blocked on both the
rtmutex associated to the outer futex and the rtmutex based hash
bucket spinlock.
While this situation cannot happen on !RT enabled kernels the
changes make the underlying concurrency problems easier to
understand in general. As a result the difference between !RT and
RT kernels is reduced to the handling of waiting for the critical
section. !RT kernels simply spin-wait as before and RT kernels
utilize rcu_wait().
- The substitution of local_lock for PREEMPT_RT with a spinlock which
protects the critical section while staying preemptible. The CPU
locality is established by disabling migration.
The underlying concepts of this code have been in use in PREEMPT_RT for
way more than a decade. The code has been refactored several times over
the years and this final incarnation has been optimized once again to be
as non-intrusive as possible, i.e. the RT specific parts are mostly
isolated.
It has been extensively tested in the 5.14-rt patch series and it has
been verified that !RT kernels are not affected by these changes"
* tag 'locking-core-2021-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (92 commits)
locking/rtmutex: Return success on deadlock for ww_mutex waiters
locking/rtmutex: Prevent spurious EDEADLK return caused by ww_mutexes
locking/rtmutex: Dequeue waiter on ww_mutex deadlock
locking/rtmutex: Dont dereference waiter lockless
locking/semaphore: Add might_sleep() to down_*() family
locking/ww_mutex: Initialize waiter.ww_ctx properly
static_call: Update API documentation
locking/local_lock: Add PREEMPT_RT support
locking/spinlock/rt: Prepare for RT local_lock
locking/rtmutex: Add adaptive spinwait mechanism
locking/rtmutex: Implement equal priority lock stealing
preempt: Adjust PREEMPT_LOCK_OFFSET for RT
locking/rtmutex: Prevent lockdep false positive with PI futexes
futex: Prevent requeue_pi() lock nesting issue on RT
futex: Simplify handle_early_requeue_pi_wakeup()
futex: Reorder sanity checks in futex_requeue()
futex: Clarify comment in futex_requeue()
futex: Restructure futex_requeue()
futex: Correct the number of requeued waiters for PI
futex: Remove bogus condition for requeue PI
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull SMP core updates from Thomas Gleixner:
- Replace get/put_online_cpus() in various places. The final removal
will happen shortly before v5.15-rc1 when the rest of the patches
have been merged.
- Add debug code to help the analysis of CPU hotplug failures
- A set of kernel doc updates
* tag 'smp-core-2021-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
mm: Replace deprecated CPU-hotplug functions.
md/raid5: Replace deprecated CPU-hotplug functions.
Documentation: Replace deprecated CPU-hotplug functions.
smp: Fix all kernel-doc warnings
cpu/hotplug: Add debug printks for hotplug callback failures
cpu/hotplug: Use DEVICE_ATTR_*() macro
cpu/hotplug: Eliminate all kernel-doc warnings
cpu/hotplug: Fix kernel doc warnings for __cpuhp_setup_state_cpuslocked()
cpu/hotplug: Fix comment typo
smpboot: Replace deprecated CPU-hotplug functions.
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 perf event updates from Ingo Molnar:
- Add support for Intel Sapphire Rapids server CPU uncore events
- Allow the AMD uncore driver to be built as a module
- Misc cleanups and fixes
* tag 'perf-core-2021-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
perf/x86/amd/ibs: Add bitfield definitions in new <asm/amd-ibs.h> header
perf/amd/uncore: Allow the driver to be built as a module
x86/cpu: Add get_llc_id() helper function
perf/amd/uncore: Clean up header use, use <linux/ include paths instead of <asm/
perf/amd/uncore: Simplify code, use free_percpu()'s built-in check for NULL
perf/hw_breakpoint: Replace deprecated CPU-hotplug functions
perf/x86/intel: Replace deprecated CPU-hotplug functions
perf/x86: Remove unused assignment to pointer 'e'
perf/x86/intel/uncore: Fix IIO cleanup mapping procedure for SNR/ICX
perf/x86/intel/uncore: Support IMC free-running counters on Sapphire Rapids server
perf/x86/intel/uncore: Support IIO free-running counters on Sapphire Rapids server
perf/x86/intel/uncore: Factor out snr_uncore_mmio_map()
perf/x86/intel/uncore: Add alias PMU name
perf/x86/intel/uncore: Add Sapphire Rapids server MDF support
perf/x86/intel/uncore: Add Sapphire Rapids server M3UPI support
perf/x86/intel/uncore: Add Sapphire Rapids server UPI support
perf/x86/intel/uncore: Add Sapphire Rapids server M2M support
perf/x86/intel/uncore: Add Sapphire Rapids server IMC support
perf/x86/intel/uncore: Add Sapphire Rapids server PCU support
perf/x86/intel/uncore: Add Sapphire Rapids server M2PCIe support
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
- The biggest change in this cycle is scheduler support for asymmetric
scheduling affinity, to support the execution of legacy 32-bit tasks
on AArch32 systems that also have 64-bit-only CPUs.
Architectures can fill in this functionality by defining their own
task_cpu_possible_mask(p). When this is done, the scheduler will make
sure the task will only be scheduled on CPUs that support it.
(The actual arm64 specific changes are not part of this tree.)
For other architectures there will be no change in functionality.
- Add cgroup SCHED_IDLE support
- Increase node-distance flexibility & delay determining it until a CPU
is brought online. (This enables platforms where node distance isn't
final until the CPU is only.)
- Deadline scheduler enhancements & fixes
- Misc fixes & cleanups.
* tag 'sched-core-2021-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (27 commits)
eventfd: Make signal recursion protection a task bit
sched/fair: Mark tg_is_idle() an inline in the !CONFIG_FAIR_GROUP_SCHED case
sched: Introduce dl_task_check_affinity() to check proposed affinity
sched: Allow task CPU affinity to be restricted on asymmetric systems
sched: Split the guts of sched_setaffinity() into a helper function
sched: Introduce task_struct::user_cpus_ptr to track requested affinity
sched: Reject CPU affinity changes based on task_cpu_possible_mask()
cpuset: Cleanup cpuset_cpus_allowed_fallback() use in select_fallback_rq()
cpuset: Honour task_cpu_possible_mask() in guarantee_online_cpus()
cpuset: Don't use the cpu_possible_mask as a last resort for cgroup v1
sched: Introduce task_cpu_possible_mask() to limit fallback rq selection
sched: Cgroup SCHED_IDLE support
sched/topology: Skip updating masks for non-online nodes
sched: Replace deprecated CPU-hotplug functions.
sched: Skip priority checks with SCHED_FLAG_KEEP_PARAMS
sched: Fix UCLAMP_FLAG_IDLE setting
sched/deadline: Fix missing clock update in migrate_task_rq_dl()
sched/fair: Avoid a second scan of target in select_idle_cpu
sched/fair: Use prev instead of new target as recent_used_cpu
sched: Don't report SCHED_FLAG_SUGOV in sched_getattr()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Heiko Carstens:
- Improve ftrace code patching so that stop_machine is not required
anymore. This requires a small common code patch acked by Steven
Rostedt:
https://lore.kernel.org/linux-s390/20210730220741.4da6fdf6@oasis.local.home/
- Enable KCSAN for s390. This comes with a small common code change to
fix a compile warning. Acked by Marco Elver:
https://lore.kernel.org/r/20210729142811.1309391-1-hca@linux.ibm.com
- Add KFENCE support for s390. This also comes with a minimal x86 patch
from Marco Elver who said also this can be carried via the s390 tree:
https://lore.kernel.org/linux-s390/YQJdarx6XSUQ1tFZ@elver.google.com/
- More changes to prepare the decompressor for relocation.
- Enable DAT also for CPU restart path.
- Final set of register asm removal patches; leaving only three
locations where needed and sane.
- Add NNPA, Vector-Packed-Decimal-Enhancement Facility 2, PCI MIO
support to hwcaps flags.
- Cleanup hwcaps implementation.
- Add new instructions to in-kernel disassembler.
- Various QDIO cleanups.
- Add SCLP debug feature.
- Various other cleanups and improvements all over the place.
* tag 's390-5.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (105 commits)
s390: remove SCHED_CORE from defconfigs
s390/smp: do not use nodat_stack for secondary CPU start
s390/smp: enable DAT before CPU restart callback is called
s390: update defconfigs
s390/ap: fix state machine hang after failure to enable irq
KVM: s390: generate kvm hypercall functions
s390/sclp: add tracing of SCLP interactions
s390/debug: add early tracing support
s390/debug: fix debug area life cycle
s390/debug: keep debug data on resize
s390/diag: make restart_part2 a local label
s390/mm,pageattr: fix walk_pte_level() early exit
s390: fix typo in linker script
s390: remove do_signal() prototype and do_notify_resume() function
s390/crypto: fix all kernel-doc warnings in vfio_ap_ops.c
s390/pci: improve DMA translation init and exit
s390/pci: simplify CLP List PCI handling
s390/pci: handle FH state mismatch only on disable
s390/pci: fix misleading rc in clp_set_pci_fn()
s390/boot: factor out offset_vmlinux_info() function
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto updates from Herbert Xu:
"Algorithms:
- Add AES-NI/AVX/x86_64 implementation of SM4.
Drivers:
- Add Arm SMCCC TRNG based driver"
[ And obviously a lot of random fixes and updates - Linus]
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (84 commits)
crypto: sha512 - remove imaginary and mystifying clearing of variables
crypto: aesni - xts_crypt() return if walk.nbytes is 0
padata: Remove repeated verbose license text
crypto: ccp - Add support for new CCP/PSP device ID
crypto: x86/sm4 - add AES-NI/AVX2/x86_64 implementation
crypto: x86/sm4 - export reusable AESNI/AVX functions
crypto: rmd320 - remove rmd320 in Makefile
crypto: skcipher - in_irq() cleanup
crypto: hisilicon - check _PS0 and _PR0 method
crypto: hisilicon - change parameter passing of debugfs function
crypto: hisilicon - support runtime PM for accelerator device
crypto: hisilicon - add runtime PM ops
crypto: hisilicon - using 'debugfs_create_file' instead of 'debugfs_create_regset32'
crypto: tcrypt - add GCM/CCM mode test for SM4 algorithm
crypto: testmgr - Add GCM/CCM mode test of SM4 algorithm
crypto: tcrypt - Fix missing return value check
crypto: hisilicon/sec - modify the hardware endian configuration
crypto: hisilicon/sec - fix the abnormal exiting process
crypto: qat - store vf.compatible flag
crypto: qat - do not export adf_iov_putmsg()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu
Pull RCU updates from Paul McKenney:
"RCU changes for this cycle were:
- Documentation updates
- Miscellaneous fixes
- Offloaded-callbacks updates
- Updates to the nolibc library
- Tasks-RCU updates
- In-kernel torture-test updates
- Torture-test scripting, perhaps most notably the pinning of
torture-test guest OSes so as to force differences in memory
latency. For example, in a two-socket system, a four-CPU guest OS
will have one pair of its CPUs pinned to threads in a single core
on one socket and the other pair pinned to threads in a single core
on the other socket. This approach proved able to force race
conditions that earlier testing missed. Some of these race
conditions are still being tracked down"
* 'core-rcu.2021.08.28a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (61 commits)
torture: Replace deprecated CPU-hotplug functions.
rcu: Replace deprecated CPU-hotplug functions
rcu: Print human-readable message for schedule() in RCU reader
rcu: Explain why rcu_all_qs() is a stub in preemptible TREE RCU
rcu: Use per_cpu_ptr to get the pointer of per_cpu variable
rcu: Remove useless "ret" update in rcu_gp_fqs_loop()
rcu: Mark accesses in tree_stall.h
rcu: Make rcu_gp_init() and rcu_gp_fqs_loop noinline to conserve stack
rcu: Mark lockless ->qsmask read in rcu_check_boost_fail()
srcutiny: Mark read-side data races
rcu: Start timing stall repetitions after warning complete
rcu: Do not disable GP stall detection in rcu_cpu_stall_reset()
rcu/tree: Handle VM stoppage in stall detection
rculist: Unify documentation about missing list_empty_rcu()
rcu: Mark accesses to ->rcu_read_lock_nesting
rcu: Weaken ->dynticks accesses and updates
rcu: Remove special bit at the bottom of the ->dynticks counter
rcu: Fix stall-warning deadlock due to non-release of rcu_node ->lock
rcu: Fix to include first blocked task in stall warning
torture: Make kvm-test-1-run-qemu.sh check for reboot loops
...
|
|
As done before in commit cb4a31675270 ("cgroup: use bitmask to filter
for_each_subsys"), avoid compiler warnings for the pathological case of
having no subsystems (i.e. CGROUP_SUBSYS_COUNT == 0). This condition is
hit for the arm multi_v7_defconfig config under -Wzero-length-bounds:
In file included from ./arch/arm/include/generated/asm/rwonce.h:1,
from include/linux/compiler.h:264,
from include/uapi/linux/swab.h:6,
from include/linux/swab.h:5,
from arch/arm/include/asm/opcodes.h:86,
from arch/arm/include/asm/bug.h:7,
from include/linux/bug.h:5,
from include/linux/thread_info.h:13,
from include/asm-generic/current.h:5,
from ./arch/arm/include/generated/asm/current.h:1,
from include/linux/sched.h:12,
from include/linux/cgroup.h:12,
from kernel/cgroup/cgroup-internal.h:5,
from kernel/cgroup/cgroup.c:31:
kernel/cgroup/cgroup.c: In function 'of_css':
kernel/cgroup/cgroup.c:651:42: warning: array subscript '<unknown>' is outside the bounds of an
interior zero-length array 'struct cgroup_subsys_state *[0]' [-Wzero-length-bounds]
651 | return rcu_dereference_raw(cgrp->subsys[cft->ss->id]);
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Tejun Heo <tj@kernel.org>
Cc: Zefan Li <lizefan.x@bytedance.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: cgroups@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
* pm-pci:
PCI: PM: Enable PME if it can be signaled from D3cold
PCI: PM: Avoid forcing PCI_D0 for wakeup reasons inconsistently
PCI: Use pci_update_current_state() in pci_enable_device_flags()
* pm-sleep:
PM: sleep: unmark 'state' functions as kernel-doc
PM: sleep: check RTC features instead of ops in suspend_test
PM: sleep: s2idle: Replace deprecated CPU-hotplug functions
* pm-domains:
PM: domains: Fix domain attach for CONFIG_PM_OPP=n
arm64: dts: sc7180: Add required-opps for i2c
PM: domains: Add support for 'required-opps' to set default perf state
opp: Don't print an error if required-opps is missing
* powercap:
powercap: Add Power Limit4 support for Alder Lake SoC
powercap: intel_rapl: Replace deprecated CPU-hotplug functions
|
|
* pm-cpufreq:
cpufreq: intel_pstate: Process HWP Guaranteed change notification
thermal: intel: Allow processing of HWP interrupt
cpufreq: schedutil: Use kobject release() method to free sugov_tunables
cpufreq: Replace deprecated CPU-hotplug functions
* pm-cpu:
notifier: Remove atomic_notifier_call_chain_robust()
PM: cpu: Make notifier chain use a raw_spinlock_t
* pm-em:
PM: EM: Increase energy calculation precision
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull fsnotify updates from Jan Kara:
"fsnotify speedups when notification actually isn't used and support
for identifying processes which caused fanotify events through pidfd
instead of normal pid"
* tag 'fsnotify_for_v5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
fsnotify: optimize the case of no marks of any type
fsnotify: count all objects with attached connectors
fsnotify: count s_fsnotify_inode_refs for attached connectors
fsnotify: replace igrab() with ihold() on attach connector
fanotify: add pidfd support to the fanotify API
fanotify: introduce a generic info record copying helper
fanotify: minor cosmetic adjustments to fid labels
kernel/pid.c: implement additional checks upon pidfd_create() parameters
kernel/pid.c: remove static qualifier from pidfd_create()
|
|
|
|
|
|
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/core
Pull irqchip updates from Marc Zyngier:
- API updates:
- Treewide conversion to generic_handle_domain_irq() for anything
that looks like a chained interrupt controller
- Update the irqdomain documentation
- Use of bitmap_zalloc() throughout the tree
- New functionalities:
- Support for GICv3 EPPI partitions
- Fixes:
- Qualcomm PDC hierarchy fixes
- Yet another priority decoding fix for the GICv3 pseudo-NMIs
- Fix the apple-aic driver irq_eoi() callback to always unmask
the interrupt
- Properly handle edge interrupts on loongson-pch-pic
- Let the mtk-sysirq driver advertise IRQCHIP_SKIP_SET_WAKE
Link: https://lore.kernel.org/r/20210828121013.2647964-1-maz@kernel.org
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Borislav Petkov:
- Have get_push_task() check whether current has migration disabled and
thus avoid useless invocations of the migration thread
- Rework initialization flow so that all rq->core's are initialized,
even of CPUs which have not been onlined yet, so that iterating over
them all works as expected
* tag 'sched_urgent_for_v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched: Fix get_push_task() vs migrate_disable()
sched: Fix Core-wide rq->lock for uninitialized CPUs
|
|
The clocksource watchdog test sets a local JIFFIES_SHIFT macro and assumes
that HZ is >= 100. For smaller HZ values this shift value is too large and
causes undefined behaviour.
Move the HZ-based definitions of JIFFIES_SHIFT from kernel/time/jiffies.c
to kernel/time/tick-internal.h so the clocksource watchdog test can utilize
them, which makes it work correctly with all HZ values.
[ tglx: Resolved conflicts and massaged changelog ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/lkml/20210812000133.GA402890@paulmck-ThinkPad-P17-Gen-1/
|
|
If CONFIG_MODULES is n, we got this:
kernel/printk/index.c:146:13: warning: ‘pi_remove_file’ defined but not used [-Wunused-function]
Move it inside #ifdef block to fix this warning.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20210804130105.18732-1-yuehaibing@huawei.com
|
|
ww_mutexes can legitimately cause a deadlock situation in the lock graph
which is resolved afterwards by the wait/wound mechanics. The rtmutex chain
walk can detect such a deadlock and returns EDEADLK which in turn skips the
wait/wound mechanism and returns EDEADLK to the caller. That's wrong
because both lock chains might get EDEADLK or the wrong waiter would back
out.
Detect that situation and return 'success' in case that the waiter which
initiated the chain walk is a ww_mutex with context. This allows the
wait/wound mechanics to resolve the situation according to the rules.
[ tglx: Split it apart and added changelog ]
Reported-by: Sebastian Siewior <bigeasy@linutronix.de>
Fixes: add461325ec5 ("locking/rtmutex: Extend the rtmutex core to support ww_mutex")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/YSeWjCHoK4v5OcOt@hirez.programming.kicks-ass.net
|
|
rtmutex based ww_mutexes can legitimately create a cycle in the lock graph
which can be observed by a blocker which didn't cause the problem:
P1: A, ww_A, ww_B
P2: ww_B, ww_A
P3: A
P3 might therefore be trapped in the ww_mutex induced cycle and run into
the lock depth limitation of rt_mutex_adjust_prio_chain() which returns
-EDEADLK to the caller.
Disable the deadlock detection walk when the chain walk observes a
ww_mutex to prevent this looping.
[ tglx: Split it apart and added changelog ]
Reported-by: Sebastian Siewior <bigeasy@linutronix.de>
Fixes: add461325ec5 ("locking/rtmutex: Extend the rtmutex core to support ww_mutex")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/YSeWjCHoK4v5OcOt@hirez.programming.kicks-ass.net
|
|
remove it because SPDX-License-Identifier is already used
Signed-off-by: Cai Huoqing <caihuoqing@baidu.com>
Acked-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
drivers/net/wwan/mhi_wwan_mbim.c - drop the extra arg.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Replace get_nr_threads with atomic_read(¤t->signal->live) as
that is a more accurate number that is decremented sooner.
Acked-by: Kees Cook <keescook@chromium.org>
Link: https://lkml.kernel.org/r/87lf6z6qbd.fsf_-_@disp2133
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Networking fixes, including fixes from can and bpf.
Closing three hw-dependent regressions. Any fixes of note are in the
'old code' category. Nothing blocking release from our perspective.
Current release - regressions:
- stmmac: revert "stmmac: align RX buffers"
- usb: asix: ax88772: move embedded PHY detection as early as
possible
- usb: asix: do not call phy_disconnect() for ax88178
- Revert "net: really fix the build...", from Kalle to fix QCA6390
Current release - new code bugs:
- phy: mediatek: add the missing suspend/resume callbacks
Previous releases - regressions:
- qrtr: fix another OOB Read in qrtr_endpoint_post
- stmmac: dwmac-rk: fix unbalanced pm_runtime_enable warnings
Previous releases - always broken:
- inet: use siphash in exception handling
- ip_gre: add validation for csum_start
- bpf: fix ringbuf helper function compatibility
- rtnetlink: return correct error on changing device netns
- e1000e: do not try to recover the NVM checksum on Tiger Lake"
* tag 'net-5.14-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (43 commits)
Revert "net: really fix the build..."
net: hns3: fix get wrong pfc_en when query PFC configuration
net: hns3: fix GRO configuration error after reset
net: hns3: change the method of getting cmd index in debugfs
net: hns3: fix duplicate node in VLAN list
net: hns3: fix speed unknown issue in bond 4
net: hns3: add waiting time before cmdq memory is released
net: hns3: clear hardware resource when loading driver
net: fix NULL pointer reference in cipso_v4_doi_free
rtnetlink: Return correct error on changing device netns
net: dsa: hellcreek: Adjust schedule look ahead window
net: dsa: hellcreek: Fix incorrect setting of GCL
cxgb4: dont touch blocked freelist bitmap after free
ipv4: use siphash instead of Jenkins in fnhe_hashfun()
ipv6: use siphash in rt6_exception_hash()
can: usb: esd_usb2: esd_usb2_rx_event(): fix the interchange of the CAN RX and TX error counters
net: usb: asix: ax88772: fix boolconv.cocci warnings
net/sched: ets: fix crash when flipping from 'strict' to 'quantum'
qede: Fix memset corruption
net: stmmac: fix kernel panic due to NULL pointer dereference of buf->xdp
...
|
|
push_rt_task() attempts to move the currently running task away if the
next runnable task has migration disabled and therefore is pinned on the
current CPU.
The current task is retrieved via get_push_task() which only checks for
nr_cpus_allowed == 1, but does not check whether the task has migration
disabled and therefore cannot be moved either. The consequence is a
pointless invocation of the migration thread which correctly observes
that the task cannot be moved.
Return NULL if the task has migration disabled and cannot be moved to
another CPU.
Fixes: a7c81556ec4d3 ("sched: Fix migrate_disable() vs rt/dl balancing")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210826133738.yiotqbtdaxzjsnfj@linutronix.de
|
|
Factor out force_sig_seccomp from the seccomp signal generation and
place it in kernel/signal.c. The function force_sig_seccomp takes a
parameter force_coredump to indicate that the sigaction field should
be reset to SIGDFL so that a coredump will be generated when the
signal is delivered.
force_sig_seccomp is then used to replace both seccomp_send_sigsys
and seccomp_init_siginfo.
force_sig_info_to_task gains an extra parameter to force using
the default signal action.
With this change seccomp is no longer a special case and there
becomes exactly one place do_coredump is called from.
Further it no longer becomes necessary for __seccomp_filter
to call do_group_exit.
Acked-by: Kees Cook <keescook@chromium.org>
Link: https://lkml.kernel.org/r/87r1gr6qc4.fsf_-_@disp2133
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
|
It's not actually used in the !CONFIG_FAIR_GROUP_SCHED case:
kernel/sched/fair.c:488:12: warning: ‘tg_is_idle’ defined but not used [-Wunused-function]
Keep around a placeholder nevertheless, for API completeness. Mark it inline,
so the compiler doesn't think it must be used.
Fixes: 304000390f88: ("sched: Cgroup SCHED_IDLE support")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Josh Don <joshdon@google.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
|
|
The functions get_online_cpus() and put_online_cpus() have been
deprecated during the CPU hotplug rework. They map directly to
cpus_read_lock() and cpus_read_unlock().
Replace deprecated CPU-hotplug functions with the official version.
The behavior remains unchanged.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20210803141621.780504-12-bigeasy@linutronix.de
|
|
This commit fixes linker errors along the lines of:
s390-linux-ld: task_iter.c:(.init.text+0xa4): undefined reference to `btf_task_struct_ids'`
Fix by defining btf_task_struct_ids unconditionally in kernel/bpf/btf.c
since there exists code that unconditionally uses btf_task_struct_ids.
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/05d94748d9f4b3eecedc4fddd6875418a396e23c.1629942444.git.dxu@dxuuu.xyz
|
|
This patch allows the bpf-tcp-cc to call bpf_setsockopt. One use
case is to allow a bpf-tcp-cc switching to another cc during init().
For example, when the tcp flow is not ecn ready, the bpf_dctcp
can switch to another cc by calling setsockopt(TCP_CONGESTION).
During setsockopt(TCP_CONGESTION), the new tcp-cc's init() will be
called and this could cause a recursion but it is stopped by the
current trampoline's logic (in the prog->active counter).
While retiring a bpf-tcp-cc (e.g. in tcp_v[46]_destroy_sock()),
the tcp stack calls bpf-tcp-cc's release(). To avoid the retiring
bpf-tcp-cc making further changes to the sk, bpf_setsockopt is not
available to the bpf-tcp-cc's release(). This will avoid release()
making setsockopt() call that will potentially allocate new resources.
Although the bpf-tcp-cc already has a more powerful way to read tcp_sock
from the PTR_TO_BTF_ID, it is usually expected that bpf_getsockopt and
bpf_setsockopt are available together. Thus, bpf_getsockopt() is also
added to all tcp_congestion_ops except release().
When the old bpf-tcp-cc is calling setsockopt(TCP_CONGESTION)
to switch to a new cc, the old bpf-tcp-cc will be released by
bpf_struct_ops_put(). Thus, this patch also puts the bpf_struct_ops_map
after a rcu grace period because the trampoline's image cannot be freed
while the old bpf-tcp-cc is still running.
bpf-tcp-cc can only access icsk_ca_priv as SCALAR. All kernel's
tcp-cc is also accessing the icsk_ca_priv as SCALAR. The size
of icsk_ca_priv has already been raised a few times to avoid
extra kmalloc and memory referencing. The only exception is the
kernel's tcp_cdg.c that stores a kmalloc()-ed pointer in icsk_ca_priv.
To avoid the old bpf-tcp-cc accidentally overriding this tcp_cdg's pointer
value stored in icsk_ca_priv after switching and without over-complicating
the bpf's verifier for this one exception in tcp_cdg, this patch does not
allow switching to tcp_cdg. If there is a need, bpf_tcp_cdg can be
implemented and then use the bpf_sk_storage as the extended storage.
bpf_sk_setsockopt proto has only been recently added and used
in bpf-sockopt and bpf-iter-tcp, so impose the tcp_cdg limitation in the
same proto instead of adding a new proto specifically for bpf-tcp-cc.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210824173007.3976921-1-kafai@fb.com
|
|
The motivation behind this helper is to access userspace pt_regs in a
kprobe handler.
uprobe's ctx is the userspace pt_regs. kprobe's ctx is the kernelspace
pt_regs. bpf_task_pt_regs() allows accessing userspace pt_regs in a
kprobe handler. The final case (kernelspace pt_regs in uprobe) is
pretty rare (usermode helper) so I think that can be solved later if
necessary.
More concretely, this helper is useful in doing BPF-based DWARF stack
unwinding. Currently the kernel can only do framepointer based stack
unwinds for userspace code. This is because the DWARF state machines are
too fragile to be computed in kernelspace [0]. The idea behind
DWARF-based stack unwinds w/ BPF is to copy a chunk of the userspace
stack (while in prog context) and send it up to userspace for unwinding
(probably with libunwind) [1]. This would effectively enable profiling
applications with -fomit-frame-pointer using kprobes and uprobes.
[0]: https://lkml.org/lkml/2012/2/10/356
[1]: https://github.com/danobi/bpf-dwarf-walk
Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/e2718ced2d51ef4268590ab8562962438ab82815.1629772842.git.dxu@dxuuu.xyz
|
|
bpf_get_current_task() is already supported so it's natural to also
include the _btf() variant for btf-powered helpers.
This is required for non-tracing progs to use bpf_task_pt_regs() in the
next commit.
Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/f99870ed5f834c9803d73b3476f8272b1bb987c0.1629772842.git.dxu@dxuuu.xyz
|
|
No need to have it defined 5 times. Once is enough.
Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/6dcefa5bed26fe1226f26683f36819bb53ec19a2.1629772842.git.dxu@dxuuu.xyz
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull ucount fixes from Eric Biederman:
"This branch fixes a regression that made it impossible to increase
rlimits that had been converted to the ucount infrastructure, and also
fixes a reference counting bug where the reference was not incremented
soon enough.
The fixes are trivial and the bugs have been encountered in the wild,
and the fixes have been tested"
* 'for-v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
ucounts: Increase ucounts reference counter before the security hook
ucounts: Fix regression preventing increasing of rlimits in init_user_ns
|
|
With the introduction of ee9707e8593d ("cgroup/cpuset: Enable memory
migration for cpuset v2") attaching a process to a different cgroup will
trigger a memory migration regardless of whether it's really needed.
Memory migration is an expensive operation, so bypass it if the
nodemasks passed to cpuset_migrate_mm() are equal.
Note that we're not only avoiding the migration work itself, but also a
call to lru_cache_disable(), which triggers and flushes an LRU drain
work on every online CPU.
Signed-off-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
The rt_mutex based ww_mutex variant queues the new waiter first in the
lock's rbtree before evaluating the ww_mutex specific conditions which
might decide that the waiter should back out. This check and conditional
exit happens before the waiter is enqueued into the PI chain.
The failure handling at the call site assumes that the waiter, if it is the
top most waiter on the lock, is queued in the PI chain and then proceeds to
adjust the unmodified PI chain, which results in RB tree corruption.
Dequeue the waiter from the lock waiter list in the ww_mutex error exit
path to prevent this.
Fixes: add461325ec5 ("locking/rtmutex: Extend the rtmutex core to support ww_mutex")
Reported-by: Sebastian Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210825102454.042280541@linutronix.de
|
|
The new rt_mutex_spin_on_onwer() loop checks whether the spinning waiter is
still the top waiter on the lock by utilizing rt_mutex_top_waiter(), which
is broken because that function contains a sanity check which dereferences
the top waiter pointer to check whether the waiter belongs to the
lock. That's wrong in the lockless spinwait case:
CPU 0 CPU 1
rt_mutex_lock(lock) rt_mutex_lock(lock);
queue(waiter0)
waiter0 == rt_mutex_top_waiter(lock)
rt_mutex_spin_on_onwer(lock, waiter0) { queue(waiter1)
waiter1 == rt_mutex_top_waiter(lock)
...
top_waiter = rt_mutex_top_waiter(lock)
leftmost = rb_first_cached(&lock->waiters);
-> signal
dequeue(waiter1)
destroy(waiter1)
w = rb_entry(leftmost, ....)
BUG_ON(w->lock != lock) <- UAF
The BUG_ON() is correct for the case where the caller holds lock->wait_lock
which guarantees that the leftmost waiter entry cannot vanish. For the
lockless spinwait case it's broken.
Create a new helper function which avoids the pointer dereference and just
compares the leftmost entry pointer with current's waiter pointer to
validate that currrent is still elegible for spinning.
Fixes: 992caf7f1724 ("locking/rtmutex: Add adaptive spinwait mechanism")
Reported-by: Sebastian Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210825102453.981720644@linutronix.de
|
|
AUDIT_TRIM is expected to be idempotent, but multiple executions resulted
in a refcount underflow and use-after-free.
git bisect fingered commit fb041bb7c0a9 ("locking/refcount: Consolidate
implementations of refcount_t") but this patch with its more thorough
checking that wasn't in the x86 assembly code merely exposed a previously
existing tree refcount imbalance in the case of tree trimming code that
was refactored with prune_one() to remove a tree introduced in
commit 8432c7006297 ("audit: Simplify locking around untag_chunk()")
Move the put_tree() to cover only the prune_one() case.
Passes audit-testsuite and 3 passes of "auditctl -t" with at least one
directory watch.
Cc: Jan Kara <jack@suse.cz>
Cc: Will Deacon <will@kernel.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Seiji Nishikawa <snishika@redhat.com>
Cc: stable@vger.kernel.org
Fixes: 8432c7006297 ("audit: Simplify locking around untag_chunk()")
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
[PM: reformatted/cleaned-up the commit description]
Signed-off-by: Paul Moore <paul@paul-moore.com>
|
|
Fix a verifier bug found by smatch static checker in [0].
This problem has never been seen in prod to my best knowledge. Fixing it
still seems to be a good idea since it's hard to say for sure whether
it's possible or not to have a scenario where a combination of
convert_ctx_access() and a narrow load would lead to an out of bound
write.
When narrow load is handled, one or two new instructions are added to
insn_buf array, but before it was only checked that
cnt >= ARRAY_SIZE(insn_buf)
And it's safe to add a new instruction to insn_buf[cnt++] only once. The
second try will lead to out of bound write. And this is what can happen
if `shift` is set.
Fix it by making sure that if the BPF_RSH instruction has to be added in
addition to BPF_AND then there is enough space for two more instructions
in insn_buf.
The full report [0] is below:
kernel/bpf/verifier.c:12304 convert_ctx_accesses() warn: offset 'cnt' incremented past end of array
kernel/bpf/verifier.c:12311 convert_ctx_accesses() warn: offset 'cnt' incremented past end of array
kernel/bpf/verifier.c
12282
12283 insn->off = off & ~(size_default - 1);
12284 insn->code = BPF_LDX | BPF_MEM | size_code;
12285 }
12286
12287 target_size = 0;
12288 cnt = convert_ctx_access(type, insn, insn_buf, env->prog,
12289 &target_size);
12290 if (cnt == 0 || cnt >= ARRAY_SIZE(insn_buf) ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^
Bounds check.
12291 (ctx_field_size && !target_size)) {
12292 verbose(env, "bpf verifier is misconfigured\n");
12293 return -EINVAL;
12294 }
12295
12296 if (is_narrower_load && size < target_size) {
12297 u8 shift = bpf_ctx_narrow_access_offset(
12298 off, size, size_default) * 8;
12299 if (ctx_field_size <= 4) {
12300 if (shift)
12301 insn_buf[cnt++] = BPF_ALU32_IMM(BPF_RSH,
^^^^^
increment beyond end of array
12302 insn->dst_reg,
12303 shift);
--> 12304 insn_buf[cnt++] = BPF_ALU32_IMM(BPF_AND, insn->dst_reg,
^^^^^
out of bounds write
12305 (1 << size * 8) - 1);
12306 } else {
12307 if (shift)
12308 insn_buf[cnt++] = BPF_ALU64_IMM(BPF_RSH,
12309 insn->dst_reg,
12310 shift);
12311 insn_buf[cnt++] = BPF_ALU64_IMM(BPF_AND, insn->dst_reg,
^^^^^^^^^^^^^^^
Same.
12312 (1ULL << size * 8) - 1);
12313 }
12314 }
12315
12316 new_prog = bpf_patch_insn_data(env, i + delta, insn_buf, cnt);
12317 if (!new_prog)
12318 return -ENOMEM;
12319
12320 delta += cnt - 1;
12321
12322 /* keep walking new program and skip insns we just inserted */
12323 env->prog = new_prog;
12324 insn = new_prog->insnsi + i + delta;
12325 }
12326
12327 return 0;
12328 }
[0] https://lore.kernel.org/bpf/20210817050843.GA21456@kili/
v1->v2:
- clarify that problem was only seen by static checker but not in prod;
Fixes: 46f53a65d2de ("bpf: Allow narrow loads with offset > 0")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210820163935.1902398-1-rdna@fb.com
|
|
Move PCI's MSI sysfs code to the irq core so that other busses such as
platform can reuse it.
Signed-off-by: Barry Song <song.bao.hua@hisilicon.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210813035628.6844-2-21cnbao@gmail.com
|
|
This sort of information is only generally useful when debugging.
No need to have these sprinkled through the kernel log otherwise.
Real world problem:
During pre-release testing these have an affect on performance on
real products. To the point where so much logging builds up, that
it sets off the watchdog(s) on some high profile consumer devices.
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210816134817.1503661-1-lee.jones@linaro.org
|
|
Add an enum (cgroup_bpf_attach_type) containing only valid cgroup_bpf
attach types and a function to map bpf_attach_type values to the new
enum. Inspired by netns_bpf_attach_type.
Then, migrate cgroup_bpf to use cgroup_bpf_attach_type wherever
possible. Functionality is unchanged as attach_type_to_prog_type
switches in bpf/syscall.c were preventing non-cgroup programs from
making use of the invalid cgroup_bpf array slots.
As a result struct cgroup_bpf uses 504 fewer bytes relative to when its
arrays were sized using MAX_BPF_ATTACH_TYPE.
bpf_cgroup_storage is notably not migrated as struct
bpf_cgroup_storage_key is part of uapi and contains a bpf_attach_type
member which is not meant to be opaque. Similarly, bpf_cgroup_link
continues to report its bpf_attach_type member to userspace via fdinfo
and bpf_link_info.
To ease disambiguation, bpf_attach_type variables are renamed from
'type' to 'atype' when changed to cgroup_bpf_attach_type.
Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210819092420.1984861-2-davemarchevsky@fb.com
|