Age | Commit message (Collapse) | Author | Files | Lines |
|
The commit d583d360a620 ("psi: Fix psi state corruption when schedule()
races with cgroup move") fixed a race problem by making cgroup_move_task()
use task->psi_flags instead of looking at the scheduler state.
We can extend task->psi_flags usage to CPU migration, which should be
a minor optimization for performance and code simplicity.
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Link: https://lore.kernel.org/r/20220926081931.45420-1-zhouchengming@bytedance.com
|
|
Psi polling mechanism is trying to minimize the number of wakeups to
run psi_poll_work and is currently relying on timer_pending() to detect
when this work is already scheduled. This provides a window of opportunity
for psi_group_change to schedule an immediate psi_poll_work after
poll_timer_fn got called but before psi_poll_work could reschedule itself.
Below is the depiction of this entire window:
poll_timer_fn
wake_up_interruptible(&group->poll_wait);
psi_poll_worker
wait_event_interruptible(group->poll_wait, ...)
psi_poll_work
psi_schedule_poll_work
if (timer_pending(&group->poll_timer)) return;
...
mod_timer(&group->poll_timer, jiffies + delay);
Prior to 461daba06bdc we used to rely on poll_scheduled atomic which was
reset and set back inside psi_poll_work and therefore this race window
was much smaller.
The larger window causes increased number of wakeups and our partners
report visible power regression of ~10mA after applying 461daba06bdc.
Bring back the poll_scheduled atomic and make this race window even
narrower by resetting poll_scheduled only when we reach polling expiration
time. This does not completely eliminate the possibility of extra wakeups
caused by a race with psi_group_change however it will limit it to the
worst case scenario of one extra wakeup per every tracking window (0.5s
in the worst case).
This patch also ensures correct ordering between clearing poll_scheduled
flag and obtaining changed_states using memory barrier. Correct ordering
between updating changed_states and setting poll_scheduled is ensured by
atomic_xchg operation.
By tracing the number of immediate rescheduling attempts performed by
psi_group_change and the number of these attempts being blocked due to
psi monitor being already active, we can assess the effects of this change:
Before the patch:
Run#1 Run#2 Run#3
Immediate reschedules attempted: 684365 1385156 1261240
Immediate reschedules blocked: 682846 1381654 1258682
Immediate reschedules (delta): 1519 3502 2558
Immediate reschedules (% of attempted): 0.22% 0.25% 0.20%
After the patch:
Run#1 Run#2 Run#3
Immediate reschedules attempted: 882244 770298 426218
Immediate reschedules blocked: 881996 769796 426074
Immediate reschedules (delta): 248 502 144
Immediate reschedules (% of attempted): 0.03% 0.07% 0.03%
The number of non-blocked immediate reschedules dropped from 0.22-0.25%
to 0.03-0.07%. The drop is attributed to the decrease in the race window
size and the fact that we allow this race only when psi monitors reach
polling window expiration time.
Fixes: 461daba06bdc ("psi: eliminate kthread_worker from psi trigger scheduling mechanism")
Reported-by: Kathleen Chang <yt.chang@mediatek.com>
Reported-by: Wenju Xu <wenju.xu@mediatek.com>
Reported-by: Jonathan Chen <jonathan.jmchen@mediatek.com>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Chengming Zhou <zhouchengming@bytedance.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Tested-by: SH Chen <show-hong.chen@mediatek.com>
Link: https://lore.kernel.org/r/20221028194541.813985-1-surenb@google.com
|
|
Pavan reported a problem that PSI avgs_work idle shutoff is not
working at all. Because PSI_NONIDLE condition would be observed in
psi_avgs_work()->collect_percpu_times()->get_recent_times() even if
only the kworker running avgs_work on the CPU.
Although commit 1b69ac6b40eb ("psi: fix aggregation idle shut-off")
avoided the ping-pong wake problem when the worker sleep, psi_avgs_work()
still will always re-arm the avgs_work, so shutoff is not working.
This patch changes to use PSI_STATE_RESCHEDULE to flag whether to
re-arm avgs_work in get_recent_times(). For the current CPU, we re-arm
avgs_work only when (NR_RUNNING > 1 || NR_IOWAIT > 0 || NR_MEMSTALL > 0),
for other CPUs we can just check PSI_NONIDLE delta. The new flag
is only used in psi_avgs_work(), so we check in get_recent_times()
that current_work() is avgs_work.
One potential problem is that the brief period of non-idle time
incurred between the aggregation run and the kworker's dequeue will
be stranded in the per-cpu buckets until avgs_work run next time.
The buckets can hold 4s worth of time, and future activity will wake
the avgs_work with a 2s delay, giving us 2s worth of data we can leave
behind when shut off the avgs_work. If the kworker run other works after
avgs_work shut off and doesn't have any scheduler activities for 2s,
this maybe a problem.
Reported-by: Pavan Kondeti <quic_pkondeti@quicinc.com>
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Suren Baghdasaryan <surenb@google.com>
Tested-by: Chengming Zhou <zhouchengming@bytedance.com>
Link: https://lore.kernel.org/r/20221014110551.22695-1-zhouchengming@bytedance.com
|
|
When a pending event exists and growth is less than the threshold, the
current logic is to skip this trigger without generating event. However,
from e6df4ead85d9 ("psi: fix possible trigger missing in the window"),
our purpose is to generate event as long as pending event exists and the
rate meets the limit, no matter what growth is.
This patch handles this case properly.
Fixes: e6df4ead85d9 ("psi: fix possible trigger missing in the window")
Signed-off-by: Hao Lee <haolee.swjtu@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Suren Baghdasaryan <surenb@google.com>
Link: https://lore.kernel.org/r/20220919072356.GA29069@haolee.io
|
|
The do_set_cpus_allowed() function is used by either kthread_bind() or
select_fallback_rq(). In both cases the user affinity (if any) should be
destroyed too.
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20220922180041.1768141-6-longman@redhat.com
|
|
It was found that the user requested affinity via sched_setaffinity()
can be easily overwritten by other kernel subsystems without an easy way
to reset it back to what the user requested. For example, any change
to the current cpuset hierarchy may reset the cpumask of the tasks in
the affected cpusets to the default cpuset value even if those tasks
have pre-existing user requested affinity. That is especially easy to
trigger under a cgroup v2 environment where writing "+cpuset" to the
root cgroup's cgroup.subtree_control file will reset the cpus affinity
of all the processes in the system.
That is problematic in a nohz_full environment where the tasks running
in the nohz_full CPUs usually have their cpus affinity explicitly set
and will behave incorrectly if cpus affinity changes.
Fix this problem by looking at user_cpus_ptr in __set_cpus_allowed_ptr()
and use it to restrcit the given cpumask unless there is no overlap. In
that case, it will fallback to the given one. The SCA_USER flag is
reused to indicate intent to set user_cpus_ptr and so user_cpus_ptr
masking should be skipped. In addition, masking should also be skipped
if any of the SCA_MIGRATE_* flag is set.
All callers of set_cpus_allowed_ptr() will be affected by this change.
A scratch cpumask is added to percpu runqueues structure for doing
additional masking when user_cpus_ptr is set.
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20220922180041.1768141-4-longman@redhat.com
|
|
Unconditionally preserve the user requested cpumask on
sched_setaffinity() calls. This allows using it outside of the fairly
narrow restrict_cpus_allowed_ptr() use-case and fix some cpuset issues
that currently suffer destruction of cpumasks.
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20220922180041.1768141-3-longman@redhat.com
|
|
In order to prepare for passing through additional data through the
affinity call-chains, convert the mask and flags argument into a
structure.
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20220922180041.1768141-5-longman@redhat.com
|
|
affine_move_task() assumes task_rq_lock() has been called and it does
an implicit task_rq_unlock() before returning. Add the appropriate
__releases annotations to make this clear.
A typo error in comment is also fixed.
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20220922180041.1768141-2-longman@redhat.com
|
|
When evaluating the CPU candidates in the perf domain (pd) containing
the previously used CPU (prev_cpu), find_energy_efficient_cpu()
evaluates the energy of the pd:
- without the task (base_energy)
- with the task placed on prev_cpu (if the task fits)
- with the task placed on the CPU with the highest spare capacity,
prev_cpu being excluded from this set
If prev_cpu is already the CPU with the highest spare capacity,
max_spare_cap_cpu will be the CPU with the second highest spare
capacity.
On an Arm64 Juno-r2, with a workload of 10 tasks at a 10% duty cycle,
when prev_cpu and max_spare_cap_cpu are both valid candidates,
prev_spare_cap > max_spare_cap at ~82%.
Thus the energy of the pd when placing the task on max_spare_cap_cpu
is computed with no possible positive outcome 82% most of the time.
Do not consider max_spare_cap_cpu as a valid candidate if
prev_spare_cap > max_spare_cap.
Signed-off-by: Pierre Gondois <pierre.gondois@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lore.kernel.org/r/20221006081052.3862167-2-pierre.gondois@arm.com
|
|
We do consider thermal pressure in util_fits_cpu() for uclamp_min only.
With the exception of the biggest cores which by definition are the max
performance point of the system and all tasks by definition should fit.
Even under thermal pressure, the capacity of the biggest CPU is the
highest in the system and should still fit every task. Except when it
reaches capacity inversion point, then this is no longer true.
We can handle this by using the inverted capacity as capacity_orig in
util_fits_cpu(). Which not only addresses the problem above, but also
ensure uclamp_max now considers the inverted capacity. Force fitting
a task when a CPU is in this adverse state will contribute to making the
thermal throttling last longer.
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220804143609.515789-10-qais.yousef@arm.com
|
|
Check each performance domain to see if thermal pressure is causing its
capacity to be lower than another performance domain.
We assume that each performance domain has CPUs with the same
capacities, which is similar to an assumption made in energy_model.c
We also assume that thermal pressure impacts all CPUs in a performance
domain equally.
If there're multiple performance domains with the same capacity_orig, we
will trigger a capacity inversion if the domain is under thermal
pressure.
The new cpu_in_capacity_inversion() should help users to know when
information about capacity_orig are not reliable and can opt in to use
the inverted capacity as the 'actual' capacity_orig.
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220804143609.515789-9-qais.yousef@arm.com
|
|
condition
If the utilization of the woken up task is 0, we skip the energy
calculation because it has no impact.
But if the task is boosted (uclamp_min != 0) will have an impact on task
placement and frequency selection. Only skip if the util is truly
0 after applying uclamp values.
Change uclamp_task_cpu() signature to avoid unnecessary additional calls
to uclamp_eff_get(). feec() is the only user now.
Fixes: 732cd75b8c920 ("sched/fair: Select an energy-efficient CPU on task wake-up")
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220804143609.515789-8-qais.yousef@arm.com
|
|
So that it is now uclamp aware.
This fixes a major problem of busy tasks capped with UCLAMP_MAX keeping
the system in overutilized state which disables EAS and leads to wasting
energy in the long run.
Without this patch running a busy background activity like JIT
compilation on Pixel 6 causes the system to be in overutilized state
74.5% of the time.
With this patch this goes down to 9.79%.
It also fixes another problem when long running tasks that have their
UCLAMP_MIN changed while running such that they need to upmigrate to
honour the new UCLAMP_MIN value. The upmigration doesn't get triggered
because overutilized state never gets set in this state, hence misfit
migration never happens at tick in this case until the task wakes up
again.
Fixes: af24bde8df202 ("sched/uclamp: Add uclamp support to energy_compute()")
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220804143609.515789-7-qais.yousef@arm.com
|
|
Use the new util_fits_cpu() to ensure migration margin and capacity
pressure are taken into account correctly when uclamp is being used
otherwise we will fail to consider CPUs as fitting in scenarios where
they should.
s/asym_fits_capacity/asym_fits_cpu/ to better reflect what it does now.
Fixes: b4c9c9f15649 ("sched/fair: Prefer prev cpu in asymmetric wakeup path")
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220804143609.515789-6-qais.yousef@arm.com
|
|
Use the new util_fits_cpu() to ensure migration margin and capacity
pressure are taken into account correctly when uclamp is being used
otherwise we will fail to consider CPUs as fitting in scenarios where
they should.
Fixes: b4c9c9f15649 ("sched/fair: Prefer prev cpu in asymmetric wakeup path")
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220804143609.515789-5-qais.yousef@arm.com
|
|
As reported by Yun Hsiang [1], if a task has its uclamp_min >= 0.8 * 1024,
it'll always pick the previous CPU because fits_capacity() will always
return false in this case.
The new util_fits_cpu() logic should handle this correctly for us beside
more corner cases where similar failures could occur, like when using
UCLAMP_MAX.
We open code uclamp_rq_util_with() except for the clamp() part,
util_fits_cpu() needs the 'raw' values to be passed to it.
Also introduce uclamp_rq_{set, get}() shorthand accessors to get uclamp
value for the rq. Makes the code more readable and ensures the right
rules (use READ_ONCE/WRITE_ONCE) are respected transparently.
[1] https://lists.linaro.org/pipermail/eas-dev/2020-July/001488.html
Fixes: 1d42509e475c ("sched/fair: Make EAS wakeup placement consider uclamp restrictions")
Reported-by: Yun Hsiang <hsiang023167@gmail.com>
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220804143609.515789-4-qais.yousef@arm.com
|
|
So that the new uclamp rules in regard to migration margin and capacity
pressure are taken into account correctly.
Fixes: a7008c07a568 ("sched/fair: Make task_fits_capacity() consider uclamp restrictions")
Co-developed-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220804143609.515789-3-qais.yousef@arm.com
|
|
fits_capacity() verifies that a util is within 20% margin of the
capacity of a CPU, which is an attempt to speed up upmigration.
But when uclamp is used, this 20% margin is problematic because for
example if a task is boosted to 1024, then it will not fit on any CPU
according to fits_capacity() logic.
Or if a task is boosted to capacity_orig_of(medium_cpu). The task will
end up on big instead on the desired medium CPU.
Similar corner cases exist for uclamp and usage of capacity_of().
Slightest irq pressure on biggest CPU for example will make a 1024
boosted task look like it can't fit.
What we really want is for uclamp comparisons to ignore the migration
margin and capacity pressure, yet retain them for when checking the
_actual_ util signal.
For example, task p:
p->util_avg = 300
p->uclamp[UCLAMP_MIN] = 1024
Will fit a big CPU. But
p->util_avg = 900
p->uclamp[UCLAMP_MIN] = 1024
will not, this should trigger overutilized state because the big CPU is
now *actually* being saturated.
Similar reasoning applies to capping tasks with UCLAMP_MAX. For example:
p->util_avg = 1024
p->uclamp[UCLAMP_MAX] = capacity_orig_of(medium_cpu)
Should fit the task on medium cpus without triggering overutilized
state.
Inlined comments expand more on desired behavior in more scenarios.
Introduce new util_fits_cpu() function which encapsulates the new logic.
The new function is not used anywhere yet, but will be used to update
various users of fits_capacity() in later patches.
Fixes: af24bde8df202 ("sched/uclamp: Add uclamp support to energy_compute()")
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220804143609.515789-2-qais.yousef@arm.com
|
|
|
|
Pull kvm fixes from Paolo Bonzini:
"RISC-V:
- Fix compilation without RISCV_ISA_ZICBOM
- Fix kvm_riscv_vcpu_timer_pending() for Sstc
ARM:
- Fix a bug preventing restoring an ITS containing mappings for very
large and very sparse device topology
- Work around a relocation handling error when compiling the nVHE
object with profile optimisation
- Fix for stage-2 invalidation holding the VM MMU lock for too long
by limiting the walk to the largest block mapping size
- Enable stack protection and branch profiling for VHE
- Two selftest fixes
x86:
- add compat implementation for KVM_X86_SET_MSR_FILTER ioctl
selftests:
- synchronize includes between include/uapi and tools/include/uapi"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
tools: include: sync include/api/linux/kvm.h
KVM: x86: Add compat handler for KVM_X86_SET_MSR_FILTER
KVM: x86: Copy filter arg outside kvm_vm_ioctl_set_msr_filter()
kvm: Add support for arch compat vm ioctls
RISC-V: KVM: Fix kvm_riscv_vcpu_timer_pending() for Sstc
RISC-V: Fix compilation without RISCV_ISA_ZICBOM
KVM: arm64: vgic: Fix exit condition in scan_its_table()
KVM: arm64: nvhe: Fix build with profile optimization
KVM: selftests: Fix number of pages for memory slot in memslot_modification_stress_test
KVM: arm64: selftests: Fix multiple versions of GIC creation
KVM: arm64: Enable stack protection and branch profiling for VHE
KVM: arm64: Limit stage2_apply_range() batch size to largest block
KVM: arm64: Work out supported block level at compile time
|
|
This reverts commit 72a95859728a7866522e6633818bebc1c2519b17.
It broke reboots on big-endian MIPS and MIPS64 malta QEMU instances,
which use the syscon driver. Little-endian is not effected, which means
likely it's important to handle regmap_get_val_endian() in this function
after all.
Fixes: 72a95859728a ("mfd: syscon: Remove repetition of the regmap_get_val_endian()")
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Lee Jones <lee@kernel.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Commit bfca3dd3d068 ("kernel/utsname_sysctl.c: print kernel arch") added
a new entry to the uts_kern_table[] array, but didn't update the
UTS_PROC_xyz enumerators of older entries, breaking anything that used
them.
Which is admittedly not many cases: it's really just the two uses of
uts_proc_notify() in kernel/sys.c. But apparently journald-systemd
actually uses this to detect hostname changes.
Reported-by: Torsten Hilbrich <torsten.hilbrich@secunet.com>
Fixes: bfca3dd3d068 ("kernel/utsname_sysctl.c: print kernel arch")
Link: https://lore.kernel.org/lkml/0c2b92a6-0f25-9538-178f-eee3b06da23f@secunet.com/
Link: https://linux-regtracking.leemhuis.info/regzbot/regression/0c2b92a6-0f25-9538-178f-eee3b06da23f@secunet.com/
Cc: Petr Vorel <pvorel@suse.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Borislav Petkov:
- Fix raw data handling when perf events are used in bpf
- Rework how SIGTRAPs get delivered to events to address a bunch of
problems with it. Add a selftest for that too
* tag 'perf_urgent_for_v6.1_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
bpf: Fix sample_flags for bpf_perf_event_output
selftests/perf_events: Add a SIGTRAP stress test with disables
perf: Fix missing SIGTRAPs
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Borislav Petkov:
- Adjust code to not trip up CFI
- Fix sched group cookie matching
* tag 'sched_urgent_for_v6.1_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched: Introduce struct balance_callback to avoid CFI mismatches
sched/core: Fix comparison in sched_group_cookie_match()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull objtool fix from Borislav Petkov:
- Fix ORC stack unwinding when GCOV is enabled
* tag 'objtool_urgent_for_v6.1_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/unwind/orc: Fix unreliable stack dump with gcov
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:
"As usually the case, right after a major release, the tip urgent
branches accumulate a couple more fixes than normal. And here is the
x86, a bit bigger, urgent pile.
- Use the correct CPU capability clearing function on the error path
in Intel perf LBR
- A CFI fix to ftrace along with a simplification
- Adjust handling of zero capacity bit mask for resctrl cache
allocation on AMD
- A fix to the AMD microcode loader to attempt patch application on
every logical thread
- A couple of topology fixes to handle CPUID leaf 0x1f enumeration
info properly
- Drop a -mabi=ms compiler option check as both compilers support it
now anyway
- A couple of fixes to how the initial, statically allocated FPU
buffer state is setup and its interaction with dynamic states at
runtime"
* tag 'x86_urgent_for_v6.0_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/fpu: Fix copy_xstate_to_uabi() to copy init states correctly
perf/x86/intel/lbr: Use setup_clear_cpu_cap() instead of clear_cpu_cap()
ftrace,kcfi: Separate ftrace_stub() and ftrace_stub_graph()
x86/ftrace: Remove ftrace_epilogue()
x86/resctrl: Fix min_cbm_bits for AMD
x86/microcode/AMD: Apply the patch early on every logical thread
x86/topology: Fix duplicated core ID within a package
x86/topology: Fix multiple packages shown on a single-package system
hwmon/coretemp: Handle large core ID value
x86/Kconfig: Drop check for -mabi=ms for CONFIG_EFI_STUB
x86/fpu: Exclude dynamic states from init_fpstate
x86/fpu: Fix the init_fpstate size check with the actual size
x86/fpu: Configure init_fpstate attributes orderly
|
|
Pull io_uring follow-up from Jens Axboe:
"Currently the zero-copy has automatic fallback to normal transmit, and
it was decided that it'd be cleaner to return an error instead if the
socket type doesn't support it.
Zero-copy does work with UDP and TCP, it's more of a future proofing
kind of thing (eg for samba)"
* tag 'io_uring-6.1-2022-10-22' of git://git.kernel.dk/linux:
io_uring/net: fail zc sendmsg when unsupported by socket
io_uring/net: fail zc send when unsupported by socket
net: flag sockets supporting msghdr originated zerocopy
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
Pull hwmon fixes from Guenter Roeck:
- corsair-psu: Fix typo in USB id description, and add USB ID for new
PSU
- pwm-fan: Fix fan power handling when disabling fan control
* tag 'hwmon-for-v6.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (corsair-psu) Add USB id of the new HX1500i psu
hwmon: (pwm-fan) Explicitly switch off fan power when setting pwm1_enable to 0
hwmon: (corsair-psu) fix typo in USB id description
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"RPM fix for qcom-cci, platform module alias for xiic, build warning
fix for mlxbf, typo fixes in comments"
* tag 'i2c-for-6.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: mlxbf: depend on ACPI; clean away ifdeffage
i2c: fix spelling typos in comments
i2c: qcom-cci: Fix ordering of pm_runtime_xx and i2c_add_adapter
i2c: xiic: Add platform module alias
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull pci fixes from Bjorn Helgaas:
- Revert a simplification that broke pci-tegra due to a masking error
- Update MAINTAINERS for Kishon's email address change and TI
DRA7XX/J721E maintainer change
* tag 'pci-v6.1-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
MAINTAINERS: Update Kishon's email address in PCI endpoint subsystem
MAINTAINERS: Add Vignesh Raghavendra as maintainer of TI DRA7XX/J721E PCI driver
Revert "PCI: tegra: Use PCI_CONF1_EXT_ADDRESS() macro"
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull missed media updates from Mauro Carvalho Chehab:
"It seems I screwed-up my previous pull request: it ends up that only
half of the media patches that were in linux-next got merged in -rc1.
The script which creates the signed tags silently failed due to
5.19->6.0 so it ended generating a tag with incomplete stuff.
So here are the missing parts:
- a DVB core security fix
- lots of fixes and cleanups for atomisp staging driver
- old drivers that are VB1 are being moved to staging to be
deprecated
- several driver updates - mostly for embedded systems, but there are
also some things addressing issues with some PC webcams, in the UVC
video driver"
* tag 'media/v6.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (163 commits)
media: sun6i-csi: Move csi buffer definition to main header file
media: sun6i-csi: Introduce and use video helper functions
media: sun6i-csi: Add media ops with link notify callback
media: sun6i-csi: Remove controls handler from the driver
media: sun6i-csi: Register the media device after creation
media: sun6i-csi: Pass and store csi device directly in video code
media: sun6i-csi: Tidy up video code
media: sun6i-csi: Tidy up v4l2 code
media: sun6i-csi: Tidy up Kconfig
media: sun6i-csi: Use runtime pm for clocks and reset
media: sun6i-csi: Define and use variant to get module clock rate
media: sun6i-csi: Always set exclusive module clock rate
media: sun6i-csi: Tidy up platform code
media: sun6i-csi: Refactor main driver data structures
media: sun6i-csi: Define and use driver name and (reworked) description
media: cedrus: Add a Kconfig dependency on RESET_CONTROLLER
media: sun8i-rotate: Add a Kconfig dependency on RESET_CONTROLLER
media: sun8i-di: Add a Kconfig dependency on RESET_CONTROLLER
media: sun4i-csi: Add a Kconfig dependency on RESET_CONTROLLER
media: sun6i-csi: Add a Kconfig dependency on RESET_CONTROLLER
...
|
|
The previous patch fails zerocopy send requests for protocols that don't
support it, do the same for zerocopy sendmsg.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/0854e7bb4c3d810a48ec8b5853e2f61af36a0467.1666346426.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
If a protocol doesn't support zerocopy it will silently fall back to
copying. This type of behaviour has always been a source of troubles
so it's better to fail such requests instead.
Cc: <stable@vger.kernel.org> # 6.0
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/2db3c7f16bb6efab4b04569cd16e6242b40c5cb3.1666346426.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
We need an efficient way in io_uring to check whether a socket supports
zerocopy with msghdr provided ubuf_info. Add a new flag into the struct
socket flags fields.
Cc: <stable@vger.kernel.org> # 6.0
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/3dafafab822b1c66308bb58a0ac738b1e3f53f74.1666346426.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Also update the documentation accordingly.
Signed-off-by: Wilken Gottwalt <wilken.gottwalt@posteo.net>
Link: https://lore.kernel.org/r/Y0FghqQCHG/cX5Jz@monster.localdomain
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
|
|
Provide a definition of KVM_CAP_DIRTY_LOG_RING_ACQ_REL.
Fixes: 17601bfed909 ("KVM: Add KVM_CAP_DIRTY_LOG_RING_ACQ_REL capability and config option")
Cc: Marc Zyngier <maz@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
The KVM_X86_SET_MSR_FILTER ioctls contains a pointer in the passed in
struct which means it has a different struct size depending on whether
it gets called from 32bit or 64bit code.
This patch introduces compat code that converts from the 32bit struct to
its 64bit counterpart which then gets used going forward internally.
With this applied, 32bit QEMU can successfully set MSR bitmaps when
running on 64bit kernels.
Reported-by: Andrew Randrianasulu <randrianasulu@gmail.com>
Fixes: 1a155254ff937 ("KVM: x86: Introduce MSR filtering")
Signed-off-by: Alexander Graf <graf@amazon.com>
Message-Id: <20221017184541.2658-4-graf@amazon.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
In the next patch we want to introduce a second caller to
set_msr_filter() which constructs its own filter list on the stack.
Refactor the original function so it takes it as argument instead of
reading it through copy_from_user().
Signed-off-by: Alexander Graf <graf@amazon.com>
Message-Id: <20221017184541.2658-3-graf@amazon.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
We will introduce the first architecture specific compat vm ioctl in the
next patch. Add all necessary boilerplate to allow architectures to
override compat vm ioctls when necessary.
Signed-off-by: Alexander Graf <graf@amazon.com>
Message-Id: <20221017184541.2658-2-graf@amazon.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
HEAD
KVM/riscv fixes for 6.1, take #1
- Fix compilation without RISCV_ISA_ZICBOM
- Fix kvm_riscv_vcpu_timer_pending() for Sstc
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 fixes for 6.1, take #2
- Fix a bug preventing restoring an ITS containing mappings
for very large and very sparse device topology
- Work around a relocation handling error when compiling
the nVHE object with profile optimisation
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 fixes for 6.1, take #1
- Fix for stage-2 invalidation holding the VM MMU lock
for too long by limiting the walk to the largest
block mapping size
- Enable stack protection and branch profiling for VHE
- Two selftest fixes
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull thermal control fix from Rafael Wysocki:
"This fixes the control CPU selection in the intel_powerclamp thermal
driver"
* tag 'thermal-6.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
thermal: intel_powerclamp: Use first online CPU as control_cpu
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These fix some issues and clean up code in ARM cpufreq drivers.
Specifics:
- Fix module loading in the Tegra124 cpufreq driver (Jon Hunter)
- Fix memory leak and update to read-only region in the qcom cpufreq
driver (Fabien Parent)
- Miscellaneous minor cleanups to cpufreq drivers (Fabien Parent,
Yang Yingliang)"
* tag 'pm-6.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq: sun50i: Switch to use dev_err_probe() helper
cpufreq: qcom-nvmem: Switch to use dev_err_probe() helper
cpufreq: imx6q: Switch to use dev_err_probe() helper
cpufreq: dt: Switch to use dev_err_probe() helper
cpufreq: qcom: remove unused parameter in function definition
cpufreq: qcom: fix writes in read-only memory region
cpufreq: qcom: fix memory leak in error path
cpufreq: tegra194: Fix module loading
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fixes from Rafael Wysocki:
"These fix issues introduced during this merge window (ACPI/PCI, device
enumeration and documentation) and some other ones found recently.
Specifics:
- Add missing device reference counting to acpi_get_pci_dev() after
changing it recently (Rafael Wysocki)
- Fix resource list walk in acpi_dma_get_range() (Robin Murphy)
- Add IRQ override quirk for LENOVO IdeaPad and extend the IRQ
override warning message (Jiri Slaby)
- Fix integer overflow in ghes_estatus_pool_init() (Ashish Kalra)
- Fix multiple error records handling in one of the ACPI extlog
driver code paths (Tony Luck)
- Prune DSDT override documentation from index after dropping it
(Bagas Sanjaya)"
* tag 'acpi-6.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: scan: Fix DMA range assignment
ACPI: PCI: Fix device reference counting in acpi_get_pci_dev()
ACPI: resource: note more about IRQ override
ACPI: resource: do IRQ override on LENOVO IdeaPad
ACPI: extlog: Handle multiple records
ACPI: APEI: Fix integer overflow in ghes_estatus_pool_init()
Documentation: ACPI: Prune DSDT override documentation from index
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi
Pull EFI fixes from Ard Biesheuvel:
- fixes for the EFI variable store refactor that landed in v6.0
- fixes for issues that were introduced during the merge window
- back out some changes related to EFI zboot signing - we'll add a
better solution for this during the next cycle
* tag 'efi-fixes-for-v6.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
efi: runtime: Don't assume virtual mappings are missing if VA == PA == 0
efi: libstub: Fix incorrect payload size in zboot header
efi: libstub: Give efi_main() asmlinkage qualification
efi: efivars: Fix variable writes without query_variable_store()
efi: ssdt: Don't free memory if ACPI table was loaded successfully
efi: libstub: Remove zboot signing from build options
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull iommu fixes from Joerg Roedel:
"Intel VT-d fixes:
- Fix a lockdep splat issue in intel_iommu_init()
- Allow NVS regions to pass RMRR check
- Domain cleanup in error path"
* tag 'iommu-fixes-v6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu/vt-d: Clean up si_domain in the init_dmars() error path
iommu/vt-d: Allow NVS regions in arch_rmrr_sanity_check()
iommu/vt-d: Use rcu_lock in get_resv_regions
iommu: Add gfp parameter to iommu_alloc_resv_region
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid
Pull HID fixes from Benjamin Tissoires:
- a 12 year old bug fix for the Apple Magic Trackpad v1 (José Expósito)
- a fix for a potential crash on removal of the Playstation controllers
(Roderick Colenbrander)
- a few new device IDs and device-specific quirks, most notably support
of the new Playstation DualSense Edge controller
* tag 'for-linus-2022102101' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
HID: lenovo: Make array tp10ubkbd_led static const
HID: saitek: add madcatz variant of MMO7 mouse device ID
HID: playstation: support updated DualSense rumble mode.
HID: playstation: add initial DualSense Edge controller support
HID: playstation: stop DualSense output work on remove.
HID: magicmouse: Do not set BTN_MOUSE on double report
|
|
Pull cifs fixes from Steve French:
- memory leak fixes
- fixes for directory leases, including an important one which fixes a
problem noticed by git functional tests
- fixes relating to missing free_xid calls (helpful for
tracing/debugging of entry/exit into cifs.ko)
- a multichannel fix
- a small cleanup fix (use of list_move instead of list_del/list_add)
* tag '6.1-rc1-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: update internal module number
cifs: fix memory leaks in session setup
cifs: drop the lease for cached directories on rmdir or rename
smb3: interface count displayed incorrectly
cifs: Fix memory leak when build ntlmssp negotiate blob failed
cifs: set rc to -ENOENT if we can not get a dentry for the cached dir
cifs: use LIST_HEAD() and list_move() to simplify code
cifs: Fix xid leak in cifs_get_file_info_unix()
cifs: Fix xid leak in cifs_ses_add_channel()
cifs: Fix xid leak in cifs_flock()
cifs: Fix xid leak in cifs_copy_file_range()
cifs: Fix xid leak in cifs_create()
|