Age | Commit message (Collapse) | Author | Files | Lines |
|
kvm_arch_vcpu_ioctl_run() disables the use of preemption when updating
the vgic and timer states to prevent the calling task from migrating to
another CPU. It does so to prevent the task from writing to the
incorrect per-CPU GIC distributor registers.
On -rt kernels, it's possible to maintain the same guarantee with the
use of migrate_{disable,enable}(), with the added benefit that the
migrate-disabled region is preemptible. Update
kvm_arch_vcpu_ioctl_run() to do so.
Cc: Christoffer Dall <christoffer.dall@linaro.org>
Reported-by: Manish Jaggi <Manish.Jaggi@caviumnetworks.com>
Signed-off-by: Josh Cartwright <joshc@ni.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Probably happens on all ARM, with
CONFIG_PREEMPT_RT
CONFIG_DEBUG_ATOMIC_SLEEP
This simple program....
int main() {
*((char*)0xc0001000) = 0;
};
[ 512.742724] BUG: sleeping function called from invalid context at kernel/rtmutex.c:658
[ 512.743000] in_atomic(): 0, irqs_disabled(): 128, pid: 994, name: a
[ 512.743217] INFO: lockdep is turned off.
[ 512.743360] irq event stamp: 0
[ 512.743482] hardirqs last enabled at (0): [< (null)>] (null)
[ 512.743714] hardirqs last disabled at (0): [<c0426370>] copy_process+0x3b0/0x11c0
[ 512.744013] softirqs last enabled at (0): [<c0426370>] copy_process+0x3b0/0x11c0
[ 512.744303] softirqs last disabled at (0): [< (null)>] (null)
[ 512.744631] [<c041872c>] (unwind_backtrace+0x0/0x104)
[ 512.745001] [<c09af0c4>] (dump_stack+0x20/0x24)
[ 512.745355] [<c0462490>] (__might_sleep+0x1dc/0x1e0)
[ 512.745717] [<c09b6770>] (rt_spin_lock+0x34/0x6c)
[ 512.746073] [<c0441bf0>] (do_force_sig_info+0x34/0xf0)
[ 512.746457] [<c0442668>] (force_sig_info+0x18/0x1c)
[ 512.746829] [<c041d880>] (__do_user_fault+0x9c/0xd8)
[ 512.747185] [<c041d938>] (do_bad_area+0x7c/0x94)
[ 512.747536] [<c041d990>] (do_sect_fault+0x40/0x48)
[ 512.747898] [<c040841c>] (do_DataAbort+0x40/0xa0)
[ 512.748181] Exception stack(0xecaa1fb0 to 0xecaa1ff8)
Oxc0000000 belongs to kernel address space, user task can not be
allowed to access it. For above condition, correct result is that
test case should receive a “segment fault” and exits but not stacks.
the root cause is commit 02fe2845d6a8 ("avoid enabling interrupts in
prefetch/data abort handlers"),it deletes irq enable block in Data
abort assemble code and move them into page/breakpiont/alignment fault
handlers instead. But author does not enable irq in translation/section
permission fault handlers. ARM disables irq when it enters exception/
interrupt mode, if kernel doesn't enable irq, it would be still disabled
during translation/section permission fault.
We see the above splat because do_force_sig_info is still called with
IRQs off, and that code eventually does a:
spin_lock_irqsave(&t->sighand->siglock, flags);
As this is architecture independent code, and we've not seen any other
need for other arch to have the siglock converted to raw lock, we can
conclude that we should enable irq for ARM translation/section
permission exception.
Signed-off-by: Yadi.hu <yadi.hu@windriver.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
arm64 is missing support for PREEMPT_RT. The main feature which is
lacking is support for lazy preemption. The arch-specific entry code,
thread information structure definitions, and associated data tables
have to be extended to provide this support. Then the Kconfig file has
to be extended to indicate the support is available, and also to
indicate that support for full RT preemption is now available.
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Implement the powerpc pieces for lazy preempt.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Implement the arm pieces for lazy preempt.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Common code needs common defines....
Fixes: f2f9e496208c ("x86: Support for lazy preemption")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Implement the x86 pieces for lazy preempt.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
The TIF_NEED_RESCHED bit is inlined on x86 into the preemption counter.
By using should_resched(0) instead of need_resched() the same check can
be performed which uses the same variable as 'preempt_count()` which was
issued before.
Use should_resched(0) instead need_resched().
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Allow to select RT.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Non constant TSC is a nightmare on bare metal already, but with
virtualization it becomes a complete disaster because the workarounds
are horrible latency wise. That's also a preliminary for running RT in
a guest on top of a RT host.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
On x86_64 we must disable preemption before we enable interrupts
for stack faults, int3 and debugging, because the current task is using
a per CPU debug stack defined by the IST. If we schedule out, another task
can come in and use the same stack and cause the stack to be corrupted
and crash the kernel on return.
When CONFIG_PREEMPT_RT is enabled, spin_locks become mutexes, and
one of these is the spin lock used in signal handling.
Some of the debug code (int3) causes do_trap() to send a signal.
This function calls a spin lock that has been converted to a mutex
and has the possibility to sleep. If this happens, the above issues with
the corrupted stack is possible.
Instead of calling the signal right away, for PREEMPT_RT and x86_64,
the signal information is stored on the stacks task_struct and
TIF_NOTIFY_RESUME is set. Then on exit of the trap, the signal resume
code will send the signal when preemption is enabled.
[ rostedt: Switched from #ifdef CONFIG_PREEMPT_RT to
ARCH_RT_DELAYS_SIGNAL_SEND and added comments to the code. ]
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[bigeasy: also needed on 32bit as per Yang Shi <yang.shi@linaro.org>]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Add a /sys/kernel entry to indicate that the kernel is a
realtime kernel.
Clark says that he needs this for udev rules, udev needs to evaluate
if its a PREEMPT_RT kernel a few thousand times and parsing uname
output is too slow or so.
Are there better solutions? Should it exist and return 0 on !-rt?
Signed-off-by: Clark Williams <williams@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Allow to select RT.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
It has become an obsession to mitigate the determinism vs. throughput
loss of RT. Looking at the mainline semantics of preemption points
gives a hint why RT sucks throughput wise for ordinary SCHED_OTHER
tasks. One major issue is the wakeup of tasks which are right away
preempting the waking task while the waking task holds a lock on which
the woken task will block right after having preempted the wakee. In
mainline this is prevented due to the implicit preemption disable of
spin/rw_lock held regions. On RT this is not possible due to the fully
preemptible nature of sleeping spinlocks.
Though for a SCHED_OTHER task preempting another SCHED_OTHER task this
is really not a correctness issue. RT folks are concerned about
SCHED_FIFO/RR tasks preemption and not about the purely fairness
driven SCHED_OTHER preemption latencies.
So I introduced a lazy preemption mechanism which only applies to
SCHED_OTHER tasks preempting another SCHED_OTHER task. Aside of the
existing preempt_count each tasks sports now a preempt_lazy_count
which is manipulated on lock acquiry and release. This is slightly
incorrect as for lazyness reasons I coupled this on
migrate_disable/enable so some other mechanisms get the same treatment
(e.g. get_cpu_light).
Now on the scheduler side instead of setting NEED_RESCHED this sets
NEED_RESCHED_LAZY in case of a SCHED_OTHER/SCHED_OTHER preemption and
therefor allows to exit the waking task the lock held region before
the woken task preempts. That also works better for cross CPU wakeups
as the other side can stay in the adaptive spinning loop.
For RT class preemption there is no change. This simply sets
NEED_RESCHED and forgoes the lazy preemption counter.
Initial test do not expose any observable latency increasement, but
history shows that I've been proven wrong before :)
The lazy preemption mode is per default on, but with
CONFIG_SCHED_DEBUG enabled it can be disabled via:
# echo NO_PREEMPT_LAZY >/sys/kernel/debug/sched_features
and reenabled via
# echo PREEMPT_LAZY >/sys/kernel/debug/sched_features
The test results so far are very machine and workload dependent, but
there is a clear trend that it enhances the non RT workload
performance.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
PREEMPT_RT preempts softirqs and the current implementation avoids
do_softirq_own_stack() and only uses __do_softirq().
Disable the unused softirqs stacks on PREEMPT_RT to safe some memory and
ensure that do_softirq_own_stack() is not used which is not expected.
[bigeasy: commit description.]
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
|
|
PREEMPT_RT preempts softirqs and the current implementation avoids
do_softirq_own_stack() and only uses __do_softirq().
Disable the unused softirqs stacks on PREEMPT_RT to safe some memory and
ensure that do_softirq_own_stack() is not used which is not expected.
[bigeasy: commit description.]
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
|
|
The CPU trigger is invoked on ARM from CPU-idle. That trigger later
invokes led_trigger_event() which may invoke the callback of the actual driver.
That driver can acquire a spinlock_t which is okay on kernel without
PREEMPT_RT. On PREEMPT_RT enabled kernel this lock is turned into a sleeping
lock and must not be acquired with disabled interrupts.
Disable the CPU trigger on PREEMPT_RT.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://lkml.kernel.org/r/20210924111501.m57cwwn7ahiyxxdd@linutronix.de
|
|
They're nondeterministic, and lead to ___might_sleep() splats in -rt.
OTOH, they're a lot less wasteful than an rtmutex per page.
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
For efficiency reasons, zsmalloc is using a slim `handle'. The value is
the address of a memory allocation of 4 or 8 bytes depending on the size
of the long data type. The lowest bit in that allocated memory is used
as a bit spin lock.
The usage of the bit spin lock is problematic because with the bit spin
lock held zsmalloc acquires a rwlock_t and spinlock_t which are both
sleeping locks on PREEMPT_RT and therefore must not be acquired with
disabled preemption.
Extend the handle to struct zsmalloc_handle which holds the old handle as
addr and a spinlock_t which replaces the bit spinlock. Replace all the
wrapper functions accordingly.
The usage of get_cpu_var() in zs_map_object() is problematic because
it disables preemption and makes it impossible to acquire any sleeping
lock on PREEMPT_RT such as a spinlock_t.
Replace the get_cpu_var() usage with a local_lock_t which is embedded
struct mapping_area. It ensures that the access the struct is
synchronized against all users on the same CPU.
This survived LTP testing.
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[bigeasy: replace the bitspin_lock() with a mutex, get_locked_var() and
patch description. Mike then fixed the size magic and made handle lock
spinlock_t.]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
|
|
ioread8() operations to TPM MMIO addresses can stall the cpu when
immediately following a sequence of iowrite*()'s to the same region.
For example, cyclitest measures ~400us latency spikes when a non-RT
usermode application communicates with an SPI-based TPM chip (Intel Atom
E3940 system, PREEMPT_RT kernel). The spikes are caused by a
stalling ioread8() operation following a sequence of 30+ iowrite8()s to
the same address. I believe this happens because the write sequence is
buffered (in cpu or somewhere along the bus), and gets flushed on the
first LOAD instruction (ioread*()) that follows.
The enclosed change appears to fix this issue: read the TPM chip's
access register (status code) after every iowrite*() operation to
amortize the cost of flushing data to chip across multiple instructions.
Signed-off-by: Haris Okanovic <haris.okanovic@ni.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
acrn_irqfds_mutex is not used, never was.
Remove acrn_irqfds_mutex.
Fixes: aa3b483ff1d71 ("virt: acrn: Introduce irqfd")
Cc: Fei Li <fei1.li@intel.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
|
|
The mutex smack_ipv6_lock is only used with the SMACK_IPV6_PORT_LABELING
block but its definition is outside of the block. This leads to a
defined-but-not-used warning on PREEMPT_RT.
Moving smack_ipv6_lock down to the block where it is used where it used
raises the question why is smk_ipv6_port_list read if nothing is added
to it.
Turns out, only smk_ipv6_port_check() is using it outside of an ifdef
SMACK_IPV6_PORT_LABELING block. However two of three caller invoke
smk_ipv6_port_check() from a ifdef block and only one is using
__is_defined() macro which requires the function and smk_ipv6_port_list
to be around.
Put the lock and list inside an ifdef SMACK_IPV6_PORT_LABELING block to
avoid the warning regarding unused mutex. Extend the ifdef-block to also
cover smk_ipv6_port_check(). Make smack_socket_connect() use ifdef
instead of __is_defined() to avoid complains about missing function.
Cc: Casey Schaufler <casey@schaufler-ca.com>
Cc: James Morris <jmorris@namei.org>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
|
|
irqs_lock is not used, never was.
Remove irqs_lock.
Fixes: 283b612429a27 ("ASoC: mediatek: implement mediatek common structure")
Cc: Liam Girdwood <lgirdwood@gmail.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Jaroslav Kysela <perex@perex.cz>
Cc: Takashi Iwai <tiwai@suse.com>
Cc: Matthias Brugger <matthias.bgg@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
|
|
On -rt kernels, the use of migrate_disable()/migrate_enable() is
sufficient to guarantee a task isn't moved to another CPU. Update the
irq_set_irqchip_state() documentation to reflect this.
Signed-off-by: Josh Cartwright <joshc@ni.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://lkml.kernel.org/r/20210917103055.92150-1-bigeasy@linutronix.de
|
|
The !irqs_disabled() check triggers on PREEMPT_RT even with
i915_sched_engine::lock acquired. The reason is the lock is transformed
into a sleeping lock on PREEMPT_RT and does not disable interrupts.
There is no need to check for disabled interrupts. The lockdep
annotation below already check if the lock has been acquired by the
caller and will yell if the interrupts are not disabled.
Remove the !irqs_disabled() check.
Reported-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
|
|
execlists_dequeue() is invoked from a function which uses
local_irq_disable() to disable interrupts so the spin_lock() behaves
like spin_lock_irq().
This breaks PREEMPT_RT because local_irq_disable() + spin_lock() is not
the same as spin_lock_irq().
execlists_dequeue_irq() and execlists_dequeue() has each one caller
only. If intel_engine_cs::active::lock is acquired and released with the
_irq suffix then it behaves almost as if execlists_dequeue() would be
invoked with disabled interrupts. The difference is the last part of the
function which is then invoked with enabled interrupts.
I can't tell if this makes a difference. From looking at it, it might
work to move the last unlock at the end of the function as I didn't find
anything that would acquire the lock again.
Reported-by: Clark Williams <williams@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
|
|
Disabling interrupts and invoking the irq_work function directly breaks
on PREEMPT_RT.
PREEMPT_RT does not invoke all irq_work from hardirq context because
some of the user have spinlock_t locking in the callback function.
These locks are then turned into a sleeping locks which can not be
acquired with disabled interrupts.
Using irq_work_queue() has the benefit that the irqwork will be invoked
in the regular context. In general there is "no" delay between enqueuing
the callback and its invocation because the interrupt is raised right
away on architectures which support it (which includes x86).
Use irq_work_queue() + irq_work_sync() instead invoking the callback
directly.
Reported-by: Clark Williams <williams@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
|
|
The order of the header files is important. If this header file is
included after tracepoint.h was included then the NOTRACE here becomes a
nop. Currently this happens for two .c files which use the tracepoitns
behind DRM_I915_LOW_LEVEL_TRACEPOINTS.
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Luca Abeni reported this:
| BUG: scheduling while atomic: kworker/u8:2/15203/0x00000003
| CPU: 1 PID: 15203 Comm: kworker/u8:2 Not tainted 4.19.1-rt3 #10
| Call Trace:
| rt_spin_lock+0x3f/0x50
| gen6_read32+0x45/0x1d0 [i915]
| g4x_get_vblank_counter+0x36/0x40 [i915]
| trace_event_raw_event_i915_pipe_update_start+0x7d/0xf0 [i915]
The tracing events use trace_i915_pipe_update_start() among other events
use functions acquire spinlock_t locks which are transformed into
sleeping locks on PREEMPT_RT. A few trace points use
intel_get_crtc_scanline(), others use ->get_vblank_counter() wich also
might acquire a sleeping locks on PREEMPT_RT.
At the time the arguments are evaluated within trace point, preemption
is disabled and so the locks must not be acquired on PREEMPT_RT.
Based on this I don't see any other way than disable trace points on
PREMPT_RT.
Reported-by: Luca Abeni <lucabe72@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
|
|
The !in_atomic() check in _wait_for_atomic() triggers on PREEMPT_RT
because the uncore::lock is a spinlock_t and does not disable
preemption or interrupts.
Changing the uncore:lock to a raw_spinlock_t doubles the worst case
latency on an otherwise idle testbox during testing. Therefore I'm
currently unsure about changing this.
Link: https://lore.kernel.org/all/20211006164628.s2mtsdd2jdbfyf7g@linutronix.de/
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
|
|
Commit
8d7849db3eab7 ("drm/i915: Make sprite updates atomic")
started disabling interrupts across atomic updates. This breaks on PREEMPT_RT
because within this section the code attempt to acquire spinlock_t locks which
are sleeping locks on PREEMPT_RT.
According to the comment the interrupts are disabled to avoid random delays and
not required for protection or synchronisation.
If this needs to happen with disabled interrupts on PREEMPT_RT, and the
whole section is restricted to register access then all sleeping locks
need to be acquired before interrupts are disabled and some function
maybe moved after enabling interrupts again.
This includes:
- prepare_to_wait() + finish_wait() due its wake queue.
- drm_crtc_vblank_put() -> vblank_disable_fn() drm_device::vbl_lock.
- skl_pfit_enable(), intel_update_plane(), vlv_atomic_update_fifo() and
maybe others due to intel_uncore::lock
- drm_crtc_arm_vblank_event() due to drm_device::event_lock and
drm_device::vblank_time_lock.
Don't disable interrupts on PREEMPT_RT during atomic updates.
[bigeasy: drop local locks, commit message]
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
|
|
Mario Kleiner suggest in commit
ad3543ede630f ("drm/intel: Push get_scanout_position() timestamping into kms driver.")
a spots where preemption should be disabled on PREEMPT_RT. The
difference is that on PREEMPT_RT the intel_uncore::lock disables neither
preemption nor interrupts and so region remains preemptible.
The area covers only register reads and writes. The part that worries me
is:
- __intel_get_crtc_scanline() the worst case is 100us if no match is
found.
- intel_crtc_scanlines_since_frame_timestamp() not sure how long this
may take in the worst case.
It was in the RT queue for a while and nobody complained.
Disable preemption on PREEPMPT_RT during timestamping.
[bigeasy: patch description.]
Cc: Mario Kleiner <mario.kleiner.de@gmail.com>
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
|
|
__timeline_mark_lock().
This is a revert of commits
d67739268cf0e ("drm/i915/gt: Mark up the nested engine-pm timeline lock as irqsafe")
6c69a45445af9 ("drm/i915/gt: Mark context->active_count as protected by timeline->mutex")
The existing code leads to a different behaviour depending on whether
lockdep is enabled or not. Any following lock that is acquired without
disabling interrupts (but needs to) will not be noticed by lockdep.
This it not just a lockdep annotation but is used but an actual mutex_t
that is properly used as a lock but in case of __timeline_mark_lock()
lockdep is only told that it is acquired but no lock has been acquired.
It appears that its purpose is just satisfy the lockdep_assert_held()
check in intel_context_mark_active(). The other problem with disabling
interrupts is that on PREEMPT_RT interrupts are also disabled which
leads to problems for instance later during memory allocation.
Add a CONTEXT_IS_PARKED bit to intel_engine_cs and set_bit/clear_bit it
instead of mutex_acquire/mutex_release. Use test_bit in the two
identified spots which relied on the lockdep annotation.
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
|
|
Delegate the random insertion to the forced threaded interrupt
handler. Store the return IP of the hard interrupt handler in the irq
descriptor and feed it into the random generator as a source of
entropy.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
CPU bringup calls into the random pool to initialize the stack
canary. During boot that works nicely even on RT as the might sleep
checks are disabled. During CPU hotplug the might sleep checks
trigger. Making the locks in random raw is a major PITA, so avoid the
call on RT is the only sensible solution. This is basically the same
randomness which we get during boot where the random pool has no
entropy and we rely on the TSC randomnness.
Reported-by: Carsten Emde <carsten.emde@osadl.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Disable on -RT. If this is invoked from irq-context we will have problems
to acquire the sleeping lock.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
The root-lock is dropped before dev_hard_start_xmit() is invoked and after
setting the __QDISC___STATE_RUNNING bit. If this task is now pushed away
by a task with a higher priority then the task with the higher priority
won't be able to submit packets to the NIC directly instead they will be
enqueued into the Qdisc. The NIC will remain idle until the task(s) with
higher priority leave the CPU and the task with lower priority gets back
and finishes the job.
If we take always the busylock we ensure that the RT task can boost the
low-prio task and submit the packet.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Upstream uses skb_dequeue() to acquire lock of `input_pkt_queue'. The reason is
to synchronize against a remote CPU which still thinks that the CPU is online
enqueues packets to this CPU.
There are no guarantees that the packet is enqueued before the callback is run,
it just hope.
RT however complains about an not initialized lock because it uses another lock
for `input_pkt_queue' due to the IRQ-off nature of the context.
Use the unlocked dequeue version for `input_pkt_queue'.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Use the rps lock as rawlock so we can keep irq-off regions. It looks low
latency. However we can't kfree() from this context therefore we defer this
to the softirq and use the tofree_queue list for it (similar to process_queue).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
In 2004 netif_rx_ni() gained a preempt_disable() section around
netif_rx() and its do_softirq() + testing for it. The do_softirq() part
is required because netif_rx() raises the softirq but does not invoke
it. The preempt_disable() is required to remain on the same CPU which added the
skb to the per-CPU list.
All this can be avoided be putting this into a local_bh_disable()ed
section. The local_bh_enable() part will invoke do_softirq() if
required.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Delay RCU-selftests until ksoftirqd is up and running.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
i_dir_seq is a sequence counter with a lock which is represented by the lowest
bit. The writer atomically updates the counter which ensures that it can be
modified by only one writer at a time.
The commit introducing this change claims that the lock has been integrated
into the counter for space reasons within the inode struct. The i_dir_seq
member is within a union which shares also a pointer. That means by using
seqlock_t we would have a sequence counter and a lock without increasing the
size of the data structure on 64bit and 32bit would grow by 4 bytes. With
lockdep enabled the size would grow and on PREEMPT_RT the spinlock_t is also
larger.
In order to keep this construct working on PREEMPT_RT, the writer needs to
disable preemption while obtaining the lock on the sequence counter / starting
the write critical section. The writer acquires an otherwise unrelated
spinlock_t which serves the same purpose on !PREEMPT_RT. With enabled
preemption a high priority reader could preempt the writer and live lock the
system while waiting for the locked bit to disappear.
Another solution would be to have global spinlock_t which is always acquired
by the writer. The reader would then acquire the lock if the sequence count is
odd and by doing so force the writer out of the critical section. The global
spinlock_t could be replaced by a hashed lock based on the address of the inode
to lower the lock contention.
For now, manually disable preemption on PREEMPT_RT to avoid live locks.
Reported-by: Oleg.Karfich@wago.com
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
|
|
__d_lookup_done() invokes wake_up_all() while holding a hlist_bl_lock()
which disables preemption. As a workaround convert it to swait.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
As explained by Alexander Fyodorov <halcy@yandex.ru>:
|read_lock(&tasklist_lock) in ptrace_stop() is converted to mutex on RT kernel,
|and it can remove __TASK_TRACED from task->state (by moving it to
|task->saved_state). If parent does wait() on child followed by a sys_ptrace
|call, the following race can happen:
|
|- child sets __TASK_TRACED in ptrace_stop()
|- parent does wait() which eventually calls wait_task_stopped() and returns
| child's pid
|- child blocks on read_lock(&tasklist_lock) in ptrace_stop() and moves
| __TASK_TRACED flag to saved_state
|- parent calls sys_ptrace, which calls ptrace_check_attach() and wait_task_inactive()
The patch is based on his initial patch where an additional check is
added in case the __TASK_TRACED moved to ->saved_state. The pi_lock is
taken in case the caller is interrupted between looking into ->state and
->saved_state.
[ Fix for ptrace_unfreeze_traced() by Oleg Nesterov ]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Upstream commit '53da1d9456fe7f8 fix ptrace slowness' is nothing more
than a bandaid around the ptrace design trainwreck. It's not a
correctness issue, it's merily a cosmetic bandaid.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
559271146efc ("mm/memcg: optimize user context object stock access") is a
classic example of optimizing for the cpu local BKL serialization without a
clear protection scope.
Disable MEMCG on RT for now.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
On RT the seqcount_t is required even on UP because the softirq can be
preempted. The IRQ handler is threaded so it is also preemptible.
Disable preemption on 32bit-RT during value updates. There is no need to
disable interrupts on RT because the handler is run threaded. Therefore
disabling preemption is enough to guarantee that the update is not
interruped.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
raise_softirq_irqoff() disables interrupts and wakes the softirq
daemon, but after reenabling interrupts there is no preemption check,
so the execution of the softirq thread might be delayed arbitrarily.
In principle we could add that check to local_irq_enable/restore, but
that's overkill as the rasie_softirq_irqoff() sections are the only
ones which show this behaviour.
Reported-by: Carsten Emde <cbe@osadl.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|