summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)AuthorFilesLines
2021-03-19kernel: debug: Ordinary typo fixes in the file gdbstub.cBhaskar Chowdhury1-2/+2
s/overwitten/overwritten/ s/procesing/processing/ Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com> Link: https://lore.kernel.org/r/20210317104658.4053473-1-unixbhaskar@gmail.com Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
2021-03-19kdb: Simplify kdb commands registrationSumit Garg3-213/+311
Simplify kdb commands registration via using linked list instead of static array for commands storage. Signed-off-by: Sumit Garg <sumit.garg@linaro.org> Link: https://lore.kernel.org/r/20210224070827.408771-1-sumit.garg@linaro.org Reviewed-by: Douglas Anderson <dianders@chromium.org> [daniel.thompson@linaro.org: Removed a bunch of .cmd_minline = 0 initializers] Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
2021-03-19kdb: Remove redundant function definitions/prototypesSumit Garg2-20/+0
Cleanup kdb code to get rid of unused function definitions/prototypes. Signed-off-by: Sumit Garg <sumit.garg@linaro.org> Link: https://lore.kernel.org/r/20210224071653.409150-1-sumit.garg@linaro.org Reviewed-by: Douglas Anderson <dianders@chromium.org> Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
2021-03-19static_call: Fix static_call_update() sanity checkPeter Zijlstra2-1/+18
Sites that match init_section_contains() get marked as INIT. For built-in code init_sections contains both __init and __exit text. OTOH kernel_text_address() only explicitly includes __init text (and there are no __exit text markers). Match what jump_label already does and ignore the warning for INIT sites. Also see the excellent changelog for commit: 8f35eaa5f2de ("jump_label: Don't warn on __exit jump entries") Fixes: 9183c3f9ed710 ("static_call: Add inline static call infrastructure") Reported-by: Sumit Garg <sumit.garg@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Jarkko Sakkinen <jarkko@kernel.org> Tested-by: Sumit Garg <sumit.garg@linaro.org> Link: https://lkml.kernel.org/r/20210318113610.739542434@infradead.org
2021-03-19static_call: Align static_call_is_init() patching conditionPeter Zijlstra1-10/+4
The intent is to avoid writing init code after init (because the text might have been freed). The code is needlessly different between jump_label and static_call and not obviously correct. The existing code relies on the fact that the module loader clears the init layout, such that within_module_init() always fails, while jump_label relies on the module state which is more obvious and matches the kernel logic. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Jarkko Sakkinen <jarkko@kernel.org> Tested-by: Sumit Garg <sumit.garg@linaro.org> Link: https://lkml.kernel.org/r/20210318113610.636651340@infradead.org
2021-03-19static_call: Fix static_call_set_init()Peter Zijlstra1-7/+10
It turns out that static_call_set_init() does not preserve the other flags; IOW. it clears TAIL if it was set. Fixes: 9183c3f9ed710 ("static_call: Add inline static call infrastructure") Reported-by: Sumit Garg <sumit.garg@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Jarkko Sakkinen <jarkko@kernel.org> Tested-by: Sumit Garg <sumit.garg@linaro.org> Link: https://lkml.kernel.org/r/20210318113610.519406371@infradead.org
2021-03-19locking/locktorture: Fix incorrect use of ww_acquire_ctx in ww_mutex testWaiman Long1-12/+27
The ww_acquire_ctx structure for ww_mutex needs to persist for a complete lock/unlock cycle. In the ww_mutex test in locktorture, however, both ww_acquire_init() and ww_acquire_fini() are called within the lock function only. This causes a lockdep splat of "WARNING: Nested lock was not taken" when lockdep is enabled in the kernel. To fix this problem, we need to move the ww_acquire_fini() after the ww_mutex_unlock() in torture_ww_mutex_unlock(). This is done by allocating a global array of ww_acquire_ctx structures. Each locking thread is associated with its own ww_acquire_ctx via the unique thread id it has so that both the lock and unlock functions can access the same ww_acquire_ctx structure. Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210318172814.4400-6-longman@redhat.com
2021-03-19locking/locktorture: Pass thread id to lock/unlock functionsWaiman Long1-36/+58
To allow the lock and unlock functions in locktorture to access per-thread information, we need to pass some hint on how to locate those information. One way to do this is to pass in a unique thread id which can then be used to access a global array for thread specific information. Change the lock and unlock method to add a thread id parameter which can be determined by the offset of the lwsp/lrsp pointer from the global lwsa/lrsa array. There is no other functional change in this patch. Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210318172814.4400-5-longman@redhat.com
2021-03-19locking/locktorture: Fix false positive circular locking splat in ww_mutex testWaiman Long1-3/+14
In order to avoid false positive circular locking lockdep splat when runnng the ww_mutex torture test, we need to make sure that the ww_mutexes have the same lock class as the acquire_ctx. This means the ww_mutexes must have the same lockdep key as the acquire_ctx. Unfortunately the current DEFINE_WW_MUTEX() macro fails to do that. As a result, we add an init method for the ww_mutex test to do explicit ww_mutex_init()'s of the ww_mutexes to avoid the false positive warning. Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210318172814.4400-3-longman@redhat.com
2021-03-19Merge branch 'locking/urgent' into locking/core, to pick up dependent commitsIngo Molnar11-144/+232
We are applying further, lower-prio fixes on top of two ww_mutex fixes in locking/urgent. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2021-03-19swiotlb: remove swiotlb_nr_tblChristoph Hellwig1-6/+1
All callers just use it to check if swiotlb is active at all, for which they can just use is_swiotlb_active. In the longer run drivers need to stop using is_swiotlb_active as well, but let's do the simple step first. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2021-03-19swiotlb: dynamically allocate io_tlb_default_memChristoph Hellwig1-207/+99
Instead of allocating ->list and ->orig_addr separately just do one dynamic allocation for the actual io_tlb_mem structure. This simplifies a lot of the initialization code, and also allows to just check io_tlb_default_mem to see if swiotlb is in use. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2021-03-19swiotlb: move global variables into a new io_tlb_mem structureClaire Chang1-190/+164
Added a new struct, io_tlb_mem, as the IO TLB memory pool descriptor and moved relevant global variables into that struct. This will be useful later to allow for restricted DMA pool. Signed-off-by: Claire Chang <tientzu@chromium.org> [hch: rebased] Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2021-03-18cpufreq: schedutil: Call sugov_update_next_freq() before check to ↵Yue Hu1-17/+12
fast_switch_enabled Note that sugov_update_next_freq() may return false, that means the caller sugov_fast_switch() will do nothing except fast switch check. Similarly, sugov_deferred_update() also has unnecessary operations of raw_spin_{lock,unlock} in sugov_update_single_freq() for that case. So, let's call sugov_update_next_freq() before the fast switch check to avoid unnecessary behaviors above. Accordingly, update interface definition to sugov_deferred_update() and remove sugov_fast_switch() since we will call cpufreq_driver_fast_switch() directly instead. Signed-off-by: Yue Hu <huyue2@yulong.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-03-18tracing: Add a verifier to check string pointers for trace eventsSteven Rostedt (VMware)4-1/+215
It is a common mistake for someone writing a trace event to save a pointer to a string in the TP_fast_assign() and then display that string pointer in the TP_printk() with %s. The problem is that those two events may happen a long time apart, where the source of the string may no longer exist. The proper way to handle displaying any string that is not guaranteed to be in the kernel core rodata section, is to copy it into the ring buffer via the __string(), __assign_str() and __get_str() helper macros. Add a check at run time while displaying the TP_printk() of events to make sure that every %s referenced is safe to dereference, and if it is not, trigger a warning and only show the address of the pointer, and the dereferenced string if it can be safely retrieved with a strncpy_from_kernel_nofault() call. In order to not have to copy the parsing of vsnprintf() formats, or even exporting its code, the verifier relies on vsnprintf() being able to modify the va_list that is passed to it, and it remains modified after it is called. This is the case for some architectures like x86_64, but other architectures like x86_32 pass the va_list to vsnprintf() as a value not a reference, and the verifier can not use it to parse the non string arguments. Thus, at boot up, it is checked if vsnprintf() modifies the passed in va_list or not, and a static branch will disable the verifier if it's not compatible. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-03-18tracing: Add check of trace event print fmts for dereferencing pointersSteven Rostedt (VMware)1-0/+210
Trace events record data into the ring buffer at the time of the event. The trace event has a printf logic to display the recorded data at a much later time when the user reads the trace file. This makes using dereferencing pointers unsafe if the dereferenced pointer points to the original source. The safe way to handle this is to create an array within the trace event and copy the source into the array. Then the dereference pointer may point to that array. As this is a easy mistake to make, a check is added to examine all trace event print fmts to make sure that they are safe to use. This only checks the various %p* dereferenced pointers like %pB, %pR, etc. It does not handle dereferencing of strings, as there are some use cases that are OK to dereference the source. That will be dealt with differently. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-03-18tracing: Add tracing_event_time_stamp() APISteven Rostedt (VMware)2-0/+9
Add a tracing_event_time_stamp() API that checks if the event passed in is not on the ring buffer but a pointer to the per CPU trace_buffered_event which does not have its time stamp set yet. If it is a pointer to the trace_buffered_event, then just return the current time stamp that the ring buffer would produce. Otherwise, return the time stamp from the event. Link: https://lkml.kernel.org/r/20210316164114.131996180@goodmis.org Reviewed-by: Tom Zanussi <zanussi@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-03-18ring-buffer: Add verifier for using ring_buffer_event_time_stamp()Steven Rostedt (VMware)1-4/+52
The ring_buffer_event_time_stamp() must be only called by an event that has not been committed yet, and is on the buffer that is passed in. This was used to help debug converting the histogram logic over to using the new time stamp code, and was proven to be very useful. Add a verifier that can check that this is the case, and extra WARN_ONs to catch unexpected use cases. Link: https://lkml.kernel.org/r/20210316164113.987294354@goodmis.org Reviewed-by: Tom Zanussi <zanussi@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-03-18tracing: Use a no_filter_buffering_ref to stop using the filter bufferSteven Rostedt (VMware)3-21/+17
Currently, the trace histograms relies on it using absolute time stamps to trigger the tracing to not use the temp buffer if filters are set. That's because the histograms need the full timestamp that is saved in the ring buffer. That is no longer the case, as the ring_buffer_event_time_stamp() can now return the time stamp for all events without all triggering a full absolute time stamp. Now that the absolute time stamp is an unrelated dependency to not using the filters. There's nothing about having absolute timestamps to keep from using the filter buffer. Instead, change the interface to explicitly state to disable filter buffering that the histogram logic can use. Link: https://lkml.kernel.org/r/20210316164113.847886563@goodmis.org Reviewed-by: Tom Zanussi <zanussi@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-03-18ring-buffer: Allow ring_buffer_event_time_stamp() to return time stamp of ↵Steven Rostedt (VMware)2-16/+46
all events Currently, ring_buffer_event_time_stamp() only returns an accurate time stamp of the event if it has an absolute extended time stamp attached to it. To make it more robust, use the event_stamp() in case the event does not have an absolute value attached to it. This will allow ring_buffer_event_time_stamp() to be used in more cases than just histograms, and it will also allow histograms to not require including absolute values all the time. Link: https://lkml.kernel.org/r/20210316164113.704830885@goodmis.org Reviewed-by: Tom Zanussi <zanussi@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-03-18tracing: Pass buffer of event to trigger operationsSteven Rostedt (VMware)4-51/+92
The ring_buffer_event_time_stamp() is going to be updated to extract the time stamp for the event without needing it to be set to have absolute values for all events. But to do so, it needs the buffer that the event is on as the buffer saves information for the event before it is committed to the buffer. If the trace buffer is disabled, a temporary buffer is used, and there's no access to this buffer from the current histogram triggers, even though it is passed to the trace event code. Pass the buffer that the event is on all the way down to the histogram triggers. Link: https://lkml.kernel.org/r/20210316164113.542448131@goodmis.org Reviewed-by: Tom Zanussi <zanussi@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-03-18ring-buffer: Add a event_stamp to cpu_buffer for each level of nestingSteven Rostedt (VMware)1-1/+10
Add a place to save the current event time stamp for each level of nesting. This will be used to retrieve the time stamp of the current event before it is committed. Link: https://lkml.kernel.org/r/20210316164113.399089673@goodmis.org Reviewed-by: Tom Zanussi <zanussi@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-03-18ring-buffer: Separate out internal use of ring_buffer_event_time_stamp()Steven Rostedt (VMware)1-18/+23
The exported use of ring_buffer_event_time_stamp() is going to become different than how it is used internally. Move the internal logic out into a static function called rb_event_time_stamp(), and have the internal callers call that instead. Link: https://lkml.kernel.org/r/20210316164113.257790481@goodmis.org Reviewed-by: Tom Zanussi <zanussi@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-03-18Revert "PM: ACPI: reboot: Use S5 for reboot"Josef Bacik1-2/+0
This reverts commit d60cd06331a3566d3305b3c7b566e79edf4e2095. This patch causes a panic when rebooting my Dell Poweredge r440. I do not have the full panic log as it's lost at that stage of the reboot and I do not have a serial console. Reverting this patch makes my system able to reboot again. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-03-18bpf, devmap: Move drop error path to devmap for XDP_REDIRECTLorenzo Bianconi1-18/+12
We want to change the current ndo_xdp_xmit drop semantics because it will allow us to implement better queue overflow handling. This is working towards the larger goal of a XDP TX queue-hook. Move XDP_REDIRECT error path handling from each XDP ethernet driver to devmap code. According to the new APIs, the driver running the ndo_xdp_xmit pointer, will break tx loop whenever the hw reports a tx error and it will just return to devmap caller the number of successfully transmitted frames. It will be devmap responsibility to free dropped frames. Move each XDP ndo_xdp_xmit capable driver to the new APIs: - veth - virtio-net - mvneta - mvpp2 - socionext - amazon ena - bnxt - freescale (dpaa2, dpaa) - xen-frontend - qede - ice - igb - ixgbe - i40e - mlx5 - ti (cpsw, cpsw-new) - tun - sfc Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Ioana Ciornei <ioana.ciornei@nxp.com> Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Reviewed-by: Camelia Groza <camelia.groza@nxp.com> Acked-by: Edward Cree <ecree.xilinx@gmail.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Shay Agroskin <shayagr@amazon.com> Link: https://lore.kernel.org/bpf/ed670de24f951cfd77590decf0229a0ad7fd12f6.1615201152.git.lorenzo@kernel.org
2021-03-18time/debug: Remove dentry pointer for debugfsGreg Kroah-Hartman1-4/+3
There is no need to keep the dentry pointer around for the created debugfs file, as it is only needed when removing it from the system. When it is to be removed, ask debugfs itself for the pointer, to save on storage and make things a bit simpler. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20210216155020.1012407-1-gregkh@linuxfoundation.org
2021-03-18bpf: Fix fexit trampoline.Alexei Starovoitov3-53/+171
The fexit/fmod_ret programs can be attached to kernel functions that can sleep. The synchronize_rcu_tasks() will not wait for such tasks to complete. In such case the trampoline image will be freed and when the task wakes up the return IP will point to freed memory causing the crash. Solve this by adding percpu_ref_get/put for the duration of trampoline and separate trampoline vs its image life times. The "half page" optimization has to be removed, since first_half->second_half->first_half transition cannot be guaranteed to complete in deterministic time. Every trampoline update becomes a new image. The image with fmod_ret or fexit progs will be freed via percpu_ref_kill and call_rcu_tasks. Together they will wait for the original function and trampoline asm to complete. The trampoline is patched from nop to jmp to skip fexit progs. They are freed independently from the trampoline. The image with fentry progs only will be freed via call_rcu_tasks_trace+call_rcu_tasks which will wait for both sleepable and non-sleepable progs to complete. Fixes: fec56f5890d9 ("bpf: Introduce BPF trampoline") Reported-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Paul E. McKenney <paulmck@kernel.org> # for RCU Link: https://lore.kernel.org/bpf/20210316210007.38949-1-alexei.starovoitov@gmail.com
2021-03-17bpf: Add sanity check for upper ptr_limitPiotr Krysiuk1-3/+8
Given we know the max possible value of ptr_limit at the time of retrieving the latter, add basic assertions, so that the verifier can bail out if anything looks odd and reject the program. Nothing triggered this so far, but it also does not hurt to have these. Signed-off-by: Piotr Krysiuk <piotras@gmail.com> Co-developed-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org>
2021-03-17irq: Simplify condition in irq_matrix_reserve()Juergen Gross1-3/+2
The if condition in irq_matrix_reserve() can be much simpler. While at it fix a typo in the comment. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20210211070953.5914-1-jgross@suse.com
2021-03-17bpf: Simplify alu_limit masking for pointer arithmeticPiotr Krysiuk1-5/+5
Instead of having the mov32 with aux->alu_limit - 1 immediate, move this operation to retrieve_ptr_limit() instead to simplify the logic and to allow for subsequent sanity boundary checks inside retrieve_ptr_limit(). This avoids in future that at the time of the verifier masking rewrite we'd run into an underflow which would not sign extend due to the nature of mov32 instruction. Signed-off-by: Piotr Krysiuk <piotras@gmail.com> Co-developed-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org>
2021-03-17bpf: Fix off-by-one for area size in creating mask to leftPiotr Krysiuk1-2/+2
retrieve_ptr_limit() computes the ptr_limit for registers with stack and map_value type. ptr_limit is the size of the memory area that is still valid / in-bounds from the point of the current position and direction of the operation (add / sub). This size will later be used for masking the operation such that attempting out-of-bounds access in the speculative domain is redirected to remain within the bounds of the current map value. When masking to the right the size is correct, however, when masking to the left, the size is off-by-one which would lead to an incorrect mask and thus incorrect arithmetic operation in the non-speculative domain. Piotr found that if the resulting alu_limit value is zero, then the BPF_MOV32_IMM() from the fixup_bpf_calls() rewrite will end up loading 0xffffffff into AX instead of sign-extending to the full 64 bit range, and as a result, this allows abuse for executing speculatively out-of- bounds loads against 4GB window of address space and thus extracting the contents of kernel memory via side-channel. Fixes: 979d63d50c0c ("bpf: prevent out of bounds speculation on pointer arithmetic") Signed-off-by: Piotr Krysiuk <piotras@gmail.com> Co-developed-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org>
2021-03-17bpf: Prohibit alu ops for pointer types not defining ptr_limitPiotr Krysiuk1-6/+10
The purpose of this patch is to streamline error propagation and in particular to propagate retrieve_ptr_limit() errors for pointer types that are not defining a ptr_limit such that register-based alu ops against these types can be rejected. The main rationale is that a gap has been identified by Piotr in the existing protection against speculatively out-of-bounds loads, for example, in case of ctx pointers, unprivileged programs can still perform pointer arithmetic. This can be abused to execute speculatively out-of-bounds loads without restrictions and thus extract contents of kernel memory. Fix this by rejecting unprivileged programs that attempt any pointer arithmetic on unprotected pointer types. The two affected ones are pointer to ctx as well as pointer to map. Field access to a modified ctx' pointer is rejected at a later point in time in the verifier, and 7c6967326267 ("bpf: Permit map_ptr arithmetic with opcode add and offset 0") only relevant for root-only use cases. Risk of unprivileged program breakage is considered very low. Fixes: 7c6967326267 ("bpf: Permit map_ptr arithmetic with opcode add and offset 0") Fixes: b2157399cc98 ("bpf: prevent out-of-bounds speculation") Signed-off-by: Piotr Krysiuk <piotras@gmail.com> Co-developed-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org>
2021-03-17tick/sched: Prevent false positive softirq pending warnings on RTThomas Gleixner2-1/+16
On RT a task which has soft interrupts disabled can block on a lock and schedule out to idle while soft interrupts are pending. This triggers the warning in the NOHZ idle code which complains about going idle with pending soft interrupts. But as the task is blocked soft interrupt processing is temporarily blocked as well which means that such a warning is a false positive. To prevent that check the per CPU state which indicates that a scheduled out task has soft interrupts disabled. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Tested-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20210309085727.527563866@linutronix.de
2021-03-17softirq: Make softirq control and processing RT awareThomas Gleixner1-7/+181
Provide a local lock based serialization for soft interrupts on RT which allows the local_bh_disabled() sections and servicing soft interrupts to be preemptible. Provide the necessary inline helpers which allow to reuse the bulk of the softirq processing code. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Tested-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20210309085727.426370483@linutronix.de
2021-03-17softirq: Move various protections into inline helpersThomas Gleixner1-7/+32
To allow reuse of the bulk of softirq processing code for RT and to avoid #ifdeffery all over the place, split protections for various code sections out into inline helpers so the RT variant can just replace them in one go. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Tested-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20210309085727.310118772@linutronix.de
2021-03-17irqtime: Make accounting correct on RTThomas Gleixner1-2/+2
vtime_account_irq and irqtime_account_irq() base checks on preempt_count() which fails on RT because preempt_count() does not contain the softirq accounting which is seperate on RT. These checks do not need the full preempt count as they only operate on the hard and softirq sections. Use irq_count() instead which provides the correct value on both RT and non RT kernels. The compiler is clever enough to fold the masking for !RT: 99b: 65 8b 05 00 00 00 00 mov %gs:0x0(%rip),%eax - 9a2: 25 ff ff ff 7f and $0x7fffffff,%eax + 9a2: 25 00 ff ff 00 and $0xffff00,%eax Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Tested-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20210309085727.153926793@linutronix.de
2021-03-17tasklets: Prevent tasklet_unlock_spin_wait() deadlock on RTThomas Gleixner1-1/+27
tasklet_unlock_spin_wait() spin waits for the TASKLET_STATE_SCHED bit in the tasklet state to be cleared. This works on !RT nicely because the corresponding execution can only happen on a different CPU. On RT softirq processing is preemptible, therefore a task preempting the softirq processing thread can spin forever. Prevent this by invoking local_bh_disable()/enable() inside the loop. In case that the softirq processing thread was preempted by the current task, current will block on the local lock which yields the CPU to the preempted softirq processing thread. If the tasklet is processed on a different CPU then the local_bh_disable()/enable() pair is just a waste of processor cycles. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20210309084241.988908275@linutronix.de
2021-03-17tasklets: Replace spin wait in tasklet_kill()Peter Zijlstra1-9/+9
tasklet_kill() spin waits for TASKLET_STATE_SCHED to be cleared invoking yield() from inside the loop. yield() is an ill defined mechanism and the result might still be wasting CPU cycles in a tight loop which is especially painful in a guest when the CPU running the tasklet is scheduled out. tasklet_kill() is used in teardown paths and not performance critical at all. Replace the spin wait with wait_var_event(). Signed-off-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20210309084241.890532921@linutronix.de
2021-03-17tasklets: Replace spin wait in tasklet_unlock_wait()Peter Zijlstra1-0/+18
tasklet_unlock_wait() spin waits for TASKLET_STATE_RUN to be cleared. This is wasting CPU cycles in a tight loop which is especially painful in a guest when the CPU running the tasklet is scheduled out. tasklet_unlock_wait() is invoked from tasklet_kill() which is used in teardown paths and not performance critical at all. Replace the spin wait with wait_var_event(). There are no users of tasklet_unlock_wait() which are invoked from atomic contexts. The usage in tasklet_disable() has been replaced temporarily with the spin waiting variant until the atomic users are fixed up and will be converted to the sleep wait variant later. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20210309084241.783936921@linutronix.de
2021-03-17rseq, ptrace: Add PTRACE_GET_RSEQ_CONFIGURATION requestPiotr Figiel1-0/+25
For userspace checkpoint and restore (C/R) a way of getting process state containing RSEQ configuration is needed. There are two ways this information is going to be used: - to re-enable RSEQ for threads which had it enabled before C/R - to detect if a thread was in a critical section during C/R Since C/R preserves TLS memory and addresses RSEQ ABI will be restored using the address registered before C/R. Detection whether the thread is in a critical section during C/R is needed to enforce behavior of RSEQ abort during C/R. Attaching with ptrace() before registers are dumped itself doesn't cause RSEQ abort. Restoring the instruction pointer within the critical section is problematic because rseq_cs may get cleared before the control is passed to the migrated application code leading to RSEQ invariants not being preserved. C/R code will use RSEQ ABI address to find the abort handler to which the instruction pointer needs to be set. To achieve above goals expose the RSEQ ABI address and the signature value with the new ptrace request PTRACE_GET_RSEQ_CONFIGURATION. This new ptrace request can also be used by debuggers so they are aware of stops within restartable sequences in progress. Signed-off-by: Piotr Figiel <figiel@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Michal Miroslaw <emmir@google.com> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Link: https://lkml.kernel.org/r/20210226135156.1081606-1-figiel@google.com
2021-03-17quota: wire up quotactl_pathSascha Hauer1-0/+1
Wire up the quotactl_path syscall added in the previous patch. Link: https://lore.kernel.org/r/20210304123541.30749-3-s.hauer@pengutronix.de Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>
2021-03-17softirq: s/BUG/WARN_ONCE/ on tasklet SCHED state not setDirk Behme1-7/+18
Replace BUG() with WARN_ONCE() on wrong tasklet state, in order to: - increase the verbosity / aid in debugging - avoid fatal/unrecoverable state Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Dirk Behme <dirk.behme@de.bosch.com> Signed-off-by: Eugeniu Rosca <erosca@de.adit-jv.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210317102012.32399-1-erosca@de.adit-jv.com
2021-03-17locking/ww_mutex: Simplify use_ww_ctx & ww_ctx handlingWaiman Long1-11/+14
The use_ww_ctx flag is passed to mutex_optimistic_spin(), but the function doesn't use it. The frequent use of the (use_ww_ctx && ww_ctx) combination is repetitive. In fact, ww_ctx should not be used at all if !use_ww_ctx. Simplify ww_mutex code by dropping use_ww_ctx from mutex_optimistic_spin() an clear ww_ctx if !use_ww_ctx. In this way, we can replace (use_ww_ctx && ww_ctx) by just (ww_ctx). Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Davidlohr Bueso <dbueso@suse.de> Link: https://lore.kernel.org/r/20210316153119.13802-2-longman@redhat.com
2021-03-17locking/rwsem: Fix comment typoBhaskar Chowdhury1-1/+1
s/folowing/following/ Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Randy Dunlap <rdunlap@infradead.org> Link: https://lore.kernel.org/r/20210317041806.4096156-1-unixbhaskar@gmail.com
2021-03-17swiotlb: lift the double initialization protection from xen-swiotlbChristoph Hellwig1-0/+8
Lift the double initialization protection from xen-swiotlb to the core code to avoid exposing too many swiotlb internals. Also upgrade the check to a warning as it should not happen. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2021-03-17swiotlb: split swiotlb_tbl_sync_singleChristoph Hellwig3-25/+21
Split swiotlb_tbl_sync_single into two separate funtions for the to device and to cpu synchronization. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2021-03-17swiotlb: move orig addr and size validation into swiotlb_bounceChristoph Hellwig1-36/+23
Move the code to find and validate the original buffer address and size from the callers into swiotlb_bounce. This means a tiny bit of extra work in the swiotlb_map path, but avoids code duplication and a leads to a better code structure. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2021-03-17swiotlb: remove the alloc_size parameter to swiotlb_tbl_unmap_singleChristoph Hellwig2-23/+24
Now that swiotlb remembers the allocation size there is no need to pass it back to swiotlb_tbl_unmap_single. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2021-03-17ftrace: Fix modify_ftrace_direct.Alexei Starovoitov1-5/+38
The following sequence of commands: register_ftrace_direct(ip, addr1); modify_ftrace_direct(ip, addr1, addr2); unregister_ftrace_direct(ip, addr2); will cause the kernel to warn: [ 30.179191] WARNING: CPU: 2 PID: 1961 at kernel/trace/ftrace.c:5223 unregister_ftrace_direct+0x130/0x150 [ 30.180556] CPU: 2 PID: 1961 Comm: test_progs W O 5.12.0-rc2-00378-g86bc10a0a711-dirty #3246 [ 30.182453] RIP: 0010:unregister_ftrace_direct+0x130/0x150 When modify_ftrace_direct() changes the addr from old to new it should update the addr stored in ftrace_direct_funcs. Otherwise the final unregister_ftrace_direct() won't find the address and will cause the splat. Fixes: 0567d6809182 ("ftrace: Add modify_ftrace_direct()") Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Link: https://lore.kernel.org/bpf/20210316195815.34714-1-alexei.starovoitov@gmail.com
2021-03-17kernel, fs: Introduce and use set_restart_fn() and arch_set_restart_data()Oleg Nesterov4-5/+4
Preparation for fixing get_nr_restart_syscall() on X86 for COMPAT. Add a new helper which sets restart_block->fn and calls a dummy arch_set_restart_data() helper. Fixes: 609c19a385c8 ("x86/ptrace: Stop setting TS_COMPAT in ptrace code") Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210201174641.GA17871@redhat.com