summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)AuthorFilesLines
2018-12-04bpf: add per-insn complexity limitAlexei Starovoitov1-1/+6
malicious bpf program may try to force the verifier to remember a lot of distinct verifier states. Put a limit to number of per-insn 'struct bpf_verifier_state'. Note that hitting the limit doesn't reject the program. It potentially makes the verifier do more steps to analyze the program. It means that malicious programs will hit BPF_COMPLEXITY_LIMIT_INSNS sooner instead of spending cpu time walking long link list. The limit of BPF_COMPLEXITY_LIMIT_STATES==64 affects cilium progs with slight increase in number of "steps" it takes to successfully verify the programs: before after bpf_lb-DLB_L3.o 1940 1940 bpf_lb-DLB_L4.o 3089 3089 bpf_lb-DUNKNOWN.o 1065 1065 bpf_lxc-DDROP_ALL.o 28052 | 28162 bpf_lxc-DUNKNOWN.o 35487 | 35541 bpf_netdev.o 10864 10864 bpf_overlay.o 6643 6643 bpf_lcx_jit.o 38437 38437 But it also makes malicious program to be rejected in 0.4 seconds vs 6.5 Hence apply this limit to unprivileged programs only. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Edward Cree <ecree@solarflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-12-04bpf: improve verifier branch analysisAlexei Starovoitov1-13/+80
pathological bpf programs may try to force verifier to explode in the number of branch states: 20: (d5) if r1 s<= 0x24000028 goto pc+0 21: (b5) if r0 <= 0xe1fa20 goto pc+2 22: (d5) if r1 s<= 0x7e goto pc+0 23: (b5) if r0 <= 0xe880e000 goto pc+0 24: (c5) if r0 s< 0x2100ecf4 goto pc+0 25: (d5) if r1 s<= 0xe880e000 goto pc+1 26: (c5) if r0 s< 0xf4041810 goto pc+0 27: (d5) if r1 s<= 0x1e007e goto pc+0 28: (b5) if r0 <= 0xe86be000 goto pc+0 29: (07) r0 += 16614 30: (c5) if r0 s< 0x6d0020da goto pc+0 31: (35) if r0 >= 0x2100ecf4 goto pc+0 Teach verifier to recognize always taken and always not taken branches. This analysis is already done for == and != comparison. Expand it to all other branches. It also helps real bpf programs to be verified faster: before after bpf_lb-DLB_L3.o 2003 1940 bpf_lb-DLB_L4.o 3173 3089 bpf_lb-DUNKNOWN.o 1080 1065 bpf_lxc-DDROP_ALL.o 29584 28052 bpf_lxc-DUNKNOWN.o 36916 35487 bpf_netdev.o 11188 10864 bpf_overlay.o 6679 6643 bpf_lcx_jit.o 39555 38437 Reported-by: Anatoly Trosinenko <anatoly.trosinenko@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Edward Cree <ecree@solarflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-12-04bpf: check pending signals while verifying programsAlexei Starovoitov1-0/+3
Malicious user space may try to force the verifier to use as much cpu time and memory as possible. Hence check for pending signals while verifying the program. Note that suspend of sys_bpf(PROG_LOAD) syscall will lead to EAGAIN, since the kernel has to release the resources used for program verification. Reported-by: Anatoly Trosinenko <anatoly.trosinenko@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Edward Cree <ecree@solarflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-12-04Merge branch 'for-mingo' of ↵Ingo Molnar28-583/+824
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu Pull RCU changes from Paul E. McKenney: - Convert RCU's BUG_ON() and similar calls to WARN_ON() and similar. - Replace calls of RCU-bh and RCU-sched update-side functions to their vanilla RCU counterparts. This series is a step towards complete removal of the RCU-bh and RCU-sched update-side functions. ( Note that some of these conversions are going upstream via their respective maintainers. ) - Documentation updates, including a number of flavor-consolidation updates from Joel Fernandes. - Miscellaneous fixes. - Automate generation of the initrd filesystem used for rcutorture testing. - Convert spin_is_locked() assertions to instead use lockdep. ( Note that some of these conversions are going upstream via their respective maintainers. ) - SRCU updates, especially including a fix from Dennis Krein for a bag-on-head-class bug. - RCU torture-test updates. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-04audit: shorten PATH cap values when zeroRichard Guy Briggs1-4/+6
Since the vast majority of files (99.993% on a typical system) have no fcaps, display "0" instead of the full zero-padded 16 hex digits in the two PATH record cap_f* fields to save netlink bandwidth and disk space. Simply changing the format to %x won't work since the value is two (or possibly more in the future) 32-bit hexadecimal values concatenated and bits in higher order values will be misrepresented. Passes audit-testsuite and userspace tools already work fine. Please see the github issue tracker for more details https://github.com/linux-audit/audit-kernel/issues/101 Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Acked-by: Steve Grubb <sgrubb@redhat.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2018-12-03cpuset: Remove set but not used variable 'cs'YueHaibing1-2/+0
Fixes gcc '-Wunused-but-set-variable' warning: kernel/cgroup/cpuset.c: In function 'cpuset_cancel_attach': kernel/cgroup/cpuset.c:2167:17: warning: variable 'cs' set but not used [-Wunused-but-set-variable] It never used since introduction in commit 1f7dd3e5a6e4 ("cgroup: fix handling of multi-destination migration from subtree_control enabling") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2018-12-03sched: Fix various typos in commentsIngo Molnar6-16/+16
Go over the scheduler source code and fix common typos in comments - and a typo in an actual variable name. No change in functionality intended. Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-03Merge tag 'v4.20-rc5' into irq/core, to pick up fixesIngo Molnar32-188/+380
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-03Merge tag 'v4.20-rc5' into sched/core, to pick up fixesIngo Molnar67-1307/+1938
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-03perf: Fix typos in commentsIngo Molnar2-2/+2
Fix two typos in kernel/events/*. No change in functionality intended. Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-02bpf: Fix memleak in aux->func_info and aux->btfMartin KaFai Lau1-0/+2
The aux->func_info and aux->btf are leaked in the error out cases during bpf_prog_load(). This patch fixes it. Fixes: ba64e7d85252 ("bpf: btf: support proper non-jit func info") Cc: Yonghong Song <yhs@fb.com> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-12-01rcutorture: Don't do busted forward-progress testingPaul E. McKenney1-1/+2
The "busted" rcutorture type is an intentionally broken implementation of RCU. Doing forward-progress testing on this implementation is not particularly meaningful on the one hand and can result in fatal abuse of the memory allocator on the other. This commit therefore disables forward-progress testing of the "busted" rcutorture type. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2018-12-01rcutorture: Use 100ms buckets for forward-progress callback histogramsPaul E. McKenney1-3/+5
This commit narrows the scope of each bucket of the forward-progress callback-invocation histograms from one second to 100 milliseconds, which aids debugging of forward-progress problems by making shorter-duration callback-invocation stalls visible. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2018-12-01rcutorture: Recover from OOM during forward-progress testsPaul E. McKenney1-11/+49
This commit causes the OOM handler to do rcu_barrier() calls and to free up forward-progress callbacks in order to recover from OOM events. The current test is terminated, but subsequent forward-progress tests can proceed. This allows a long test to result in multiple forward-progress failures, greatly reducing the required testing time. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2018-12-01rcutorture: Print forward-progress test age upon failurePaul E. McKenney1-1/+2
This commit prints the age of the forward-progress test in jiffies, in order to allow better interpretation of the callback-invocation histograms. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2018-12-01rcutorture: Print time since GP end upon forward-progress failurePaul E. McKenney2-1/+6
If rcutorture's forward-progress tests fail while a grace period is not in progress, it is useful to print the time since the last grace period ended as a way to detect failure to launch a new grace period. This commit therefore makes this change. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2018-12-01rcutorture: Print histogram of CB invocation at OOM timePaul E. McKenney1-8/+16
One reason why a forward-progress test might fail would be if something prevented or delayed callback invocation. This commit therefore adds a callback-invocation histogram printout when OOM is reported to rcutorture. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2018-12-01rcutorture: Print GP age upon forward-progress failurePaul E. McKenney1-0/+2
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2018-12-01rcu: Print per-CPU callback counts for forward-progress failuresPaul E. McKenney1-0/+18
This commit prints out the non-zero per-CPU callback counts when a forware-progress error (OOM event) occurs. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com> [ paulmck: Fix a pair of uninitialized locals spotted by kbuild test robot. ]
2018-12-01rcu: Account for nocb-CPU callback counts in RCU CPU stall warningsPaul E. McKenney3-9/+35
The RCU CPU stall warnings print an estimate of the total number of RCU callbacks queued in the system, but this estimate leaves out the callbacks queued for nocbs CPUs. This commit therefore introduces rcu_get_n_cbs_cpu(), which gives an accurate callback estimate for both nocbs and normal CPUs, and uses this new function as needed. This commit also introduces a rcu_get_n_cbs_nocb_cpu() helper function that returns the number of callbacks for nocbs CPUs or zero otherwise, and also uses this function in place of direct access to ->nocb_q_count while in the area (fewer characters, you see). Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2018-12-01rcutorture: Dump grace-period diagnostics upon forward-progress OOMPaul E. McKenney3-3/+50
This commit adds an OOM notifier during rcutorture forward-progress testing. If this notifier is invoked, it dumps out some grace-period state to help debug the forward-progress problem. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2018-12-01rcutorture: Prepare for asynchronous access to rcu_fwd_startatPaul E. McKenney1-2/+2
Because rcutorture's forward-progress checking will trigger from an OOM notifier, this notifier will introduce asynchronous concurrent access to the rcu_fwd_startat variable. This commit therefore prepares for this by converting updates to WRITE_ONCE(). Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2018-12-01torture: Remove unnecessary "ret" variablesPierce Griffiths1-14/+8
Remove return variables (declared as "ret") in cases where, depending on whether a condition evaluates as true, the result of a function call can be immediately returned instead of storing the result in the return variable. When the condition evaluates as false, the constant initially stored in the return variable at declaration is returned instead. Signed-off-by: Pierce Griffiths <pierceagriffiths@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2018-12-01rcutorture: Affinity forward-progress test to avoid housekeeping CPUsPaul E. McKenney3-0/+14
This commit affinities the forward-progress tests to avoid hogging a housekeeping CPU on the theory that the offloaded callbacks will be running on those housekeeping CPUs. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com> [ paulmck: Fix NULL-pointer issue located by kbuild test robot. ] Tested-by: Rong Chen <rong.a.chen@intel.com>
2018-12-01rcutorture: Break up too-long rcu_torture_fwd_prog() functionPaul E. McKenney1-119/+135
This commit splits rcu_torture_fwd_prog_nr() and rcu_torture_fwd_prog_cr() functions out of rcu_torture_fwd_prog() in order to reduce indentation pain and because rcu_torture_fwd_prog() was getting a bit too long. In addition, this will enable easier conditional execution of the rcu_torture_fwd_prog_cr() function, which can give false-positive failures in some NO_HZ_FULL configurations due to overloading the housekeeping CPUs. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-12-01rcutorture: Remove cbflood facilityPaul E. McKenney1-85/+1
Now that the forward-progress code does a full-bore continuous callback flood lasting multiple seconds, there is little point in also posting a mere 60,000 callbacks every second or so. This commit therefore removes the old cbflood testing. Over time, it may be desirable to concurrently do full-bore continuous callback floods on all CPUs simultaneously, but one dragon at a time. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-12-01torture: Bring any extra CPUs online during kernel startupPaul E. McKenney1-0/+12
Currently, the torture scripts rely on the initrd/init script to bring any extra CPUs online, for example, in the case where the kernel and qemu have different ideas about how many CPUs are present. This works, but is an unnecessary dependency on initrd, which needs to vary depending on the distro. This commit therefore causes torture_onoff() to check for additional CPUs, attempting to bring any found online. Errors are ignored, just as they are by the initrd/init script. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-12-01rcutorture: Add call_rcu() flooding forward-progress testsPaul E. McKenney1-2/+127
This commit adds a call_rcu() flooding loop to the forward-progress test. This emulates tight userspace loops that force call_rcu() invocations, for example, the infamous loop containing close(open()) that instigated the addition of blimit. If RCU does not make sufficient forward progress in invoking the resulting flood of callbacks, rcutorture emits a warning. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-12-01Merge branches 'bug.2018.11.12a', 'consolidate.2018.12.01a', ↵Paul E. McKenney25-418/+430
'doc.2018.11.12a', 'fixes.2018.11.12a', 'initrd.2018.11.08b', 'sil.2018.11.12a' and 'srcu.2018.11.27a' into HEAD bug.2018.11.12a: Get rid of BUG_ON() and friends consolidate.2018.12.01a: Continued RCU flavor-consolidation cleanup doc.2018.11.12a: Documentation updates fixes.2018.11.12a: Miscellaneous fixes initrd.2018.11.08b: Automate creation of rcutorture initrd sil.2018.11.12a: Remove more spin_unlock_wait() calls
2018-12-01livepatch: Replace synchronize_sched() with synchronize_rcu()Paul E. McKenney2-4/+4
Now that synchronize_rcu() waits for preempt-disable regions of code as well as RCU read-side critical sections, synchronize_sched() can be replaced by synchronize_rcu(). This commit therefore makes this change, even though it is but a comment. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2018-12-01cgroups: Replace synchronize_sched() with synchronize_rcu()Paul E. McKenney1-1/+1
Now that synchronize_rcu() waits for preempt-disable regions of code as well as RCU read-side critical sections, synchronize_sched() can be replaced by synchronize_rcu(). This commit therefore makes this change, even though it is but a comment. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Dennis Zhou <dennis@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "Dennis Zhou (Facebook)" <dennisszhou@gmail.com> Acked-by: Tejun Heo <tj@kernel.org>
2018-12-01Merge branch 'x86-pti-for-linus' of ↵Linus Torvalds4-27/+21
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull STIBP fallout fixes from Thomas Gleixner: "The performance destruction department finally got it's act together and came up with a cure for the STIPB regression: - Provide a command line option to control the spectre v2 user space mitigations. Default is either seccomp or prctl (if seccomp is disabled in Kconfig). prctl allows mitigation opt-in, seccomp enables the migitation for sandboxed processes. - Rework the code to handle the conditional STIBP/IBPB control and remove the now unused ptrace_may_access_sched() optimization attempt - Disable STIBP automatically when SMT is disabled - Optimize the switch_to() logic to avoid MSR writes and invocations of __switch_to_xtra(). - Make the asynchronous speculation TIF updates synchronous to prevent stale mitigation state. As a general cleanup this also makes retpoline directly depend on compiler support and removes the 'minimal retpoline' option which just pretended to provide some form of security while providing none" * 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (31 commits) x86/speculation: Provide IBPB always command line options x86/speculation: Add seccomp Spectre v2 user space protection mode x86/speculation: Enable prctl mode for spectre_v2_user x86/speculation: Add prctl() control for indirect branch speculation x86/speculation: Prepare arch_smt_update() for PRCTL mode x86/speculation: Prevent stale SPEC_CTRL msr content x86/speculation: Split out TIF update ptrace: Remove unused ptrace_may_access_sched() and MODE_IBRS x86/speculation: Prepare for conditional IBPB in switch_mm() x86/speculation: Avoid __switch_to_xtra() calls x86/process: Consolidate and simplify switch_to_xtra() code x86/speculation: Prepare for per task indirect branch speculation control x86/speculation: Add command line control for indirect branch speculation x86/speculation: Unify conditional spectre v2 print functions x86/speculataion: Mark command line parser data __initdata x86/speculation: Mark string arrays const correctly x86/speculation: Reorder the spec_v2 code x86/l1tf: Show actual SMT state x86/speculation: Rework SMT state change sched/smt: Expose sched_smt_present static key ...
2018-12-01dma-remap: support DMA_ATTR_NO_KERNEL_MAPPINGChristoph Hellwig1-2/+9
Do not waste vmalloc space on allocations that do not require a mapping into the kernel address space. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Robin Murphy <robin.murphy@arm.com>
2018-12-01dma-mapping: support highmem in the generic remap allocatorChristoph Hellwig1-7/+7
By using __dma_direct_alloc_pages we can deal entirely with struct page instead of having to derive a kernel virtual address. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Robin Murphy <robin.murphy@arm.com>
2018-12-01dma-mapping: move the arm64 noncoherent alloc/free support to common codeChristoph Hellwig2-1/+162
The arm64 codebase to implement coherent dma allocation for architectures with non-coherent DMA is a good start for a generic implementation, given that is uses the generic remap helpers, provides the atomic pool for allocations that can't sleep and still is realtively simple and well tested. Move it to kernel/dma and allow architectures to opt into it using a config symbol. Architectures just need to provide a new arch_dma_prep_coherent helper to writeback an invalidate the caches for any memory that gets remapped for uncached access. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Will Deacon <will.deacon@arm.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com>
2018-12-01dma-mapping: move the remap helpers to a separate fileChristoph Hellwig4-85/+93
The dma remap code only makes sense for not cache coherent architectures (or possibly the corner case of highmem CMA allocations) and currently is only used by arm, arm64, csky and xtensa. Split it out into a separate file with a separate Kconfig symbol, which gets the right copyright notice given that this code was written by Laura Abbott working for Code Aurora at that point. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Laura Abbott <labbott@redhat.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com>
2018-12-01dma-direct: reject highmem pages from dma_alloc_from_contiguousChristoph Hellwig1-0/+12
dma_alloc_from_contiguous can return highmem pages depending on the setup, which a plain non-remapping DMA allocator can't handle. Detect this case and fail the allocation. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Robin Murphy <robin.murphy@arm.com>
2018-12-01dma-direct: provide page based alloc/free helpersChristoph Hellwig1-10/+22
Some architectures support remapping highmem into DMA coherent allocations. To use the common code for them we need variants of dma_direct_{alloc,free}_pages that do not use kernel virtual addresses. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Robin Murphy <robin.murphy@arm.com>
2018-12-01bpf: Add BPF_F_ANY_ALIGNMENT.David Miller2-1/+8
Often we want to write tests cases that check things like bad context offset accesses. And one way to do this is to use an odd offset on, for example, a 32-bit load. This unfortunately triggers the alignment checks first on platforms that do not set CONFIG_EFFICIENT_UNALIGNED_ACCESS. So the test case see the alignment failure rather than what it was testing for. It is often not completely possible to respect the original intention of the test, or even test the same exact thing, while solving the alignment issue. Another option could have been to check the alignment after the context and other validations are performed by the verifier, but that is a non-trivial change to the verifier. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-12-01Merge branch 'akpm' (patches from Andrew)Linus Torvalds3-15/+27
Merge misc fixes from Andrew Morton: "31 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (31 commits) ocfs2: fix potential use after free mm/khugepaged: fix the xas_create_range() error path mm/khugepaged: collapse_shmem() do not crash on Compound mm/khugepaged: collapse_shmem() without freezing new_page mm/khugepaged: minor reorderings in collapse_shmem() mm/khugepaged: collapse_shmem() remember to clear holes mm/khugepaged: fix crashes due to misaccounted holes mm/khugepaged: collapse_shmem() stop if punched or truncated mm/huge_memory: fix lockdep complaint on 32-bit i_size_read() mm/huge_memory: splitting set mapping+index before unfreeze mm/huge_memory: rename freeze_page() to unmap_page() initramfs: clean old path before creating a hardlink kernel/kcov.c: mark funcs in __sanitizer_cov_trace_pc() as notrace psi: make disabling/enabling easier for vendor kernels proc: fixup map_files test on arm debugobjects: avoid recursive calls with kmemleak userfaultfd: shmem: UFFDIO_COPY: set the page dirty if VM_WRITE is not set userfaultfd: shmem: add i_size checks userfaultfd: shmem/hugetlbfs: only allow to register VM_MAYWRITE vmas userfaultfd: shmem: allocate anonymous memory for MAP_PRIVATE shmem ...
2018-12-01Merge tag 'gcc-plugins-v4.20-rc5' of ↵Linus Torvalds1-1/+3
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull stackleak plugin fix from Kees Cook: "Fix crash by not allowing kprobing of stackleak_erase() (Alexander Popov)" * tag 'gcc-plugins-v4.20-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: stackleak: Disable function tracing and kprobes for stackleak_erase()
2018-12-01kernel/kcov.c: mark funcs in __sanitizer_cov_trace_pc() as notraceAnders Roxell1-2/+2
Since __sanitizer_cov_trace_pc() is marked as notrace, function calls in __sanitizer_cov_trace_pc() shouldn't be traced either. ftrace_graph_caller() gets called for each function that isn't marked 'notrace', like canonicalize_ip(). This is the call trace from a run: [ 139.644550] ftrace_graph_caller+0x1c/0x24 [ 139.648352] canonicalize_ip+0x18/0x28 [ 139.652313] __sanitizer_cov_trace_pc+0x14/0x58 [ 139.656184] sched_clock+0x34/0x1e8 [ 139.659759] trace_clock_local+0x40/0x88 [ 139.663722] ftrace_push_return_trace+0x8c/0x1f0 [ 139.667767] prepare_ftrace_return+0xa8/0x100 [ 139.671709] ftrace_graph_caller+0x1c/0x24 Rework so that check_kcov_mode() and canonicalize_ip() that are called from __sanitizer_cov_trace_pc() are also marked as notrace. Link: http://lkml.kernel.org/r/20181128081239.18317-1-anders.roxell@linaro.org Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signen-off-by: Anders Roxell <anders.roxell@linaro.org> Co-developed-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-01psi: make disabling/enabling easier for vendor kernelsJohannes Weiner2-13/+25
Mel Gorman reports a hackbench regression with psi that would prohibit shipping the suse kernel with it default-enabled, but he'd still like users to be able to opt in at little to no cost to others. With the current combination of CONFIG_PSI and the psi_disabled bool set from the commandline, this is a challenge. Do the following things to make it easier: 1. Add a config option CONFIG_PSI_DEFAULT_DISABLED that allows distros to enable CONFIG_PSI in their kernel but leave the feature disabled unless a user requests it at boot-time. To avoid double negatives, rename psi_disabled= to psi=. 2. Make psi_disabled a static branch to eliminate any branch costs when the feature is disabled. In terms of numbers before and after this patch, Mel says: : The following is a comparision using CONFIG_PSI=n as a baseline against : your patch and a vanilla kernel : : 4.20.0-rc4 4.20.0-rc4 4.20.0-rc4 : kconfigdisable-v1r1 vanilla psidisable-v1r1 : Amean 1 1.3100 ( 0.00%) 1.3923 ( -6.28%) 1.3427 ( -2.49%) : Amean 3 3.8860 ( 0.00%) 4.1230 * -6.10%* 3.8860 ( -0.00%) : Amean 5 6.8847 ( 0.00%) 8.0390 * -16.77%* 6.7727 ( 1.63%) : Amean 7 9.9310 ( 0.00%) 10.8367 * -9.12%* 9.9910 ( -0.60%) : Amean 12 16.6577 ( 0.00%) 18.2363 * -9.48%* 17.1083 ( -2.71%) : Amean 18 26.5133 ( 0.00%) 27.8833 * -5.17%* 25.7663 ( 2.82%) : Amean 24 34.3003 ( 0.00%) 34.6830 ( -1.12%) 32.0450 ( 6.58%) : Amean 30 40.0063 ( 0.00%) 40.5800 ( -1.43%) 41.5087 ( -3.76%) : Amean 32 40.1407 ( 0.00%) 41.2273 ( -2.71%) 39.9417 ( 0.50%) : : It's showing that the vanilla kernel takes a hit (as the bisection : indicated it would) and that disabling PSI by default is reasonably : close in terms of performance for this particular workload on this : particular machine so; Link: http://lkml.kernel.org/r/20181127165329.GA29728@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Tested-by: Mel Gorman <mgorman@techsingularity.net> Reported-by: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-30Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds1-2/+10
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Misc fixes: - counter freezing related regression fix - uprobes race fix - Intel PMU unusual event combination fix - .. and diverse tooling fixes" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: uprobes: Fix handle_swbp() vs. unregister() + register() race once more perf/x86/intel: Disallow precise_ip on BTS events perf/x86/intel: Add generic branch tracing check to intel_pmu_has_bts() perf/x86/intel: Move branch tracing setup to the Intel-specific source file perf/x86/intel: Fix regression by default disabling perfmon v4 interrupt handling perf tools beauty ioctl: Support new ISO7816 commands tools uapi asm-generic: Synchronize ioctls.h tools arch x86: Update tools's copy of cpufeatures.h tools headers uapi: Synchronize i915_drm.h perf tools: Restore proper cwd on return from mnt namespace tools build feature: Check if get_current_dir_name() is available perf tools: Fix crash on synthesizing the unit
2018-11-30Merge tag 'trace-v4.20-rc4' of ↵Linus Torvalds4-3/+62
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull more tracing fixes from Steven Rostedt: "Two more fixes: - Change idx variable in DO_TRACE macro to __idx to avoid name conflicts. A kvm event had "idx" as a parameter and it confused the macro. - Fix a race where interrupts would be traced when set_graph_function was set. The previous patch set increased a race window that tricked the function graph tracer to think it should trace interrupts when it really should not have. The bug has been there before, but was seldom hit. Only the last patch series made it more common" * tag 'trace-v4.20-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing/fgraph: Fix set_graph_function from showing interrupts tracepoint: Use __idx instead of idx in DO_TRACE macro to make it unique
2018-11-30Merge tag 'trace-v4.20-rc3' of ↵Linus Torvalds2-13/+43
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: "While rewriting the function graph tracer, I discovered a design flaw that was introduced by a patch that tried to fix one bug, but by doing so created another bug. As both bugs corrupt the output (but they do not crash the kernel), I decided to fix the design such that it could have both bugs fixed. The original fix, fixed time reporting of the function graph tracer when doing a max_depth of one. This was code that can test how much the kernel interferes with userspace. But in doing so, it could corrupt the time keeping of the function profiler. The issue is that the curr_ret_stack variable was being used for two different meanings. One was to keep track of the stack pointer on the ret_stack (shadow stack used by the function graph tracer), and the other use case was the graph call depth. Although, the two may be closely related, where they got updated was the issue that lead to the two different bugs that required the two use cases to be updated differently. The big issue with this fix is that it requires changing each architecture. The good news is, I was able to remove a lot of code that was duplicated within the architectures and place it into a single location. Then I could make the fix in one place. I pushed this code into linux-next to let it settle over a week, and before doing so, I cross compiled all the affected architectures to make sure that they built fine. In the mean time, I also pulled in a patch that fixes the sched_switch previous tasks state output, that was not actually correct" * tag 'trace-v4.20-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: sched, trace: Fix prev_state output in sched_switch tracepoint function_graph: Have profiler use curr_ret_stack and not depth function_graph: Reverse the order of pushing the ret_stack and the callback function_graph: Move return callback before update of curr_ret_stack function_graph: Use new curr_ret_depth to manage depth instead of curr_ret_stack function_graph: Make ftrace_push_return_trace() static sparc/function_graph: Simplify with function_graph_enter() sh/function_graph: Simplify with function_graph_enter() s390/function_graph: Simplify with function_graph_enter() riscv/function_graph: Simplify with function_graph_enter() powerpc/function_graph: Simplify with function_graph_enter() parisc: function_graph: Simplify with function_graph_enter() nds32: function_graph: Simplify with function_graph_enter() MIPS: function_graph: Simplify with function_graph_enter() microblaze: function_graph: Simplify with function_graph_enter() arm64: function_graph: Simplify with function_graph_enter() ARM: function_graph: Simplify with function_graph_enter() x86/function_graph: Simplify with function_graph_enter() function_graph: Create function_graph_enter() to consolidate architecture code
2018-11-30stackleak: Disable function tracing and kprobes for stackleak_erase()Alexander Popov1-1/+3
The stackleak_erase() function is called on the trampoline stack at the end of syscall. This stack is not big enough for ftrace and kprobes operations, e.g. it can be exhausted if we use kprobe_events for stackleak_erase(). So let's disable function tracing and kprobes of stackleak_erase(). Reported-by: kernel test robot <lkp@intel.com> Fixes: 10e9ae9fabaf ("gcc-plugins: Add STACKLEAK plugin for tracking the kernel stack") Signed-off-by: Alexander Popov <alex.popov@linux.com> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Kees Cook <keescook@chromium.org>
2018-11-30fgraph: Have set_graph_notrace only affect function_graph tracerSteven Rostedt (VMware)3-21/+29
In order to make the function graph infrastructure more generic, there can not be code specific for the function_graph tracer in the generic code. This includes the set_graph_notrace logic, that stops all graph calls when a function in the set_graph_notrace is hit. By using the trace_recursion mask, we can use a bit in the current task_struct to implement the notrace code, and move the logic out of fgraph.c and into trace_functions_graph.c and keeps it affecting only the tracer and not all call graph callbacks. Acked-by: Namhyung Kim <namhyung@kernel.org> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-11-30fgraph: Create a fgraph.c file to store function graph infrastructureSteven Rostedt (VMware)3-220/+233
As the function graph infrastructure can be used by thing other than tracing, moving the code to its own file out of the trace_functions_graph.c code makes more sense. The fgraph.c file will only contain the infrastructure required to hook into functions and their return code. Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-11-30tracing: Do not line wrap short line in function_graph_enter()Steven Rostedt (VMware)1-2/+1
Commit 588ca1786f2dd ("function_graph: Use new curr_ret_depth to manage depth instead of curr_ret_stack") removed a parameter from the call ftrace_push_return_trace() that made it so that the entire call was under 80 characters, but it did not remove the line break. There's no reason to break that line up, so make it a single line. Link: http://lkml.kernel.org/r/20181122100322.GN2131@hirez.programming.kicks-ass.net Reported-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>