summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)AuthorFilesLines
2015-08-15percpu-rwsem: introduce percpu_down_read_trylock()Oleg Nesterov1-0/+13
Add percpu_down_read_trylock(), it will have the user soon. Signed-off-by: Oleg Nesterov <oleg@redhat.com>
2015-08-14devres: add devm_memremapChristoph Hellwig1-0/+39
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-08-14Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds2-28/+73
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Misc fixes: PMU driver corner cases, tooling fixes, and an 'AUX' (Intel PT) race related core fix" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel/cqm: Do not access cpu_data() from CPU_UP_PREPARE handler perf/x86/intel: Fix memory leak on hot-plug allocation fail perf: Fix PERF_EVENT_IOC_PERIOD migration race perf: Fix double-free of the AUX buffer perf: Fix fasync handling on inherited events perf tools: Fix test build error when bindir contains double slash perf stat: Fix transaction lenght metrics perf: Fix running time accounting
2015-08-14Merge branch 'locking-urgent-for-linus' of ↵Linus Torvalds1-1/+10
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fix from Ingo Molnar: "A single fix for a locking self-test crash" * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking/pvqspinlock: Fix kernel panic in locking-selftest
2015-08-14arch: introduce memremap()Dan Williams2-0/+100
Existing users of ioremap_cache() are mapping memory that is known in advance to not have i/o side effects. These users are forced to cast away the __iomem annotation, or otherwise neglect to fix the sparse errors thrown when dereferencing pointers to this memory. Provide memremap() as a non __iomem annotated ioremap_*() in the case when ioremap is otherwise a pointer to cacheable memory. Empirically, ioremap_<cacheable-type>() call sites are seeking memory-like semantics (e.g. speculative reads, and prefetching permitted). memremap() is a break from the ioremap implementation pattern of adding a new memremap_<type>() for each mapping type and having silent compatibility fall backs. Instead, the implementation defines flags that are passed to the central memremap() and if a mapping type is not supported by an arch memremap returns NULL. We introduce a memremap prototype as a trivial wrapper of ioremap_cache() and ioremap_wt(). Later, once all ioremap_cache() and ioremap_wt() usage has been removed from drivers we teach archs to implement arch_memremap() with the ability to strictly enforce the mapping type. Cc: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-08-14Move certificate handling to its own directoryDavid Howells3-323/+0
Move certificate handling out of the kernel/ directory and into a certs/ directory to get all the weird stuff in one place and move the generated signing keys into this directory. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: David Woodhouse <David.Woodhouse@intel.com>
2015-08-14Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller3-7/+18
Conflicts: drivers/net/ethernet/cavium/Kconfig The cavium conflict was overlapping dependency changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-13fixup: audit: implement audit by executableRichard Guy Briggs1-3/+9
The Intel build-bot detected a sparse warning with with a patch I posted a couple of days ago that was accepted in the audit/next tree: Subject: [linux-next:master 6689/6751] kernel/audit_watch.c:543:36: sparse: dereference of noderef expression Date: Friday, August 07, 2015, 06:57:55 PM From: kbuild test robot <fengguang.wu@intel.com> tree: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master head: e6455bc5b91f41f842f30465c9193320f0568707 commit: 2e3a8aeb63e5335d4f837d453787c71bcb479796 [6689/6751] Merge remote- tracking branch 'audit/next' sparse warnings: (new ones prefixed by >>) >> kernel/audit_watch.c:543:36: sparse: dereference of noderef expression kernel/audit_watch.c:544:28: sparse: dereference of noderef expression 34d99af5 Richard Guy Briggs 2015-08-05 541 int audit_exe_compare(struct task_struct *tsk, struct audit_fsnotify_mark *mark) 34d99af5 Richard Guy Briggs 2015-08-05 542 { 34d99af5 Richard Guy Briggs 2015-08-05 @543 unsigned long ino = tsk->mm- >exe_file->f_inode->i_ino; 34d99af5 Richard Guy Briggs 2015-08-05 544 dev_t dev = tsk->mm->exe_file- >f_inode->i_sb->s_dev; :::::: The code at line 543 was first introduced by commit :::::: 34d99af52ad40bd498ba66970579a5bc1fb1a3bc audit: implement audit by executable tsk->mm->exe_file requires RCU access. The warning was reproduceable by adding "C=1 CF=-D__CHECK_ENDIAN__" to the build command, and verified eliminated with this patch. Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Paul Moore <pmoore@redhat.com>
2015-08-13bpf: fix bpf_perf_event_read() loop upper boundWei-Chun Chao1-1/+1
Verifier rejects programs incorrectly. Fixes: 35578d798400 ("bpf: Implement function bpf_perf_event_read()") Cc: Kaixu Xia <xiakaixu@huawei.com> Cc: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: Wei-Chun Chao <weichunc@plumgrid.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-12userns,pidns: Force thread group sharing, not signal handler sharing.Eric W. Biederman2-6/+6
The code that places signals in signal queues computes the uids, gids, and pids at the time the signals are enqueued. Which means that tasks that share signal queues must be in the same pid and user namespaces. Sharing signal handlers is fine, but bizarre. So make the code in fork and userns_install clearer by only testing for what is functionally necessary. Also update the comment in unshare about unsharing a user namespace to be a little more explicit and make a little more sense. Acked-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2015-08-12unshare: Unsharing a thread does not require unsharing a vmEric W. Biederman1-10/+18
In the logic in the initial commit of unshare made creating a new thread group for a process, contingent upon creating a new memory address space for that process. That is wrong. Two separate processes in different thread groups can share a memory address space and clone allows creation of such proceses. This is significant because it was observed that mm_users > 1 does not mean that a process is multi-threaded, as reading /proc/PID/maps temporarily increments mm_users, which allows other processes to (accidentally) interfere with unshare() calls. Correct the check in check_unshare_flags() to test for !thread_group_empty() for CLONE_THREAD, CLONE_SIGHAND, and CLONE_VM. For sighand->count > 1 for CLONE_SIGHAND and CLONE_VM. For !current_is_single_threaded instead of mm_users > 1 for CLONE_VM. By using the correct checks in unshare this removes the possibility of an accidental denial of service attack. Additionally using the correct checks in unshare ensures that only an explicit unshare(CLONE_VM) can possibly trigger the slow path of current_is_single_threaded(). As an explict unshare(CLONE_VM) is pointless it is not expected there are many applications that make that call. Cc: stable@vger.kernel.org Fixes: b2e0d98705e60e45bbb3c0032c48824ad7ae0704 userns: Implement unshare of the user namespace Reported-by: Ricky Zhou <rickyz@chromium.org> Reported-by: Kees Cook <keescook@chromium.org> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2015-08-12PKCS#7: Appropriately restrict authenticated attributes and content typeDavid Howells2-3/+6
A PKCS#7 or CMS message can have per-signature authenticated attributes that are digested as a lump and signed by the authorising key for that signature. If such attributes exist, the content digest isn't itself signed, but rather it is included in a special authattr which then contributes to the signature. Further, we already require the master message content type to be pkcs7_signedData - but there's also a separate content type for the data itself within the SignedData object and this must be repeated inside the authattrs for each signer [RFC2315 9.2, RFC5652 11.1]. We should really validate the authattrs if they exist or forbid them entirely as appropriate. To this end: (1) Alter the PKCS#7 parser to reject any message that has more than one signature where at least one signature has authattrs and at least one that does not. (2) Validate authattrs if they are present and strongly restrict them. Only the following authattrs are permitted and all others are rejected: (a) contentType. This is checked to be an OID that matches the content type in the SignedData object. (b) messageDigest. This must match the crypto digest of the data. (c) signingTime. If present, we check that this is a valid, parseable UTCTime or GeneralTime and that the date it encodes fits within the validity window of the matching X.509 cert. (d) S/MIME capabilities. We don't check the contents. (e) Authenticode SP Opus Info. We don't check the contents. (f) Authenticode Statement Type. We don't check the contents. The message is rejected if (a) or (b) are missing. If the message is an Authenticode type, the message is rejected if (e) is missing; if not Authenticode, the message is rejected if (d) - (f) are present. The S/MIME capabilities authattr (d) unfortunately has to be allowed to support kernels already signed by the pesign program. This only affects kexec. sign-file suppresses them (CMS_NOSMIMECAP). The message is also rejected if an authattr is given more than once or if it contains more than one element in its set of values. (3) Add a parameter to pkcs7_verify() to select one of the following restrictions and pass in the appropriate option from the callers: (*) VERIFYING_MODULE_SIGNATURE This requires that the SignedData content type be pkcs7-data and forbids authattrs. sign-file sets CMS_NOATTR. We could be more flexible and permit authattrs optionally, but only permit minimal content. (*) VERIFYING_FIRMWARE_SIGNATURE This requires that the SignedData content type be pkcs7-data and requires authattrs. In future, this will require an attribute holding the target firmware name in addition to the minimal set. (*) VERIFYING_UNSPECIFIED_SIGNATURE This requires that the SignedData content type be pkcs7-data but allows either no authattrs or only permits the minimal set. (*) VERIFYING_KEXEC_PE_SIGNATURE This only supports the Authenticode SPC_INDIRECT_DATA content type and requires at least an SpcSpOpusInfo authattr in addition to the minimal set. It also permits an SPC_STATEMENT_TYPE authattr (and an S/MIME capabilities authattr because the pesign program doesn't remove these). (*) VERIFYING_KEY_SIGNATURE (*) VERIFYING_KEY_SELF_SIGNATURE These are invalid in this context but are included for later use when limiting the use of X.509 certs. (4) The pkcs7_test key type is given a module parameter to select between the above options for testing purposes. For example: echo 1 >/sys/module/pkcs7_test_key/parameters/usage keyctl padd pkcs7_test foo @s </tmp/stuff.pkcs7 will attempt to check the signature on stuff.pkcs7 as if it contains a firmware blob (1 being VERIFYING_FIRMWARE_SIGNATURE). Suggested-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Marcel Holtmann <marcel@holtmann.org> Reviewed-by: David Woodhouse <David.Woodhouse@intel.com>
2015-08-12modsign: Use extract-cert to process CONFIG_SYSTEM_TRUSTED_KEYSDavid Woodhouse2-13/+15
Fix up the dependencies somewhat too, while we're at it. Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Signed-off-by: David Howells <dhowells@redhat.com>
2015-08-12Merge branch 'for-mingo' of ↵Ingo Molnar14-498/+620
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu Pull RCU changes from Paul E. McKenney: - The combination of tree geometry-initialization simplifications and OS-jitter-reduction changes to expedited grace periods. These two are stacked due to the large number of conflicts that would otherwise result. [ With one addition, a temporary commit to silence a lockdep false positive. Additional changes to the expedited grace-period primitives (queued for 4.4) remove the cause of this false positive, and therefore include a revert of this temporary commit. ] - Documentation updates. - Torture-test updates. - Miscellaneous fixes. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-12sched/deadline: Fix comment in enqueue_task_dl()Andrea Parri1-1/+1
The "dl_boosted" flag is set by comparing *absolute* deadlines (c.f., rt_mutex_setprio()). Signed-off-by: Andrea Parri <parri.andrea@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1438782979-9057-2-git-send-email-parri.andrea@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-12sched/deadline: Fix comment in push_dl_tasks()Andrea Parri1-1/+1
The comment is "misleading"; fix it by adapting a comment from push_rt_tasks(). Signed-off-by: Andrea Parri <parri.andrea@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1438782979-9057-1-git-send-email-parri.andrea@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-12sched: Change the sched_class::set_cpus_allowed() calling contextPeter Zijlstra3-81/+26
Change the calling context of sched_class::set_cpus_allowed() such that we can assume the task is inactive. This allows us to easily make changes that affect accounting done by enqueue/dequeue. This does in fact completely remove set_cpus_allowed_rt() and greatly reduces set_cpus_allowed_dl(). Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dedekind1@gmail.com Cc: juri.lelli@arm.com Cc: mgorman@suse.de Cc: riel@redhat.com Cc: rostedt@goodmis.org Link: http://lkml.kernel.org/r/20150515154833.667516139@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-12sched: Make sched_class::set_cpus_allowed() unconditionalPeter Zijlstra7-18/+36
Give every class a set_cpus_allowed() method, this enables some small optimization in the RT,DL implementation by avoiding a double cpumask_weight() call. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dedekind1@gmail.com Cc: juri.lelli@arm.com Cc: mgorman@suse.de Cc: riel@redhat.com Cc: rostedt@goodmis.org Link: http://lkml.kernel.org/r/20150515154833.614517487@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-12sched: Fix a race between __kthread_bind() and sched_setaffinity()Peter Zijlstra3-11/+51
Because sched_setscheduler() checks p->flags & PF_NO_SETAFFINITY without locks, a caller might observe an old value and race with the set_cpus_allowed_ptr() call from __kthread_bind() and effectively undo it: __kthread_bind() do_set_cpus_allowed() <SYSCALL> sched_setaffinity() if (p->flags & PF_NO_SETAFFINITIY) set_cpus_allowed_ptr() p->flags |= PF_NO_SETAFFINITY Fix the bug by putting everything under the regular scheduler locks. This also closes a hole in the serialization of task_struct::{nr_,}cpus_allowed. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Tejun Heo <tj@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dedekind1@gmail.com Cc: juri.lelli@arm.com Cc: mgorman@suse.de Cc: riel@redhat.com Cc: rostedt@goodmis.org Link: http://lkml.kernel.org/r/20150515154833.545640346@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-12sched: Ensure a task has a non-normalized vruntime when returning back to CFSByungchul Park1-2/+17
Current code ensures that a task has a normalized vruntime when switching away from the fair class, but it does not ensure the task has a non-normalized vruntime when switching back to the fair class. This is an example breaking this consistency: 1. a task is in fair class and !queued 2. changes its class to RT class (still !queued) 3. changes its class to fair class again (still !queued) Signed-off-by: Byungchul Park <byungchul.park@lge.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1439197375-27927-1-git-send-email-byungchul.park@lge.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-12sched/numa: Fix NUMA_DIRECT topology identificationAravind Gopalakrishnan1-1/+3
Systems which have all nodes at a distance of at most 1 hop should be identified as 'NUMA_DIRECT'. However, the scheduler incorrectly identifies it as 'NUMA_BACKPLANE'. This is because 'n' is assigned to sched_max_numa_distance but the code (mis)interprets it to mean 'number of hops'. Rik had actually used sched_domains_numa_levels for detecting a 'NUMA_DIRECT' topology: http://marc.info/?l=linux-kernel&m=141279712429834&w=2 But that was changed when he removed the hops table in the subsequent version: http://marc.info/?l=linux-kernel&m=141353106106771&w=2 Fixing the issue here. Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1439256048-3748-1-git-send-email-Aravind.Gopalakrishnan@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-12locking/qrwlock: Make use of _{acquire|release|relaxed}() atomicsWill Deacon1-12/+12
The qrwlock implementation is slightly heavy in its use of memory barriers, mainly through the use of _cmpxchg() and _return() atomics, which imply full barrier semantics. This patch modifies the qrwlock code to use the more relaxed atomic routines so that we can reduce the unnecessary barrier overhead on weakly-ordered architectures. Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Waiman.Long@hp.com Cc: paulmck@linux.vnet.ibm.com Link: http://lkml.kernel.org/r/1438880084-18856-7-git-send-email-will.deacon@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-12perf/ring-buffer: Clarify the use of page::private for high-order AUX ↵Alexander Shishkin1-1/+4
allocations A question [1] was raised about the use of page::private in AUX buffer allocations, so let's add a clarification about its intended use. The private field and flag are used by perf's rb_alloc_aux() path to tell the pmu driver the size of each high-order allocation, so that the driver can program those appropriately into its hardware. This only matters for PMUs that don't support hardware scatter tables. Otherwise, every page in the buffer is just a page. This patch adds a comment about the private field to the AUX buffer allocation path. [1] http://marc.info/?l=linux-kernel&m=143803696607968 Reported-by: Mathieu Poirier <mathieu.poirier@linaro.org> Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1438063204-665-1-git-send-email-alexander.shishkin@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-12Merge branch 'perf/urgent' into perf/core, to pick up fixes before applying ↵Ingo Molnar2-26/+71
new changes Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-12perf: Fix PERF_EVENT_IOC_PERIOD migration racePeter Zijlstra1-20/+55
I ran the perf fuzzer, which triggered some WARN()s which are due to trying to stop/restart an event on the wrong CPU. Use the normal IPI pattern to ensure we run the code on the correct CPU. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: bad7192b842c ("perf: Fix PERF_EVENT_IOC_PERIOD to force-reset the period") Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-12perf: Fix double-free of the AUX bufferBen Hutchings1-4/+6
If rb->aux_refcount is decremented to zero before rb->refcount, __rb_free_aux() may be called twice resulting in a double free of rb->aux_pages. Fix this by adding a check to __rb_free_aux(). Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Fixes: 57ffc5ca679f ("perf: Fix AUX buffer refcounting") Link: http://lkml.kernel.org/r/1437953468.12842.17.camel@decadent.org.uk Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-12tracing: Allow triggers to filter for CPU ids and process namesDaniel Wagner2-2/+77
By extending the filter rules by more generic fields we can write triggers filters like echo 'stacktrace if cpu == 1' > \ /sys/kernel/debug/tracing/events/raw_syscalls/sys_enter/trigger or echo 'stacktrace if comm == sshd' > \ /sys/kernel/debug/tracing/events/raw_syscalls/sys_enter/trigger CPU and COMM are not part of struct trace_entry. We could add the two new fields to ftrace_common_field list and fix up all depending sides. But that looks pretty ugly. Another thing I would like to avoid that the 'format' file contents changes. All this can be avoided by introducing another list which contains non field members of struct trace_entry. Link: http://lkml.kernel.org/r/1439210146-24707-1-git-send-email-daniel.wagner@bmw-carit.de Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-08-11mm: enhance region_is_ram() to region_intersects()Dan Williams1-25/+36
region_is_ram() is used to prevent the establishment of aliased mappings to physical "System RAM" with incompatible cache settings. However, it uses "-1" to indicate both "unknown" memory ranges (ranges not described by platform firmware) and "mixed" ranges (where the parameters describe a range that partially overlaps "System RAM"). Fix this up by explicitly tracking the "unknown" vs "mixed" resource cases and returning REGION_INTERSECTS, REGION_MIXED, or REGION_DISJOINT. This re-write also adds support for detecting when the requested region completely eclipses all of a resource. Note, the implementation treats overlaps between "unknown" and the requested memory type as REGION_INTERSECTS. Finally, other memory types can be passed in by name, for now the only usage "System RAM". Suggested-by: Luis R. Rodriguez <mcgrof@suse.com> Reviewed-by: Toshi Kani <toshi.kani@hp.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-08-10cpuset: use trialcs->mems_allowed as a temp variableAlban Crequy1-1/+1
The comment says it's using trialcs->mems_allowed as a temp variable but it didn't match the code. Change the code to match the comment. This fixes an issue when writing in cpuset.mems when a sub-directory exists: we need to write several times for the information to persist: | root@alban:/sys/fs/cgroup/cpuset# mkdir footest9 | root@alban:/sys/fs/cgroup/cpuset# cd footest9 | root@alban:/sys/fs/cgroup/cpuset/footest9# mkdir aa | root@alban:/sys/fs/cgroup/cpuset/footest9# cat cpuset.mems | | root@alban:/sys/fs/cgroup/cpuset/footest9# echo 0 > cpuset.mems | root@alban:/sys/fs/cgroup/cpuset/footest9# cat cpuset.mems | | root@alban:/sys/fs/cgroup/cpuset/footest9# echo 0 > cpuset.mems | root@alban:/sys/fs/cgroup/cpuset/footest9# cat cpuset.mems | 0 | root@alban:/sys/fs/cgroup/cpuset/footest9# cat aa/cpuset.mems | | root@alban:/sys/fs/cgroup/cpuset/footest9# echo 0 > aa/cpuset.mems | root@alban:/sys/fs/cgroup/cpuset/footest9# cat aa/cpuset.mems | 0 | root@alban:/sys/fs/cgroup/cpuset/footest9# This should help to fix the following issue in Docker: https://github.com/opencontainers/runc/issues/133 In some conditions, a Docker container needs to be started twice in order to work. Signed-off-by: Alban Crequy <alban@endocode.com> Tested-by: Iago López Galeiras <iago@endocode.com> Cc: <stable@vger.kernel.org> # 3.17+ Acked-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2015-08-10kernel: broadcast-hrtimer: Migrate to new 'set-state' interfaceViresh Kumar1-29/+20
Migrate broadcast-hrtimer driver to the new 'set-state' interface provided by clockevents core, the earlier 'set-mode' interface is marked obsolete now. Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2015-08-10bpf: Implement function bpf_perf_event_read() that get the selected hardware ↵Kaixu Xia2-15/+64
PMU conuter According to the perf_event_map_fd and index, the function bpf_perf_event_read() can convert the corresponding map value to the pointer to struct perf_event and return the Hardware PMU counter value. Signed-off-by: Kaixu Xia <xiakaixu@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-10bpf: Add new bpf map type to store the pointer to struct perf_eventKaixu Xia1-0/+57
Introduce a new bpf map type 'BPF_MAP_TYPE_PERF_EVENT_ARRAY'. This map only stores the pointer to struct perf_event. The user space event FDs from perf_event_open() syscall are converted to the pointer to struct perf_event and stored in map. Signed-off-by: Kaixu Xia <xiakaixu@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-10bpf: Make the bpf_prog_array_map more genericWang Nan3-33/+51
All the map backends are of generic nature. In order to avoid adding much special code into the eBPF core, rewrite part of the bpf_prog_array map code and make it more generic. So the new perf_event_array map type can reuse most of code with bpf_prog_array map and add fewer lines of special code. Signed-off-by: Wang Nan <wangnan0@huawei.com> Signed-off-by: Kaixu Xia <xiakaixu@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-10perf: add the necessary core perf APIs when accessing events counters in ↵Kaixu Xia1-0/+78
eBPF programs This patch add three core perf APIs: - perf_event_attrs(): export the struct perf_event_attr from struct perf_event; - perf_event_get(): get the struct perf_event from the given fd; - perf_event_read_local(): read the events counters active on the current CPU; These APIs are needed when accessing events counters in eBPF programs. The API perf_event_read_local() comes from Peter and I add the corresponding SOB. Signed-off-by: Kaixu Xia <xiakaixu@huawei.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-10Merge 4.2-rc6 into char-misc-nextGreg Kroah-Hartman3-7/+18
We want the fixes in Linus's tree in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-08-07modsign: Add explicit CONFIG_SYSTEM_TRUSTED_KEYS optionDavid Woodhouse1-60/+65
Let the user explicitly provide a file containing trusted keys, instead of just automatically finding files matching *.x509 in the build tree and trusting whatever we find. This really ought to be an *explicit* configuration, and the build rules for dealing with the files were fairly painful too. Fix applied from James Morris that removes an '=' from a macro definition in kernel/Makefile as this is a feature that only exists from GNU make 3.82 onwards. Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Signed-off-by: David Howells <dhowells@redhat.com>
2015-08-07modsign: Use single PEM file for autogenerated keyDavid Woodhouse1-8/+7
The current rule for generating signing_key.priv and signing_key.x509 is a classic example of a bad rule which has a tendency to break parallel make. When invoked to create *either* target, it generates the other target as a side-effect that make didn't predict. So let's switch to using a single file signing_key.pem which contains both key and certificate. That matches what we do in the case of an external key specified by CONFIG_MODULE_SIG_KEY anyway, so it's also slightly cleaner. Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Signed-off-by: David Howells <dhowells@redhat.com>
2015-08-07modsign: Extract signing cert from CONFIG_MODULE_SIG_KEY if neededDavid Woodhouse1-0/+38
Where an external PEM file or PKCS#11 URI is given, we can get the cert from it for ourselves instead of making the user drop signing_key.x509 in place for us. Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Signed-off-by: David Howells <dhowells@redhat.com>
2015-08-07modsign: Allow external signing key to be specifiedDavid Woodhouse1-0/+5
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Signed-off-by: David Howells <dhowells@redhat.com>
2015-08-07MODSIGN: Extract the blob PKCS#7 signature verifier from module signingDavid Howells2-43/+51
Extract the function that drives the PKCS#7 signature verification given a data blob and a PKCS#7 blob out from the module signing code and lump it with the system keyring code as it's generic. This makes it independent of module config options and opens it to use by the firmware loader. Signed-off-by: David Howells <dhowells@redhat.com> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Ming Lei <ming.lei@canonical.com> Cc: Seth Forshee <seth.forshee@canonical.com> Cc: Kyle McMartin <kyle@kernel.org>
2015-08-07system_keyring.c doesn't need to #include module-internal.hDavid Howells1-1/+0
system_keyring.c doesn't need to #include module-internal.h as it doesn't use the one thing that exports. Remove the inclusion. Signed-off-by: David Howells <dhowells@redhat.com>
2015-08-07MODSIGN: Use PKCS#7 messages as module signaturesDavid Howells1-176/+44
Move to using PKCS#7 messages as module signatures because: (1) We have to be able to support the use of X.509 certificates that don't have a subjKeyId set. We're currently relying on this to look up the X.509 certificate in the trusted keyring list. (2) PKCS#7 message signed information blocks have a field that supplies the data required to match with the X.509 certificate that signed it. (3) The PKCS#7 certificate carries fields that specify the digest algorithm used to generate the signature in a standardised way and the X.509 certificates specify the public key algorithm in a standardised way - so we don't need our own methods of specifying these. (4) We now have PKCS#7 message support in the kernel for signed kexec purposes and we can make use of this. To make this work, the old sign-file script has been replaced with a program that needs compiling in a previous patch. The rules to build it are added here. Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: Vivek Goyal <vgoyal@redhat.com>
2015-08-07kernel: exit: fix typo in commentFrans Klaver1-1/+1
s,critiera,criteria, While at it, add a comma, because it makes sense grammatically. Signed-off-by: Frans Klaver <fransklaver@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.com>
2015-08-07kthread: export kthread functionsDavid Kershner1-0/+4
The s-Par visornic driver, currently in staging, processes a queue being serviced by the an s-Par service partition. We can get a message that something has happened with the Service Partition, when that happens, we must not access the channel until we get a message that the service partition is back again. The visornic driver has a thread for processing the channel, when we get the message, we need to be able to park the thread and then resume it when the problem clears. We can do this with kthread_park and unpark but they are not exported from the kernel, this patch exports the needed functions. Signed-off-by: David Kershner <david.kershner@unisys.com> Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Richard Weinberger <richard.weinberger@gmail.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-08-07signal: fix information leak in copy_siginfo_to_userAmanieu d'Antras1-3/+6
This function may copy the si_addr_lsb, si_lower and si_upper fields to user mode when they haven't been initialized, which can leak kernel stack data to user mode. Just checking the value of si_code is insufficient because the same si_code value is shared between multiple signals. This is solved by checking the value of si_signo in addition to si_code. Signed-off-by: Amanieu d'Antras <amanieu@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Russell King <rmk@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-08-07signal: fix information leak in copy_siginfo_from_user32Amanieu d'Antras1-2/+2
This function can leak kernel stack data when the user siginfo_t has a positive si_code value. The top 16 bits of si_code descibe which fields in the siginfo_t union are active, but they are treated inconsistently between copy_siginfo_from_user32, copy_siginfo_to_user32 and copy_siginfo_to_user. copy_siginfo_from_user32 is called from rt_sigqueueinfo and rt_tgsigqueueinfo in which the user has full control overthe top 16 bits of si_code. This fixes the following information leaks: x86: 8 bytes leaked when sending a signal from a 32-bit process to itself. This leak grows to 16 bytes if the process uses x32. (si_code = __SI_CHLD) x86: 100 bytes leaked when sending a signal from a 32-bit process to a 64-bit process. (si_code = -1) sparc: 4 bytes leaked when sending a signal from a 32-bit process to a 64-bit process. (si_code = any) parsic and s390 have similar bugs, but they are not vulnerable because rt_[tg]sigqueueinfo have checks that prevent sending a positive si_code to a different process. These bugs are also fixed for consistency. Signed-off-by: Amanieu d'Antras <amanieu@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Chris Metcalf <cmetcalf@ezchip.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-08-06audit: implement audit by executableRichard Guy Briggs5-1/+92
This adds the ability audit the actions of a not-yet-running process. This patch implements the ability to filter on the executable path. Instead of just hard coding the ino and dev of the executable we care about at the moment the rule is inserted into the kernel, use the new audit_fsnotify infrastructure to manage this dynamically. This means that if the filename does not yet exist but the containing directory does, or if the inode in question is unlinked and creat'd (aka updated) the rule will just continue to work. If the containing directory is moved or deleted or the filesystem is unmounted, the rule is deleted automatically. A future enhancement would be to have the rule survive across directory disruptions. This is a heavily modified version of a patch originally submitted by Eric Paris with some ideas from Peter Moody. Cc: Peter Moody <peter@hda3.com> Cc: Eric Paris <eparis@redhat.com> Signed-off-by: Richard Guy Briggs <rgb@redhat.com> [PM: minor whitespace clean to satisfy ./scripts/checkpatch] Signed-off-by: Paul Moore <pmoore@redhat.com>
2015-08-06audit: clean simple fsnotify implementationRichard Guy Briggs4-2/+232
This is to be used to audit by executable path rules, but audit watches should be able to share this code eventually. At the moment the audit watch code is a lot more complex. That code only creates one fsnotify watch per parent directory. That 'audit_parent' in turn has a list of 'audit_watches' which contain the name, ino, dev of the specific object we care about. This just creates one fsnotify watch per object we care about. So if you watch 100 inodes in /etc this code will create 100 fsnotify watches on /etc. The audit_watch code will instead create 1 fsnotify watch on /etc (the audit_parent) and then 100 individual watches chained from that fsnotify mark. We should be able to convert the audit_watch code to do one fsnotify mark per watch and simplify things/remove a whole lot of code. After that conversion we should be able to convert the audit_fsnotify code to support that hierarchy if the optimization is necessary. Move the access to the entry for audit_match_signal() to the beginning of the audit_del_rule() function in case the entry found is the same one passed in. This will enable it to be used by audit_autoremove_mark_rule(), kill_rules() and audit_remove_parent_watches(). This is a heavily modified and merged version of two patches originally submitted by Eric Paris. Cc: Peter Moody <peter@hda3.com> Cc: Eric Paris <eparis@redhat.com> Signed-off-by: Richard Guy Briggs <rgb@redhat.com> [PM: added a space after a declaration to keep ./scripts/checkpatch happy] Signed-off-by: Paul Moore <pmoore@redhat.com>
2015-08-06audit: use macros for unset inode and device valuesRichard Guy Briggs3-8/+8
Clean up a number of places were casted magic numbers are used to represent unset inode and device numbers in preparation for the audit by executable path patch set. Signed-off-by: Richard Guy Briggs <rgb@redhat.com> [PM: enclosed the _UNSET macros in parentheses for ./scripts/checkpatch] Signed-off-by: Paul Moore <pmoore@redhat.com>
2015-08-06tracing, perf: Implement BPF programs attached to uprobesWang Nan3-3/+8
By copying BPF related operation to uprobe processing path, this patch allow users attach BPF programs to uprobes like what they are already doing on kprobes. After this patch, users are allowed to use PERF_EVENT_IOC_SET_BPF on a uprobe perf event. Which make it possible to profile user space programs and kernel events together using BPF. Because of this patch, CONFIG_BPF_EVENTS should be selected by CONFIG_UPROBE_EVENT to ensure trace_call_bpf() is compiled even if KPROBE_EVENT is not set. Signed-off-by: Wang Nan <wangnan0@huawei.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Cc: Brendan Gregg <brendan.d.gregg@gmail.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: David Ahern <dsahern@gmail.com> Cc: He Kuang <hekuang@huawei.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kaixu Xia <xiakaixu@huawei.com> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Zefan Li <lizefan@huawei.com> Cc: pi3orama@163.com Link: http://lkml.kernel.org/r/1435716878-189507-3-git-send-email-wangnan0@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>