summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2016-01-12Merge branch 'x86-asm-for-linus' of ↵Linus Torvalds28-244/+194
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 asm updates from Ingo Molnar: "The main changes in this cycle were: - vDSO and asm entry improvements (Andy Lutomirski) - Xen paravirt entry enhancements (Boris Ostrovsky) - asm entry labels enhancement (Borislav Petkov) - and other misc changes (Thomas Gleixner, me)" * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/vsdo: Fix build on PARAVIRT_CLOCK=y, KVM_GUEST=n Revert "x86/kvm: On KVM re-enable (e.g. after suspend), update clocks" x86/entry/64_compat: Make labels local x86/platform/uv: Include clocksource.h for clocksource_touch_watchdog() x86/vdso: Enable vdso pvclock access on all vdso variants x86/vdso: Remove pvclock fixmap machinery x86/vdso: Get pvclock data from the vvar VMA instead of the fixmap x86, vdso, pvclock: Simplify and speed up the vdso pvclock reader x86/kvm: On KVM re-enable (e.g. after suspend), update clocks x86/entry/64: Bypass enter_from_user_mode on non-context-tracking boots x86/asm: Add asm macros for static keys/jump labels x86/asm: Error out if asm/jump_label.h is included inappropriately context_tracking: Switch to new static_branch API x86/entry, x86/paravirt: Remove the unused usergs_sysret32 PV op x86/paravirt: Remove the unused irq_enable_sysexit pv op x86/xen: Avoid fast syscall path for Xen PV guests
2016-01-12Merge branch 'x86-apic-for-linus' of ↵Linus Torvalds26-42/+284
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 apic updates from Ingo Molnar: "The main changes in this cycle were: - introduce optimized single IPI sending methods on modern APICs (Linus Torvalds, Thomas Gleixner) - kexec/crash APIC handling fixes and enhancements (Hidehiro Kawai) - extend lapic vector saving/restoring to the CMCI (MCE) vector as well (Juergen Gross) - various fixes and enhancements (Jake Oshins, Len Brown)" * 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits) x86/irq: Export functions to allow MSI domains in modules Documentation: Document kernel.panic_on_io_nmi sysctl x86/nmi: Save regs in crash dump on external NMI x86/apic: Introduce apic_extnmi command line parameter kexec: Fix race between panic() and crash_kexec() panic, x86: Allow CPUs to save registers even if looping in NMI context panic, x86: Fix re-entrance problem due to panic on NMI x86/apic: Fix the saving and restoring of lapic vectors during suspend/resume x86/smpboot: Re-enable init_udelay=0 by default on modern CPUs x86/smp: Remove single IPI wrapper x86/apic: Use default send single IPI wrapper x86/apic: Provide default send single IPI wrapper x86/apic: Implement single IPI for apic_noop x86/apic: Wire up single IPI for apic_numachip x86/apic: Wire up single IPI for x2apic_uv x86/apic: Implement single IPI for x2apic_phys x86/apic: Wire up single IPI for bigsmp_apic x86/apic: Remove pointless indirections from bigsmp_apic x86/apic: Wire up single IPI for apic_physflat x86/apic: Remove pointless indirections from apic_physflat ...
2016-01-12Merge branch 'sched-core-for-linus' of ↵Linus Torvalds17-306/+485
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: "The main changes in this cycle were: - tickless load average calculation enhancements (Byungchul Park) - vtime handling enhancements (Frederic Weisbecker) - scalability improvement via properly aligning a key structure field (Jiri Olsa) - various stop_machine() fixes (Oleg Nesterov) - sched/numa enhancement (Rik van Riel) - various fixes and improvements (Andi Kleen, Dietmar Eggemann, Geliang Tang, Hiroshi Shimamoto, Joonwoo Park, Peter Zijlstra, Waiman Long, Wanpeng Li, Yuyang Du)" * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (32 commits) sched/fair: Fix new task's load avg removed from source CPU in wake_up_new_task() sched/core: Move sched_entity::avg into separate cache line x86/fpu: Properly align size in CHECK_MEMBER_AT_END_OF() macro sched/deadline: Fix the earliest_dl.next logic sched/fair: Disable the task group load_avg update for the root_task_group sched/fair: Move the cache-hot 'load_avg' variable into its own cacheline sched/fair: Avoid redundant idle_cpu() call in update_sg_lb_stats() sched/core: Move the sched_to_prio[] arrays out of line sched/cputime: Convert vtime_seqlock to seqcount sched/cputime: Introduce vtime accounting check for readers sched/cputime: Rename vtime_accounting_enabled() to vtime_accounting_cpu_enabled() sched/cputime: Correctly handle task guest time on housekeepers sched/cputime: Clarify vtime symbols and document them sched/cputime: Remove extra cost in task_cputime() sched/fair: Make it possible to account fair load avg consistently sched/fair: Modify the comment about lock assumptions in migrate_task_rq_fair() stop_machine: Clean up the usage of the preemption counter in cpu_stopper_thread() stop_machine: Shift the 'done != NULL' check from cpu_stop_signal_done() to callers stop_machine: Kill cpu_stop_done->executed stop_machine: Change __stop_cpus() to rely on cpu_stop_queue_work() ...
2016-01-12Merge branch 'ras-core-for-linus' of ↵Linus Torvalds2-41/+43
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RAS updates from Ingo Molnar: "Various x86 MCE fixes and small enhancements" * 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mce: Make usable address checks Intel-only x86/mce: Add the missing memory error check on AMD x86/RAS: Remove mce.usable_addr x86/mce: Do not enter deferred errors into the generic pool twice
2016-01-12Merge branch 'perf-core-for-linus' of ↵Linus Torvalds240-1876/+9147
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Ingo Molnar: "Kernel side changes: - Intel Knights Landing support. (Harish Chegondi) - Intel Broadwell-EP uncore PMU support. (Kan Liang) - Core code improvements. (Peter Zijlstra.) - Event filter, LBR and PEBS fixes. (Stephane Eranian) - Enable cycles:pp on Intel Atom. (Stephane Eranian) - Add cycles:ppp support for Skylake. (Andi Kleen) - Various x86 NMI overhead optimizations. (Andi Kleen) - Intel PT enhancements. (Takao Indoh) - AMD cache events fix. (Vince Weaver) Tons of tooling changes: - Show random perf tool tips in the 'perf report' bottom line (Namhyung Kim) - perf report now defaults to --group if the perf.data file has grouped events, try it with: # perf record -e '{cycles,instructions}' -a sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 1.093 MB perf.data (1247 samples) ] # perf report # Samples: 1K of event 'anon group { cycles, instructions }' # Event count (approx.): 1955219195 # # Overhead Command Shared Object Symbol 2.86% 0.22% swapper [kernel.kallsyms] [k] intel_idle 1.05% 0.33% firefox libxul.so [.] js::SetObjectElement 1.05% 0.00% kworker/0:3 [kernel.kallsyms] [k] gen6_ring_get_seqno 0.88% 0.17% chrome chrome [.] 0x0000000000ee27ab 0.65% 0.86% firefox libxul.so [.] js::ValueToId<(js::AllowGC)1> 0.64% 0.23% JS Helper libxul.so [.] js::SplayTree<js::jit::LiveRange*, js::jit::LiveRange>::splay 0.62% 1.27% firefox libxul.so [.] js::GetIterator 0.61% 1.74% firefox libxul.so [.] js::NativeSetProperty 0.61% 0.31% firefox libxul.so [.] js::SetPropertyByDefining - Introduce the 'perf stat record/report' workflow: Generate perf.data files from 'perf stat', to tap into the scripting capabilities perf has instead of defining a 'perf stat' specific scripting support to calculate event ratios, etc. Simple example: $ perf stat record -e cycles usleep 1 Performance counter stats for 'usleep 1': 1,134,996 cycles 0.000670644 seconds time elapsed $ perf stat report Performance counter stats for '/home/acme/bin/perf stat record -e cycles usleep 1': 1,134,996 cycles 0.000670644 seconds time elapsed $ It generates PERF_RECORD_ userspace records to store the details: $ perf report -D | grep PERF_RECORD 0xf0 [0x28]: PERF_RECORD_THREAD_MAP nr: 1 thread: 27637 0x118 [0x12]: PERF_RECORD_CPU_MAP nr: 1 cpu: 65535 0x12a [0x40]: PERF_RECORD_STAT_CONFIG 0x16a [0x30]: PERF_RECORD_STAT -1 -1 0x19a [0x40]: PERF_RECORD_MMAP -1/0: [0xffffffff81000000(0x1f000000) @ 0xffffffff81000000]: x [kernel.kallsyms]_text 0x1da [0x18]: PERF_RECORD_STAT_ROUND [acme@ssdandy linux]$ An effort was made to make perf.data files generated like this to not generate cryptic messages when processed by older tools. The 'perf script' bits need rebasing, will go up later. - Make command line options always available, even when they depend on some feature being enabled, warning the user about use of such options (Wang Nan) - Support hw breakpoint events (mem:0xAddress) in the default output mode in 'perf script' (Wang Nan) - Fixes and improvements for supporting annotating ARM binaries, support ARM call and jump instructions, more work needed to have arch specific stuff separated into tools/perf/arch/*/annotate/ (Russell King) - Add initial 'perf config' command, for now just with a --list command to the contents of the configuration file in use and a basic man page describing its format, commands for doing edits and detailed documentation are being reviewed and proof-read. (Taeung Song) - Allows BPF scriptlets specify arguments to be fetched using DWARF info, using a prologue generated at compile/build time (He Kuang, Wang Nan) - Allow attaching BPF scriptlets to module symbols (Wang Nan) - Allow attaching BPF scriptlets to userspace code using uprobe (Wang Nan) - BPF programs now can specify 'perf probe' tunables via its section name, separating key=val values using semicolons (Wang Nan) Testing some of these new BPF features: Use case: get callchains when receiving SSL packets, filter then in the kernel, at arbitrary place. # cat ssl.bpf.c #define SEC(NAME) __attribute__((section(NAME), used)) struct pt_regs; SEC("func=__inet_lookup_established hnum") int func(struct pt_regs *ctx, int err, unsigned short port) { return err == 0 && port == 443; } char _license[] SEC("license") = "GPL"; int _version SEC("version") = LINUX_VERSION_CODE; # # perf record -a -g -e ssl.bpf.c ^C[ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.787 MB perf.data (3 samples) ] # perf script | head -30 swapper 0 [000] 58783.268118: perf_bpf_probe:func: (ffffffff816a0f60) hnum=0x1bb 8a0f61 __inet_lookup_established (/lib/modules/4.3.0+/build/vmlinux) 896def ip_rcv_finish (/lib/modules/4.3.0+/build/vmlinux) 8976c2 ip_rcv (/lib/modules/4.3.0+/build/vmlinux) 855eba __netif_receive_skb_core (/lib/modules/4.3.0+/build/vmlinux) 8565d8 __netif_receive_skb (/lib/modules/4.3.0+/build/vmlinux) 8572a8 process_backlog (/lib/modules/4.3.0+/build/vmlinux) 856b11 net_rx_action (/lib/modules/4.3.0+/build/vmlinux) 2a284b __do_softirq (/lib/modules/4.3.0+/build/vmlinux) 2a2ba3 irq_exit (/lib/modules/4.3.0+/build/vmlinux) 96b7a4 do_IRQ (/lib/modules/4.3.0+/build/vmlinux) 969807 ret_from_intr (/lib/modules/4.3.0+/build/vmlinux) 2dede5 cpu_startup_entry (/lib/modules/4.3.0+/build/vmlinux) 95d5bc rest_init (/lib/modules/4.3.0+/build/vmlinux) 1163ffa start_kernel ([kernel.vmlinux].init.text) 11634d7 x86_64_start_reservations ([kernel.vmlinux].init.text) 1163623 x86_64_start_kernel ([kernel.vmlinux].init.text) qemu-system-x86 9178 [003] 58785.792417: perf_bpf_probe:func: (ffffffff816a0f60) hnum=0x1bb 8a0f61 __inet_lookup_established (/lib/modules/4.3.0+/build/vmlinux) 896def ip_rcv_finish (/lib/modules/4.3.0+/build/vmlinux) 8976c2 ip_rcv (/lib/modules/4.3.0+/build/vmlinux) 855eba __netif_receive_skb_core (/lib/modules/4.3.0+/build/vmlinux) 8565d8 __netif_receive_skb (/lib/modules/4.3.0+/build/vmlinux) 856660 netif_receive_skb_internal (/lib/modules/4.3.0+/build/vmlinux) 8566ec netif_receive_skb_sk (/lib/modules/4.3.0+/build/vmlinux) 430a br_handle_frame_finish ([bridge]) 48bc br_handle_frame ([bridge]) 855f44 __netif_receive_skb_core (/lib/modules/4.3.0+/build/vmlinux) 8565d8 __netif_receive_skb (/lib/modules/4.3.0+/build/vmlinux) # - Use 'perf probe' various options to list functions, see what variables can be collected at any given point, experiment first collecting without a filter, then filter, use it together with 'perf trace', 'perf top', with or without callchains, if it explodes, please tell us! - Introduce a new callchain mode: "folded", that will list per line representations of all callchains for a give histogram entry, facilitating 'perf report' output processing by other tools, such as Brendan Gregg's flamegraph tools (Namhyung Kim) E.g: # perf report | grep -v ^# | head 18.37% 0.00% swapper [kernel.kallsyms] [k] cpu_startup_entry | ---cpu_startup_entry | |--12.07%--start_secondary | --6.30%--rest_init start_kernel x86_64_start_reservations x86_64_start_kernel # Becomes, in "folded" mode: # perf report -g folded | grep -v ^# | head -5 18.37% 0.00% swapper [kernel.kallsyms] [k] cpu_startup_entry 12.07% cpu_startup_entry;start_secondary 6.30% cpu_startup_entry;rest_init;start_kernel;x86_64_start_reservations;x86_64_start_kernel 16.90% 0.00% swapper [kernel.kallsyms] [k] call_cpuidle 11.23% call_cpuidle;cpu_startup_entry;start_secondary 5.67% call_cpuidle;cpu_startup_entry;rest_init;start_kernel;x86_64_start_reservations;x86_64_start_kernel 16.90% 0.00% swapper [kernel.kallsyms] [k] cpuidle_enter 11.23% cpuidle_enter;call_cpuidle;cpu_startup_entry;start_secondary 5.67% cpuidle_enter;call_cpuidle;cpu_startup_entry;rest_init;start_kernel;x86_64_start_reservations;x86_64_start_kernel 15.12% 0.00% swapper [kernel.kallsyms] [k] cpuidle_enter_state # The user can also select one of "count", "period" or "percent" as the first column. ... and lots of infrastructure enhancements, plus fixes and other changes, features I failed to list - see the shortlog and the git log for details" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (271 commits) perf evlist: Add --trace-fields option to show trace fields perf record: Store data mmaps for dwarf unwind perf libdw: Check for mmaps also in MAP__VARIABLE tree perf unwind: Check for mmaps also in MAP__VARIABLE tree perf unwind: Use find_map function in access_dso_mem perf evlist: Remove perf_evlist__(enable|disable)_event functions perf evlist: Make perf_evlist__open() open evsels with their cpus and threads (like perf record does) perf report: Show random usage tip on the help line perf hists: Export a couple of hist functions perf diff: Use perf_hpp__register_sort_field interface perf tools: Add overhead/overhead_children keys defaults via string perf tools: Remove list entry from struct sort_entry perf tools: Include all tools/lib directory for tags/cscope/TAGS targets perf script: Align event name properly perf tools: Add missing headers in perf's MANIFEST perf tools: Do not show trace command if it's not compiled in perf report: Change default to use event group view perf top: Decay periods in callchains tools lib: Move bitmap.[ch] from tools/perf/ to tools/{lib,include}/ tools lib: Sync tools/lib/find_bit.c with the kernel ...
2016-01-12Merge branch 'locking-core-for-linus' of ↵Linus Torvalds20-146/+904
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Ingo Molnar: "So we have a laundry list of locking subsystem changes: - continuing barrier API and code improvements - futex enhancements - atomics API improvements - pvqspinlock enhancements: in particular lock stealing and adaptive spinning - qspinlock micro-enhancements" * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: futex: Allow FUTEX_CLOCK_REALTIME with FUTEX_WAIT op futex: Cleanup the goto confusion in requeue_pi() futex: Remove pointless put_pi_state calls in requeue() futex: Document pi_state refcounting in requeue code futex: Rename free_pi_state() to put_pi_state() futex: Drop refcount if requeue_pi() acquired the rtmutex locking/barriers, arch: Remove ambiguous statement in the smp_store_mb() documentation lcoking/barriers, arch: Use smp barriers in smp_store_release() locking/cmpxchg, arch: Remove tas() definitions locking/pvqspinlock: Queue node adaptive spinning locking/pvqspinlock: Allow limited lock stealing locking/pvqspinlock: Collect slowpath lock statistics sched/core, locking: Document Program-Order guarantees locking, sched: Introduce smp_cond_acquire() and use it locking/pvqspinlock, x86: Optimize the PV unlock code path locking/qspinlock: Avoid redundant read of next pointer locking/qspinlock: Prefetch the next node cacheline locking/qspinlock: Use _acquire/_release() versions of cmpxchg() & xchg() atomics: Add test for atomic operations with _relaxed variants
2016-01-12Merge branch 'core-rcu-for-linus' of ↵Linus Torvalds34-287/+7552
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU updates from Ingo Molnar: "The changes in this cycle were: - Adding transitivity uniformly to rcu_node structure ->lock acquisitions. (This is implemented by the first two commits on top of v4.4-rc2 due to the pervasive nature of this change.) - Documentation updates, including RCU requirements. - Expedited grace-period changes. - Miscellaneous fixes. - Linked-list fixes, courtesy of KTSAN. - Torture-test updates. - Late-breaking fix to sysrq-generated crash. One thing I should note is that these pieces of documentation are fairly large files: .../RCU/Design/Requirements/Requirements.html | 2897 ++++++++++++++++++++ .../RCU/Design/Requirements/Requirements.htmlx | 2741 ++++++++++++++++++ and are written in HTML, not the usual .txt style. I hope they are fine" Paul McKenney explains the html docs: "For whatever it is worth, the reason for this unconventional choice was that attempts to do the diagrams in ASCII art failed miserably. And attempts to do ASCII art for the upcoming documentation of the data structures failed even more miserably" * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (49 commits) sysrq: Fix warning in sysrq generated crash. list: Add lockless list traversal primitives rcu: Make rcu_gp_init() be bool rather than int rcu: Move wakeup out from under rnp->lock rcu: Fix comment for rcu_dereference_raw_notrace rcu: Don't redundantly disable irqs in rcu_irq_{enter,exit}() rcu: Make cpu_needs_another_gp() be bool rcu: Eliminate unused rcu_init_one() argument rcu: Remove TINY_RCU bloat from pointless boot parameters torture: Place console.log files correctly from the get-go torture: Abbreviate console error dump rcutorture: Print symbolic name for ->gp_state rcutorture: Print symbolic name for rcu_torture_writer_state rcutorture: Remove CONFIG_RCU_USER_QS from rcutorture selftest doc rcutorture: Default grace period to three minutes, allow override rcutorture: Dump stack when GP kthread stalls rcutorture: Flag nonexistent RCU GP kthread rcutorture: Add batch number to script printout Documentation/memory-barriers.txt: Fix ACCESS_ONCE thinko documentation: Update RCU requirements based on expedited changes ...
2016-01-12Merge branch 'work.xattr' of ↵Linus Torvalds49-1054/+516
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs xattr updates from Al Viro: "Andreas' xattr cleanup series. It's a followup to his xattr work that went in last cycle; -0.5KLoC" * 'work.xattr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: xattr handlers: Simplify list operation ocfs2: Replace list xattr handler operations nfs: Move call to security_inode_listsecurity into nfs_listxattr xfs: Change how listxattr generates synthetic attributes tmpfs: listxattr should include POSIX ACL xattrs tmpfs: Use xattr handler infrastructure btrfs: Use xattr handler infrastructure vfs: Distinguish between full xattr names and proper prefixes posix acls: Remove duplicate xattr name definitions gfs2: Remove gfs2_xattr_acl_chmod vfs: Remove vfs_xattr_cmp
2016-01-12Merge branch 'work.symlinks' of ↵Linus Torvalds95-470/+570
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs RCU symlink updates from Al Viro: "Replacement of ->follow_link/->put_link, allowing to stay in RCU mode even if the symlink is not an embedded one. No changes since the mailbomb on Jan 1" * 'work.symlinks' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: switch ->get_link() to delayed_call, kill ->put_link() kill free_page_put_link() teach nfs_get_link() to work in RCU mode teach proc_self_get_link()/proc_thread_self_get_link() to work in RCU mode teach shmem_get_link() to work in RCU mode teach page_get_link() to work in RCU mode replace ->follow_link() with new method that could stay in RCU mode don't put symlink bodies in pagecache into highmem namei: page_getlink() and page_follow_link_light() are the same thing ufs: get rid of ->setattr() for symlinks udf: don't duplicate page_symlink_inode_operations logfs: don't duplicate page_symlink_inode_operations switch befs long symlinks to page_symlink_operations
2016-01-11Merge branch 'for-linus' of ↵Linus Torvalds4-120/+146
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs compat_ioctl fixes from Al Viro: "This is basically Jann's patches from last week. I have _not_ included the stuff like switching i2c to ->compat_ioctl() into this one - those need more testing. Ideally I would like fs/compat_ioctl.c shrunk a lot, but that's a separate story" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: compat_ioctl: don't call do_ioctl under set_fs(KERNEL_DS) compat_ioctl: don't pass fd around when not needed compat_ioctl: don't look up the fd twice
2016-01-11Merge branch 'patchwork' into v4l_for_linusMauro Carvalho Chehab528-4096/+4540
* patchwork: (204 commits) [media] rc: sunxi-cir: Initialize the spinlock properly [media] rtl2832: do not filter out slave TS null packets [media] rtl2832: print reg number on error case [media] rtl28xxu: return demod reg page from driver cache [media] coda: enable MPEG-2 ES decoding [media] coda: don't start streaming without queued buffers [media] coda: hook up vidioc_prepare_buf [media] coda: relax coda_jpeg_check_buffer for trailing bytes [media] coda: make to_coda_video_device static [media] s5p-mfc: remove volatile attribute from MFC register addresses [media] s5p-mfc: merge together s5p_mfc_hw_call and s5p_mfc_hw_call_void [media] s5p-mfc: use spinlock to protect MFC context [media] s5p-mfc: remove unnecessary callbacks [media] s5p-mfc: make queue cleanup code common [media] s5p-mfc: use one implementation of s5p_mfc_get_new_ctx [media] s5p-mfc: constify s5p_mfc_codec_ops structures [media] au8522: Avoid memory leak for device config data [media] ir-lirc-codec.c: don't leak lirc->drv-rbuf [media] uvcvideo: small cleanup in uvc_video_clock_update() [media] uvcvideo: Fix reading the current exposure value of UVC ...
2016-01-11Linux 4.4Linus Torvalds1-1/+1
2016-01-10um: Use race-free temporary file creationMickaël Salaün1-0/+11
Open the memory mapped file with the O_TMPFILE flag when available. Signed-off-by: Mickaël Salaün <mic@digikod.net> Cc: Jeff Dike <jdike@addtoit.com> Cc: Richard Weinberger <richard@nod.at> Acked-by: Tristan Schmelcher <tschmelcher@google.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2016-01-10um: Do not set unsecure permission for temporary fileMickaël Salaün1-6/+0
Remove the insecure 0777 mode for temporary file to prohibit other users to change the executable mapped code. An attacker could gain access to the mapped file descriptor from the temporary file (before it is unlinked) in a read-only mode but it should not be accessible in write mode to avoid arbitrary code execution. To not change the hostfs behavior, the temporary file creation permission now depends on the current umask(2) and the implementation of mkstemp(3). Signed-off-by: Mickaël Salaün <mic@digikod.net> Cc: Jeff Dike <jdike@addtoit.com> Cc: Richard Weinberger <richard@nod.at> Acked-by: Tristan Schmelcher <tschmelcher@google.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2016-01-10um: Fix build error and kconfig for i386Mickaël Salaün1-1/+1
Fix build error by generating elfcore.o only when ELF_CORE (depending on COREDUMP) is selected: arch/x86/um/built-in.o: In function `elf_core_write_extra_phdrs': (.text+0x3e62): undefined reference to `dump_emit' arch/x86/um/built-in.o: In function `elf_core_write_extra_data': (.text+0x3eef): undefined reference to `dump_emit' Fixes: 5d2acfc7b974 ("kconfig: make allnoconfig disable options behind EMBEDDED and EXPERT") Signed-off-by: Mickaël Salaün <mic@digikod.net> Cc: Jeff Dike <jdike@addtoit.com> Cc: Richard Weinberger <richard@nod.at> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Michal Marek <mmarek@suse.cz> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Richard Weinberger <richard@nod.at> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2016-01-10um: Add seccomp supportMickaël Salaün5-1/+25
This brings SECCOMP_MODE_STRICT and SECCOMP_MODE_FILTER support through prctl(2) and seccomp(2) to User-mode Linux for i386 and x86_64 subarchitectures. secure_computing() is called first in handle_syscall() so that the syscall emulation will be aborted quickly if matching a seccomp rule. This is inspired from Meredydd Luff's patch (https://gerrit.chromium.org/gerrit/21425). Signed-off-by: Mickaël Salaün <mic@digikod.net> Cc: Jeff Dike <jdike@addtoit.com> Cc: Richard Weinberger <richard@nod.at> Cc: Ingo Molnar <mingo@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Will Drewry <wad@chromium.org> Cc: Chris Metcalf <cmetcalf@ezchip.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: James Hogan <james.hogan@imgtec.com> Cc: Meredydd Luff <meredydd@senatehouse.org> Cc: David Drysdale <drysdale@google.com> Signed-off-by: Richard Weinberger <richard@nod.at> Acked-by: Kees Cook <keescook@chromium.org>
2016-01-10um: Add full asm/syscall.h supportMickaël Salaün2-0/+139
Add subarchitecture-independent implementation of asm-generic/syscall.h allowing access to user system call parameters and results: * syscall_get_nr() * syscall_rollback() * syscall_get_error() * syscall_get_return_value() * syscall_set_return_value() * syscall_get_arguments() * syscall_set_arguments() * syscall_get_arch() provided by arch/x86/um/asm/syscall.h This provides the necessary syscall helpers needed by HAVE_ARCH_SECCOMP_FILTER plus syscall_get_error(). This is inspired from Meredydd Luff's patch (https://gerrit.chromium.org/gerrit/21425). Signed-off-by: Mickaël Salaün <mic@digikod.net> Cc: Jeff Dike <jdike@addtoit.com> Cc: Richard Weinberger <richard@nod.at> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Will Drewry <wad@chromium.org> Cc: Meredydd Luff <meredydd@senatehouse.org> Cc: David Drysdale <drysdale@google.com> Signed-off-by: Richard Weinberger <richard@nod.at> Acked-by: Kees Cook <keescook@chromium.org>
2016-01-10selftests/seccomp: Remove the need for HAVE_ARCH_TRACEHOOKMickaël Salaün1-3/+24
Some architectures do not implement PTRACE_GETREGSET nor PTRACE_SETREGSET (required by HAVE_ARCH_TRACEHOOK) but only implement PTRACE_GETREGS and PTRACE_SETREGS (e.g. User-mode Linux). This improve seccomp selftest portability for architectures without HAVE_ARCH_TRACEHOOK support by defining a new trigger HAVE_GETREGS. For now, this is only enabled for i386 and x86_64 architectures. This is required to be able to run this tests on User-mode Linux. Signed-off-by: Mickaël Salaün <mic@digikod.net> Cc: Jeff Dike <jdike@addtoit.com> Cc: Richard Weinberger <richard@nod.at> Cc: Kees Cook <keescook@chromium.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Will Drewry <wad@chromium.org> Cc: Shuah Khan <shuahkh@osg.samsung.com> Cc: Meredydd Luff <meredydd@senatehouse.org> Cc: David Drysdale <drysdale@google.com> Signed-off-by: Richard Weinberger <richard@nod.at> Acked-by: Kees Cook <keescook@chromium.org>
2016-01-10um: Fix ptrace GETREGS/SETREGS bugsMickaël Salaün4-25/+17
This fix two related bugs: * PTRACE_GETREGS doesn't get the right orig_ax (syscall) value * PTRACE_SETREGS can't set the orig_ax value (erased by initial value) Get rid of the now useless and error-prone get_syscall(). Fix inconsistent behavior in the ptrace implementation for i386 when updating orig_eax automatically update the syscall number as well. This is now updated in handle_syscall(). Signed-off-by: Mickaël Salaün <mic@digikod.net> Cc: Jeff Dike <jdike@addtoit.com> Cc: Richard Weinberger <richard@nod.at> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Kees Cook <keescook@chromium.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Will Drewry <wad@chromium.org> Cc: Thomas Meyer <thomas@m3y3r.de> Cc: Nicolas Iooss <nicolas.iooss_linux@m4x.org> Cc: Anton Ivanov <aivanov@brocade.com> Cc: Meredydd Luff <meredydd@senatehouse.org> Cc: David Drysdale <drysdale@google.com> Signed-off-by: Richard Weinberger <richard@nod.at> Acked-by: Kees Cook <keescook@chromium.org>
2016-01-10um: link with -lpthreadVegard Nossum1-1/+1
Similarly to commit fb1770aa78a43530940d0c2dd161e77bc705bdac, with gcc 5 on Ubuntu and CONFIG_STATIC_LINK=y I was seeing these linker errors: /usr/lib/gcc/x86_64-linux-gnu/5/../../../x86_64-linux-gnu/librt.a(timer_create.o): In function `__timer_create_new': (.text+0xcd): undefined reference to `pthread_once' /usr/lib/gcc/x86_64-linux-gnu/5/../../../x86_64-linux-gnu/librt.a(timer_create.o): In function `__timer_create_new': (.text+0x126): undefined reference to `pthread_attr_init' /usr/lib/gcc/x86_64-linux-gnu/5/../../../x86_64-linux-gnu/librt.a(timer_create.o): In function `__timer_create_new': (.text+0x168): undefined reference to `pthread_attr_setdetachstate' [...] Obviously we also need -lpthread for librt.a. Cc: stable@vger.kernel.org # 4.4 Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2016-01-10um: Update UBD to use pread/pwrite family of functionsAnton Ivanov3-22/+26
This decreases the number of syscalls per read/write by half. Signed-off-by: Anton Ivanov <aivanov@brocade.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2016-01-10um: Do not change hard IRQ flags in soft IRQ processingAnton Ivanov1-0/+23
Software IRQ processing in generic architectures assumes that the exit out of hard IRQ may have re-enabled interrupts (some architectures may have an implicit EOI). It presumes them enabled and toggles the flags once more just in case unless this is turned off in the architecture specific hardirq.h by setting __ARCH_IRQ_EXIT_IRQS_DISABLED This patch adds this to UML where due to the way IRQs are handled it is an optimization (it works fine without it too). Signed-off-by: Anton Ivanov <aivanov@brocade.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2016-01-10um: Prevent IRQ handler reentrancyAnton Ivanov1-1/+15
The existing IRQ handler design in UML does not prevent reentrancy This is mitigated by fd-enable/fd-disable semantics for the IO portion of the UML subsystem. The timer, however, can and is re-entered resulting in very deep stack usage and occasional stack exhaustion. This patch prevents this by checking if there is a timer interrupt in-flight before processing any pending timer interrupts. Signed-off-by: Anton Ivanov <aivanov@brocade.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2016-01-10uml: flush stdout before forkingVegard Nossum1-0/+2
I was seeing some really weird behaviour where piping UML's output somewhere would cause output to get duplicated: $ ./vmlinux | head -n 40 Checking that ptrace can change system call numbers...Core dump limits : soft - 0 hard - NONE OK Checking syscall emulation patch for ptrace...Core dump limits : soft - 0 hard - NONE OK Checking advanced syscall emulation patch for ptrace...Core dump limits : soft - 0 hard - NONE OK Core dump limits : soft - 0 hard - NONE This is because these tests do a fork() which duplicates the non-empty stdout buffer, then glibc flushes the duplicated buffer as each child exits. A simple workaround is to flush before forking. Cc: stable@vger.kernel.org Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2016-01-10uml: fix hostfs mknod()Vegard Nossum1-3/+1
An inverted return value check in hostfs_mknod() caused the function to return success after handling it as an error (and cleaning up). It resulted in the following segfault when trying to bind() a named unix socket: Pid: 198, comm: a.out Not tainted 4.4.0-rc4 RIP: 0033:[<0000000061077df6>] RSP: 00000000daae5d60 EFLAGS: 00010202 RAX: 0000000000000000 RBX: 000000006092a460 RCX: 00000000dfc54208 RDX: 0000000061073ef1 RSI: 0000000000000070 RDI: 00000000e027d600 RBP: 00000000daae5de0 R08: 00000000da980ac0 R09: 0000000000000000 R10: 0000000000000003 R11: 00007fb1ae08f72a R12: 0000000000000000 R13: 000000006092a460 R14: 00000000daaa97c0 R15: 00000000daaa9a88 Kernel panic - not syncing: Kernel mode fault at addr 0x40, ip 0x61077df6 CPU: 0 PID: 198 Comm: a.out Not tainted 4.4.0-rc4 #1 Stack: e027d620 dfc54208 0000006f da981398 61bee000 0000c1ed daae5de0 0000006e e027d620 dfcd4208 00000005 6092a460 Call Trace: [<60dedc67>] SyS_bind+0xf7/0x110 [<600587be>] handle_syscall+0x7e/0x80 [<60066ad7>] userspace+0x3e7/0x4e0 [<6006321f>] ? save_registers+0x1f/0x40 [<6006c88e>] ? arch_prctl+0x1be/0x1f0 [<60054985>] fork_handler+0x85/0x90 Let's also get rid of the "cosmic ray protection" while we're at it. Fixes: e9193059b1b3 "hostfs: fix races in dentry_name() and inode_name()" Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Cc: Jeff Dike <jdike@addtoit.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: stable@vger.kernel.org Signed-off-by: Richard Weinberger <richard@nod.at>
2016-01-10m68k: Provide __phys_to_pfn() and __pfn_to_phys()Sudip Mukherjee1-0/+3
The defconfig build of m68k was failing with the error: implicit declaration of function '__pfn_to_phys' Other architectures have added <asm/memory.h>, but if we do so here then we will also get redeclaration of some other functions. So it is better to copy these macros into page.h. Fixes: 0a3c3bf11240 ("x86, mm: introduce vmem_altmap to augment vmemmap_populate()") Signed-off-by: Sudip Mukherjee <sudip@vectorindia.org> Cc: Dan Williams <dan.j.williams@intel.com> Reported-by: Guenter Roeck <linux@roeck-us.net> (m68knommu) [geert: Apply to page.h instead of page_mm.h to cover nommu, reword] Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
2016-01-10m68k/atari, m68k/sun3: Fix SCSI platform device registration when driver is ↵Finn Thain2-3/+3
modular Fixes: 3ff228af84b5 ("atari_scsi: Convert to platform device") Fixes: 0d31f8759109 ("sun3_scsi: Convert to platform device") Reported-by: Michael Schmitz <schmitzmic@gmail.com> Signed-off-by: Finn Thain <fthain@telegraphics.com.au> Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
2016-01-10ubifs: Use XATTR_*_PREFIX_LENRichard Weinberger1-2/+2
...instead of open coding it. Signed-off-by: Richard Weinberger <richard@nod.at>
2016-01-10UBIFS: add a comment in key.h for unused parameterDongsheng Yang1-0/+6
Add a comment in key.h to explain why we keep an unused parameter in key helpers. Signed-off-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2016-01-10mtd: ubi: wl: avoid erasing a PEB which is emptySebastian Siewior1-3/+18
wear_leveling_worker() currently unconditionally puts a PEB on erase in the error case even it just been taken from the free_list and never used. In case the PEB was never used it can be put back on the free list saving a precious erase cycle. v1…v2: - to_leb_clean -> dst_leb_clean - use the nested option for ensure_wear_leveling() - do_sync_erase() can't go -ENOMEM so we can just go into RO-mode now. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Richard Weinberger <richard@nod.at>
2016-01-10Merge tag 'scsi-fixes' of ↵Linus Torvalds1-3/+6
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fix from James Bottomley: "A single fix for machines with pages > 4k (PPC mostly). There's a bug in our optimal transfer size code where we don't account for pages > 4k and can set the transfer size to be less than the page size causing nasty failures" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: sd: Reject optimal transfer length smaller than page size
2016-01-10Merge tag 'pci-v4.4-fixes-4' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI fixlet from Bjorn Helgaas: "This marks the TI DRA7xx host bridge driver as broken. Apparently it has never worked without some additional out-of-tree code, so I'm going to mark it broken now and remove it completely next cycle unless it's fixed" * tag 'pci-v4.4-fixes-4' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: PCI: dra7xx: Mark driver as broken
2016-01-09Merge tag 'perf-core-for-mingo' of ↵Ingo Molnar65-347/+1556
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo: New features: - Allow using trace events fields as sort order keys, making 'perf evlist --trace_fields' show those, and then the user can select a subset and use like: perf top -e sched:sched_switch -s prev_comm,next_comm That works as well in 'perf report' when handling files containing tracepoints. The default when just tracepoint events are found in a perf.data file is to format it like ftrace, using the libtraceevent formatters, plugins, etc (Namhyung Kim) - Add support in 'perf script' to process 'perf stat record' generated files, culminating in a python perf script that calculates CPI (Cycles per Instruction) (Jiri Olsa) - Show random perf tool tips in the 'perf report' bottom line (Namhyung Kim) - perf report now defaults to --group if the perf.data file has grouped events, try it with: # perf record -e '{cycles,instructions}' -a sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 1.093 MB perf.data (1247 samples) ] # perf report # Samples: 1K of event 'anon group { cycles, instructions }' # Event count (approx.): 1955219195 # # Overhead Command Shared Object Symbol 2.86% 0.22% swapper [kernel.kallsyms] [k] intel_idle 1.05% 0.33% firefox libxul.so [.] js::SetObjectElement 1.05% 0.00% kworker/0:3 [kernel.kallsyms] [k] gen6_ring_get_seqno 0.88% 0.17% chrome chrome [.] 0x0000000000ee27ab 0.65% 0.86% firefox libxul.so [.] js::ValueToId<(js::AllowGC)1> 0.64% 0.23% JS Helper libxul.so [.] js::SplayTree<js::jit::LiveRange*, js::jit::LiveRange>::splay 0.62% 1.27% firefox libxul.so [.] js::GetIterator 0.61% 1.74% firefox libxul.so [.] js::NativeSetProperty 0.61% 0.31% firefox libxul.so [.] js::SetPropertyByDefining User visible fixes: - Coect data mmaps so that the DWARF unwinder can handle usecases needing them, like softice (Jiri Olsa) - Decay callchains in fractal mode, fixing up cases where 'perf top -g' would show entries with more than 100% (Namhyung Kim) Infrastructure changes: - Sync tools/lib with the lib/ in the kernel sources for find_bit.c and move bitmap.[ch] from tools/perf/util/ to tools/lib/ (Arnaldo Carvalho de Melo) - No need to set attr.sample_freq in some 'perf test' entries that only want to deal with PERF_RECORD_ meta-events, improve a bit error output for CQM test (Arnaldo Carvalho de Melo) - Fix python binding build, adding some missing object files now required due to cpumap using find_bit stuff (Arnaldo Carvalho de Melo) - tools/build improvemnts (Jiri Olsa) - Add more files to cscope/ctags databases (Jiri Olsa) - Do not show 'trace' in 'perf help' if it is not compiled in (Jiri Olsa) - Make perf_evlist__open() open evsels with their cpus and threads, like perf record does, making them consistent (Adrian Hunter) - Fix pmu snapshot initialization bug (Stephane Eranian) - Add missing headers in perf's MANIFEST (Wang Nan) Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-09hwmon: (nct6683) Add basic support for NCT6683 on Mitac boardsGuenter Roeck1-17/+61
Mitac microcode differs from Intel microcode. One key difference is that pwm values can be written. Detect vendor from customer ID field and no longer use DMI data to identify which microcode is running on the chip. Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2016-01-09nfsd: don't hold i_mutex over userspace upcallsNeilBrown5-19/+86
We need information about exports when crossing mountpoints during lookup or NFSv4 readdir. If we don't already have that information cached, we may have to ask (and wait for) rpc.mountd. In both cases we currently hold the i_mutex on the parent of the directory we're asking rpc.mountd about. We've seen situations where rpc.mountd performs some operation on that directory that tries to take the i_mutex again, resulting in deadlock. With some care, we may be able to avoid that in rpc.mountd. But it seems better just to avoid holding a mutex while waiting on userspace. It appears that lookup_one_len is pretty much the only operation that needs the i_mutex. So we could just drop the i_mutex elsewhere and do something like mutex_lock() lookup_one_len() mutex_unlock() In many cases though the lookup would have been cached and not required the i_mutex, so it's more efficient to create a lookup_one_len() variant that only takes the i_mutex when necessary. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-01-09fs:affs:Replace time_t with time64_tDengChao3-8/+9
The affs code uses "time_t" and "get_seconds()". This will cause problems on 32-bit architectures in 2038 when time_t overflows. This patch replaces them with "time64_t" and "ktime_get_real_seconds()". This patch introduces expensive 64-bit divsion in "secs_to_datestamp()", considering this function is not called so often, the cost should be acceptable. Reviewed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: DengChao <chao.deng@linaro.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-01-09fs/9p: use fscache mutex rather than spinlockSasha Levin3-6/+6
We may sleep inside a the lock, so use a mutex rather than spinlock. Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-01-09proc: add a reschedule point in proc_readfd_common()Eric Dumazet1-0/+1
User can pass an arbitrary large buffer to getdents(). It is typically a 32KB buffer used by libc scandir() implementation. When scanning /proc/{pid}/fd, we can hold cpu way too long, so add a cond_resched() to be kind with other tasks. We've seen latencies of more than 50ms on real workloads. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-01-09logfs: constify logfs_block_ops structuresJulia Lawall3-5/+5
The logfs_block_ops structures are never modified, so declare them as const. Done with the help of Coccinelle. Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-01-09fcntl: allow to set O_DIRECT flag on pipeStanislav Kinsburskiy1-1/+2
With packetized mode for pipes, it's not possible to set O_DIRECT on pipe file via sys_fcntl, because of unsupported sanity checks. Ability to set this flag will be used by CRIU to migrate packetized pipes. v2: Fixed typos and mode variable to check. Signed-off-by: Stanislav Kinsburskiy <skinsbursky@virtuozzo.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-01-09fs: __generic_file_splice_read retry lookup on AOP_TRUNCATED_PAGEAbhi Das1-5/+3
During testing, I discovered that __generic_file_splice_read() returns 0 (EOF) when aops->readpage fails with AOP_TRUNCATED_PAGE on the first page of a single/multi-page splice read operation. This EOF return code causes the userspace test to (correctly) report a zero-length read error when it was expecting otherwise. The current strategy of returning a partial non-zero read when ->readpage returns AOP_TRUNCATED_PAGE works only when the failed page is not the first of the lot being processed. This patch attempts to retry lookup and call ->readpage again on pages that had previously failed with AOP_TRUNCATED_PAGE. With this patch, my tests pass and I haven't noticed any unwanted side effects. This version removes the thrice-retry loop and instead indefinitely retries lookups on AOP_TRUNCATED_PAGE errors from ->readpage. This behavior is now similar to do_generic_file_read(). Signed-off-by: Abhi Das <adas@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Bob Peterson <rpeterso@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-01-09fs: xattr: Use kvfree()Richard Weinberger1-24/+14
... instead of open coding it. Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-01-09vmstat: allocate vmstat_wq before it is usedMichal Hocko1-1/+1
kernel test robot has reported the following crash: BUG: unable to handle kernel NULL pointer dereference at 00000100 IP: [<c1074df6>] __queue_work+0x26/0x390 *pdpt = 0000000000000000 *pde = f000ff53f000ff53 *pde = f000ff53f000ff53 Oops: 0000 [#1] PREEMPT PREEMPT SMP SMP CPU: 0 PID: 24 Comm: kworker/0:1 Not tainted 4.4.0-rc4-00139-g373ccbe #1 Workqueue: events vmstat_shepherd task: cb684600 ti: cb7ba000 task.ti: cb7ba000 EIP: 0060:[<c1074df6>] EFLAGS: 00010046 CPU: 0 EIP is at __queue_work+0x26/0x390 EAX: 00000046 EBX: cbb37800 ECX: cbb37800 EDX: 00000000 ESI: 00000000 EDI: 00000000 EBP: cb7bbe68 ESP: cb7bbe38 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 CR0: 8005003b CR2: 00000100 CR3: 01fd5000 CR4: 000006b0 Stack: Call Trace: __queue_delayed_work+0xa1/0x160 queue_delayed_work_on+0x36/0x60 vmstat_shepherd+0xad/0xf0 process_one_work+0x1aa/0x4c0 worker_thread+0x41/0x440 kthread+0xb0/0xd0 ret_from_kernel_thread+0x21/0x40 The reason is that start_shepherd_timer schedules the shepherd work item which uses vmstat_wq (vmstat_shepherd) before setup_vmstat allocates that workqueue so if the further initialization takes more than HZ we might end up scheduling on a NULL vmstat_wq. This is really unlikely but not impossible. Fixes: 373ccbe59270 ("mm, vmstat: allow WQ concurrency to discover memory reclaim doesn't make any progress") Reported-by: kernel test robot <ying.huang@linux.intel.com> Signed-off-by: Michal Hocko <mhocko@suse.com> Tested-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Cc: stable@vger.kernel.org Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-09[s390] page_to_phys() always returns a multiple of PAGE_SIZEAl Viro1-2/+1
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-01-09nbd: use ->compat_ioctl()Al Viro2-11/+1
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-01-09Merge branch 'for-linus' into work.miscAl Viro77-568/+855
2016-01-09compat_ioctl: don't call do_ioctl under set_fs(KERNEL_DS)Jann Horn1-62/+68
This replaces all code in fs/compat_ioctl.c that translated ioctl arguments into a in-kernel structure, then performed do_ioctl under set_fs(KERNEL_DS), with code that allocates data on the user stack and can call the VFS ioctl handler under USER_DS. This is done as a hardening measure because the caller does not know what kind of ioctl handler will be invoked, only that no corresponding compat_ioctl handler exists and what the ioctl command number is. The accidental invocation of an unlocked_ioctl handler that unexpectedly calls copy_to_user could be a severe security issue. Signed-off-by: Jann Horn <jann@thejh.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-01-09compat_ioctl: don't pass fd around when not neededAl Viro4-55/+61
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-01-09compat_ioctl: don't look up the fd twiceJann Horn1-54/+68
In code in fs/compat_ioctl.c that translates ioctl arguments into a in-kernel structure, then performs sys_ioctl, possibly under set_fs(KERNEL_DS), this commit changes the sys_ioctl calls to do_ioctl calls. do_ioctl is a new function that does the same thing as sys_ioctl, but doesn't look up the fd again. This change is made to avoid (potential) security issues because of ioctl handlers that accept one of the ioctl commands I2C_FUNCS, VIDEO_GET_EVENT, MTIOCPOS, MTIOCGET, TIOCGSERIAL, TIOCSSERIAL, RTC_IRQP_READ, RTC_EPOCH_READ. This can happen for multiple reasons: - The ioctl command number could be reused. - The ioctl handler might not check the full ioctl command. This is e.g. true for drm_ioctl. - The ioctl handler is very special, e.g. cuse_file_ioctl The real issue is that set_fs(KERNEL_DS) is used here, but that's fixed in a separate commit "compat_ioctl: don't call do_ioctl under set_fs(KERNEL_DS)". This change mitigates potential security issues by preventing a race that permits invocation of unlocked_ioctl handlers under KERNEL_DS through compat code even if a corresponding compat_ioctl handler exists. So far, no way has been identified to use this to damage kernel memory without having CAP_SYS_ADMIN in the init ns (with the capability, doing reads/writes at arbitrary kernel addresses should be easy through CUSE's ioctl handler with FUSE_IOCTL_UNRESTRICTED set). [AV: two missed sys_ioctl() taken care of] Signed-off-by: Jann Horn <jann@thejh.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-01-09dm snapshot: fix hung bios when copy error occursMikulas Patocka4-19/+12
When there is an error copying a chunk dm-snapshot can incorrectly hold associated bios indefinitely, resulting in hung IO. The function copy_callback sets pe->error if there was error copying the chunk, and then calls complete_exception. complete_exception calls pending_complete on error, otherwise it calls commit_exception with commit_callback (and commit_callback calls complete_exception). The persistent exception store (dm-snap-persistent.c) assumes that calls to prepare_exception and commit_exception are paired. persistent_prepare_exception increases ps->pending_count and persistent_commit_exception decreases it. If there is a copy error, persistent_prepare_exception is called but persistent_commit_exception is not. This results in the variable ps->pending_count never returning to zero and that causes some pending exceptions (and their associated bios) to be held forever. Fix this by unconditionally calling commit_exception regardless of whether the copy was successful. A new "valid" parameter is added to commit_exception -- when the copy fails this parameter is set to zero so that the chunk that failed to copy (and all following chunks) is not recorded in the snapshot store. Also, remove commit_callback now that it is merely a wrapper around pending_complete. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org