summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2016-12-15arch/microblaze: add option to skip DMA sync as a part of map and unmapAlexander Duyck1-2/+8
This change allows us to pass DMA_ATTR_SKIP_CPU_SYNC which allows us to avoid invoking cache line invalidation if the driver will just handle it via a sync_for_cpu or sync_for_device call. Link: http://lkml.kernel.org/r/20161110113508.76501.77583.stgit@ahduyck-blue-test.jf.intel.com Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Cc: Michal Simek <monstr@monstr.eu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-15arch/metag: add option to skip DMA sync as a part of map and unmapAlexander Duyck1-3/+13
This change allows us to pass DMA_ATTR_SKIP_CPU_SYNC which allows us to avoid invoking cache line invalidation if the driver will just handle it via a sync_for_cpu or sync_for_device call. Link: http://lkml.kernel.org/r/20161110113503.76501.80809.stgit@ahduyck-blue-test.jf.intel.com Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Cc: James Hogan <james.hogan@imgtec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-15arch/m68k: add option to skip DMA sync as a part of mappingAlexander Duyck1-1/+7
This change allows us to pass DMA_ATTR_SKIP_CPU_SYNC which allows us to avoid invoking cache line invalidation if the driver will just handle it later via a sync_for_cpu or sync_for_device call. Link: http://lkml.kernel.org/r/20161110113457.76501.77603.stgit@ahduyck-blue-test.jf.intel.com Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-15arch/hexagon: Add option to skip DMA sync as a part of mappingAlexander Duyck1-1/+5
This change allows us to pass DMA_ATTR_SKIP_CPU_SYNC which allows us to avoid invoking cache line invalidation if the driver will just handle it later via a sync_for_cpu or sync_for_device call. Link: http://lkml.kernel.org/r/20161110113452.76501.45864.stgit@ahduyck-blue-test.jf.intel.com Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Cc: Richard Kuo <rkuo@codeaurora.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-15arch/frv: add option to skip sync on DMA mapAlexander Duyck2-6/+17
The use of DMA_ATTR_SKIP_CPU_SYNC was not consistent across all of the DMA APIs in the arch/arm folder. This change is meant to correct that so that we get consistent behavior. Link: http://lkml.kernel.org/r/20161110113447.76501.93160.stgit@ahduyck-blue-test.jf.intel.com Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Cc: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-15arch/c6x: add option to skip sync on DMA map and unmapAlexander Duyck1-4/+10
This change allows us to pass DMA_ATTR_SKIP_CPU_SYNC which allows us to avoid invoking cache line invalidation if the driver will just handle it later via a sync_for_cpu or sync_for_device call. Link: http://lkml.kernel.org/r/20161110113442.76501.7673.stgit@ahduyck-blue-test.jf.intel.com Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Acked-by: Mark Salter <msalter@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-15arch/blackfin: add option to skip sync on DMA mapAlexander Duyck1-1/+7
The use of DMA_ATTR_SKIP_CPU_SYNC was not consistent across all of the DMA APIs in the arch/arm folder. This change is meant to correct that so that we get consistent behavior. Link: http://lkml.kernel.org/r/20161110113436.76501.13386.stgit@ahduyck-blue-test.jf.intel.com Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Cc: Steven Miao <realmz6@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-15arch/avr32: add option to skip sync on DMA mapAlexander Duyck1-1/+6
The use of DMA_ATTR_SKIP_CPU_SYNC was not consistent across all of the DMA APIs in the arch/arm folder. This change is meant to correct that so that we get consistent behavior. Link: http://lkml.kernel.org/r/20161110113430.76501.79737.stgit@ahduyck-blue-test.jf.intel.com Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Acked-by: Hans-Christian Noren Egtvedt <egtvedt@samfundet.no> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-15arch/arm: add option to skip sync on DMA map and unmapAlexander Duyck1-6/+10
The use of DMA_ATTR_SKIP_CPU_SYNC was not consistent across all of the DMA APIs in the arch/arm folder. This change is meant to correct that so that we get consistent behavior. Link: http://lkml.kernel.org/r/20161110113424.76501.2715.stgit@ahduyck-blue-test.jf.intel.com Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Cc: Russell King <linux@armlinux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-15arch/arc: add option to skip sync on DMA mappingAlexander Duyck1-1/+4
Patch series "Add support for DMA writable pages being writable by the network stack", v3. The first 19 patches in the set add support for the DMA attribute DMA_ATTR_SKIP_CPU_SYNC on multiple platforms/architectures. This is needed so that we can flag the calls to dma_map/unmap_page so that we do not invalidate cache lines that do not currently belong to the device. Instead we have to take care of this in the driver via a call to sync_single_range_for_cpu prior to freeing the Rx page. Patch 20 adds support for dma_map_page_attrs and dma_unmap_page_attrs so that we can unmap and map a page using the DMA_ATTR_SKIP_CPU_SYNC attribute. Patch 21 adds support for freeing a page that has multiple references being held by a single caller. This way we can free page fragments that were allocated by a given driver. The last 2 patches use these updates in the igb driver, and lay the groundwork to allow for us to reimplement the use of build_skb. This patch (of 23): This change allows us to pass DMA_ATTR_SKIP_CPU_SYNC which allows us to avoid invoking cache line invalidation if the driver will just handle it later via a sync_for_cpu or sync_for_device call. Link: http://lkml.kernel.org/r/20161110113419.76501.38491.stgit@ahduyck-blue-test.jf.intel.com Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Acked-by: Vineet Gupta <vgupta@synopsys.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-15sysctl: add KERN_CONT to deprecated_sysctl_warning()Tetsuo Handa1-2/+2
Do not break lines while printk()ing values. kernel: warning: process `tomoyo_file_tes' used the deprecated sysctl system call with kernel: 3. kernel: 5. kernel: 56. kernel: Link: http://lkml.kernel.org/r/1480814833-4976-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-15kexec: add cond_resched into kimage_alloc_crash_control_pageszhong jiang1-0/+2
A soft lookup will occur when I run trinity in syscall kexec_load. the corresponding stack information is as follows. BUG: soft lockup - CPU#6 stuck for 22s! [trinity-c6:13859] Kernel panic - not syncing: softlockup: hung tasks CPU: 6 PID: 13859 Comm: trinity-c6 Tainted: G O L ----V------- 3.10.0-327.28.3.35.zhongjiang.x86_64 #1 Hardware name: Huawei Technologies Co., Ltd. Tecal BH622 V2/BC01SRSA0, BIOS RMIBV386 06/30/2014 Call Trace: <IRQ> dump_stack+0x19/0x1b panic+0xd8/0x214 watchdog_timer_fn+0x1cc/0x1e0 __hrtimer_run_queues+0xd2/0x260 hrtimer_interrupt+0xb0/0x1e0 ? call_softirq+0x1c/0x30 local_apic_timer_interrupt+0x37/0x60 smp_apic_timer_interrupt+0x3f/0x60 apic_timer_interrupt+0x6d/0x80 <EOI> ? kimage_alloc_control_pages+0x80/0x270 ? kmem_cache_alloc_trace+0x1ce/0x1f0 ? do_kimage_alloc_init+0x1f/0x90 kimage_alloc_init+0x12a/0x180 SyS_kexec_load+0x20a/0x260 system_call_fastpath+0x16/0x1b the first time allocation of control pages may take too much time because crash_res.end can be set to a higher value. we need to add cond_resched to avoid the issue. The patch have been tested and above issue is not appear. Link: http://lkml.kernel.org/r/1481164674-42775-1-git-send-email-zhongjiang@huawei.com Signed-off-by: zhong jiang <zhongjiang@huawei.com> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Xunlei Pang <xpang@redhat.com> Cc: Dave Young <dyoung@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-15kexec: export the value of phys_base instead of symbol addressBaoquan He2-4/+2
Currently in x86_64, the symbol address of phys_base is exported to vmcoreinfo. Dave Anderson complained this is really useless for his Crash implementation. Because in user-space utility Crash and Makedumpfile which exported vmcore information is mainly used for, value of phys_base is needed to covert virtual address of exported kernel symbol to physical address. Especially init_level4_pgt, if we want to access and go over the page table to look up a PA corresponding to VA, firstly we need calculate page_dir = SYMBOL(init_level4_pgt) - __START_KERNEL_map + phys_base; Now in Crash and Makedumpfile, we have to analyze the vmcore elf program header to get value of phys_base. As Dave said, it would be preferable if it were readily availabl in vmcoreinfo rather than depending upon the PT_LOAD semantics. Hence in this patch change to export the value of phys_base instead of its virtual address. And people also complained that KERNEL_IMAGE_SIZE exporting is x86_64 only, should be moved into arch dependent function arch_crash_save_vmcoreinfo. Do the moving in this patch. Link: http://lkml.kernel.org/r/1478568596-30060-2-git-send-email-bhe@redhat.com Signed-off-by: Baoquan He <bhe@redhat.com> Cc: Thomas Garnier <thgarnie@google.com> Cc: Baoquan He <bhe@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Xunlei Pang <xlpang@redhat.com> Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> Cc: Kees Cook <keescook@chromium.org> Cc: Eugene Surovegin <surovegin@google.com> Cc: Dave Young <dyoung@redhat.com> Cc: AKASHI Takahiro <takahiro.akashi@linaro.org> Cc: Atsushi Kumagai <ats-kumagai@wm.jp.nec.com> Cc: Dave Anderson <anderson@redhat.com> Cc: Pratyush Anand <panand@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-15Revert "kdump, vmcoreinfo: report memory sections virtual addresses"Baoquan He2-9/+0
This reverts commit 0549a3c02efb ("kdump, vmcoreinfo: report memory sections virtual addresses"). Commit 0549a3c02efb tells the userspace utility makedumpfile the randomized base address of these memmory sections when mm kaslr is enabled. However the following patch "kexec: export the value of phys_base instead of symbol address" makes makedumpfile not need these addresses any more. Besides we should use VMCOREINFO_NUMBER to export the value of the variable so that we can use the existing number_table mechanism of Makedumpfile to fetch it. So revert it now. If needed we can add it later. http://lists.infradead.org/pipermail/kexec/2016-October/017540.html Link: http://lkml.kernel.org/r/1478568596-30060-1-git-send-email-bhe@redhat.com Signed-off-by: Baoquan He <bhe@redhat.com> Cc: Thomas Garnier <thgarnie@google.com> Cc: Baoquan He <bhe@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Xunlei Pang <xlpang@redhat.com> Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> Cc: Kees Cook <keescook@chromium.org> Cc: Eugene Surovegin <surovegin@google.com> Cc: Dave Young <dyoung@redhat.com> Cc: AKASHI Takahiro <takahiro.akashi@linaro.org> Cc: Atsushi Kumagai <ats-kumagai@wm.jp.nec.com> Cc: Dave Anderson <anderson@redhat.com> Cc: Pratyush Anand <panand@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-15coredump: clarify "unsafe core_pattern" warningAlexey Dobriyan1-3/+5
I was amused to find "unsafe core_pattern" warning having these lines in /etc/sysctl.conf: fs.suid_dumpable=2 kernel.core_pattern=/core/core-%e-%p-%E kernel.core_uses_pid=0 Turns out kernel is formally right. Default core_pattern is just "core", which doesn't qualify for secure path while setting suid.dumpable. Hint admins about solution, clarify sysctl names, delete unnecessary '\' characters (string literals are concatenated regardless) and reformat for easier grepping. Link: http://lkml.kernel.org/r/20161029152124.GA1258@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-15signals: avoid unnecessary taking of sighand->siglockWaiman Long2-0/+24
When running certain database workload on a high-end system with many CPUs, it was found that spinlock contention in the sigprocmask syscalls became a significant portion of the overall CPU cycles as shown below. 9.30% 9.30% 905387 dataserver /proc/kcore 0x7fff8163f4d2 [k] _raw_spin_lock_irq | ---_raw_spin_lock_irq | |--99.34%-- __set_current_blocked | sigprocmask | sys_rt_sigprocmask | system_call_fastpath | | | |--50.63%-- __swapcontext | | | | | |--99.91%-- upsleepgeneric | | | |--49.36%-- __setcontext | | ktskRun Looking further into the swapcontext function in glibc, it was found that the function always call sigprocmask() without checking if there are changes in the signal mask. A check was added to the __set_current_blocked() function to avoid taking the sighand->siglock spinlock if there is no change in the signal mask. This will prevent unneeded spinlock contention when many threads are trying to call sigprocmask(). With this patch applied, the spinlock contention in sigprocmask() was gone. Link: http://lkml.kernel.org/r/1474979209-11867-1-git-send-email-Waiman.Long@hpe.com Signed-off-by: Waiman Long <Waiman.Long@hpe.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Stas Sergeev <stsp@list.ru> Cc: Scott J Norton <scott.norton@hpe.com> Cc: Douglas Hatch <doug.hatch@hpe.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-15mm, compaction: allow compaction for GFP_NOFS requestsMichal Hocko1-3/+14
compaction has been disabled for GFP_NOFS and GFP_NOIO requests since the direct compaction was introduced by commit 56de7263fcf3 ("mm: compaction: direct compact when a high-order allocation fails"). The main reason is that the migration of page cache pages might recurse back to fs/io layer and we could potentially deadlock. This is overly conservative because all the anonymous memory is migrateable in the GFP_NOFS context just fine. This might be a large portion of the memory in many/most workkloads. Remove the GFP_NOFS restriction and make sure that we skip all fs pages (those with a mapping) while isolating pages to be migrated. We cannot consider clean fs pages because they might need a metadata update so only isolate pages without any mapping for nofs requests. The effect of this patch will be probably very limited in many/most workloads because higher order GFP_NOFS requests are quite rare, although different configurations might lead to very different results. David Chinner has mentioned a heavy metadata workload with 64kB block which to quote him: : Unfortunately, there was an era of cargo cult configuration tweaks in the : Ceph community that has resulted in a large number of production machines : with XFS filesystems configured this way. And a lot of them store large : numbers of small files and run under significant sustained memory : pressure. : : I slowly working towards getting rid of these high order allocations and : replacing them with the equivalent number of single page allocations, but : I haven't got that (complex) change working yet. We can do the following to simulate that workload: $ mkfs.xfs -f -n size=64k <dev> $ mount <dev> /mnt/scratch $ time ./fs_mark -D 10000 -S0 -n 100000 -s 0 -L 32 \ -d /mnt/scratch/0 -d /mnt/scratch/1 \ -d /mnt/scratch/2 -d /mnt/scratch/3 \ -d /mnt/scratch/4 -d /mnt/scratch/5 \ -d /mnt/scratch/6 -d /mnt/scratch/7 \ -d /mnt/scratch/8 -d /mnt/scratch/9 \ -d /mnt/scratch/10 -d /mnt/scratch/11 \ -d /mnt/scratch/12 -d /mnt/scratch/13 \ -d /mnt/scratch/14 -d /mnt/scratch/15 and indeed is hammers the system with many high order GFP_NOFS requests as per a simle tracepoint during the load: $ echo '!(gfp_flags & 0x80) && (gfp_flags &0x400000)' > $TRACE_MNT/events/kmem/mm_page_alloc/filter I am getting 5287609 order=0 37 order=1 1594905 order=2 3048439 order=3 6699207 order=4 66645 order=5 My testing was done in a kvm guest so performance numbers should be taken with a grain of salt but there seems to be a difference when the patch is applied: * Original kernel FSUse% Count Size Files/sec App Overhead 1 1600000 0 4300.1 20745838 3 3200000 0 4239.9 23849857 5 4800000 0 4243.4 25939543 6 6400000 0 4248.4 19514050 8 8000000 0 4262.1 20796169 9 9600000 0 4257.6 21288675 11 11200000 0 4259.7 19375120 13 12800000 0 4220.7 22734141 14 14400000 0 4238.5 31936458 16 16000000 0 4231.5 23409901 18 17600000 0 4045.3 23577700 19 19200000 0 2783.4 58299526 21 20800000 0 2678.2 40616302 23 22400000 0 2693.5 83973996 and xfs complaining about memory allocation not making progress [ 2304.372647] XFS: fs_mark(3289) possible memory allocation deadlock size 65624 in kmem_alloc (mode:0x2408240) [ 2304.443323] XFS: fs_mark(3285) possible memory allocation deadlock size 65728 in kmem_alloc (mode:0x2408240) [ 4796.772477] XFS: fs_mark(3424) possible memory allocation deadlock size 46936 in kmem_alloc (mode:0x2408240) [ 4796.775329] XFS: fs_mark(3423) possible memory allocation deadlock size 51416 in kmem_alloc (mode:0x2408240) [ 4797.388808] XFS: fs_mark(3424) possible memory allocation deadlock size 65728 in kmem_alloc (mode:0x2408240) * Patched kernel FSUse% Count Size Files/sec App Overhead 1 1600000 0 4289.1 19243934 3 3200000 0 4241.6 32828865 5 4800000 0 4248.7 32884693 6 6400000 0 4314.4 19608921 8 8000000 0 4269.9 24953292 9 9600000 0 4270.7 33235572 11 11200000 0 4346.4 40817101 13 12800000 0 4285.3 29972397 14 14400000 0 4297.2 20539765 16 16000000 0 4219.6 18596767 18 17600000 0 4273.8 49611187 19 19200000 0 4300.4 27944451 21 20800000 0 4270.6 22324585 22 22400000 0 4317.6 22650382 24 24000000 0 4065.2 22297964 So the dropdown at Count 19200000 didn't happen and there was only a single warning about allocation not making progress [ 3063.815003] XFS: fs_mark(3272) possible memory allocation deadlock size 65624 in kmem_alloc (mode:0x2408240) This suggests that the patch has helped even though there is not all that much of anonymous memory as the workload mostly generates fs metadata. I assume the success rate would be higher with more anonymous memory which should be the case in many workloads. [akpm@linux-foundation.org: fix comment] Link: http://lkml.kernel.org/r/20161012114721.31853-1-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Joonsoo Kim <js1304@gmail.com> Cc: Dave Chinner <david@fromorbit.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-15kernel/watchdog: use nmi registers snapshot in hardlockup handlerKonstantin Khlebnikov1-1/+0
NMI handler doesn't call set_irq_regs(), it's set only by normal IRQ. Thus get_irq_regs() returns NULL or stale registers snapshot with IP/SP pointing to the code interrupted by IRQ which was interrupted by NMI. NULL isn't a problem: in this case watchdog calls dump_stack() and prints full stack trace including NMI. But if we're stuck in IRQ handler then NMI watchlog will print stack trace without IRQ part at all. This patch uses registers snapshot passed into NMI handler as arguments: these registers point exactly to the instruction interrupted by NMI. Fixes: 55537871ef66 ("kernel/watchdog.c: perform all-CPU backtrace in case of hard lockup") Link: http://lkml.kernel.org/r/146771764784.86724.6006627197118544150.stgit@buzz Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Ulrich Obergfell <uobergfe@redhat.com> Cc: Aaron Tomlin <atomlin@redhat.com> Cc: <stable@vger.kernel.org> [4.4+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-15btrfs: better handle btrfs_printk() defaultsPetr Mladek1-9/+3
Commit 262c5e86fec7 ("printk/btrfs: handle more message headers") triggers: warning: `ratelimit' may be used uninitialized in this function with gcc (4.1.2) and probably many other versions. The code actually is correct but a bit twisted. Let's make it more straightforward and set the default values at the beginning. Link: http://lkml.kernel.org/r/20161213135246.GQ3506@pathway.suse.cz Signed-off-by: Petr Mladek <pmladek@suse.com> Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: David Sterba <dsterba@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-15Merge branch 'for-linus' of ↵Linus Torvalds19-43/+139
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull namespace updates from Eric Biederman: "After a lot of discussion and work we have finally reachanged a basic understanding of what is necessary to make unprivileged mounts safe in the presence of EVM and IMA xattrs which the last commit in this series reflects. While technically it is a revert the comments it adds are important for people not getting confused in the future. Clearing up that confusion allows us to seriously work on unprivileged mounts of fuse in the next development cycle. The rest of the fixes in this set are in the intersection of user namespaces, ptrace, and exec. I started with the first fix which started a feedback cycle of finding additional issues during review and fixing them. Culiminating in a fix for a bug that has been present since at least Linux v1.0. Potentially these fixes were candidates for being merged during the rc cycle, and are certainly backport candidates but enough little things turned up during review and testing that I decided they should be handled as part of the normal development process just to be certain there were not any great surprises when it came time to backport some of these fixes" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: Revert "evm: Translate user/group ids relative to s_user_ns when computing HMAC" exec: Ensure mm->user_ns contains the execed files ptrace: Don't allow accessing an undumpable mm ptrace: Capture the ptracer's creds not PT_PTRACE_CAP mm: Add a user_ns owner to mm_struct and fix ptrace permission checks
2016-12-15Merge branch 'stable-4.10' of git://git.infradead.org/users/pcmoore/auditLinus Torvalds9-243/+361
Pull audit updates from Paul Moore: "After the small number of patches for v4.9, we've got a much bigger pile for v4.10. The bulk of these patches involve a rework of the audit backlog queue to enable us to move the netlink multicasting out of the task/thread that generates the audit record and into the kernel thread that emits the record (just like we do for the audit unicast to auditd). While we were playing with the backlog queue(s) we fixed a number of other little problems with the code, and from all the testing so far things look to be in much better shape now. Doing this also allowed us to re-enable disabling IRQs for some netns operations ("netns: avoid disabling irq for netns id"). The remaining patches fix some small problems that are well documented in the commit descriptions, as well as adding session ID filtering support" * 'stable-4.10' of git://git.infradead.org/users/pcmoore/audit: audit: use proper refcount locking on audit_sock netns: avoid disabling irq for netns id audit: don't ever sleep on a command record/message audit: handle a clean auditd shutdown with grace audit: wake up kauditd_thread after auditd registers audit: rework audit_log_start() audit: rework the audit queue handling audit: rename the queues and kauditd related functions audit: queue netlink multicast sends just like we do for unicast sends audit: fixup audit_init() audit: move kaudit thread start from auditd registration to kaudit init (#2) audit: add support for session ID user filter audit: fix formatting of AUDIT_CONFIG_CHANGE events audit: skip sessionid sentinel value when auto-incrementing audit: tame initialization warning len_abuf in audit_log_execve_info audit: less stack usage for /proc/*/loginuid
2016-12-15Merge branch 'next' of ↵Linus Torvalds43-567/+832
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull security subsystem updates from James Morris: "Generally pretty quiet for this release. Highlights: Yama: - allow ptrace access for original parent after re-parenting TPM: - add documentation - many bugfixes & cleanups - define a generic open() method for ascii & bios measurements Integrity: - Harden against malformed xattrs SELinux: - bugfixes & cleanups Smack: - Remove unnecessary smack_known_invalid label - Do not apply star label in smack_setprocattr hook - parse mnt opts after privileges check (fixes unpriv DoS vuln)" * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (56 commits) Yama: allow access for the current ptrace parent tpm: adjust return value of tpm_read_log tpm: vtpm_proxy: conditionally call tpm_chip_unregister tpm: Fix handling of missing event log tpm: Check the bios_dir entry for NULL before accessing it tpm: return -ENODEV if np is not set tpm: cleanup of printk error messages tpm: replace of_find_node_by_name() with dev of_node property tpm: redefine read_log() to handle ACPI/OF at runtime tpm: fix the missing .owner in tpm_bios_measurements_ops tpm: have event log use the tpm_chip tpm: drop tpm1_chip_register(/unregister) tpm: replace dynamically allocated bios_dir with a static array tpm: replace symbolic permission with octal for securityfs files char: tpm: fix kerneldoc tpm2_unseal_trusted name typo tpm_tis: Allow tpm_tis to be bound using DT tpm, tpm_vtpm_proxy: add kdoc comments for VTPM_PROXY_IOC_NEW_DEV tpm: Only call pm_runtime_get_sync if device has a parent tpm: define a generic open() method for ascii & bios measurements Documentation: tpm: add the Physical TPM device tree binding documentation ...
2016-12-15Merge branch 'linus' of ↵Linus Torvalds151-4462/+15711
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto updates from Herbert Xu: "Here is the crypto update for 4.10: API: - add skcipher walk interface - add asynchronous compression (acomp) interface - fix algif_aed AIO handling of zero buffer Algorithms: - fix unaligned access in poly1305 - fix DRBG output to large buffers Drivers: - add support for iMX6UL to caam - fix givenc descriptors (used by IPsec) in caam - accelerated SHA256/SHA512 for ARM64 from OpenSSL - add SSE CRCT10DIF and CRC32 to ARM/ARM64 - add AEAD support to Chelsio chcr - add Armada 8K support to omap-rng" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (148 commits) crypto: testmgr - fix overlap in chunked tests again crypto: arm/crc32 - accelerated support based on x86 SSE implementation crypto: arm64/crc32 - accelerated support based on x86 SSE implementation crypto: arm/crct10dif - port x86 SSE implementation to ARM crypto: arm64/crct10dif - port x86 SSE implementation to arm64 crypto: testmgr - add/enhance test cases for CRC-T10DIF crypto: testmgr - avoid overlap in chunked tests crypto: chcr - checking for IS_ERR() instead of NULL crypto: caam - check caam_emi_slow instead of re-lookup platform crypto: algif_aead - fix AIO handling of zero buffer crypto: aes-ce - Make aes_simd_algs static crypto: algif_skcipher - set error code when kcalloc fails crypto: caam - make aamalg_desc a proper module crypto: caam - pass key buffers with typesafe pointers crypto: arm64/aes-ce-ccm - Fix AEAD decryption length MAINTAINERS: add crypto headers to crypto entry crypt: doc - remove misleading mention of async API crypto: doc - fix header file name crypto: api - fix comment typo crypto: skcipher - Add separate walker for AEAD decryption ..
2016-12-14blk-mq: Fix failed allocation path when mapping queuesGabriel Krisman Bertazi1-5/+21
In blk_mq_map_swqueue, there is a memory optimization that frees the tags of a queue that has gone unmapped. Later, if that hctx is remapped after another topology change, the tags need to be reallocated. If this allocation fails, a simple WARN_ON triggers, but the block layer ends up with an active hctx without any corresponding set of tags. Then, any income IO to that hctx can trigger an Oops. I can reproduce it consistently by running IO, flipping CPUs on and off and eventually injecting a memory allocation failure in that path. In the fix below, if the system experiences a failed allocation of any hctx's tags, we remap all the ctxs of that queue to the hctx_0, which should always keep it's tags. There is a minor performance hit, since our mapping just got worse after the error path, but this is the simplest solution to handle this error path. The performance hit will disappear after another successful remap. I considered dropping the memory optimization all together, but it seemed a bad trade-off to handle this very specific error case. This should apply cleanly on top of Jens' for-next branch. The Oops is the one below: SP (3fff935ce4d0) is in userspace 1:mon> e cpu 0x1: Vector: 300 (Data Access) at [c000000fe99eb110] pc: c0000000005e868c: __sbitmap_queue_get+0x2c/0x180 lr: c000000000575328: __bt_get+0x48/0xd0 sp: c000000fe99eb390 msr: 900000010280b033 dar: 28 dsisr: 40000000 current = 0xc000000fe9966800 paca = 0xc000000007e80300 softe: 0 irq_happened: 0x01 pid = 11035, comm = aio-stress Linux version 4.8.0-rc6+ (root@bean) (gcc version 5.4.0 20160609 (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.2) ) #3 SMP Mon Oct 10 20:16:53 CDT 2016 1:mon> s [c000000fe99eb3d0] c000000000575328 __bt_get+0x48/0xd0 [c000000fe99eb400] c000000000575838 bt_get.isra.1+0x78/0x2d0 [c000000fe99eb480] c000000000575cb4 blk_mq_get_tag+0x44/0x100 [c000000fe99eb4b0] c00000000056f6f4 __blk_mq_alloc_request+0x44/0x220 [c000000fe99eb500] c000000000570050 blk_mq_map_request+0x100/0x1f0 [c000000fe99eb580] c000000000574650 blk_mq_make_request+0xf0/0x540 [c000000fe99eb640] c000000000561c44 generic_make_request+0x144/0x230 [c000000fe99eb690] c000000000561e00 submit_bio+0xd0/0x200 [c000000fe99eb740] c0000000003ef740 ext4_io_submit+0x90/0xb0 [c000000fe99eb770] c0000000003e95d8 ext4_writepages+0x588/0xdd0 [c000000fe99eb910] c00000000025a9f0 do_writepages+0x60/0xc0 [c000000fe99eb940] c000000000246c88 __filemap_fdatawrite_range+0xf8/0x180 [c000000fe99eb9e0] c000000000246f90 filemap_write_and_wait_range+0x70/0xf0 [c000000fe99eba20] c0000000003dd844 ext4_sync_file+0x214/0x540 [c000000fe99eba80] c000000000364718 vfs_fsync_range+0x78/0x130 [c000000fe99ebad0] c0000000003dd46c ext4_file_write_iter+0x35c/0x430 [c000000fe99ebb90] c00000000038c280 aio_run_iocb+0x3b0/0x450 [c000000fe99ebce0] c00000000038dc28 do_io_submit+0x368/0x730 [c000000fe99ebe30] c000000000009404 system_call+0x38/0xec Signed-off-by: Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com> Cc: Brian King <brking@linux.vnet.ibm.com> Cc: Douglas Miller <dougmill@linux.vnet.ibm.com> Cc: linux-block@vger.kernel.org Cc: linux-scsi@vger.kernel.org Reviewed-by: Douglas Miller <dougmill@linux.vnet.ibm.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-12-14vfs,mm: fix return value of read() at s_maxbytesLinus Torvalds1-1/+1
We truncated the possible read iterator to s_maxbytes in commit c2a9737f45e2 ("vfs,mm: fix a dead loop in truncate_inode_pages_range()"), but our end condition handling was wrong: it's not an error to try to read at the end of the file. Reading past the end should return EOF (0), not EINVAL. See for example https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1649342 http://lists.gnu.org/archive/html/bug-coreutils/2016-12/msg00008.html where a md5sum of a maximally sized file fails because the final read is exactly at s_maxbytes. Fixes: c2a9737f45e2 ("vfs,mm: fix a dead loop in truncate_inode_pages_range()") Reported-by: Joseph Salisbury <joseph.salisbury@canonical.com> Cc: Wei Fang <fangwei1@huawei.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-14Merge tag 'acpi-urgent-4.10-rc1' of ↵Linus Torvalds1-4/+8
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull two ACPI CPPC fixes from Rafael Wysocki: "One of them fixes a crash in KVM encountered by Sebastian in linux-next and introduced by a recent intel_pstate change that caused the driver to use the ACPI CPPC code and uncovered a missing NULL pointer check in it. The other one fixes a possible use-after-free in the same code area. Summary: - Fix a crash in KVM encountered in linux-next and introduced by a recent intel_pstate change that caused the driver to use the ACPI CPPC code and uncovered a missing NULL pointer check in it (Sebastian Andrzej Siewior). - Fix a possible use-after-free in the same code area (Rafael Wysocki)" * tag 'acpi-urgent-4.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPI / CPPC: Fix per-CPU pointer management in acpi_cppc_processor_probe() ACPI / CPPC: Fix crash in acpi_cppc_processor_exit()
2016-12-14Merge tag 'for-v4.10' of ↵Linus Torvalds21-65/+217
git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply Pull power supply and reset updates from Sebastian Reichel: - new driver for Intel PIIX4 - lots of module autoload fixes - misc fixes * tag 'for-v4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply: power_supply: wm97xx_battery: use power_supply_get_drvdata wm8350_power: use permission-specific DEVICE_ATTR variants power: ipaq_micro_battery: fix alias power: supply: bq27xxx_battery: Fix register map for BQ27510 and BQ27520 bq24190_charger: Fix PM runtime use for bq24190_battery_set_property power: supply: lp8788: remove an unneeded NULL check power: reset: zx-reboot: Fix module autoload power: reset: syscon-reboot-mode: Fix module autoload power: reset: at91-poweroff: Fix module autoload power: reset: at91-reset: Fix module autoload power: supply: axp288_fuel_gauge: Fix module autoload power: supply: max8997_charger: Fix module autoload power: supply: max17040: Change register transaction length from 8 bits to 16 bits power: supply: bq27xxx_battery: don't update poll_interval param if same power: supply: improve function-level documentation power: reset: Add Intel PIIX4 poweroff driver
2016-12-14Merge tag 'sound-4.10-rc1' of ↵Linus Torvalds269-4562/+22270
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound updates from Takashi Iwai: "No dramatic changes are found in this development cycle, but as usual, many commits are applied in a wide range of drivers. Most of big changes are in ASoC, where a few bits of framework work and quite a lot of cleanups and improvements to existing code have been done. The rest are usual stuff, a few HD-audio and USB-audio quirks and fixes, as well as the drop of kthread usages in the whole subsystem. Below are some highlights: ASoC: - support for stereo DAPM controls - some initial work on the of-graph sound card - regmap conversions of the remaining AC'97 drivers - a new version of the topology ABI; this should be backward compatible - updates / cleanups of rsnd, sunxi, sti, nau8825, samsung, arizona, Intel skylake, atom-sst - new drivers for Cirrus Logic CS42L42, Qualcomm MSM8916-WCD, and Realtek RT5665 USB-audio: - yet another race fix at disconnection - tolerated packet size calculation for some Android devices - quirks for Axe-Fx II, QuickCam, TEAC 501/503 HD-audio: - improvement of Dell pin fixup mapping - quirks for HP Z1 Gen3, Alienware 15 R2 2016 and ALC622 headset mic Misc: - replace all kthread usages with simple works" * tag 'sound-4.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (296 commits) ALSA: hiface: Fix M2Tech hiFace driver sampling rate change ALSA: usb-audio: Eliminate noise at the start of DSD playback. ALSA: usb-audio: Add native DSD support for TEAC 501/503 DAC ASoC: wm_adsp: wm_adsp_buf_alloc should use kfree in error path ASoC: topology: avoid uninitialized kcontrol_type ALSA: usb-audio: Add QuickCam Communicate Deluxe/S7500 to volume_control_quirks ALSA: usb-audio: add implicit fb quirk for Axe-Fx II ASoC: zte: spdif: correct ZX_SPDIF_CLK_RAT define ASoC: zte: spdif and i2s drivers are not zx296702 specific ASoC: rsnd: setup BRGCKR/BRRA/BRRB when starting ASoC: rsnd: enable/disable ADG when suspend/resume timing ASoC: rsnd: tidyup ssi->usrcnt counter check in hw_params ALSA: cs46xx: add a new line ASoC: Intel: update bxt_da7219_max98357a to support quad ch dmic capture ASoC: nau8825: disable sinc filter for high THD of ADC ALSA: usb-audio: more tolerant packetsize ALSA: usb-audio: avoid setting of sample rate multiple times on bus ASoC: cs35l34: Simplify the logic to set CS35L34_MCLK_CTL setting ALSA: hda - Gate the mic jack on HP Z1 Gen3 AiO ALSA: hda: when comparing pin configurations, ignore assoc in addition to seq ...
2016-12-14Merge branch 'for-linus' of ↵Linus Torvalds31-39/+36
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial updates from Jiri Kosina. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: NTB: correct ntb_spad_count comment typo misc: ibmasm: fix typo in error message Remove references to dead make variable LINUX_INCLUDE Remove last traces of ikconfig.h treewide: Fix printk() message errors Documentation/device-mapper: s/getsize/getsz/
2016-12-14Merge branch 'for-linus' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching Pull livepatch update from Jiri Kosina: "This is just a small documentation update (as the work on the hybrid model is still underway)" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching: Documentation/livepatch: Fix stale link to gmame
2016-12-14Merge branch 'for-linus' of ↵Linus Torvalds26-484/+2440
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid Pull HID updates from Jiri Kosina: - support for new Wacom "MobileStudio Pro" class of tablets from Jason Gerecke - Microsoft Surface 3 support from Benjamin Tissoires and Microsoft Surface 4 support from Daniel Keller - uDraw PS3 tablet support from Bastien Nocera - timeout scheduling fixes for intel-ish-hid from Even Xu - HID_QUIRK_MULTI_INPUT in order to simplify LED handling from Benjamin Tissoires - support for Sony DS4 dongle and various other fixes to Sony driver from Roderick Colenbrander - other assorted smaller fixes and device ID additions * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid: (63 commits) HID: fix missing irq field HID: i2c-hid: fix build HID: i2c-hid: Disable IRQ before freeing buffers HID: usbhid: fix improper return value HID: wacom: generic: Don't sync input on empty input packets HID: wacom: generic: Pad supports more than buttons HID: wacom: generic: Send data only when the interface is defined HID: wacom: generic: Don't return a value for wacom_wac_event HID: intel_ish-hid: use %pUL for uuid formatting HID: cp2112: explicitly require irqchip support in gpiolib HID: asus: Add i2c touchpad support HID: intel-ish-hid: Fix potential race condition HID: sony: Support DS4 dongle HID: sony: Comply to Linux gamepad spec for DS4 HID: sony: Make the DS4 touchpad a separate device HID: sony: Fix memory issue when connecting device using both Bluetooth and USB HID: cp2112: add IRQ chip handling HID: i2c-hid: force the IRQ level trigger only when not set HID: multitouch: do not retrieve all reports for all devices HID: multitouch: enable the Surface 3 Type Cover to report multitouch data ...
2016-12-14Merge tag 'dm-4.10-changes' of ↵Linus Torvalds22-200/+461
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper updates from Mike Snitzer: - various fixes and improvements to request-based DM and DM multipath - some locking improvements in DM bufio - add Kconfig option to disable the DM block manager's extra locking which mainly serves as a developer tool - a few bug fixes to DM's persistent-data - a couple changes to prepare for multipage biovec support in the block layer - various improvements and cleanups in the DM core, DM cache, DM raid and DM crypt - add ability to have DM crypt use keys from the kernel key retention service - add a new "error_writes" feature to the DM flakey target, reads are left unchanged in this mode * tag 'dm-4.10-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (40 commits) dm flakey: introduce "error_writes" feature dm cache policy smq: use hash_32() instead of hash_32_generic() dm crypt: reject key strings containing whitespace chars dm space map: always set ev if sm_ll_mutate() succeeds dm space map metadata: skip useless memcpy in metadata_ll_init_index() dm space map metadata: fix 'struct sm_metadata' leak on failed create Documentation: dm raid: define data_offset status field dm raid: fix discard support regression dm raid: don't allow "write behind" with raid4/5/6 dm mpath: use hw_handler_params if attached hw_handler is same as requested dm crypt: add ability to use keys from the kernel key retention service dm array: remove a dead assignment in populate_ablock_with_values() dm ioctl: use offsetof() instead of open-coding it dm rq: simplify use_blk_mq initialization dm: use blk_set_queue_dying() in __dm_destroy() dm bufio: drop the lock when doing GFP_NOIO allocation dm bufio: don't take the lock in dm_bufio_shrink_count dm bufio: avoid sleeping while holding the dm_bufio lock dm table: simplify dm_table_determine_type() dm table: an 'all_blk_mq' table must be loaded for a blk-mq DM device ...
2016-12-14Merge branch 'for-linus' of ↵Linus Torvalds16-1236/+3403
git://git.kernel.org/pub/scm/linux/kernel/git/shli/md Pull MD updates from Shaohua Li: - a raid5 writeback cache feature. The goal is to aggregate writes to make full stripe write and reduce read-modify-write. It's helpful for workload which does sequential write and follows fsync for example. This feature is experimental and off by default right now. - FAILFAST support. This fails IOs to broken raid disks quickly, so can improve latency. It's mainly for DASD storage, but some patches help normal raid array too. - support bad block for raid array with external metadata - AVX2 instruction support for raid6 parity calculation - normalize MD info output - add missing blktrace - other bug fixes * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md: (66 commits) md: separate flags for superblock changes md: MD_RECOVERY_NEEDED is set for mddev->recovery md: takeover should clear unrelated bits md/r5cache: after recovery, increase journal seq by 10000 md/raid5-cache: fix crc in rewrite_data_only_stripes() md/raid5-cache: no recovery is required when create super-block md: fix refcount problem on mddev when stopping array. md/r5cache: do r5c_update_log_state after log recovery md/raid5-cache: adjust the write position of the empty block if no data blocks md/r5cache: run_no_space_stripes() when R5C_LOG_CRITICAL == 0 md/raid5: limit request size according to implementation limits md/raid5-cache: do not need to set STRIPE_PREREAD_ACTIVE repeatedly md/raid5-cache: remove the unnecessary next_cp_seq field from the r5l_log md/raid5-cache: release the stripe_head at the appropriate location md/raid5-cache: use ring add to prevent overflow md/raid5-cache: remove unnecessary function parameters raid5-cache: don't set STRIPE_R5C_PARTIAL_STRIPE flag while load stripe into cache raid5-cache: add another check conditon before replaying one stripe md/r5cache: enable IRQs on error path md/r5cache: handle alloc_page failure ...
2016-12-14Merge tag 'mmc-v4.10-2' of ↵Linus Torvalds12-88/+71
git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc Pull another MMC update from Ulf Hansson: "Here's a second pull request for MMC for v4.10. As a matter of fact it's only one change that moves some mmc files around. I thought it was a good idea to get this into v4.10, as it gives us a nice and fresh base for v4.11. Summary: MMC core: - Move files from the card directory to the core directory to enable future clean-ups of the generic mmc header files and interfaces" * tag 'mmc-v4.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: mmc: block: Move files to core
2016-12-14Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds142-4470/+5358
Pull SCSI updates from James Bottomley: "This update includes the usual round of major driver updates (ncr5380, lpfc, hisi_sas, megaraid_sas, ufs, ibmvscsis, mpt3sas). There's also an assortment of minor fixes, mostly in error legs or other not very user visible stuff. The major change is the pci_alloc_irq_vectors replacement for the old pci_msix_.. calls; this effectively makes IRQ mapping generic for the drivers and allows blk_mq to use the information" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (256 commits) scsi: qla4xxx: switch to pci_alloc_irq_vectors scsi: hisi_sas: support deferred probe for v2 hw scsi: megaraid_sas: switch to pci_alloc_irq_vectors scsi: scsi_devinfo: remove synchronous ALUA for NETAPP devices scsi: be2iscsi: set errno on error path scsi: be2iscsi: set errno on error path scsi: hpsa: fallback to use legacy REPORT PHYS command scsi: scsi_dh_alua: Fix RCU annotations scsi: hpsa: use %phN for short hex dumps scsi: hisi_sas: fix free'ing in probe and remove scsi: isci: switch to pci_alloc_irq_vectors scsi: ipr: Fix runaway IRQs when falling back from MSI to LSI scsi: dpt_i2o: double free on error path scsi: cxlflash: Migrate scsi command pointer to AFU command scsi: cxlflash: Migrate IOARRIN specific routines to function pointers scsi: cxlflash: Cleanup queuecommand() scsi: cxlflash: Cleanup send_tmf() scsi: cxlflash: Remove AFU command lock scsi: cxlflash: Wait for active AFU commands to timeout upon tear down scsi: cxlflash: Remove private command pool ...
2016-12-14Merge tag 'configfs-for-4.10' of git://git.infradead.org/users/hch/configfsLinus Torvalds6-37/+17
Pull configfs update from Christoph Hellwig: "Just one simple change from Andrzej to drop the pointless return value from the ->drop_link method" * tag 'configfs-for-4.10' of git://git.infradead.org/users/hch/configfs: fs: configfs: don't return anything from drop_link
2016-12-14audit: use proper refcount locking on audit_sockRichard Guy Briggs1-6/+24
Resetting audit_sock appears to be racy. audit_sock was being copied and dereferenced without using a refcount on the source sock. Bump the refcount on the underlying sock when we store a refrence in audit_sock and release it when we reset audit_sock. audit_sock modification needs the audit_cmd_mutex. See: https://lkml.org/lkml/2016/11/26/232 Thanks to Eric Dumazet <edumazet@google.com> and Cong Wang <xiyou.wangcong@gmail.com> on ideas how to fix it. Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com> [PM: fixed the comment block text formatting for auditd_reset()] Signed-off-by: Paul Moore <paul@paul-moore.com>
2016-12-14netns: avoid disabling irq for netns idPaul Moore1-20/+15
Bring back commit bc51dddf98c9 ("netns: avoid disabling irq for netns id") now that we've fixed some audit multicast issues that caused problems with original attempt. Additional information, and history, can be found in the links below: * https://github.com/linux-audit/audit-kernel/issues/22 * https://github.com/linux-audit/audit-kernel/issues/23 Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2016-12-14audit: don't ever sleep on a command record/messagePaul Moore1-5/+13
Sleeping on a command record/message in audit_log_start() could slow something, e.g. auditd, from doing something important, e.g. clean shutdown, which could present problems on a heavily loaded system. This patch allows tasks to bypass any queue restrictions if they are logging a command record/message. Signed-off-by: Paul Moore <paul@paul-moore.com>
2016-12-14audit: handle a clean auditd shutdown with gracePaul Moore1-0/+6
When auditd stops cleanly it sets 'auditd_pid' to 0 with an AUDIT_SET message, in this case we should reset our backlog queues via the auditd_reset() function. This patch also adds a 'auditd_pid' check to the top of kauditd_send_unicast_skb() so we can fail quicker. Signed-off-by: Paul Moore <paul@paul-moore.com>
2016-12-14audit: wake up kauditd_thread after auditd registersPaul Moore1-0/+1
This patch was suggested by Richard Briggs back in 2015, see the link to the mail archive below. Unfortunately, that patch is no longer even remotely valid due to other changes to the code. * https://www.redhat.com/archives/linux-audit/2015-October/msg00075.html Suggested-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2016-12-14audit: rework audit_log_start()Paul Moore1-56/+36
The backlog queue handling in audit_log_start() is a little odd with some questionable design decisions, this patch attempts to rectify this with the following changes: * Never make auditd wait, ignore any backlog limits as we need auditd awake so it can drain the backlog queue. * When we hit a backlog limit and start dropping records, don't wake all the tasks sleeping on the backlog, that's silly. Instead, let kauditd_thread() take care of waking everyone once it has had a chance to drain the backlog queue. * Don't keep a global backlog timeout countdown, make it per-task. A per-task timer means we won't have all the sleeping tasks waking at the same time and hammering on an already stressed backlog queue. Signed-off-by: Paul Moore <paul@paul-moore.com>
2016-12-14audit: rework the audit queue handlingPaul Moore1-121/+226
The audit record backlog queue has always been a bit of a mess, and the moving the multicast send into kauditd_thread() from audit_log_end() only makes things worse. This patch attempts to fix the backlog queue with a better design that should hold up better under load and have less of a performance impact at syscall invocation time. While it looks like there is a log going on in this patch, the main change is the move from a single backlog queue to three queues: * A queue for holding records generated from audit_log_end() that haven't been consumed by kauditd_thread() (audit_queue). * A queue for holding records that have been sent via multicast but had a temporary failure when sending via unicast and need a resend (audit_retry_queue). * A queue for holding records that haven't been sent via unicast because no one is listening (audit_hold_queue). Special care is taken in this patch to ensure that the proper record ordering is preserved, e.g. we send everything in the hold queue first, then the retry queue, and finally the main queue. Signed-off-by: Paul Moore <paul@paul-moore.com>
2016-12-14audit: rename the queues and kauditd related functionsPaul Moore1-20/+20
The audit queue names can be shortened and the record sending helpers associated with the kauditd task could be named better, do these small cleanups now to make life easier once we start reworking the queues and kauditd code. Signed-off-by: Paul Moore <paul@paul-moore.com>
2016-12-14audit: queue netlink multicast sends just like we do for unicast sendsPaul Moore1-35/+35
Sending audit netlink multicast messages is bad for all the same reasons that sending audit netlink unicast messages is bad, so this patch reworks things so that we don't do the multicast send in audit_log_end(), we do it from the dedicated kauditd_thread thread just as we do for unicast messages. See the GitHub issues below for more information/history: * https://github.com/linux-audit/audit-kernel/issues/23 * https://github.com/linux-audit/audit-kernel/issues/22 Signed-off-by: Paul Moore <paul@paul-moore.com>
2016-12-14audit: fixup audit_init()Paul Moore1-6/+8
Make sure everything is initialized before we start the kauditd_thread and don't emit the "initialized" record until everything is finished. We also panic with a descriptive message if we can't start the kauditd_thread. Signed-off-by: Paul Moore <paul@paul-moore.com>
2016-12-14audit: move kaudit thread start from auditd registration to kaudit init (#2)Richard Guy Briggs1-10/+4
Richard made this change some time ago but Eric backed it out because the rest of the supporting code wasn't ready. In order to move the netlink multicast send to kauditd_thread we need to ensure the kauditd_thread is always running, so restore commit 6ff5e459 ("audit: move kaudit thread start from auditd registration to kaudit init"). Signed-off-by: Richard Guy Briggs <rbriggs@redhat.com> [PM: brought forward and merged based on Richard's old patch] Signed-off-by: Paul Moore <paul@paul-moore.com>
2016-12-14Merge tag 'ext4_for_linus' of ↵Linus Torvalds41-1302/+1450
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: "This merge request includes the dax-4.0-iomap-pmd branch which is needed for both ext4 and xfs dax changes to use iomap for DAX. It also includes the fscrypt branch which is needed for ubifs encryption work as well as ext4 encryption and fscrypt cleanups. Lots of cleanups and bug fixes, especially making sure ext4 is robust against maliciously corrupted file systems --- especially maliciously corrupted xattr blocks and a maliciously corrupted superblock. Also fix ext4 support for 64k block sizes so it works well on ppcle. Fixed mbcache so we don't miss some common xattr blocks that can be merged" * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (86 commits) dax: Fix sleep in atomic contex in grab_mapping_entry() fscrypt: Rename FS_WRITE_PATH_FL to FS_CTX_HAS_BOUNCE_BUFFER_FL fscrypt: Delay bounce page pool allocation until needed fscrypt: Cleanup page locking requirements for fscrypt_{decrypt,encrypt}_page() fscrypt: Cleanup fscrypt_{decrypt,encrypt}_page() fscrypt: Never allocate fscrypt_ctx on in-place encryption fscrypt: Use correct index in decrypt path. fscrypt: move the policy flags and encryption mode definitions to uapi header fscrypt: move non-public structures and constants to fscrypt_private.h fscrypt: unexport fscrypt_initialize() fscrypt: rename get_crypt_info() to fscrypt_get_crypt_info() fscrypto: move ioctl processing more fully into common code fscrypto: remove unneeded Kconfig dependencies MAINTAINERS: fscrypto: recommend linux-fsdevel for fscrypto patches ext4: do not perform data journaling when data is encrypted ext4: return -ENOMEM instead of success ext4: reject inodes with negative size ext4: remove another test in ext4_alloc_file_blocks() Documentation: fix description of ext4's block_validity mount option ext4: fix checks for data=ordered and journal_async_commit options ...
2016-12-14Merge tag 'for-f2fs-4.10' of ↵Linus Torvalds22-476/+1048
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "This patch series contains several performance tuning patches regarding to the IO submission flow, in addition to supporting new features such as a ZBC-base drive and multiple devices. It also includes some major bug fixes such as: - checkpoint version control - fdatasync-related roll-forward recovery routine - memory boundary or null-pointer access in corner cases - missing error cases It has various minor clean-up patches as well" * tag 'for-f2fs-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (66 commits) f2fs: fix a missing size change in f2fs_setattr f2fs: fix to access nullified flush_cmd_control pointer f2fs: free meta pages if sanity check for ckpt is failed f2fs: detect wrong layout f2fs: call sync_fs when f2fs is idle Revert "f2fs: use percpu_counter for # of dirty pages in inode" f2fs: return AOP_WRITEPAGE_ACTIVATE for writepage f2fs: do not activate auto_recovery for fallocated i_size f2fs: fix to determine start_cp_addr by sbi->cur_cp_pack f2fs: fix 32-bit build f2fs: set ->owner for debugfs status file's file_operations f2fs: fix incorrect free inode count in ->statfs f2fs: drop duplicate header timer.h f2fs: fix wrong AUTO_RECOVER condition f2fs: do not recover i_size if it's valid f2fs: fix fdatasync f2fs: fix to account total free nid correctly f2fs: fix an infinite loop when flush nodes in cp f2fs: don't wait writeback for datas during checkpoint f2fs: fix wrong written_valid_blocks counting ...
2016-12-14Merge tag 'dlm-4.10' of ↵Linus Torvalds9-20/+22
git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm Pull dlm fixes from David Teigland: "This set fixes error reporting for dlm sockets, removes the unbound property on the dlm callback workqueue to improve performance, and includes a couple trivial changes" * tag 'dlm-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm: dlm: fix error return code in sctp_accept_from_sock() dlm: don't specify WQ_UNBOUND for the ast callback workqueue dlm: remove lock_sock to avoid scheduling while atomic dlm: don't save callbacks after accept dlm: audit and remove any unnecessary uses of module.h dlm: make genl_ops const