summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)AuthorFilesLines
2016-10-08mm, compaction: restrict full priority to non-costly ordersVlastimil Babka1-0/+1
The new ultimate compaction priority disables some heuristics, which may result in excessive cost. This is fine for non-costly orders where we want to try hard before resulting for OOM, but might be disruptive for costly orders which do not trigger OOM and should generally have some fallback. Thus, we disable the full priority for costly orders. Suggested-by: Michal Hocko <mhocko@kernel.org> Link: http://lkml.kernel.org/r/20160906135258.18335-4-vbabka@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Cc: Michal Hocko <mhocko@kernel.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: David Rientjes <rientjes@google.com> Cc: Rik van Riel <riel@redhat.com> Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-08mm: remove page_file_indexHuang Ying2-13/+1
After using the offset of the swap entry as the key of the swap cache, the page_index() becomes exactly same as page_file_index(). So the page_file_index() is removed and the callers are changed to use page_index() instead. Link: http://lkml.kernel.org/r/1473270649-27229-2-git-send-email-ying.huang@intel.com Signed-off-by: "Huang, Ying" <ying.huang@intel.com> Cc: Trond Myklebust <trond.myklebust@primarydata.com> Cc: Anna Schumaker <anna.schumaker@netapp.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-08mm, swap: use offset of swap entry as key of swap cacheHuang Ying1-4/+4
This patch is to improve the performance of swap cache operations when the type of the swap device is not 0. Originally, the whole swap entry value is used as the key of the swap cache, even though there is one radix tree for each swap device. If the type of the swap device is not 0, the height of the radix tree of the swap cache will be increased unnecessary, especially on 64bit architecture. For example, for a 1GB swap device on the x86_64 architecture, the height of the radix tree of the swap cache is 11. But if the offset of the swap entry is used as the key of the swap cache, the height of the radix tree of the swap cache is 4. The increased height causes unnecessary radix tree descending and increased cache footprint. This patch reduces the height of the radix tree of the swap cache via using the offset of the swap entry instead of the whole swap entry value as the key of the swap cache. In 32 processes sequential swap out test case on a Xeon E5 v3 system with RAM disk as swap, the lock contention for the spinlock of the swap cache is reduced from 20.15% to 12.19%, when the type of the swap device is 1. Use the whole swap entry as key, perf-profile.calltrace.cycles-pp._raw_spin_lock_irq.__add_to_swap_cache.add_to_swap_cache.add_to_swap.shrink_page_list: 10.37, perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.__remove_mapping.shrink_page_list.shrink_inactive_list.shrink_node_memcg: 9.78, Use the swap offset as key, perf-profile.calltrace.cycles-pp._raw_spin_lock_irq.__add_to_swap_cache.add_to_swap_cache.add_to_swap.shrink_page_list: 6.25, perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.__remove_mapping.shrink_page_list.shrink_inactive_list.shrink_node_memcg: 5.94, Link: http://lkml.kernel.org/r/1473270649-27229-1-git-send-email-ying.huang@intel.com Signed-off-by: "Huang, Ying" <ying.huang@intel.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Hugh Dickins <hughd@google.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Minchan Kim <minchan@kernel.org> Cc: Aaron Lu <aaron.lu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-08thp: reduce usage of huge zero page's atomic counterAaron Lu2-4/+5
The global zero page is used to satisfy an anonymous read fault. If THP(Transparent HugePage) is enabled then the global huge zero page is used. The global huge zero page uses an atomic counter for reference counting and is allocated/freed dynamically according to its counter value. CPU time spent on that counter will greatly increase if there are a lot of processes doing anonymous read faults. This patch proposes a way to reduce the access to the global counter so that the CPU load can be reduced accordingly. To do this, a new flag of the mm_struct is introduced: MMF_USED_HUGE_ZERO_PAGE. With this flag, the process only need to touch the global counter in two cases: 1 The first time it uses the global huge zero page; 2 The time when mm_user of its mm_struct reaches zero. Note that right now, the huge zero page is eligible to be freed as soon as its last use goes away. With this patch, the page will not be eligible to be freed until the exit of the last process from which it was ever used. And with the use of mm_user, the kthread is not eligible to use huge zero page either. Since no kthread is using huge zero page today, there is no difference after applying this patch. But if that is not desired, I can change it to when mm_count reaches zero. Case used for test on Haswell EP: usemem -n 72 --readonly -j 0x200000 100G Which spawns 72 processes and each will mmap 100G anonymous space and then do read only access to that space sequentially with a step of 2MB. CPU cycles from perf report for base commit: 54.03% usemem [kernel.kallsyms] [k] get_huge_zero_page CPU cycles from perf report for this commit: 0.11% usemem [kernel.kallsyms] [k] mm_get_huge_zero_page Performance(throughput) of the workload for base commit: 1784430792 Performance(throughput) of the workload for this commit: 4726928591 164% increase. Runtime of the workload for base commit: 707592 us Runtime of the workload for this commit: 303970 us 50% drop. Link: http://lkml.kernel.org/r/fe51a88f-446a-4622-1363-ad1282d71385@intel.com Signed-off-by: Aaron Lu <aaron.lu@intel.com> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Huang Ying <ying.huang@intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Jerome Marchand <jmarchan@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Ebru Akagunduz <ebru.akagunduz@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-08thp, dax: add thp_get_unmapped_area for pmd mappingsToshi Kani1-0/+7
When CONFIG_FS_DAX_PMD is set, DAX supports mmap() using pmd page size. This feature relies on both mmap virtual address and FS block (i.e. physical address) to be aligned by the pmd page size. Users can use mkfs options to specify FS to align block allocations. However, aligning mmap address requires code changes to existing applications for providing a pmd-aligned address to mmap(). For instance, fio with "ioengine=mmap" performs I/Os with mmap() [1]. It calls mmap() with a NULL address, which needs to be changed to provide a pmd-aligned address for testing with DAX pmd mappings. Changing all applications that call mmap() with NULL is undesirable. Add thp_get_unmapped_area(), which can be called by filesystem's get_unmapped_area to align an mmap address by the pmd size for a DAX file. It calls the default handler, mm->get_unmapped_area(), to find a range and then aligns it for a DAX file. The patch is based on Matthew Wilcox's change that allows adding support of the pud page size easily. [1]: https://github.com/axboe/fio/blob/master/engines/mmap.c Link: http://lkml.kernel.org/r/1472497881-9323-2-git-send-email-toshi.kani@hpe.com Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Jan Kara <jack@suse.cz> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-08mm: don't use radix tree writeback tags for pages in swap cacheHuang Ying1-0/+12
File pages use a set of radix tree tags (DIRTY, TOWRITE, WRITEBACK, etc.) to accelerate finding the pages with a specific tag in the radix tree during inode writeback. But for anonymous pages in the swap cache, there is no inode writeback. So there is no need to find the pages with some writeback tags in the radix tree. It is not necessary to touch radix tree writeback tags for pages in the swap cache. Per Rik van Riel's suggestion, a new flag AS_NO_WRITEBACK_TAGS is introduced for address spaces which don't need to update the writeback tags. The flag is set for swap caches. It may be used for DAX file systems, etc. With this patch, the swap out bandwidth improved 22.3% (from ~1.2GB/s to ~1.48GBps) in the vm-scalability swap-w-seq test case with 8 processes. The test is done on a Xeon E5 v3 system. The swap device used is a RAM simulated PMEM (persistent memory) device. The improvement comes from the reduced contention on the swap cache radix tree lock. To test sequential swapping out, the test case uses 8 processes, which sequentially allocate and write to the anonymous pages until RAM and part of the swap device is used up. Details of comparison is as follow, base base+patch ---------------- -------------------------- %stddev %change %stddev \ | \ 2506952 ± 2% +28.1% 3212076 ± 7% vm-scalability.throughput 1207402 ± 7% +22.3% 1476578 ± 6% vmstat.swap.so 10.86 ± 12% -23.4% 8.31 ± 16% perf-profile.cycles-pp._raw_spin_lock_irq.__add_to_swap_cache.add_to_swap_cache.add_to_swap.shrink_page_list 10.82 ± 13% -33.1% 7.24 ± 14% perf-profile.cycles-pp._raw_spin_lock_irqsave.__remove_mapping.shrink_page_list.shrink_inactive_list.shrink_zone_memcg 10.36 ± 11% -100.0% 0.00 ± -1% perf-profile.cycles-pp._raw_spin_lock_irqsave.__test_set_page_writeback.bdev_write_page.__swap_writepage.swap_writepage 10.52 ± 12% -100.0% 0.00 ± -1% perf-profile.cycles-pp._raw_spin_lock_irqsave.test_clear_page_writeback.end_page_writeback.page_endio.pmem_rw_page Link: http://lkml.kernel.org/r/1472578089-5560-1-git-send-email-ying.huang@intel.com Signed-off-by: "Huang, Ying" <ying.huang@intel.com> Acked-by: Rik van Riel <riel@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Shaohua Li <shli@kernel.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Tejun Heo <tj@kernel.org> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-08mm/nobootmem.c: remove duplicate macro ARCH_LOW_ADDRESS_LIMIT statementszijun_hu1-4/+5
Fix the following bugs: - the same ARCH_LOW_ADDRESS_LIMIT statements are duplicated between header and relevant source - don't ensure ARCH_LOW_ADDRESS_LIMIT perhaps defined by ARCH in asm/processor.h is preferred over default in linux/bootmem.h completely since the former header isn't included by the latter Link: http://lkml.kernel.org/r/e046aeaa-e160-6d9e-dc1b-e084c2fd999f@zoho.com Signed-off-by: zijun_hu <zijun_hu@htc.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-08mm/memblock.c: expose total reserved memorySrikar Dronamraju1-0/+1
The total reserved memory in a system is accounted but not available for use use outside mm/memblock.c. By exposing the total reserved memory, systems can better calculate the size of large hashes. Link: http://lkml.kernel.org/r/1472476010-4709-3-git-send-email-srikar@linux.vnet.ibm.com Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Suggested-by: Mel Gorman <mgorman@techsingularity.net> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Michal Hocko <mhocko@kernel.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Cc: Hari Bathini <hbathini@linux.vnet.ibm.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-08mm: introduce arch_reserved_kernel_pages()Srikar Dronamraju1-0/+3
Currently arch specific code can reserve memory blocks but alloc_large_system_hash() may not take it into consideration when sizing the hashes. This can lead to bigger hash than required and lead to no available memory for other purposes. This is specifically true for systems with CONFIG_DEFERRED_STRUCT_PAGE_INIT enabled. One approach to solve this problem would be to walk through the memblock regions and calculate the available memory and base the size of hash system on the available memory. The other approach would be to depend on the architecture to provide the number of pages that are reserved. This change provides hooks to allow the architecture to provide the required info. Link: http://lkml.kernel.org/r/1472476010-4709-2-git-send-email-srikar@linux.vnet.ibm.com Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Suggested-by: Mel Gorman <mgorman@techsingularity.net> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Michal Hocko <mhocko@kernel.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Cc: Hari Bathini <hbathini@linux.vnet.ibm.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-08mm: make sure that kthreads will not refault oom reaped memoryMichal Hocko1-0/+1
There are only few use_mm() users in the kernel right now. Most of them write to the target memory but vhost driver relies on copy_from_user/get_user from a kernel thread context. This makes it impossible to reap the memory of an oom victim which shares the mm with the vhost kernel thread because it could see a zero page unexpectedly and theoretically make an incorrect decision visible outside of the killed task context. To quote Michael S. Tsirkin: : Getting an error from __get_user and friends is handled gracefully. : Getting zero instead of a real value will cause userspace : memory corruption. The vhost kernel thread is bound to an open fd of the vhost device which is not tight to the mm owner life cycle in general. The device fd can be inherited or passed over to another process which means that we really have to be careful about unexpected memory corruption because unlike for normal oom victims the result will be visible outside of the oom victim context. Make sure that no kthread context (users of use_mm) can ever see corrupted data because of the oom reaper and hook into the page fault path by checking MMF_UNSTABLE mm flag. __oom_reap_task_mm will set the flag before it starts unmapping the address space while the flag is checked after the page fault has been handled. If the flag is set then SIGBUS is triggered so any g-u-p user will get a error code. Regular tasks do not need this protection because all which share the mm are killed when the mm is reaped and so the corruption will not outlive them. This patch shouldn't have any visible effect at this moment because the OOM killer doesn't invoke oom reaper for tasks with mm shared with kthreads yet. Link: http://lkml.kernel.org/r/1472119394-11342-9-git-send-email-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Acked-by: "Michael S. Tsirkin" <mst@redhat.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Oleg Nesterov <oleg@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Vladimir Davydov <vdavydov@parallels.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-08mm, oom: enforce exit_oom_victim on current taskTetsuo Handa1-1/+1
There are no users of exit_oom_victim on !current task anymore so enforce the API to always work on the current. Link: http://lkml.kernel.org/r/1472119394-11342-8-git-send-email-mhocko@kernel.org Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Michal Hocko <mhocko@suse.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Vladimir Davydov <vdavydov@parallels.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-08oom, suspend: fix oom_killer_disable vs. pm suspend properlyMichal Hocko1-1/+1
Commit 74070542099c ("oom, suspend: fix oom_reaper vs. oom_killer_disable race") has workaround an existing race between oom_killer_disable and oom_reaper by adding another round of try_to_freeze_tasks after the oom killer was disabled. This was the easiest thing to do for a late 4.7 fix. Let's fix it properly now. After "oom: keep mm of the killed task available" we no longer have to call exit_oom_victim from the oom reaper because we have stable mm available and hide the oom_reaped mm by MMF_OOM_SKIP flag. So let's remove exit_oom_victim and the race described in the above commit doesn't exist anymore if. Unfortunately this alone is not sufficient for the oom_killer_disable usecase because now we do not have any reliable way to reach exit_oom_victim (the victim might get stuck on a way to exit for an unbounded amount of time). OOM killer can cope with that by checking mm flags and move on to another victim but we cannot do the same for oom_killer_disable as we would lose the guarantee of no further interference of the victim with the rest of the system. What we can do instead is to cap the maximum time the oom_killer_disable waits for victims. The only current user of this function (pm suspend) already has a concept of timeout for back off so we can reuse the same value there. Let's drop set_freezable for the oom_reaper kthread because it is no longer needed as the reaper doesn't wake or thaw any processes. Link: http://lkml.kernel.org/r/1472119394-11342-7-git-send-email-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Oleg Nesterov <oleg@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Vladimir Davydov <vdavydov@parallels.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-08mm, oom: get rid of signal_struct::oom_victimsMichal Hocko2-2/+6
After "oom: keep mm of the killed task available" we can safely detect an oom victim by checking task->signal->oom_mm so we do not need the signal_struct counter anymore so let's get rid of it. This alone wouldn't be sufficient for nommu archs because exit_oom_victim doesn't hide the process from the oom killer anymore. We can, however, mark the mm with a MMF flag in __mmput. We can reuse MMF_OOM_REAPED and rename it to a more generic MMF_OOM_SKIP. Link: http://lkml.kernel.org/r/1472119394-11342-6-git-send-email-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Oleg Nesterov <oleg@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Vladimir Davydov <vdavydov@parallels.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-08kernel, oom: fix potential pgd_lock deadlock from __mmdropMichal Hocko2-2/+14
Lockdep complains that __mmdrop is not safe from the softirq context: ================================= [ INFO: inconsistent lock state ] 4.6.0-oomfortification2-00011-geeb3eadeab96-dirty #949 Tainted: G W --------------------------------- inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. swapper/1/0 [HC0[0]:SC1[1]:HE1:SE0] takes: (pgd_lock){+.?...}, at: pgd_free+0x19/0x6b {SOFTIRQ-ON-W} state was registered at: __lock_acquire+0xa06/0x196e lock_acquire+0x139/0x1e1 _raw_spin_lock+0x32/0x41 __change_page_attr_set_clr+0x2a5/0xacd change_page_attr_set_clr+0x16f/0x32c set_memory_nx+0x37/0x3a free_init_pages+0x9e/0xc7 alternative_instructions+0xa2/0xb3 check_bugs+0xe/0x2d start_kernel+0x3ce/0x3ea x86_64_start_reservations+0x2a/0x2c x86_64_start_kernel+0x17a/0x18d irq event stamp: 105916 hardirqs last enabled at (105916): free_hot_cold_page+0x37e/0x390 hardirqs last disabled at (105915): free_hot_cold_page+0x2c1/0x390 softirqs last enabled at (105878): _local_bh_enable+0x42/0x44 softirqs last disabled at (105879): irq_exit+0x6f/0xd1 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(pgd_lock); <Interrupt> lock(pgd_lock); *** DEADLOCK *** 1 lock held by swapper/1/0: #0: (rcu_callback){......}, at: rcu_process_callbacks+0x390/0x800 stack backtrace: CPU: 1 PID: 0 Comm: swapper/1 Tainted: G W 4.6.0-oomfortification2-00011-geeb3eadeab96-dirty #949 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Debian-1.8.2-1 04/01/2014 Call Trace: <IRQ> print_usage_bug.part.25+0x259/0x268 mark_lock+0x381/0x567 __lock_acquire+0x993/0x196e lock_acquire+0x139/0x1e1 _raw_spin_lock+0x32/0x41 pgd_free+0x19/0x6b __mmdrop+0x25/0xb9 __put_task_struct+0x103/0x11e delayed_put_task_struct+0x157/0x15e rcu_process_callbacks+0x660/0x800 __do_softirq+0x1ec/0x4d5 irq_exit+0x6f/0xd1 smp_apic_timer_interrupt+0x42/0x4d apic_timer_interrupt+0x8e/0xa0 <EOI> arch_cpu_idle+0xf/0x11 default_idle_call+0x32/0x34 cpu_startup_entry+0x20c/0x399 start_secondary+0xfe/0x101 More over commit a79e53d85683 ("x86/mm: Fix pgd_lock deadlock") was explicit about pgd_lock not to be called from the irq context. This means that __mmdrop called from free_signal_struct has to be postponed to a user context. We already have a similar mechanism for mmput_async so we can use it here as well. This is safe because mm_count is pinned by mm_users. This fixes bug introduced by "oom: keep mm of the killed task available" Link: http://lkml.kernel.org/r/1472119394-11342-5-git-send-email-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Oleg Nesterov <oleg@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Vladimir Davydov <vdavydov@parallels.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-08oom: keep mm of the killed task availableMichal Hocko1-0/+2
oom_reap_task has to call exit_oom_victim in order to make sure that the oom vicim will not block the oom killer for ever. This is, however, opening new problems (e.g oom_killer_disable exclusion - see commit 74070542099c ("oom, suspend: fix oom_reaper vs. oom_killer_disable race")). exit_oom_victim should be only called from the victim's context ideally. One way to achieve this would be to rely on per mm_struct flags. We already have MMF_OOM_REAPED to hide a task from the oom killer since "mm, oom: hide mm which is shared with kthread or global init". The problem is that the exit path: do_exit exit_mm tsk->mm = NULL; mmput __mmput exit_oom_victim doesn't guarantee that exit_oom_victim will get called in a bounded amount of time. At least exit_aio depends on IO which might get blocked due to lack of memory and who knows what else is lurking there. This patch takes a different approach. We remember tsk->mm into the signal_struct and bind it to the signal struct life time for all oom victims. __oom_reap_task_mm as well as oom_scan_process_thread do not have to rely on find_lock_task_mm anymore and they will have a reliable reference to the mm struct. As a result all the oom specific communication inside the OOM killer can be done via tsk->signal->oom_mm. Increasing the signal_struct for something as unlikely as the oom killer is far from ideal but this approach will make the code much more reasonable and long term we even might want to move task->mm into the signal_struct anyway. In the next step we might want to make the oom killer exclusion and access to memory reserves completely independent which would be also nice. Link: http://lkml.kernel.org/r/1472119394-11342-4-git-send-email-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Oleg Nesterov <oleg@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Vladimir Davydov <vdavydov@parallels.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-08mm,oom_reaper: do not attempt to reap a task twiceTetsuo Handa1-1/+0
"mm, oom_reaper: do not attempt to reap a task twice" tried to give the OOM reaper one more chance to retry using MMF_OOM_NOT_REAPABLE flag. But the usefulness of the flag is rather limited and actually never shown in practice. If the flag is set, it means that the holder of mm->mmap_sem cannot call up_write() due to presumably being blocked at unkillable wait waiting for other thread's memory allocation. But since one of threads sharing that mm will queue that mm immediately via task_will_free_mem() shortcut (otherwise, oom_badness() will select the same mm again due to oom_score_adj value unchanged), retrying MMF_OOM_NOT_REAPABLE mm is unlikely helpful. Let's always set MMF_OOM_REAPED. Link: http://lkml.kernel.org/r/1472119394-11342-3-git-send-email-mhocko@kernel.org Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Michal Hocko <mhocko@suse.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Vladimir Davydov <vdavydov@parallels.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-08mm, swap: add swap_cluster_listHuang Ying1-4/+7
This is a code clean up patch without functionality changes. The swap_cluster_list data structure and its operations are introduced to provide some better encapsulation for the free cluster and discard cluster list operations. This avoid some code duplication, improved the code readability, and reduced the total line number. [akpm@linux-foundation.org: coding-style fixes] Link: http://lkml.kernel.org/r/1472067356-16004-1-git-send-email-ying.huang@intel.com Signed-off-by: "Huang, Ying" <ying.huang@intel.com> Acked-by: Minchan Kim <minchan@kernel.org> Acked-by: Rik van Riel <riel@redhat.com> Cc: Tim Chen <tim.c.chen@intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Shaohua Li <shli@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-08mm: pagewalk: fix the comment for test_walkJames Morse1-2/+2
Modify the comment describing struct mm_walk->test_walk()s behaviour to match the comment on walk_page_test() and the behaviour of walk_page_vma(). Fixes: fafaa4264eba4 ("pagewalk: improve vma handling") Link: http://lkml.kernel.org/r/1471622518-21980-1-git-send-email-james.morse@arm.com Signed-off-by: James Morse <james.morse@arm.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-08mm/page_owner: don't define fields on struct page_ext by hard-codingJoonsoo Kim1-6/+0
There is a memory waste problem if we define field on struct page_ext by hard-coding. Entry size of struct page_ext includes the size of those fields even if it is disabled at runtime. Now, extra memory request at runtime is possible so page_owner don't need to define it's own fields by hard-coding. This patch removes hard-coded define and uses extra memory for storing page_owner information in page_owner. Most of code are just mechanical changes. Link: http://lkml.kernel.org/r/1471315879-32294-7-git-send-email-iamjoonsoo.kim@lge.com Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Minchan Kim <minchan@kernel.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-08mm/page_ext: support extra space allocation by page_ext userJoonsoo Kim1-0/+2
Until now, if some page_ext users want to use it's own field on page_ext, it should be defined in struct page_ext by hard-coding. It has a problem that wastes memory in following situation. struct page_ext { #ifdef CONFIG_A int a; #endif #ifdef CONFIG_B int b; #endif }; Assume that kernel is built with both CONFIG_A and CONFIG_B. Even if we enable feature A and doesn't enable feature B at runtime, each entry of struct page_ext takes two int rather than one int. It's undesirable result so this patch tries to fix it. To solve above problem, this patch implements to support extra space allocation at runtime. When need() callback returns true, it's extra memory requirement is summed to entry size of page_ext. Also, offset for each user's extra memory space is returned. With this offset, user can use this extra space and there is no need to define needed field on page_ext by hard-coding. This patch only implements an infrastructure. Following patch will use it for page_owner which is only user having it's own fields on page_ext. Link: http://lkml.kernel.org/r/1471315879-32294-6-git-send-email-iamjoonsoo.kim@lge.com Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Minchan Kim <minchan@kernel.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-08mm/page_owner: move page_owner specific function to page_owner.cJoonsoo Kim1-0/+2
There is no reason that page_owner specific function resides on vmstat.c. Link: http://lkml.kernel.org/r/1471315879-32294-4-git-send-email-iamjoonsoo.kim@lge.com Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Minchan Kim <minchan@kernel.org> Cc: Michal Hocko <mhocko@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-08mm, vmscan: get rid of throttle_vm_writeoutMichal Hocko1-1/+0
throttle_vm_writeout() was introduced back in 2005 to fix OOMs caused by excessive pageout activity during the reclaim. Too many pages could be put under writeback therefore LRUs would be full of unreclaimable pages until the IO completes and in turn the OOM killer could be invoked. There have been some important changes introduced since then in the reclaim path though. Writers are throttled by balance_dirty_pages when initiating the buffered IO and later during the memory pressure, the direct reclaim is throttled by wait_iff_congested if the node is considered congested by dirty pages on LRUs and the underlying bdi is congested by the queued IO. The kswapd is throttled as well if it encounters pages marked for immediate reclaim or under writeback which signals that that there are too many pages under writeback already. Finally should_reclaim_retry does congestion_wait if the reclaim cannot make any progress and there are too many dirty/writeback pages. Another important aspect is that we do not issue any IO from the direct reclaim context anymore. In a heavy parallel load this could queue a lot of IO which would be very scattered and thus unefficient which would just make the problem worse. This three mechanisms should throttle and keep the amount of IO in a steady state even under heavy IO and memory pressure so yet another throttling point doesn't really seem helpful. Quite contrary, Mikulas Patocka has reported that swap backed by dm-crypt doesn't work properly because the swapout IO cannot make sufficient progress as the writeout path depends on dm_crypt worker which has to allocate memory to perform the encryption. In order to guarantee a forward progress it relies on the mempool allocator. mempool_alloc(), however, prefers to use the underlying (usually page) allocator before it grabs objects from the pool. Such an allocation can dive into the memory reclaim and consequently to throttle_vm_writeout. If there are too many dirty or pages under writeback it will get throttled even though it is in fact a flusher to clear pending pages. kworker/u4:0 D ffff88003df7f438 10488 6 2 0x00000000 Workqueue: kcryptd kcryptd_crypt [dm_crypt] Call Trace: schedule+0x3c/0x90 schedule_timeout+0x1d8/0x360 io_schedule_timeout+0xa4/0x110 congestion_wait+0x86/0x1f0 throttle_vm_writeout+0x44/0xd0 shrink_zone_memcg+0x613/0x720 shrink_zone+0xe0/0x300 do_try_to_free_pages+0x1ad/0x450 try_to_free_pages+0xef/0x300 __alloc_pages_nodemask+0x879/0x1210 alloc_pages_current+0xa1/0x1f0 new_slab+0x2d7/0x6a0 ___slab_alloc+0x3fb/0x5c0 __slab_alloc+0x51/0x90 kmem_cache_alloc+0x27b/0x310 mempool_alloc_slab+0x1d/0x30 mempool_alloc+0x91/0x230 bio_alloc_bioset+0xbd/0x260 kcryptd_crypt+0x114/0x3b0 [dm_crypt] Let's just drop throttle_vm_writeout altogether. It is not very much helpful anymore. I have tried to test a potential writeback IO runaway similar to the one described in the original patch which has introduced that [1]. Small virtual machine (512MB RAM, 4 CPUs, 2G of swap space and disk image on a rather slow NFS in a sync mode on the host) with 8 parallel writers each writing 1G worth of data. As soon as the pagecache fills up and the direct reclaim hits then I start anon memory consumer in a loop (allocating 300M and exiting after populating it) in the background to make the memory pressure even stronger as well as to disrupt the steady state for the IO. The direct reclaim is throttled because of the congestion as well as kswapd hitting congestion_wait due to nr_immediate but throttle_vm_writeout doesn't ever trigger the sleep throughout the test. Dirty+writeback are close to nr_dirty_threshold with some fluctuations caused by the anon consumer. [1] https://www2.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.9-rc1/2.6.9-rc1-mm3/broken-out/vm-pageout-throttling.patch Link: http://lkml.kernel.org/r/1471171473-21418-1-git-send-email-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Reported-by: Mikulas Patocka <mpatocka@redhat.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: NeilBrown <neilb@suse.com> Cc: Ondrej Kozina <okozina@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-08mm, compaction: create compact_gap wrapperVlastimil Babka1-0/+23
Compaction uses a watermark gap of (2UL << order) pages at various places and it's not immediately obvious why. Abstract it through a compact_gap() wrapper to create a single place with a thorough explanation. [vbabka@suse.cz: clarify the comment of compact_gap()] Link: http://lkml.kernel.org/r/7b6aed1f-fdf8-2063-9ff4-bbe4de712d37@suse.cz Link: http://lkml.kernel.org/r/20160810091226.6709-9-vbabka@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Tested-by: Lorenzo Stoakes <lstoakes@gmail.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: David Rientjes <rientjes@google.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-08mm, compaction: add the ultimate direct compaction priorityVlastimil Babka1-1/+2
During reclaim/compaction loop, it's desirable to get a final answer from unsuccessful compaction so we can either fail the allocation or invoke the OOM killer. However, heuristics such as deferred compaction or pageblock skip bits can cause compaction to skip parts or whole zones and lead to premature OOM's, failures or excessive reclaim/compaction retries. To remedy this, we introduce a new direct compaction priority called COMPACT_PRIO_SYNC_FULL, which instructs direct compaction to: - ignore deferred compaction status for a zone - ignore pageblock skip hints - ignore cached scanner positions and scan the whole zone The new priority should get eventually picked up by should_compact_retry() and this should improve success rates for costly allocations using __GFP_REPEAT, such as hugetlbfs allocations, and reduce some corner-case OOM's for non-costly allocations. Link: http://lkml.kernel.org/r/20160810091226.6709-6-vbabka@suse.cz [vbabka@suse.cz: use the MIN_COMPACT_PRIORITY alias] Link: http://lkml.kernel.org/r/d443b884-87e7-1c93-8684-3a3a35759fb1@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Tested-by: Lorenzo Stoakes <lstoakes@gmail.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: David Rientjes <rientjes@google.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-08mm, compaction: rename COMPACT_PARTIAL to COMPACT_SUCCESSVlastimil Babka2-5/+5
COMPACT_PARTIAL has historically meant that compaction returned after doing some work without fully compacting a zone. It however didn't distinguish if compaction terminated because it succeeded in creating the requested high-order page. This has changed recently and now we only return COMPACT_PARTIAL when compaction thinks it succeeded, or the high-order watermark check in compaction_suitable() passes and no compaction needs to be done. So at this point we can make the return value clearer by renaming it to COMPACT_SUCCESS. The next patch will remove some redundant tests for success where compaction just returned COMPACT_SUCCESS. Link: http://lkml.kernel.org/r/20160810091226.6709-4-vbabka@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Tested-by: Lorenzo Stoakes <lstoakes@gmail.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: David Rientjes <rientjes@google.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-08mm, compaction: cleanup unused functionsVlastimil Babka1-5/+0
Since kswapd compaction moved to kcompactd, compact_pgdat() is not called anymore, so we remove it. The only caller of __compact_pgdat() is compact_node(), so we merge them and remove code that was only reachable from kswapd. Link: http://lkml.kernel.org/r/20160810091226.6709-3-vbabka@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Tested-by: Lorenzo Stoakes <lstoakes@gmail.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: David Rientjes <rientjes@google.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-08mm/vmalloc.c: fix align value calculation errorzijun_hu1-10/+26
It causes double align requirement for __get_vm_area_node() if parameter size is power of 2 and VM_IOREMAP is set in parameter flags, for example size=0x10000 -> fls_long(0x10000)=17 -> align=0x20000 get_count_order_long() is implemented and can be used instead of fls_long() for fixing the bug, for example size=0x10000 -> get_count_order_long(0x10000)=16 -> align=0x10000 [akpm@linux-foundation.org: s/get_order_long()/get_count_order_long()/] [zijun_hu@zoho.com: fixes] Link: http://lkml.kernel.org/r/57AABC8B.1040409@zoho.com [akpm@linux-foundation.org: locate get_count_order_long() next to get_count_order()] [akpm@linux-foundation.org: move get_count_order[_long] definitions to pick up fls_long()] [zijun_hu@htc.com: move out get_count_order[_long]() from __KERNEL__ scope] Link: http://lkml.kernel.org/r/57B2C4CE.80303@zoho.com Link: http://lkml.kernel.org/r/fc045ecf-20fa-0722-b3ac-9a6140488fad@zoho.com Signed-off-by: zijun_hu <zijun_hu@htc.com> Cc: Tejun Heo <tj@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Minchan Kim <minchan@kernel.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: zijun_hu <zijun_hu@htc.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-08mm: oom: deduplicate victim selection code for memcg and global oomVladimir Davydov2-39/+19
When selecting an oom victim, we use the same heuristic for both memory cgroup and global oom. The only difference is the scope of tasks to select the victim from. So we could just export an iterator over all memcg tasks and keep all oom related logic in oom_kill.c, but instead we duplicate pieces of it in memcontrol.c reusing some initially private functions of oom_kill.c in order to not duplicate all of it. That looks ugly and error prone, because any modification of select_bad_process should also be propagated to mem_cgroup_out_of_memory. Let's rework this as follows: keep all oom heuristic related code private to oom_kill.c and make oom_kill.c use exported memcg functions when it's really necessary (like in case of iterating over memcg tasks). Link: http://lkml.kernel.org/r/1470056933-7505-1-git-send-email-vdavydov@virtuozzo.com Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-08jiffies: add time comparison functions for 64 bit jiffiesJason A. Donenfeld1-0/+4
Though the time_before and time_after family of functions were nicely extended to support jiffies64, so that the interface would be consistent, it was forgotten to also extend the before/after jiffies functions to support jiffies64. This commit brings the interface to parity between jiffies and jiffies64, which is quite convenient. Link: http://lkml.kernel.org/r/20160929033319.12188-1-Jason@zx2c4.com Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: John Stultz <john.stultz@linaro.org Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-08fanotify: use notification_lock instead of access_lockJan Kara1-1/+0
Fanotify code has its own lock (access_lock) to protect a list of events waiting for a response from userspace. However this is somewhat awkward as the same list_head in the event is protected by notification_lock if it is part of the notification queue and by access_lock if it is part of the fanotify private queue which makes it difficult for any reliable checks in the generic code. So make fanotify use the same lock - notification_lock - for protecting its private event list. Link: http://lkml.kernel.org/r/1473797711-14111-6-git-send-email-jack@suse.cz Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Cc: Miklos Szeredi <mszeredi@redhat.com> Cc: Eric Paris <eparis@redhat.com> Cc: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-08fsnotify: convert notification_mutex to a spinlockJan Kara1-1/+1
notification_mutex is used to protect the list of pending events. As such there's no reason to use a sleeping lock for it. Convert it to a spinlock. [jack@suse.cz: fixed version] Link: http://lkml.kernel.org/r/1474031567-1831-1-git-send-email-jack@suse.cz Link: http://lkml.kernel.org/r/1473797711-14111-5-git-send-email-jack@suse.cz Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Tested-by: Guenter Roeck <linux@roeck-us.net> Cc: Miklos Szeredi <mszeredi@redhat.com> Cc: Eric Paris <eparis@redhat.com> Cc: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-08Merge branch 'i2c/for-4.9' of ↵Linus Torvalds3-10/+49
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c updates from Wolfram Sang: "Here is the 4.9 pull request from I2C including: - centralized error messages when registering to the core - improved lockdep annotations to prevent false positives - DT support for muxes, gates, and arbitrators - bus speeds can now be obtained from ACPI - i2c-octeon got refactored and now supports ThunderX SoCs, too - i2c-tegra and i2c-designware got a bigger bunch of updates - a couple of standard driver fixes and improvements" * 'i2c/for-4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: (71 commits) i2c: axxia: disable clks in case of failure in probe i2c: octeon: thunderx: Limit register access retries i2c: uniphier-f: fix misdetection of incomplete STOP condition gpio: pca953x: variable 'id' was used twice i2c: i801: Add support for Kaby Lake PCH-H gpio: pca953x: fix an incorrect lockdep warning i2c: add a warning to i2c_adapter_depth() lockdep: make MAX_LOCKDEP_SUBCLASSES unconditionally visible i2c: export i2c_adapter_depth() i2c: rk3x: Fix variable 'min_total_ns' unused warning i2c: rk3x: Fix sparse warning i2c / ACPI: Do not touch an I2C device if it belongs to another adapter i2c: octeon: Fix high-level controller status check i2c: octeon: Avoid sending STOP during recovery i2c: octeon: Fix set SCL recovery function i2c: rcar: add support for r8a7796 (R-Car M3-W) i2c: imx: make bus recovery through pinctrl optional i2c: meson: add gxbb compatible string i2c: uniphier-f: set the adapter to master mode when probing i2c: uniphier-f: avoid WARN_ON() of clk_disable() in failure path ...
2016-10-07Merge branch 'for-linus' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial updates from Jiri Kosina: "The usual rocket science from the trivial tree" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: tracing/syscalls: fix multiline in error message text lib/Kconfig.debug: fix DEBUG_SECTION_MISMATCH description doc: vfs: fix fadvise() sycall name x86/entry: spell EBX register correctly in documentation securityfs: fix securityfs_create_dir comment irq: Fix typo in tracepoint.xml
2016-10-07Merge branch 'for-linus' of ↵Linus Torvalds1-0/+3
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching Pull livepatching updates from Jiri Kosina: - fix for patching modules that contain .altinstructions or .parainstructions sections, from Jessica Yu - make TAINT_LIVEPATCH a per-module flag (so that it's immediately clear which module caused the taint), from Josh Poimboeuf * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching: livepatch/module: make TAINT_LIVEPATCH module-specific Documentation: livepatch: add section about arch-specific code livepatch/x86: apply alternatives and paravirt patches after relocations livepatch: use arch_klp_init_object_loaded() to finish arch-specific tasks
2016-10-07Merge branch 'for-linus' of ↵Linus Torvalds3-2/+33
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid Pull HID updates from Jiri Kosina: - Integrated Sensor Hub support (Cherrytrail+) from Srinivas Pandruvada - Big cleanup of Wacom driver; namely it's now using devres, and the standardized LED API so that libinput doesn't need to have root access any more, with substantial amount of other cleanups piggy-backing on top. All this from Benjamin Tissoires - Report descriptor parsing would now ignore and out-of-range System controls in case of the application actually being System Control. This fixes quite some issues with several devices, and allows us to remove a few ->report_fixup callbacks. From Benjamin Tissoires - ... a lot of other assorted small fixes and device ID additions * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid: (76 commits) HID: add missing \n to end of dev_warn messages HID: alps: fix multitouch cursor issue HID: hid-logitech: Documentation updates/corrections HID: hid-logitech: Improve Wingman Formula Force GP support HID: hid-logitech: Rewrite of descriptor for all DF wheels HID: hid-logitech: Compute combined pedals value HID: hid-logitech: Add combined pedal support Logitech wheels HID: hid-logitech: Introduce control for combined pedals feature HID: sony: Update copyright and add Dualshock 4 rate control note HID: sony: Defer the initial USB Sixaxis output report HID: sony: Relax duplicate checking for USB-only devices Revert "HID: microsoft: fix invalid rdesc for 3k kbd" HID: alps: fix error return code in alps_input_configured() HID: alps: fix stick device not working after resume HID: support for keyboard - Corsair STRAFE HID: alps: Fix memory leak HID: uclogic: Add support for UC-Logic TWHA60 v3 HID: uclogic: Override constant descriptors HID: uclogic: Support UGTizer GP0610 partially HID: uclogic: Add support for several more tablets ...
2016-10-07Merge tag 'pci-v4.9-changes' of ↵Linus Torvalds3-4/+36
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI updates from Bjorn Helgaas: "Summary of PCI changes for the v4.9 merge window: Enumeration: - microblaze: Add multidomain support for procfs (Bharat Kumar Gogada) Resource management: - Ignore requested alignment for PROBE_ONLY and fixed resources (Yongji Xie) - Ignore requested alignment for VF BARs (Yongji Xie) PCI device hotplug: - Make core explicitly non-modular (Paul Gortmaker) PCIe native device hotplug: - Rename pcie_isr() locals for clarity (Bjorn Helgaas) - Return IRQ_NONE when we can't read interrupt status (Bjorn Helgaas) - Remove unnecessary guard (Bjorn Helgaas) - Clean up dmesg "Slot(%s)" messages (Bjorn Helgaas) - Remove useless pciehp_get_latch_status() calls (Bjorn Helgaas) - Clear attention LED on device add (Keith Busch) - Allow exclusive userspace control of indicators (Keith Busch) - Process all hotplug events before looking for new ones (Mayurkumar Patel) - Don't re-read Slot Status when queuing hotplug event (Mayurkumar Patel) - Don't re-read Slot Status when handling surprise event (Mayurkumar Patel) - Make explicitly non-modular (Paul Gortmaker) Power management: - Afford direct-complete to devices with non-standard PM (Lukas Wunner) - Query platform firmware for device power state (Lukas Wunner) - Recognize D3cold in pci_update_current_state() (Lukas Wunner) - Avoid unnecessary resume after direct-complete (Lukas Wunner) - Make explicitly non-modular (Paul Gortmaker) Virtualization: - Mark Atheros AR9580 to avoid bus reset (Maik Broemme) - Check for pci_setup_device() failure in pci_iov_add_virtfn() (Po Liu) MSI: - Enable PCI_MSI_IRQ_DOMAIN support for ARC (Joao Pinto) AER: - Remove aerdriver.nosourceid kernel parameter (Bjorn Helgaas) - Remove aerdriver.forceload kernel parameter (Bjorn Helgaas) - Fix aer_probe() kernel-doc comment (Cao jin) - Add bus flag to skip source ID matching (Jon Derrick) - Avoid memory allocation in interrupt handling path (Jon Derrick) - Cache capability position (Keith Busch) - Make explicitly non-modular (Paul Gortmaker) - Remove duplicate AER severity translation (Tyler Baicar) - Send correct severity to calculate AER severity (Tyler Baicar) Precision Time Measurement: - Add Precision Time Measurement (PTM) support (Jonathan Yong) - Add PTM clock granularity information (Bjorn Helgaas) - Add pci_enable_ptm() for drivers to enable PTM on endpoints (Bjorn Helgaas) Generic host bridge driver: - Fix pci_remap_iospace() failure path (Lorenzo Pieralisi) - Make explicitly non-modular (Paul Gortmaker) Altera host bridge driver: - Remove redundant platform_get_resource() return value check (Bjorn Helgaas) - Poll for link training status after retraining the link (Ley Foon Tan) - Rework config accessors for use without a struct pci_bus (Ley Foon Tan) - Move retrain from fixup to altera_pcie_host_init() (Ley Foon Tan) - Make MSI explicitly non-modular (Paul Gortmaker) - Make explicitly non-modular (Paul Gortmaker) - Relax device number checking to allow SR-IOV (Po Liu) ARM Versatile host bridge driver: - Fix pci_remap_iospace() failure path (Lorenzo Pieralisi) Axis ARTPEC-6 host bridge driver: - Drop __init from artpec6_add_pcie_port() (Niklas Cassel) Freescale i.MX6 host bridge driver: - Make explicitly non-modular (Paul Gortmaker) Intel VMD host bridge driver: - Add quirk for AER to ignore source ID (Jon Derrick) - Allocate IRQ lists with correct MSI-X count (Jon Derrick) - Convert to use pci_alloc_irq_vectors() API (Jon Derrick) - Eliminate vmd_vector member from list type (Jon Derrick) - Eliminate index member from IRQ list (Jon Derrick) - Synchronize with RCU freeing MSI IRQ descs (Keith Busch) - Request userspace control of PCIe hotplug indicators (Keith Busch) - Move VMD driver to drivers/pci/host (Keith Busch) Marvell Aardvark host bridge driver: - Fix pci_remap_iospace() failure path (Lorenzo Pieralisi) - Remove redundant dev_err call in advk_pcie_probe() (Wei Yongjun) Microsoft Hyper-V host bridge driver: - Use zero-length array in struct pci_packet (Dexuan Cui) - Use pci_function_description[0] in struct definitions (Dexuan Cui) - Remove the unused 'wrk' in struct hv_pcibus_device (Dexuan Cui) - Handle vmbus_sendpacket() failure in hv_compose_msi_msg() (Dexuan Cui) - Handle hv_pci_generic_compl() error case (Dexuan Cui) - Use list_move_tail() instead of list_del() + list_add_tail() (Wei Yongjun) NVIDIA Tegra host bridge driver: - Fix pci_remap_iospace() failure path (Lorenzo Pieralisi) - Remove redundant _data suffix (Thierry Reding) - Use of_device_get_match_data() (Thierry Reding) Qualcomm host bridge driver: - Make explicitly non-modular (Paul Gortmaker) Renesas R-Car host bridge driver: - Consolidate register space lookup and ioremap (Bjorn Helgaas) - Don't disable/unprepare clocks on prepare/enable failure (Geert Uytterhoeven) - Add multi-MSI support (Grigory Kletsko) - Fix pci_remap_iospace() failure path (Lorenzo Pieralisi) - Fix some checkpatch warnings (Sergei Shtylyov) - Try increasing PCIe link speed to 5 GT/s at boot (Sergei Shtylyov) Rockchip host bridge driver: - Add DT bindings for Rockchip PCIe controller (Shawn Lin) - Add Rockchip PCIe controller support (Shawn Lin) - Improve the deassert sequence of four reset pins (Shawn Lin) - Fix wrong transmitted FTS count (Shawn Lin) - Increase the Max Credit update interval (Rajat Jain) Samsung Exynos host bridge driver: - Make explicitly non-modular (Paul Gortmaker) ST Microelectronics SPEAr13xx host bridge driver: - Make explicitly non-modular (Paul Gortmaker) Synopsys DesignWare host bridge driver: - Return data directly from dw_pcie_readl_rc() (Bjorn Helgaas) - Exchange viewport of `MEMORYs' and `CFGs/IOs' (Dong Bo) - Check LTSSM training bit before deciding link is up (Jisheng Zhang) - Move link wait definitions to .c file (Joao Pinto) - Wait for iATU enable (Joao Pinto) - Add iATU Unroll feature (Joao Pinto) - Fix pci_remap_iospace() failure path (Lorenzo Pieralisi) - Make explicitly non-modular (Paul Gortmaker) - Relax device number checking to allow SR-IOV (Po Liu) - Keep viewport fixed for IO transaction if num_viewport > 2 (Pratyush Anand) - Remove redundant platform_get_resource() return value check (Wei Yongjun) TI DRA7xx host bridge driver: - Make explicitly non-modular (Paul Gortmaker) TI Keystone host bridge driver: - Propagate request_irq() failure (Wei Yongjun) Xilinx AXI host bridge driver: - Keep both legacy and MSI interrupt domain references (Bharat Kumar Gogada) - Clear interrupt register for invalid interrupt (Bharat Kumar Gogada) - Clear correct MSI set bit (Bharat Kumar Gogada) - Dispose of MSI virtual IRQ (Bharat Kumar Gogada) - Make explicitly non-modular (Paul Gortmaker) - Relax device number checking to allow SR-IOV (Po Liu) Xilinx NWL host bridge driver: - Expand error logging (Bharat Kumar Gogada) - Enable all MSI interrupts using MSI mask (Bharat Kumar Gogada) - Make explicitly non-modular (Paul Gortmaker) Miscellaneous: - Drop CONFIG_KEXEC_CORE ifdeffery (Lukas Wunner) - portdrv: Make explicitly non-modular (Paul Gortmaker) - Make DPC explicitly non-modular (Paul Gortmaker)" * tag 'pci-v4.9-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (105 commits) x86/PCI: VMD: Move VMD driver to drivers/pci/host PCI: rockchip: Fix wrong transmitted FTS count PCI: rockchip: Improve the deassert sequence of four reset pins PCI: rockchip: Increase the Max Credit update interval PCI: rcar: Try increasing PCIe link speed to 5 GT/s at boot PCI/AER: Fix aer_probe() kernel-doc comment PCI: Ignore requested alignment for VF BARs PCI: Ignore requested alignment for PROBE_ONLY and fixed resources PCI: Avoid unnecessary resume after direct-complete PCI: Recognize D3cold in pci_update_current_state() PCI: Query platform firmware for device power state PCI: Afford direct-complete to devices with non-standard PM PCI/AER: Cache capability position PCI/AER: Avoid memory allocation in interrupt handling path x86/PCI: VMD: Request userspace control of PCIe hotplug indicators PCI: pciehp: Allow exclusive userspace control of indicators ACPI / APEI: Send correct severity to calculate AER severity PCI/AER: Remove duplicate AER severity translation x86/PCI: VMD: Synchronize with RCU freeing MSI IRQ descs x86/PCI: VMD: Eliminate index member from IRQ list ...
2016-10-07Merge tag 'md/4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/mdLinus Torvalds1-0/+4
Pull MD updates from Shaohua Li: "This update includes: - new AVX512 instruction based raid6 gen/recovery algorithm - a couple of md-cluster related bug fixes - fix a potential deadlock - set nonrotational bit for raid array with SSD - set correct max_hw_sectors for raid5/6, which hopefuly can improve performance a little bit - other minor fixes" * tag 'md/4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md: md: set rotational bit raid6/test/test.c: bug fix: Specify aligned(alignment) attributes to the char arrays raid5: handle register_shrinker failure raid5: fix to detect failure of register_shrinker md: fix a potential deadlock md/bitmap: fix wrong cleanup raid5: allow arbitrary max_hw_sectors lib/raid6: Add AVX512 optimized xor_syndrome functions lib/raid6/test/Makefile: Add avx512 gen_syndrome and recovery functions lib/raid6: Add AVX512 optimized recovery functions lib/raid6: Add AVX512 optimized gen_syndrome functions md-cluster: make resync lock also could be interruptted md-cluster: introduce dlm_lock_sync_interruptible to fix tasks hang md-cluster: convert the completion to wait queue md-cluster: protect md_find_rdev_nr_rcu with rcu lock md-cluster: clean related infos of cluster md: changes for MD_STILL_CLOSED flag md-cluster: remove some unnecessary dlm_unlock_sync md-cluster: use FORCEUNLOCK in lockres_free md-cluster: call md_kick_rdev_from_array once ack failed
2016-10-07Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds2-5/+34
Pull SCSI updates from James Bottomley: "This update includes the usual round of major driver updates (hpsa, be2iscsi, hisi_sas, zfcp, cxlflash). There's a new incarnation of hpsa called smartpqi for which a driver is added, there's some cleanup work of the ibm vscsi target and updates to libfc, plus a whole host of minor fixes and updates and finally the removal of several ISA drivers which seem not to have been used for years" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (173 commits) scsi: mvsas: Mark symbols static where possible scsi: pm8001: Mark symbols static where possible scsi: arcmsr: Simplify user_len checking scsi: fcoe: fix off by one in eth2fc_speed() scsi: dtc: remove from tree scsi: t128: remove from tree scsi: pas16: remove from tree scsi: u14-34f: remove from tree scsi: ultrastor: remove from tree scsi: in2000: remove from tree scsi: wd7000: remove from tree scsi: scsi_dh_alua: Fix memory leak in alua_rtpg() scsi: lpfc: Mark symbols static where possible scsi: hpsa: correct call to hpsa_do_reset scsi: ufs: Get a TM service response from the correct offset scsi: ibmvfc: Fix I/O hang when port is not mapped scsi: megaraid_sas: clean function declarations in megaraid_sas_base.c up scsi: ipr: Remove redundant messages at adapter init time scsi: ipr: Don't log unnecessary 9084 error details scsi: smartpqi: raid bypass lba calculation fix ...
2016-10-07Merge tag 'mfd-for-linus-4.9' of ↵Linus Torvalds20-40/+494
git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd Pull MFD updates from Lee Jones: "Core framework: - Add the MFD bindings doc to MAINTAINERS New drivers: - X-Powers AC100 Audio CODEC and RTC - TI LP873x PMIC - Rockchip RK808 PMIC - Samsung Exynos Low Power Audio New device support: - Add support for STMPE1600 variant to stmpe - Add support for PM8018 PMIC to pm8921-core - Add support for AXP806 PMIC in axp20x - Add support for AXP209 GPIO in axp20x New functionality: - Add support for Reset to all STMPE variants - Add support for MKBP event support to cros_ec - Add support for USB to intel_soc_pmic_bxtwc - Add support for IRQs and Power Button to tps65217 Fix-ups: - Clean-up defunct author emails (da9063, max14577) - Kconfig fixups (wm8350-i2c, as37220 - Constify (altera-a10sr, sm501) - Supply PCI IDs (intel-lpss-pci) - Improve clocking (qcom_rpm) - Fix IRQ probing (ucb1x00-core) - Ensure fault log is cleared (da9052) - Remove NO_IRQ check (ucb1x00-core) - Supply I2C properties (intel-lpss-acpi, intel-lpss-pci) - Non standard declaration (tps65217, max8997-irq) - Remove unused code (lp873x, db8500-prcmu, ab8500-debugfs, cros_ec_spi) - Make non-modular (altera-a10sr, intel_msic, smsc-ece1099, sun6i-prcm, twl-core) - OF bindings (ac100, stmpe, qcom-pm8xxx, qcom-rpm, rk808, axp20x, lp873x, exynos5433-lpass, act8945a, aspeed-scu, twl6040, arizona) Bugfixes: - Release OF pointer (qcom_rpm) - Avoid double shifting in suspend/resume (88pm80x) - Fix 'defined but not used' error (exynos-lpass) - Fix 'sleeping whilst attomic' (atmel-hlcdc)" * tag 'mfd-for-linus-4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd: (69 commits) mfd: arizona: Handle probe deferral for reset GPIO mfd: arizona: Remove arizona_of_get_named_gpio helper function mfd: arizona: Add DT options for max_channels_clocked and PDM speaker config mfd: twl6040: Register child device for twl6040-pdmclk mfd: cros_ec_spi: Remove unused variable 'request' mfd: omap-usb-host: Return value is not 'const int' mfd: ab8500-debugfs: Remove 'weak' function suspend_test_wake_cause_interrupt_is_mine() mfd: ab8500-debugfs: Remove ab8500_dump_all_banks_to_mem() mfd: db8500-prcmu: Remove unused *prcmu_set_ddr_opp() calls mfd: ab8500-debugfs: Prevent initialised field from being over-written mfd: max8997-irq: 'inline' should be at the beginning of the declaration mfd: rk808: Fix RK818_IRQ_DISCHG_ILIM initializer mfd: tps65217: Fix nonstandard declaration mfd: lp873x: Remove unused mutex lock from struct lp873x mfd: atmel-hlcdc: Do not sleep in atomic context mfd: exynos-lpass: Mark PM functions as __maybe_unused mfd: intel-lpss: Add default I2C device properties for Apollo Lake mfd: twl-core: Make it explicitly non-modular mfd: sun6i-prcm: Make it explicitly non-modular mfd: smsc-ece1099: Make it explicitly non-modular ...
2016-10-07Merge branches 'for-4.8/upstream-fixes', 'for-4.9/alps', ↵Jiri Kosina3-2/+33
'for-4.9/hid-input', 'for-4.9/intel-ish', 'for-4.9/kye-uclogic-waltop-fixes', 'for-4.9/logitech', 'for-4.9/sony', 'for-4.9/upstream' and 'for-4.9/wacom' into for-linus
2016-10-07Merge tag 'hsi-for-4.9' of ↵Linus Torvalds1-8/+9
git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-hsi Pull HSI fix from Sebastian Reichel: "Fix hsi userspace header" * tag 'hsi-for-4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-hsi: HSI: hsi_char.h: use __u32 from linux/types.h
2016-10-07Merge tag 'for-v4.9' of ↵Linus Torvalds3-10/+3
git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply Pull power supply and reset updates from Sebastian Reichel: - move power supply drivers to drivers/power/supply - unify location of power supply DT documentation - tps65217-charger: IRQ support - act8945a-charger: misc. cleanups & improvements - sbs-battery cleanup - fix users of deprecated create_singlethread_workqueue() - misc fixes. * tag 'for-v4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply: (46 commits) power: supply: bq27xxx_battery: allow kernel poll_interval parameter runtime update power: supply: sbs-battery: Cleanup removal of chip->pdata power: reset: st: Remove obsolete platforms from dt doc power: reset: st-poweroff: Remove obsolete platforms. power: reset: zx-reboot: Unmap region obtained by of_iomap power: reset: xgene-reboot: Unmap region obtained by of_iomap power: supply: ab8500: cleanup with list_first_entry_or_null() power: reset: add in missing white space in error message text sbs-battery: make writes to ManufacturerAccess optional power: bq24257: Fix use of uninitialized pointer bq->charger power: supply: sbs-battery: simplify DT parsing power: supply: bq24735-charger: Request status GPIO with initial input setup power: supply: sbs-battery: Use gpio_desc and sleeping calls for battery detect power: supply: act8945a_charger: Add max current property power: supply: act8945a_charger: Add capacity level property doc: bindings: power: act8945a-charger: Update properties. power: supply: act8945a_charger: Fix the power supply type power: supply: act8945a_charger: Add status change update support power: supply: act8945a_charger: Improve state handling power: supply: act8945a_charger: Remove "battery_temperature" ...
2016-10-07Merge tag 'dmaengine-4.9-rc1' of git://git.infradead.org/users/vkoul/slave-dmaLinus Torvalds7-3/+118
Pull dmaengine updates from Vinod Koul: "This is bit large pile of code which bring in some nice additions: - Error reporting: we have added a new mechanism for users of dmaenegine to register a callback_result which tells them the result of the dma transaction. Right now only one user (ntb) is using it. - As we discussed on KS mailing list and pointed out NO_IRQ has no place in kernel, this also remove NO_IRQ from dmaengine subsystem (both arm and ppc users) - Support for IOMMU slave transfers and its implementation for arm. - To get better build coverage, enable COMPILE_TEST for bunch of driver, and fix the warning and sparse complaints on these. - Apart from above, usual updates spread across drivers" * tag 'dmaengine-4.9-rc1' of git://git.infradead.org/users/vkoul/slave-dma: (169 commits) async_pq_val: fix DMA memory leak dmaengine: virt-dma: move function declarations dmaengine: omap-dma: Enable burst and data pack for SG DT: dmaengine: rcar-dmac: document R8A7743/5 support dmaengine: fsldma: Unmap region obtained by of_iomap dmaengine: jz4780: fix resource leaks on error exit return dma-debug: fix ia64 build, use PHYS_PFN dmaengine: coh901318: fix integer overflow when shifting more than 32 places dmaengine: edma: avoid uninitialized variable use dma-mapping: fix m32r build warning dma-mapping: fix ia64 build, use PHYS_PFN dmaengine: ti-dma-crossbar: enable COMPILE_TEST dmaengine: omap-dma: enable COMPILE_TEST dmaengine: edma: enable COMPILE_TEST dmaengine: ti-dma-crossbar: Fix of_device_id data parameter usage dmaengine: ti-dma-crossbar: Correct type for of_find_property() third parameter dmaengine/ARM: omap-dma: Fix the DMAengine compile test on non OMAP configs dmaengine: edma: Rename set_bits and remove unused clear_bits helper dmaengine: edma: Use correct type for of_find_property() third parameter dmaengine: edma: Fix of_device_id data parameter usage (legacy vs TPCC) ...
2016-10-07Merge tag 'rpmsg-v4.9' of git://github.com/andersson/remoteprocLinus Torvalds1-213/+35
Pull rpmsg updates from Bjorn Andersson: "The bulk of these patches involve splitting the rpmsg implementation into a framework/API part and a virtio specific backend part. It then adds the Qualcomm Shared Memory Device (SMD) as an additional supported wire format. Also included is a set of code style cleanups that have been lingering for a while" * tag 'rpmsg-v4.9' of git://github.com/andersson/remoteproc: (26 commits) rpmsg: smd: fix dependency on QCOM_SMD=n rpmsg: Introduce Qualcomm SMD backend rpmsg: Allow callback to return errors rpmsg: Move virtio specifics from public header rpmsg: virtio: Hide vrp pointer from the public API rpmsg: Hide rpmsg indirection tables rpmsg: Split rpmsg core and virtio backend rpmsg: Split off generic tail of create_channel() rpmsg: Move helper for finding rpmsg devices to core rpmsg: Move endpoint related interface to rpmsg core rpmsg: Indirection table for rpmsg_endpoint operations rpmsg: Move rpmsg_device API to new file rpmsg: Introduce indirection table for rpmsg_device operations rpmsg: Clean up rpmsg device vs channel naming rpmsg: Make rpmsg_create_ept() take channel_info struct rpmsg: rpmsg_send() operations takes rpmsg_endpoint rpmsg: Name rpmsg devices based on channel id rpmsg: Enable matching devices with drivers based on DT rpmsg: Drop prototypes for non-existing functions samples/rpmsg: add support for multiple instances ...
2016-10-07Merge tag 'rproc-v4.9' of git://github.com/andersson/remoteprocLinus Torvalds2-11/+11
Pull remoteproc updates from Bjorn Andersson: "In addition to a slew of minor fixes and cleanups these patches refactor how we deal with remoteprocs that will be auto-booting themselves. That does clean up the remote resource handling but makes for additional work to clarify responsibilities and life cycles of resources. We also revise how module locking of remoteproc drivers work, so that they are locked as we hand out references to them to third parties, rather than only when booted by anyone. In addition to that we also introduce the Qualcomm Wireless Subsystem remoteproc driver" * tag 'rproc-v4.9' of git://github.com/andersson/remoteproc: (26 commits) remoteproc: Refactor rproc module locking remoteproc: Split driver and consumer dereferencing remoteproc: Correct resource handling upon boot failure remoteproc: Drop unnecessary NULL check remoteproc: core: transform struct fw_rsc_vdev_vring reserved field in pa remoteproc: Modify FW_RSC_ADDR_ANY definition remoteproc: qcom: wcnss: Fix return value check in wcnss_probe() remoteproc: qcom: Introduce WCNSS peripheral image loader dt-binding: remoteproc: Introduce Qualcomm WCNSS loader binding remoteproc: Only update table_ptr if we have a loaded table remoteproc: Move handling of cached table to boot/shutdown remoteproc: Move vdev handling to boot/shutdown remoteproc: Calculate max_notifyid during load remoteproc: Introduce auto-boot flag remoteproc/omap: revise a minor error trace message remoteproc/omap: fix various code formatting issues remoteproc: print hex numbers with a leading 0x format remoteproc: align code with open parenthesis remoteproc: fix bare unsigned type usage remoteproc: use variable names for sizeof() operator ...
2016-10-07Merge tag 'for-f2fs-4.9' of ↵Linus Torvalds2-11/+8
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "In this round, we've investigated how f2fs deals with errors given by our fault injection facility. With this, we could fix several corner cases. And, in order to improve the performance, we set inline_dentry by default and enhance the exisiting discard issue flow. In addition, we added f2fs_migrate_page for better memory management. Enhancements: - set inline_dentry by default - improve discard issue flow - add more fault injection cases in f2fs - allow block preallocation for encrypted files - introduce migrate_page callback function - avoid truncating the next direct node block at every checkpoint Bug fixes: - set page flag correctly between write_begin and write_end - missing error handling cases detected by fault injection - preallocate blocks regarding to 4KB alignement correctly - dentry and filename handling of encryption - lost xattrs of directories" * tag 'for-f2fs-4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (69 commits) f2fs: introduce update_ckpt_flags to clean up f2fs: don't submit irrelevant page f2fs: fix to commit bio cache after flushing node pages f2fs: introduce get_checkpoint_version for cleanup f2fs: remove dead variable f2fs: remove redundant io plug f2fs: support checkpoint error injection f2fs: fix to recover old fault injection config in ->remount_fs f2fs: do fault injection initialization in default_options f2fs: remove redundant value definition f2fs: support configuring fault injection per superblock f2fs: adjust display format of segment bit f2fs: remove dirty inode pages in error path f2fs: do not unnecessarily null-terminate encrypted symlink data f2fs: handle errors during recover_orphan_inodes f2fs: avoid gc in cp_error case f2fs: should put_page for summary page f2fs: assign return value in f2fs_gc f2fs: add customized migrate_page callback f2fs: introduce cp_lock to protect updating of ckpt_flags ...
2016-10-07Merge tag 'pstore-v4.9-rc1' of ↵Linus Torvalds2-6/+18
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull pstore updates from Kees Cook: - Fix bug in module unloading - Switch to always using spinlock over cmpxchg - Explicitly define pstore backend's supported modes - Remove bounce buffer from pmsg - Switch to using memcpy_to/fromio() - Error checking improvements * tag 'pstore-v4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: ramoops: move spin_lock_init after kmalloc error checking pstore/ram: Use memcpy_fromio() to save old buffer pstore/ram: Use memcpy_toio instead of memcpy pstore/pmsg: drop bounce buffer pstore/ram: Set pstore flags dynamically pstore: Split pstore fragile flags pstore/core: drop cmpxchg based updates pstore/ramoops: fixup driver removal
2016-10-06Merge tag 'trace-v4.9' of ↵Linus Torvalds2-4/+29
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: "This release cycle is rather small. Just a few fixes to tracing. The big change is the addition of the hwlat tracer. It not only detects SMIs, but also other latency that's caused by the hardware. I have detected some latency from large boxes having bus contention" * tag 'trace-v4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Call traceoff trigger after event is recorded ftrace/scripts: Add helper script to bisect function tracing problem functions tracing: Have max_latency be defined for HWLAT_TRACER as well tracing: Add NMI tracing in hwlat detector tracing: Have hwlat trace migrate across tracing_cpumask CPUs tracing: Add documentation for hwlat_detector tracer tracing: Added hardware latency tracer ftrace: Access ret_stack->subtime only in the function profiler function_graph: Handle TRACE_BPUTS in print_graph_comment tracing/uprobe: Drop isdigit() check in create_trace_uprobe
2016-10-06Merge tag 'for-linus-4.9-rc0-tag' of ↵Linus Torvalds2-2/+3
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen updates from David Vrabel: "xen features and fixes for 4.9: - switch to new CPU hotplug mechanism - support driver_override in pciback - require vector callback for HVM guests (the alternate mechanism via the platform device has been broken for ages)" * tag 'for-linus-4.9-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen/x86: Update topology map for PV VCPUs xen/x86: Initialize per_cpu(xen_vcpu, 0) a little earlier xen/pciback: support driver_override xen/pciback: avoid multiple entries in slot list xen/pciback: simplify pcistub device handling xen: Remove event channel notification through Xen PCI platform device xen/events: Convert to hotplug state machine xen/x86: Convert to hotplug state machine x86/xen: add missing \n at end of printk warning message xen/grant-table: Use kmalloc_array() in arch_gnttab_valloc() xen: Make VPMU init message look less scary xen: rename xen_pmu_init() in sys-hypervisor.c hotplug: Prevent alloc/free of irq descriptors during cpu up/down (again) xen/x86: Move irq allocation from Xen smp_op.cpu_up()
2016-10-06Merge tag 'kvm-4.9-1' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds3-11/+54
Pull KVM updates from Radim Krčmář: "All architectures: - move `make kvmconfig` stubs from x86 - use 64 bits for debugfs stats ARM: - Important fixes for not using an in-kernel irqchip - handle SError exceptions and present them to guests if appropriate - proxying of GICV access at EL2 if guest mappings are unsafe - GICv3 on AArch32 on ARMv8 - preparations for GICv3 save/restore, including ABI docs - cleanups and a bit of optimizations MIPS: - A couple of fixes in preparation for supporting MIPS EVA host kernels - MIPS SMP host & TLB invalidation fixes PPC: - Fix the bug which caused guests to falsely report lockups - other minor fixes - a small optimization s390: - Lazy enablement of runtime instrumentation - up to 255 CPUs for nested guests - rework of machine check deliver - cleanups and fixes x86: - IOMMU part of AMD's AVIC for vmexit-less interrupt delivery - Hyper-V TSC page - per-vcpu tsc_offset in debugfs - accelerated INS/OUTS in nVMX - cleanups and fixes" * tag 'kvm-4.9-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (140 commits) KVM: MIPS: Drop dubious EntryHi optimisation KVM: MIPS: Invalidate TLB by regenerating ASIDs KVM: MIPS: Split kernel/user ASID regeneration KVM: MIPS: Drop other CPU ASIDs on guest MMU changes KVM: arm/arm64: vgic: Don't flush/sync without a working vgic KVM: arm64: Require in-kernel irqchip for PMU support KVM: PPC: Book3s PR: Allow access to unprivileged MMCR2 register KVM: PPC: Book3S PR: Support 64kB page size on POWER8E and POWER8NVL KVM: PPC: Book3S: Remove duplicate setting of the B field in tlbie KVM: PPC: BookE: Fix a sanity check KVM: PPC: Book3S HV: Take out virtual core piggybacking code KVM: PPC: Book3S: Treat VTB as a per-subcore register, not per-thread ARM: gic-v3: Work around definition of gic_write_bpr1 KVM: nVMX: Fix the NMI IDT-vectoring handling KVM: VMX: Enable MSR-BASED TPR shadow even if APICv is inactive KVM: nVMX: Fix reload apic access page warning kvmconfig: add virtio-gpu to config fragment config: move x86 kvm_guest.config to a common location arm64: KVM: Remove duplicating init code for setting VMID ARM: KVM: Support vgic-v3 ...