summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2016-06-08Linux 4.6.2v4.6.2Greg Kroah-Hartman1-1/+1
2016-06-08regulator: Fix deadlock during regulator registrationJon Hunter1-5/+5
commit a2151374230820a3a6e654f2998b2a44dbfae4e1 upstream. Commit 5e3ca2b349b1 ("regulator: Try to resolve regulators supplies on registration") added a call to regulator_resolve_supply() within regulator_register() where the regulator_list_mutex is held. This causes a deadlock to occur on the Tegra114 Dalmore board when the palmas PMIC is registered because regulator_register_resolve_supply() calls regulator_dev_lookup() which may try to acquire the regulator_list_mutex again. Fix this by releasing the mutex before calling regulator_register_resolve_supply() and update the error exit path to ensure the mutex is released on an error. [Made commit message more legible -- broonie] Signed-off-by: Jon Hunter <jonathanh@nvidia.com> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-08IB/hfi1: Fix hard lockup due to not using save/restore spin lockMike Marciniszyn1-2/+3
commit 7049de65c9e520886f06d6f9deceaaed5d93fb7c upstream. Commit b9b06cb6feda ("IB/hfi1: Fix missing lock/unlock in verbs drain callback") added a spin lock. Unfortunately, the new lock code can be called from a base level interrupt state, and an interrupt that can get stacked will attempt to get the same lock. Fix by using the flag save/restore spin lock variation. Cc: stable@vger.kernel.org # 4.6+ Reviewed-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-08drm: msm: remove unused variableArnd Bergmann1-1/+0
commit 6979cd54c0667189bb0805c0fcebfef8afc5a191 upstream. A recent cleanup removed the only user of the 'kms' variable in msm_preclose(), causing a harmless compiler warning: drivers/gpu/drm/msm/msm_drv.c: In function 'msm_preclose': drivers/gpu/drm/msm/msm_drv.c:468:18: error: unused variable 'kms' [-Werror=unused-variable] This removes the variable as well. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: 4016260ba47a ("drm/msm: fix bug after preclose removal") Signed-off-by: Rob Clark <robdclark@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-08xfs: skip stale inodes in xfs_iflush_clusterDave Chinner1-0/+1
commit 7d3aa7fe970791f1a674b14572a411accf2f4d4e upstream. We don't write back stale inodes so we should skip them in xfs_iflush_cluster, too. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-08xfs: fix inode validity check in xfs_iflush_clusterDave Chinner1-4/+4
commit 51b07f30a71c27405259a0248206ed4e22adbee2 upstream. Some careless idiot(*) wrote crap code in commit 1a3e8f3 ("xfs: convert inode cache lookups to use RCU locking") back in late 2010, and so xfs_iflush_cluster checks the wrong inode for whether it is still valid under RCU protection. Fix it to lock and check the correct inode. (*) Careless-idiot: Dave Chinner <dchinner@redhat.com> Discovered-by: Brain Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-08xfs: xfs_iflush_cluster fails to abort on errorDave Chinner1-4/+13
commit b1438f477934f5a4d5a44df26f3079a7575d5946 upstream. When a failure due to an inode buffer occurs, the error handling fails to abort the inode writeback correctly. This can result in the inode being reclaimed whilst still in the AIL, leading to use-after-free situations as well as filesystems that cannot be unmounted as the inode log items left in the AIL never get removed. Fix this by ensuring fatal errors from xfs_imap_to_bp() result in the inode flush being aborted correctly. Reported-by: Shyam Kaushik <shyam@zadarastorage.com> Diagnosed-by: Shyam Kaushik <shyam@zadarastorage.com> Tested-by: Shyam Kaushik <shyam@zadarastorage.com> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-08xfs: remove xfs_fs_evict_inode()Dave Chinner1-21/+7
commit 8179c03629de67f515d3ab825b5a9428687d4b85 upstream. Joe Lawrence reported a list_add corruption with 4.6-rc1 when testing some custom md administration code that made it's own block device nodes for the md array. The simple test loop of: for i in {0..100}; do mknod --mode=0600 $tmp/tmp_node b $MAJOR $MINOR mdadm --detail --export $tmp/tmp_node > /dev/null rm -f $tmp/tmp_node done Would produce this warning in bd_acquire() when mdadm opened the device node: list_add double add: new=ffff88043831c7b8, prev=ffff8804380287d8, next=ffff88043831c7b8. And then produce this from bd_forget from kdevtmpfs evicting a block dev inode: list_del corruption. prev->next should be ffff8800bb83eb10, but was ffff88043831c7b8 This is a regression caused by commit c19b3b05 ("xfs: mode di_mode to vfs inode"). The issue is that xfs_inactive() frees the unlinked inode, and the above commit meant that this freeing zeroed the mode in the struct inode. The problem is that after evict() has called ->evict_inode, it expects the i_mode to be intact so that it can call bd_forget() or cd_forget() to drop the reference to the block device inode attached to the XFS inode. In reality, the only thing we do in xfs_fs_evict_inode() that is not generic is call xfs_inactive(). We can move the xfs_inactive() call to xfs_fs_destroy_inode() without any problems at all, and this will leave the VFS inode intact until it is completely done with it. So, remove xfs_fs_evict_inode(), and do the work it used to do in ->destroy_inode instead. Reported-by: Joe Lawrence <joe.lawrence@stratus.com> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-08xfs: Don't wrap growfs AGFL indexesDave Chinner1-2/+2
commit ad747e3b299671e1a53db74963cc6c5f6cdb9f6d upstream. Commit 96f859d ("libxfs: pack the agfl header structure so XFS_AGFL_SIZE is correct") allowed the freelist to use the empty slot at the end of the freelist on 64 bit systems that was not being used due to sizeof() rounding up the structure size. This has caused versions of xfs_repair prior to 4.5.0 (which also has the fix) to report this as a corruption once the filesystem has been grown. Older kernels can also have problems (seen from a whacky container/vm management environment) mounting filesystems grown on a system with a newer kernel than the vm/container it is deployed on. To avoid this problem, change the initial free list indexes not to wrap across the end of the AGFL, hence avoiding the initialisation of agf_fllast to the last index in the AGFL. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-08xfs: disallow rw remount on fs with unknown ro-compat featuresEric Sandeen1-0/+10
commit d0a58e833931234c44e515b5b8bede32bd4e6eed upstream. Today, a kernel which refuses to mount a filesystem read-write due to unknown ro-compat features can still transition to read-write via the remount path. The old kernel is most likely none the wiser, because it's unaware of the new feature, and isn't using it. However, writing to the filesystem may well corrupt metadata related to that new feature, and moving to a newer kernel which understand the feature will have problems. Right now the only ro-compat feature we have is the free inode btree, which showed up in v3.16. It would be good to push this back to all the active stable kernels, I think, so that if anyone is using newer mkfs (which enables the finobt feature) with older kernel releases, they'll be protected. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Bill O'Donnell <billodo@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-08gcov: disable tree-loop-im to reduce stack usageArnd Bergmann1-1/+1
commit c87bf431448b404a6ef5fbabd74c0e3e42157a7f upstream. Enabling CONFIG_GCOV_PROFILE_ALL produces us a lot of warnings like lib/lz4/lz4hc_compress.c: In function 'lz4_compresshcctx': lib/lz4/lz4hc_compress.c:514:1: warning: the frame size of 1504 bytes is larger than 1024 bytes [-Wframe-larger-than=] After some investigation, I found that this behavior started with gcc-4.9, and opened https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69702. A suggested workaround for it is to use the -fno-tree-loop-im flag that turns off one of the optimization stages in gcc, so the code runs a little slower but does not use excessive amounts of stack. We could make this conditional on the gcc version, but I could not find an easy way to do this in Kbuild and the benefit would be fairly small, given that most of the gcc version in production are affected now. I'm marking this for 'stable' backports because it addresses a bug with code generation in gcc that exists in all kernel versions with the affected gcc releases. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com> Signed-off-by: Michal Marek <mmarek@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-08mm: thp: avoid false positive VM_BUG_ON_PAGE in page_move_anon_rmap()Kirill A. Shutemov1-0/+2
commit 0798d3c022dc63eb0ec02b511e1f76ca8411ef8e upstream. If page_move_anon_rmap() is refiling a pmd-splitted THP mapped in a tail page from a pte, the "address" must be THP aligned in order for the page->index bugcheck to pass in the CONFIG_DEBUG_VM=y builds. Link: http://lkml.kernel.org/r/1464253620-106404-1-git-send-email-kirill.shutemov@linux.intel.com Fixes: 6d0a07edd17c ("mm: thp: calculate the mapcount correctly for THP pages during WP faults") Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reported-by: Mika Westerberg <mika.westerberg@linux.intel.com> Tested-by: Mika Westerberg <mika.westerberg@linux.intel.com> Reviewed-by: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-08scripts/package/Makefile: rpmbuild add support of RPMOPTSSrinivas Pandruvada1-2/+2
commit 65a9f31c5042e5bb50d30ed8ae374044be561054 upstream. After commit 21a59991ce0c ("scripts/package/Makefile: rpmbuild is needed for rpm targets"), it is no longer possible to specify RPMOPTS. For example, we can no longer able to control _topdir using the following make command. make RPMOPTS="--define '_topdir /home/xyz/workspace/'" binrpm-pkg Fixes: 21a59991ce0c ("scripts/package/Makefile: rpmbuild is needed for rpm targets") Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Michal Marek <mmarek@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-08dma-debug: avoid spinlock recursion when disabling dma-debugVille Syrjälä1-1/+1
commit 3017cd63f26fc655d56875aaf497153ba60e9edf upstream. With netconsole (at least) the pr_err("... disablingn") call can recurse back into the dma-debug code, where it'll try to grab free_entries_lock again. Avoid the problem by doing the printk after dropping the lock. Link: http://lkml.kernel.org/r/1463678421-18683-1-git-send-email-ville.syrjala@linux.intel.com Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-08PM / sleep: Handle failures in device_suspend_late() consistentlyRafael J. Wysocki1-2/+3
commit 3a17fb329da68cb00558721aff876a80bba2fdb9 upstream. Grygorii Strashko reports: The PM runtime will be left disabled for the device if its .suspend_late() callback fails and async suspend is not allowed for this device. In this case device will not be added in dpm_late_early_list and dpm_resume_early() will ignore this device, as result PM runtime will be disabled for it forever (side effect: after 8 subsequent failures for the same device the PM runtime will be reenabled due to disable_depth overflow). To fix this problem, add devices to dpm_late_early_list regardless of whether or not device_suspend_late() returns errors for them. That will ensure failures in there to be handled consistently for all devices regardless of their async suspend/resume status. Reported-by: Grygorii Strashko <grygorii.strashko@ti.com> Tested-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-08nfs: avoid race that crashes nfs_init_commitWeston Andros Adamson2-0/+32
commit ade8febde0271513360bac44883dbebad44276c3 upstream. Since the patch "NFS: Allow multiple commit requests in flight per file" we can run multiple simultaneous commits on the same inode. This introduced a race over collecting pages to commit that made it possible to call nfs_init_commit() with an empty list - which causes crashes like the one below. The fix is to catch this race and avoid calling nfs_init_commit and initiate_commit when there is no work to do. Here is the crash: [600522.076832] BUG: unable to handle kernel NULL pointer dereference at 0000000000000040 [600522.078475] IP: [<ffffffffa0479e72>] nfs_init_commit+0x22/0x130 [nfs] [600522.078745] PGD 4272b1067 PUD 4272cb067 PMD 0 [600522.078972] Oops: 0000 [#1] SMP [600522.079204] Modules linked in: nfsv3 nfs_layout_flexfiles rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache dcdbas ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw vmw_vsock_vmci_transport vsock bonding ipmi_devintf ipmi_msghandler coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel ppdev vmw_balloon parport_pc parport acpi_cpufreq vmw_vmci i2c_piix4 shpchp nfsd auth_rpcgss nfs_acl lockd grace sunrpc xfs libcrc32c vmwgfx drm_kms_helper ttm drm crc32c_intel serio_raw vmxnet3 [600522.081380] vmw_pvscsi ata_generic pata_acpi [600522.081809] CPU: 3 PID: 15667 Comm: /usr/bin/python Not tainted 4.1.9-100.pd.88.el7.x86_64 #1 [600522.082281] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/30/2014 [600522.082814] task: ffff8800bbbfa780 ti: ffff88042ae84000 task.ti: ffff88042ae84000 [600522.083378] RIP: 0010:[<ffffffffa0479e72>] [<ffffffffa0479e72>] nfs_init_commit+0x22/0x130 [nfs] [600522.083973] RSP: 0018:ffff88042ae87438 EFLAGS: 00010246 [600522.084571] RAX: 0000000000000000 RBX: ffff880003485e40 RCX: ffff88042ae87588 [600522.085188] RDX: 0000000000000000 RSI: ffff88042ae874b0 RDI: ffff880003485e40 [600522.085756] RBP: ffff88042ae87448 R08: ffff880003486010 R09: ffff88042ae874b0 [600522.086332] R10: 0000000000000000 R11: 0000000000000005 R12: ffff88042ae872d0 [600522.086905] R13: ffff88042ae874b0 R14: ffff880003485e40 R15: ffff88042704c840 [600522.087484] FS: 00007f4728ff2740(0000) GS:ffff88043fd80000(0000) knlGS:0000000000000000 [600522.088070] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [600522.088663] CR2: 0000000000000040 CR3: 000000042b6aa000 CR4: 00000000001406e0 [600522.089327] Stack: [600522.089926] 0000000000000001 ffff88042ae87588 ffff88042ae874f8 ffffffffa04f09fa [600522.090549] 0000000000017840 0000000000017840 ffff88042ae87588 ffff8803258d9930 [600522.091169] ffff88042ae87578 ffffffffa0563d80 0000000000000000 ffff88042704c840 [600522.091789] Call Trace: [600522.092420] [<ffffffffa04f09fa>] pnfs_generic_commit_pagelist+0x1da/0x320 [nfsv4] [600522.093052] [<ffffffffa0563d80>] ? ff_layout_commit_prepare_v3+0x30/0x30 [nfs_layout_flexfiles] [600522.093696] [<ffffffffa0562645>] ff_layout_commit_pagelist+0x15/0x20 [nfs_layout_flexfiles] [600522.094359] [<ffffffffa047bc78>] nfs_generic_commit_list+0xe8/0x120 [nfs] [600522.095032] [<ffffffffa047bd6a>] nfs_commit_inode+0xba/0x110 [nfs] [600522.095719] [<ffffffffa046ac54>] nfs_release_page+0x44/0xd0 [nfs] [600522.096410] [<ffffffff811a8122>] try_to_release_page+0x32/0x50 [600522.097109] [<ffffffff811bd4f1>] shrink_page_list+0x961/0xb30 [600522.097812] [<ffffffff811bdced>] shrink_inactive_list+0x1cd/0x550 [600522.098530] [<ffffffff811bea65>] shrink_lruvec+0x635/0x840 [600522.099250] [<ffffffff811bed60>] shrink_zone+0xf0/0x2f0 [600522.099974] [<ffffffff811bf312>] do_try_to_free_pages+0x192/0x470 [600522.100709] [<ffffffff811bf6ca>] try_to_free_pages+0xda/0x170 [600522.101464] [<ffffffff811b2198>] __alloc_pages_nodemask+0x588/0x970 [600522.102235] [<ffffffff811fbbd5>] alloc_pages_vma+0xb5/0x230 [600522.103000] [<ffffffff813a1589>] ? cpumask_any_but+0x39/0x50 [600522.103774] [<ffffffff811d6115>] wp_page_copy.isra.55+0x95/0x490 [600522.104558] [<ffffffff810e3438>] ? __wake_up+0x48/0x60 [600522.105357] [<ffffffff811d7d3b>] do_wp_page+0xab/0x4f0 [600522.106137] [<ffffffff810a1bbb>] ? release_task+0x36b/0x470 [600522.106902] [<ffffffff8126dbd7>] ? eventfd_ctx_read+0x67/0x1c0 [600522.107659] [<ffffffff811da2a8>] handle_mm_fault+0xc78/0x1900 [600522.108431] [<ffffffff81067ef1>] __do_page_fault+0x181/0x420 [600522.109173] [<ffffffff811446a6>] ? __audit_syscall_exit+0x1e6/0x280 [600522.109893] [<ffffffff810681c0>] do_page_fault+0x30/0x80 [600522.110594] [<ffffffff81024f36>] ? syscall_trace_leave+0xc6/0x120 [600522.111288] [<ffffffff81790a58>] page_fault+0x28/0x30 [600522.111947] Code: 5d c3 0f 1f 80 00 00 00 00 0f 1f 44 00 00 55 4c 8d 87 d0 01 00 00 48 89 e5 53 48 89 fb 48 83 ec 08 4c 8b 0e 49 8b 41 18 4c 39 ce <48> 8b 40 40 4c 8b 50 30 74 24 48 8b 87 d0 01 00 00 48 8b 7e 08 [600522.113343] RIP [<ffffffffa0479e72>] nfs_init_commit+0x22/0x130 [nfs] [600522.114003] RSP <ffff88042ae87438> [600522.114636] CR2: 0000000000000040 Fixes: af7cf057 (NFS: Allow multiple commit requests in flight per file) Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-08ext4: silence UBSAN in ext4_mb_init()Nicolai Stange1-2/+4
commit 935244cd54b86ca46e69bc6604d2adfb1aec2d42 upstream. Currently, in ext4_mb_init(), there's a loop like the following: do { ... offset += 1 << (sb->s_blocksize_bits - i); i++; } while (i <= sb->s_blocksize_bits + 1); Note that the updated offset is used in the loop's next iteration only. However, at the last iteration, that is at i == sb->s_blocksize_bits + 1, the shift count becomes equal to (unsigned)-1 > 31 (c.f. C99 6.5.7(3)) and UBSAN reports UBSAN: Undefined behaviour in fs/ext4/mballoc.c:2621:15 shift exponent 4294967295 is too large for 32-bit type 'int' [...] Call Trace: [<ffffffff818c4d25>] dump_stack+0xbc/0x117 [<ffffffff818c4c69>] ? _atomic_dec_and_lock+0x169/0x169 [<ffffffff819411ab>] ubsan_epilogue+0xd/0x4e [<ffffffff81941cac>] __ubsan_handle_shift_out_of_bounds+0x1fb/0x254 [<ffffffff81941ab1>] ? __ubsan_handle_load_invalid_value+0x158/0x158 [<ffffffff814b6dc1>] ? kmem_cache_alloc+0x101/0x390 [<ffffffff816fc13b>] ? ext4_mb_init+0x13b/0xfd0 [<ffffffff814293c7>] ? create_cache+0x57/0x1f0 [<ffffffff8142948a>] ? create_cache+0x11a/0x1f0 [<ffffffff821c2168>] ? mutex_lock+0x38/0x60 [<ffffffff821c23ab>] ? mutex_unlock+0x1b/0x50 [<ffffffff814c26ab>] ? put_online_mems+0x5b/0xc0 [<ffffffff81429677>] ? kmem_cache_create+0x117/0x2c0 [<ffffffff816fcc49>] ext4_mb_init+0xc49/0xfd0 [...] Observe that the mentioned shift exponent, 4294967295, equals (unsigned)-1. Unless compilers start to do some fancy transformations (which at least GCC 6.0.0 doesn't currently do), the issue is of cosmetic nature only: the such calculated value of offset is never used again. Silence UBSAN by introducing another variable, offset_incr, holding the next increment to apply to offset and adjust that one by right shifting it by one position per loop iteration. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=114701 Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=112161 Signed-off-by: Nicolai Stange <nicstange@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-08ext4: address UBSAN warning in mb_find_order_for_block()Nicolai Stange1-1/+3
commit b5cb316cdf3a3f5f6125412b0f6065185240cfdc upstream. Currently, in mb_find_order_for_block(), there's a loop like the following: while (order <= e4b->bd_blkbits + 1) { ... bb += 1 << (e4b->bd_blkbits - order); } Note that the updated bb is used in the loop's next iteration only. However, at the last iteration, that is at order == e4b->bd_blkbits + 1, the shift count becomes negative (c.f. C99 6.5.7(3)) and UBSAN reports UBSAN: Undefined behaviour in fs/ext4/mballoc.c:1281:11 shift exponent -1 is negative [...] Call Trace: [<ffffffff818c4d35>] dump_stack+0xbc/0x117 [<ffffffff818c4c79>] ? _atomic_dec_and_lock+0x169/0x169 [<ffffffff819411bb>] ubsan_epilogue+0xd/0x4e [<ffffffff81941cbc>] __ubsan_handle_shift_out_of_bounds+0x1fb/0x254 [<ffffffff81941ac1>] ? __ubsan_handle_load_invalid_value+0x158/0x158 [<ffffffff816e93a0>] ? ext4_mb_generate_from_pa+0x590/0x590 [<ffffffff816502c8>] ? ext4_read_block_bitmap_nowait+0x598/0xe80 [<ffffffff816e7b7e>] mb_find_order_for_block+0x1ce/0x240 [...] Unless compilers start to do some fancy transformations (which at least GCC 6.0.0 doesn't currently do), the issue is of cosmetic nature only: the such calculated value of bb is never used again. Silence UBSAN by introducing another variable, bb_incr, holding the next increment to apply to bb and adjust that one by right shifting it by one position per loop iteration. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=114701 Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=112161 Signed-off-by: Nicolai Stange <nicstange@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-08ext4: fix oops on corrupted filesystemJan Kara1-1/+1
commit 74177f55b70e2f2be770dd28684dd6d17106a4ba upstream. When filesystem is corrupted in the right way, it can happen ext4_mark_iloc_dirty() in ext4_orphan_add() returns error and we subsequently remove inode from the in-memory orphan list. However this deletion is done with list_del(&EXT4_I(inode)->i_orphan) and thus we leave i_orphan list_head with a stale content. Later we can look at this content causing list corruption, oops, or other issues. The reported trace looked like: WARNING: CPU: 0 PID: 46 at lib/list_debug.c:53 __list_del_entry+0x6b/0x100() list_del corruption, 0000000061c1d6e0->next is LIST_POISON1 0000000000100100) CPU: 0 PID: 46 Comm: ext4.exe Not tainted 4.1.0-rc4+ #250 Stack: 60462947 62219960 602ede24 62219960 602ede24 603ca293 622198f0 602f02eb 62219950 6002c12c 62219900 601b4d6b Call Trace: [<6005769c>] ? vprintk_emit+0x2dc/0x5c0 [<602ede24>] ? printk+0x0/0x94 [<600190bc>] show_stack+0xdc/0x1a0 [<602ede24>] ? printk+0x0/0x94 [<602ede24>] ? printk+0x0/0x94 [<602f02eb>] dump_stack+0x2a/0x2c [<6002c12c>] warn_slowpath_common+0x9c/0xf0 [<601b4d6b>] ? __list_del_entry+0x6b/0x100 [<6002c254>] warn_slowpath_fmt+0x94/0xa0 [<602f4d09>] ? __mutex_lock_slowpath+0x239/0x3a0 [<6002c1c0>] ? warn_slowpath_fmt+0x0/0xa0 [<60023ebf>] ? set_signals+0x3f/0x50 [<600a205a>] ? kmem_cache_free+0x10a/0x180 [<602f4e88>] ? mutex_lock+0x18/0x30 [<601b4d6b>] __list_del_entry+0x6b/0x100 [<601177ec>] ext4_orphan_del+0x22c/0x2f0 [<6012f27c>] ? __ext4_journal_start_sb+0x2c/0xa0 [<6010b973>] ? ext4_truncate+0x383/0x390 [<6010bc8b>] ext4_write_begin+0x30b/0x4b0 [<6001bb50>] ? copy_from_user+0x0/0xb0 [<601aa840>] ? iov_iter_fault_in_readable+0xa0/0xc0 [<60072c4f>] generic_perform_write+0xaf/0x1e0 [<600c4166>] ? file_update_time+0x46/0x110 [<60072f0f>] __generic_file_write_iter+0x18f/0x1b0 [<6010030f>] ext4_file_write_iter+0x15f/0x470 [<60094e10>] ? unlink_file_vma+0x0/0x70 [<6009b180>] ? unlink_anon_vmas+0x0/0x260 [<6008f169>] ? free_pgtables+0xb9/0x100 [<600a6030>] __vfs_write+0xb0/0x130 [<600a61d5>] vfs_write+0xa5/0x170 [<600a63d6>] SyS_write+0x56/0xe0 [<6029fcb0>] ? __libc_waitpid+0x0/0xa0 [<6001b698>] handle_syscall+0x68/0x90 [<6002633d>] userspace+0x4fd/0x600 [<6002274f>] ? save_registers+0x1f/0x40 [<60028bd7>] ? arch_prctl+0x177/0x1b0 [<60017bd5>] fork_handler+0x85/0x90 Fix the problem by using list_del_init() as we always should with i_orphan list. Reported-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-08ext4: fix check of dqget() return value in ext4_ioctl_setproject()Seth Forshee1-1/+1
commit ff0bc08454917964291f72ee5b8eca66de4bc250 upstream. A failed call to dqget() returns an ERR_PTR() and not null. Fix the check in ext4_ioctl_setproject() to handle this correctly. Fixes: 9b7365fc1c82 ("ext4: add FS_IOC_FSSETXATTR/FS_IOC_FSGETXATTR interface support") Signed-off-by: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-08ext4: clean up error handling when orphan list is corruptedTheodore Ts'o1-27/+22
commit 7827a7f6ebfcb7f388dc47fddd48567a314701ba upstream. Instead of just printing warning messages, if the orphan list is corrupted, declare the file system is corrupted. If there are any reserved inodes in the orphaned inode list, declare the file system corrupted and stop right away to avoid doing more potential damage to the file system. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-08ext4: fix hang when processing corrupted orphaned inode listTheodore Ts'o1-4/+6
commit c9eb13a9105e2e418f72e46a2b6da3f49e696902 upstream. If the orphaned inode list contains inode #5, ext4_iget() returns a bad inode (since the bootloader inode should never be referenced directly). Because of the bad inode, we end up processing the inode repeatedly and this hangs the machine. This can be reproduced via: mke2fs -t ext4 /tmp/foo.img 100 debugfs -w -R "ssv last_orphan 5" /tmp/foo.img mount -o loop /tmp/foo.img /mnt (But don't do this if you are using an unpatched kernel if you care about the system staying functional. :-) This bug was found by the port of American Fuzzy Lop into the kernel to find file system problems[1]. (Since it *only* happens if inode #5 shows up on the orphan list --- 3, 7, 8, etc. won't do it, it's not surprising that AFL needed two hours before it found it.) [1] http://events.linuxfoundation.org/sites/events/files/slides/AFL%20filesystem%20fuzzing%2C%20Vault%202016_0.pdf Reported by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-08ext4: fix data exposure after a crashJan Kara1-9/+15
commit 06bd3c36a733ac27962fea7d6f47168841376824 upstream. Huang has reported that in his powerfail testing he is seeing stale block contents in some of recently allocated blocks although he mounts ext4 in data=ordered mode. After some investigation I have found out that indeed when delayed allocation is used, we don't add inode to transaction's list of inodes needing flushing before commit. Originally we were doing that but commit f3b59291a69d removed the logic with a flawed argument that it is not needed. The problem is that although for delayed allocated blocks we write their contents immediately after allocating them, there is no guarantee that the IO scheduler or device doesn't reorder things and thus transaction allocating blocks and attaching them to inode can reach stable storage before actual block contents. Actually whenever we attach freshly allocated blocks to inode using a written extent, we should add inode to transaction's ordered inode list to make sure we properly wait for block contents to be written before committing the transaction. So that is what we do in this patch. This also handles other cases where stale data exposure was possible - like filling hole via mmap in data=ordered,nodelalloc mode. The only exception to the above rule are extending direct IO writes where blkdev_direct_IO() waits for IO to complete before increasing i_size and thus stale data exposure is not possible. For now we don't complicate the code with optimizing this special case since the overhead is pretty low. In case this is observed to be a performance problem we can always handle it using a special flag to ext4_map_blocks(). Fixes: f3b59291a69d0b734be1fc8be489fef2dd846d3d Reported-by: "HUANG Weller (CM/ESW12-CN)" <Weller.Huang@cn.bosch.com> Tested-by: "HUANG Weller (CM/ESW12-CN)" <Weller.Huang@cn.bosch.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-08drm/i915: Pass the correct crtc state to .update_plane()Ville Syrjälä1-3/+1
commit 9f6151c9039084e18c118831779a99ac8f29e39c upstream. Pass the current crtc state, not the old crtc state, to the .update_plane() hook. Noticed on BSW when PRIMSIZE was getting programmed to a stale value which produced utter garbage on screen eg. wwhen going from 1920x1080 to 1024x768. Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Fixes: a758e6845825 ("drm/i915: Do not use commit_plane for sprite planes.") Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1457543247-13987-3-git-send-email-ville.syrjala@linux.intel.com Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-08drm/imx: Match imx-ipuv3-crtc components using device node in platform dataPhilipp Zabel4-3/+14
commit 310944d148e3600dcff8b346bee7fa01d34903b1 upstream. The component master driver imx-drm-core matches component devices using their of_node. Since commit 950b410dd1ab ("gpu: ipu-v3: Fix imx-ipuv3-crtc module autoloading"), the imx-ipuv3-crtc dev->of_node is not set during probing. Before that, of_node was set and caused an of: modalias to be used instead of the platform: modalias, which broke module autoloading. On the other hand, if dev->of_node is not set yet when the imx-ipuv3-crtc probe function calls component_add, component matching in imx-drm-core fails. While dev->of_node will be set once the next component tries to bring up the component master, imx-drm-core component binding will never succeed if one of the crtc devices is probed last. Add of_node to the component platform data and match against the pdata->of_node instead of dev->of_node in imx-drm-core to work around this problem. Fixes: 950b410dd1ab ("gpu: ipu-v3: Fix imx-ipuv3-crtc module autoloading") Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de> Tested-by: Fabio Estevam <fabio.estevam@nxp.com> Tested-by: Lothar Waßmann <LW@KARO-electronics.de> Tested-by: Heiko Schocher <hs@denx.de> Tested-by: Chris Ruehl <chris.ruehl@gtsys.com.hk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-08drm: Add helper for DP++ adaptorsVille Syrjälä4-1/+465
commit b3daa5ef52c26acd7432c787989bd92d48070c76 upstream. Add a helper which aids in the identification of DP dual mode (aka. DP++) adaptors. There are several types of adaptors specified: type 1 DVI, type 1 HDMI, type 2 DVI, type 2 HDMI Type 1 adaptors have a max TMDS clock limit of 165MHz, type 2 adaptors may go as high as 300MHz and they provide a register informing the source device what the actual limit is. Supposedly also type 1 adaptors may optionally implement this register. This TMDS clock limit is the main reason why we need to identify these adaptors. Type 1 adaptors provide access to their internal registers and the sink DDC bus through I2C. Type 2 adaptors provide this access both via I2C and I2C-over-AUX. A type 2 source device may choose to implement either of these methods. If a source device implements the I2C-over-AUX method, then the driver will obviously need specific support for such adaptors since the port is driven like an HDMI port, but DDC communication happes over the AUX channel. This helper should be enough to identify the adaptor type (some type 1 DVI adaptors may be a slight exception) and the maximum TMDS clock limit. Another feature that may be available is control over the TMDS output buffers on the adaptor, possibly allowing for some power saving when the TMDS link is down. Other user controllable features that may be available in the adaptors are downstream i2c bus speed control when using i2c-over-aux, and some control over the CEC pin. I chose not to provide any helper functions for those since I have no use for them in i915 at this time. The rest of the registers in the adaptor are mostly just information, eg. IEEE OUI, hardware and firmware revision, etc. v2: Pass adaptor type to helper functions to ease driver implementation Fix a bunch of typoes (Paulo) Add DRM_DP_DUAL_MODE_UNKNOWN for the case where we don't (yet) know the type (Paulo) Reject 0x00 and 0xff DP_DUAL_MODE_MAX_TMDS_CLOCK values (Paulo) Adjust drm_dp_dual_mode_detect() type2 vs. type1 detection to ease future LSPCON enabling Remove the unused DP_DUAL_MODE_LAST_RESERVED define v3: Fix kernel doc function argument descriptions (Jani) s/NONE/UNKNOWN/ in drm_dp_dual_mode_detect() docs Add kernel doc for enum drm_dp_dual_mode_type Actually build the docs Fix more typoes v4: Adjust code indentation of type2 adaptor detection (Shashank) Add debug messages for failurs cases (Shashank) v5: EXPORT_SYMBOL(drm_dp_dual_mode_read) (Paulo) Cc: Tore Anderson <tore@fud.no> Cc: Paulo Zanoni <paulo.r.zanoni@intel.com> Cc: Shashank Sharma <shashank.sharma@intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Shashank Sharma <shashank.sharma@intel.com> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Shashank Sharma <shashank.sharma@intel.com> (v4) Link: http://patchwork.freedesktop.org/patch/msgid/1462542412-25533-1-git-send-email-ville.syrjala@linux.intel.com (cherry picked from commit ede53344dbfd1dd43bfd73eb6af743d37c56a7c3) Signed-off-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-08drm/i915: Fix watermarks for VLV/CHVVille Syrjälä3-10/+20
commit caed361d83b204b7766924b80463bf7502ee7986 upstream. commit 92826fcdfc14 ("drm/i915: Calculate watermark related members in the crtc_state, v4.") broke thigns by removing the pre vs. post wm update distinction. We also lost the pre plane wm update entirely for VLV/CHV from the crtc enable path. This caused underruns on modeset and plane enable/disable on CHV, and often those can lead to a dead pipe. So let's bring back the pre vs. post thing, and let's toss in an explicit wm update to valleyview_crtc_enable() to avoid having to put it into the common code. This is more or less a partial revert of the offending commit. Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: drm-intel-fixes@lists.freedesktop.org Fixes: 92826fcdfc14 ("drm/i915: Calculate watermark related members in the crtc_state, v4.") Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1457543247-13987-4-git-send-email-ville.syrjala@linux.intel.com Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-08drm/i915: Don't leave old junk in ilk active watermarks on readoutVille Syrjälä1-0/+2
commit 7045c3689f148a0c95f42bae8ef3eb2829ac7de9 upstream. When we read out the watermark state from the hardware we're supposed to transfer that into the active watermarks, but currently we fail to any part of the active watermarks that isn't explicitly written. Let's clear it all upfront. Looks like this has been like this since the beginning, when I added the readout. No idea why I didn't clear it up. Cc: Matt Roper <matthew.d.roper@intel.com> Fixes: 243e6a44b9ca ("drm/i915: Init HSW watermark tracking in intel_modeset_setup_hw_state()") Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1463151318-14719-2-git-send-email-ville.syrjala@linux.intel.com (cherry picked from commit 15606534bf0a65d8a74a90fd57b8712d147dbca6) Signed-off-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-08drm/i915: Enable/disable TMDS output buffers in DP++ adaptor as neededVille Syrjälä3-0/+33
commit 0c2fb7c6c8fef2092051192095a7788231ba9a42 upstream. To save a bit of power, let's try to turn off the TMDS output buffers in DP++ adaptors when we're not driving the port. v2: Let's not forget DDI, toss in a debug message while at it v3: Just do the TMDS output control based on adaptor type. With the helper getting passed the type, we wouldn't actually have to check at all in the driver, but the check eliminates the debug output more honest Cc: Tore Anderson <tore@fud.no> Cc: Paulo Zanoni <paulo.r.zanoni@intel.com> Cc: Shashank Sharma <shashank.sharma@intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1462216105-20881-4-git-send-email-ville.syrjala@linux.intel.com Reviewed-by: Shashank Sharma <shashank.sharma@intel.com> (cherry picked from commit b2ccb822d376d1bbbe5d1f9118d1488b25e6bc6d) Signed-off-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-08drm/i915: Respect DP++ adaptor TMDS clock limitVille Syrjälä2-8/+55
commit c578d15226c99f0566d5d022f81af6b7d69928db upstream. Try to detect the max TMDS clock limit for the DP++ adaptor (if any) and take it into account when checking the port clock. Note that as with the sink (HDMI vs. DVI) TMDS clock limit we'll ignore the adaptor TMDS clock limit in the modeset path, in case users are already "overclocking" their TMDS links. One subtle change here is that we'll have to respect the adaptor TMDS clock limit when we decide whether to do 12bpc or 8bpc, otherwise we might end up picking 12bpc and accidentally driving the TMDS link out of spec even when the user chose a mode that fits wihting the limits at 8bpc. This means you can't "overclock" your DP++ dongle at 12bpc anymore, but you can continue to do so at 8bpc. Note that for simplicity we'll use the I2C access method for all dual mode adaptors including type 2. Otherwise we'd have to start mixing DP AUX and HDMI together. In the future we may need to do that if we come across any board designs that don't hook up the DDC pins to the DP++ connectors. Such boards would obviously only work with type 2 dual mode adaptors, and not type 1. v2: Store adaptor type under indel_hdmi->dp_dual_mode Deal with DRM_DP_DUAL_MODE_UNKNOWN Pass adaptor type to drm_dp_dual_mode_max_tmds_clock(), and use it for type1 adaptors as well Reported-by: Tore Anderson <tore@fud.no> Fixes: 7a0baa623446 ("Revert "drm/i915: Disable 12bpc hdmi for now"") Cc: Paulo Zanoni <paulo.r.zanoni@intel.com> Cc: Shashank Sharma <shashank.sharma@intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1462216105-20881-3-git-send-email-ville.syrjala@linux.intel.com Reviewed-by: Shashank Sharma <shashank.sharma@intel.com> (cherry picked from commit b1ba124d8e95cca48d33502a4a76b1ed09d213ce) Signed-off-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-08drm/i915/psr: Try to program link training times correctlyDaniel Vetter1-8/+47
commit 03b7b5f983091bca17e9c163832fcde56971d7d1 upstream. The default of 0 is 500us of link training, but that's not enough for some platforms. Decoding this correctly means we're using 2.5ms of link training on these platforms, which fixes flickering issues associated with enabling PSR. v2: Unbotch the math a bit. v3: Drop debug hunk. v4: Improve commit message. Tested-by: Lyude <cpaul@redhat.com> Cc: Lyude <cpaul@redhat.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95176 Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Sonika Jindal <sonika.jindal@intel.com> Cc: Durgadoss R <durgadoss.r@intel.com> Cc: "Pandiyan, Dhinakaran" <dhinakaran.pandiyan@intel.com> Tested-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Tested-by: fritsch@kodi.tv Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1463590036-17824-2-git-send-email-daniel.vetter@ffwll.ch (cherry picked from commit 50db139018f9c94376d5f4db94a3bae65fdfac14) Signed-off-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-08Bluetooth: 6lowpan: Fix memory corruption of ipv6 destination addressGlenn Ruben Bakke1-7/+4
commit 55441070ca1cbd47ce1ad2959bbf4b47aed9b83b upstream. The memcpy of ipv6 header destination address to the skb control block (sbk->cb) in header_create() results in currupted memory when bt_xmit() is issued. The skb->cb is "released" in the return of header_create() making room for lower layer to minipulate the skb->cb. The value retrieved in bt_xmit is not persistent across header creation and sending, and the lower layer will overwrite portions of skb->cb, making the copied destination address wrong. The memory corruption will lead to non-working multicast as the first 4 bytes of the copied destination address is replaced by a value that resolves into a non-multicast prefix. This fix removes the dependency on the skb control block between header creation and send, by moving the destination address memcpy to the send function path (setup_create, which is called from bt_xmit). Signed-off-by: Glenn Ruben Bakke <glenn.ruben.bakke@nordicsemi.no> Acked-by: Jukka Rissanen <jukka.rissanen@linux.intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-08drm/atomic: Verify connector->funcs != NULL when clearing statesLyude1-1/+1
Unfortunately since we don't have Dave's connector refcounting patch here yet, it's very possible that drm_atomic_state_default_clear() could get called by intel_display_resume() when intel_dp_mst_destroy_connector() isn't completely finished destroying an mst connector, but has already finished setting connector->funcs to NULL. As such, we need to treat the connector like it's already been destroyed and just skip it, otherwise we'll end up dereferencing a NULL pointer. This fix is only required for 4.6 and below. David Airlie's patchseries for 4.7 to add connector reference counting provides a more proper fix for this. Changes since v1: - Fix leftover whitespace Upstream fix: 0552f7651bc2 ("drm/i915/mst: use reference counted connectors. (v3)") Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Lyude <cpaul@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-08drm/i915: Discard previous atomic state on resume if connectors changeLyude1-0/+12
If an MST device is disconnected while the machine is suspended, the number of connectors will change as well after we call intel_dp_mst_resume(). This means that any previous atomic state we had before suspending is no longer valid, since it'll still be pointing to missing connectors. We need to check for this before committing the state, otherwise we'll kernel panic on resume whenever if any MST display was disconnected before we started resuming: BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 IP: [<ffffffffa01588ef>] drm_atomic_helper_check_modeset+0x29f/0xb40 [drm_kms_helper] Call Trace: [<ffffffffa02354f4>] intel_atomic_check+0x34/0x1180 [i915] [<ffffffff810e6c3f>] ? mark_held_locks+0x6f/0xa0 [<ffffffff810e6d99>] ? trace_hardirqs_on_caller+0x129/0x1b0 [<ffffffffa00ff1d2>] drm_atomic_check_only+0x192/0x620 [drm] [<ffffffff813ee001>] ? pci_pm_thaw+0x21/0x90 [<ffffffffa00ff677>] drm_atomic_commit+0x17/0x60 [drm] [<ffffffffa023e0ad>] intel_display_resume+0xbd/0x160 [i915] [<ffffffff813ee070>] ? pci_pm_thaw+0x90/0x90 [<ffffffffa01b60d8>] i915_drm_resume+0xd8/0x160 [i915] [<ffffffffa01b6185>] i915_pm_resume+0x25/0x30 [i915] [<ffffffff813ee0d4>] pci_pm_resume+0x64/0xa0 [<ffffffff814d9ea0>] dpm_run_callback+0x90/0x190 [<ffffffff814da455>] device_resume+0xd5/0x1f0 [<ffffffff814da58d>] async_resume+0x1d/0x50 [<ffffffff810b6718>] async_run_entry_fn+0x48/0x150 [<ffffffff810acc19>] process_one_work+0x1e9/0x5c0 [<ffffffff810acb96>] ? process_one_work+0x166/0x5c0 [<ffffffff810ad038>] worker_thread+0x48/0x4e0 [<ffffffff810acff0>] ? process_one_work+0x5c0/0x5c0 [<ffffffff810b3794>] kthread+0xe4/0x100 [<ffffffff81742672>] ret_from_fork+0x22/0x50 [<ffffffff810b36b0>] ? kthread_create_on_node+0x200/0x200 Changes since v1: - Move drm_atomic_state_free() call down so we're holding the appropriate locks when destroying the atomic state Changes since v2: - Check that state != NULL before we start accessing it's members This fix is only required for 4.6 and below. David Airlie's patchseries for 4.7 to add connector reference counting provides a more proper fix for this. Upstream fix: 0552f7651bc2 ("drm/i915/mst: use reference counted connectors. (v3)") Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Lyude <cpaul@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-08drm/fb_helper: Fix references to dev->mode_config.num_connectorLyude1-3/+2
commit 255f0e7c418ad95a4baeda017ae6182ba9b3c423 upstream. During boot, MST hotplugs are generally expected (even if no physical hotplugging occurs) and result in DRM's connector topology changing. This means that using num_connector from the current mode configuration can lead to the number of connectors changing under us. This can lead to some nasty scenarios in fbcon: - We allocate an array to the size of dev->mode_config.num_connectors. - MST hotplug occurs, dev->mode_config.num_connectors gets incremented. - We try to loop through each element in the array using the new value of dev->mode_config.num_connectors, and end up going out of bounds since dev->mode_config.num_connectors is now larger then the array we allocated. fb_helper->connector_count however, will always remain consistent while we do a modeset in fb_helper. Note: This is just polish for 4.7, Dave Airlie's drm_connector refcounting fixed these bugs for real. But it's good enough duct-tape for stable kernel backporting, since backporting the refcounting changes is way too invasive. Signed-off-by: Lyude <cpaul@redhat.com> [danvet: Clarify why we need this. Also remove the now unused "dev" local variable to appease gcc.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/1463065021-18280-3-git-send-email-cpaul@redhat.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-08drm/i915/fbdev: Fix num_connector references in intel_fb_initial_config()Lyude1-3/+3
commit 14a3842a1d5945067d1dd0788f314e14d5b18e5b upstream. During boot time, MST devices usually send a ton of hotplug events irregardless of whether or not any physical hotplugs actually occurred. Hotplugs mean connectors being created/destroyed, and the number of DRM connectors changing under us. This isn't a problem if we use fb_helper->connector_count since we only set it once in the code, however if we use num_connector from struct drm_mode_config we risk it's value changing under us. On top of that, there's even a chance that dev->mode_config.num_connector != fb_helper->connector_count. If the number of connectors happens to increase under us, we'll end up using the wrong array size for memcpy and start writing beyond the actual length of the array, occasionally resulting in kernel panics. Note: This is just polish for 4.7, Dave Airlie's drm_connector refcounting fixed these bugs for real. But it's good enough duct-tape for stable kernel backporting, since backporting the refcounting changes is way too invasive. Signed-off-by: Lyude <cpaul@redhat.com> [danvet: Clarify why we need this.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/1463065021-18280-2-git-send-email-cpaul@redhat.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-08drm/amdgpu: Fix hdmi deep color support.Mario Kleiner1-5/+5
commit 9d746ab68163d642dae13756b2b3145b2e38cb65 upstream. When porting the hdmi deep color detection code from radeon-kms to amdgpu-kms apparently some kind of copy and paste error happened, attaching an else branch to the wrong if statement. The result is that hdmi deep color mode is always disabled, regardless of gpu and display capabilities and user wishes, as the code mistakenly thinks that the display doesn't provide the required max_tmds_clock limit and falls back to 8 bpc. This patch fixes deep color support, as tested on a R9 380 Tonga Pro + suitable display, and should be backported to all kernels with amdgpu-kms support. Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-08drm/amdgpu: use drm_mode_vrefresh() rather than mode->vrefreshAlex Deucher1-1/+1
commit 6b8812eb004ee2b24aac8b1a711a0e8e797df3ce upstream. This is a port of radeon commit: 3d2d98ee1af0cf6eebfbd6bff4c17d3601ac1284 drm/radeon: use drm_mode_vrefresh() rather than mode->vrefresh to amdgpu. Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-08drm/vmwgfx: Kill some lockdep warningsThomas Hellstrom5-14/+21
commit 93cd16817ae5ddcfc548784b51c76bf6d7923442 upstream. Some global KMS state that is elsewhere protected by the mode_config mutex here needs to be protected with a local mutex. Remove corresponding lockdep checks and introduce a new driver-private global_kms_state_mutex, and make sure its locking order is *after* the crtc locks in order to avoid having to release those when the new mutex is taken. Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Sinclair Yeh <syeh@vmware.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-08drm/gma500: Fix possible out of bounds readItai Handler1-1/+1
commit 7ccca1d5bf69fdd1d3c5fcf84faf1659a6e0ad11 upstream. Fix possible out of bounds read, by adding missing comma. The code may read pass the end of the dsi_errors array when the most significant bit (bit #31) in the intr_stat register is set. This bug has been detected using CppCheck (static analysis tool). Signed-off-by: Itai Handler <itai_handler@hotmail.com> Signed-off-by: Patrik Jakobsson <patrik.r.jakobsson@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-08sunrpc: fix stripping of padded MIC tokensTomáš Trnka1-2/+2
commit c0cb8bf3a8e4bd82e640862cdd8891400405cb89 upstream. The length of the GSS MIC token need not be a multiple of four bytes. It is then padded by XDR to a multiple of 4 B, but unwrap_integ_data() would previously only trim mic.len + 4 B. The remaining up to three bytes would then trigger a check in nfs4svc_decode_compoundargs(), leading to a "garbage args" error and mount failure: nfs4svc_decode_compoundargs: compound not properly padded! nfsd: failed to decode arguments! This would prevent older clients using the pre-RFC 4121 MIC format (37-byte MIC including a 9-byte OID) from mounting exports from v3.9+ servers using krb5i. The trimming was introduced by commit 4c190e2f913f ("sunrpc: trim off trailing checksum before returning decrypted or integrity authenticated buffer"). Fixes: 4c190e2f913f "unrpc: trim off trailing checksum..." Signed-off-by: Tomáš Trnka <ttrnka@mail.muni.cz> Acked-by: Jeff Layton <jlayton@poochiereds.net> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-08xen: use same main loop for counting and remapping pagesJuergen Gross1-39/+26
commit dd14be92fbf5bc1ef7343f34968440e44e21b46a upstream. Instead of having two functions for cycling through the E820 map in order to count to be remapped pages and remap them later, just use one function with a caller supplied sub-function called for each region to be processed. This eliminates the possibility of a mismatch between both loops which showed up in certain configurations. Suggested-by: Ed Swierk <eswierk@skyportsystems.com> Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-08xen/events: Don't move disabled irqsRoss Lagerwall1-2/+4
commit f0f393877c71ad227d36705d61d1e4062bc29cf5 upstream. Commit ff1e22e7a638 ("xen/events: Mask a moving irq") open-coded irq_move_irq() but left out checking if the IRQ is disabled. This broke resuming from suspend since it tries to move a (disabled) irq without holding the IRQ's desc->lock. Fix it by adding in a check for disabled IRQs. The resulting stacktrace was: kernel BUG at /build/linux-UbQGH5/linux-4.4.0/kernel/irq/migration.c:31! invalid opcode: 0000 [#1] SMP Modules linked in: xenfs xen_privcmd ... CPU: 0 PID: 9 Comm: migration/0 Not tainted 4.4.0-22-generic #39-Ubuntu Hardware name: Xen HVM domU, BIOS 4.6.1-xs125180 05/04/2016 task: ffff88003d75ee00 ti: ffff88003d7bc000 task.ti: ffff88003d7bc000 RIP: 0010:[<ffffffff810e26e2>] [<ffffffff810e26e2>] irq_move_masked_irq+0xd2/0xe0 RSP: 0018:ffff88003d7bfc50 EFLAGS: 00010046 RAX: 0000000000000000 RBX: ffff88003d40ba00 RCX: 0000000000000001 RDX: 0000000000000001 RSI: 0000000000000100 RDI: ffff88003d40bad8 RBP: ffff88003d7bfc68 R08: 0000000000000000 R09: ffff88003d000000 R10: 0000000000000000 R11: 000000000000023c R12: ffff88003d40bad0 R13: ffffffff81f3a4a0 R14: 0000000000000010 R15: 00000000ffffffff FS: 0000000000000000(0000) GS:ffff88003da00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fd4264de624 CR3: 0000000037922000 CR4: 00000000003406f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Stack: ffff88003d40ba38 0000000000000024 0000000000000000 ffff88003d7bfca0 ffffffff814c8d92 00000010813ef89d 00000000805ea732 0000000000000009 0000000000000024 ffff88003cc39b80 ffff88003d7bfce0 ffffffff814c8f66 Call Trace: [<ffffffff814c8d92>] eoi_pirq+0xb2/0xf0 [<ffffffff814c8f66>] __startup_pirq+0xe6/0x150 [<ffffffff814ca659>] xen_irq_resume+0x319/0x360 [<ffffffff814c7e75>] xen_suspend+0xb5/0x180 [<ffffffff81120155>] multi_cpu_stop+0xb5/0xe0 [<ffffffff811200a0>] ? cpu_stop_queue_work+0x80/0x80 [<ffffffff811203d0>] cpu_stopper_thread+0xb0/0x140 [<ffffffff810a94e6>] ? finish_task_switch+0x76/0x220 [<ffffffff810ca731>] ? __raw_callee_save___pv_queued_spin_unlock+0x11/0x20 [<ffffffff810a3935>] smpboot_thread_fn+0x105/0x160 [<ffffffff810a3830>] ? sort_range+0x30/0x30 [<ffffffff810a0588>] kthread+0xd8/0xf0 [<ffffffff810a04b0>] ? kthread_create_on_node+0x1e0/0x1e0 [<ffffffff8182568f>] ret_from_fork+0x3f/0x70 [<ffffffff810a04b0>] ? kthread_create_on_node+0x1e0/0x1e0 Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-08powerpc/eeh: Restore initial state in eeh_pe_reset_and_recover()Gavin Shan1-0/+23
commit 5a0cdbfd17b90a89c64a71d8aec9773ecdb20d0d upstream. The function eeh_pe_reset_and_recover() is used to recover EEH error when the passthrou device are transferred to guest and backwards. The content in the device's config space will be lost on PE reset issued in the middle of the recovery. The function saves/restores it before/after the reset. However, config access to some adapters like Broadcom BCM5719 at this point will causes fenced PHB. The config space is always blocked and we save 0xFF's that are restored at late point. The memory BARs are totally corrupted, causing another EEH error upon access to one of the memory BARs. This restores the config space on those adapters like BCM5719 from the content saved to the EEH device when it's populated, to resolve above issue. Fixes: 5cfb20b9 ("powerpc/eeh: Emulate EEH recovery for VFIO devices") Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-08Revert "powerpc/eeh: Fix crash in eeh_add_device_early() on Cell"Guilherme G. Piccoli1-1/+1
commit c2078d9ef600bdbe568c89e5ddc2c6f15b7982c8 upstream. This reverts commit 89a51df5ab1d38b257300b8ac940bbac3bb0eb9b. The function eeh_add_device_early() is used to perform EEH initialization in devices added later on the system, like in hotplug/DLPAR scenarios. Since the commit 89a51df5ab1d ("powerpc/eeh: Fix crash in eeh_add_device_early() on Cell") a new check was introduced in this function - Cell has no EEH capabilities which led to kernel oops if hotplug was performed, so checking for eeh_enabled() was introduced to avoid the issue. However, in architectures that EEH is present like pSeries or PowerNV, we might reach a case in which no PCI devices are present on boot time and so EEH is not initialized. Then, if a device is added via DLPAR for example, eeh_add_device_early() fails because eeh_enabled() is false, and EEH end up not being enabled at all. This reverts the aforementioned patch since a new verification was introduced by the commit d91dafc02f42 ("powerpc/eeh: Delay probing EEH device during hotplug") and so the original Cell issue does not happen anymore. Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-08powerpc/eeh: Don't report error in eeh_pe_reset_and_recover()Gavin Shan1-3/+0
commit affeb0f2d3a9af419ad7ef4ac782e1540b2f7b28 upstream. The function eeh_pe_reset_and_recover() is used to recover EEH error when the passthrough device are transferred to guest and backwards, meaning the device's driver is vfio-pci or none. When the driver is vfio-pci that provides error_detected() error handler only, the handler simply stops the guest and it's not expected behaviour. On the other hand, no error handlers will be called if we don't have a bound driver. This ignores the error handler in eeh_pe_reset_and_recover() that reports the error to device driver to avoid the exceptional behaviour. Fixes: 5cfb20b9 ("powerpc/eeh: Emulate EEH recovery for VFIO devices") Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-08powerpc/book3s64: Fix branching to OOL handlers in relocatable kernelHari Bathini1-5/+11
commit 8ed8ab40047a570fdd8043a40c104a57248dd3fd upstream. Some of the interrupt vectors on 64-bit POWER server processors are only 32 bytes long (8 instructions), which is not enough for the full first-level interrupt handler. For these we need to branch to an out-of-line (OOL) handler. But when we are running a relocatable kernel, interrupt vectors till __end_interrupts marker are copied down to real address 0x100. So, branching to labels (ie. OOL handlers) outside this section must be handled differently (see LOAD_HANDLER()), considering relocatable kernel, which would need at least 4 instructions. However, branching from interrupt vector means that we corrupt the CFAR (come-from address register) on POWER7 and later processors as mentioned in commit 1707dd16. So, EXCEPTION_PROLOG_0 (6 instructions) that contains the part up to the point where the CFAR is saved in the PACA should be part of the short interrupt vectors before we branch out to OOL handlers. But as mentioned already, there are interrupt vectors on 64-bit POWER server processors that are only 32 bytes long (like vectors 0x4f00, 0x4f20, etc.), which cannot accomodate the above two cases at the same time owing to space constraint. Currently, in these interrupt vectors, we simply branch out to OOL handlers, without using LOAD_HANDLER(), which leaves us vulnerable when running a relocatable kernel (eg. kdump case). While this has been the case for sometime now and kdump is used widely, we were fortunate not to see any problems so far, for three reasons: 1. In almost all cases, production kernel (relocatable) is used for kdump as well, which would mean that crashed kernel's OOL handler would be at the same place where we end up branching to, from short interrupt vector of kdump kernel. 2. Also, OOL handler was unlikely the reason for crash in almost all the kdump scenarios, which meant we had a sane OOL handler from crashed kernel that we branched to. 3. On most 64-bit POWER server processors, page size is large enough that marking interrupt vector code as executable (see commit 429d2e83) leads to marking OOL handler code from crashed kernel, that sits right below interrupt vector code from kdump kernel, as executable as well. Let us fix this by moving the __end_interrupts marker down past OOL handlers to make sure that we also copy OOL handlers to real address 0x100 when running a relocatable kernel. This fix has been tested successfully in kdump scenario, on an LPAR with 4K page size by using different default/production kernel and kdump kernel. Also tested by manually corrupting the OOL handlers in the first kernel and then kdump'ing, and then causing the OOL handlers to fire - mpe. Fixes: c1fb6816fb1b ("powerpc: Add relocation on exception vector handlers") Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com> Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-08batman-adv: Fix double neigh_node_put in batadv_v_ogm_route_updateSven Eckelmann1-1/+3
The router is put down twice when it was non-NULL and either orig_ifinfo is NULL afterwards or batman-adv receives a packet with the same sequence number. This will end up in a use-after-free when the batadv_neigh_node is removed because the reference counter ended up too early at 0. This patch is skipping netdev and is being sent directly to stable in accordance with David S. Miller[1]. The reason is that this patch applies only on linux-4.6 and not on linux-4.7/net because it was "accidentally" fixed by a refactoring commit (more details in [2]). It addresses a reference imbalance which systematically leads to a use-after-free and then a kernel crash. [1] https://www.mail-archive.com/b.a.t.m.a.n@lists.open-mesh.org/msg15258.html [2] https://www.mail-archive.com/b.a.t.m.a.n@lists.open-mesh.org/msg15252.html Fixes: 9323158ef9f4 ("batman-adv: OGMv2 - implement originators logic") Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Antonio Quartulli <a@unstable.cc> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-08QE-UART: add "fsl,t1040-ucc-uart" to of_device_idZhao Qiang1-0/+3
commit 11ca2b7ab432eb90906168c327733575e68d388f upstream. New bindings use "fsl,t1040-ucc-uart" as the compatible for qe-uart. So add it. Signed-off-by: Zhao Qiang <qiang.zhao@nxp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-08wait/ptrace: assume __WALL if the child is tracedOleg Nesterov1-9/+20
commit bf959931ddb88c4e4366e96dd22e68fa0db9527c upstream. The following program (simplified version of generated by syzkaller) #include <pthread.h> #include <unistd.h> #include <sys/ptrace.h> #include <stdio.h> #include <signal.h> void *thread_func(void *arg) { ptrace(PTRACE_TRACEME, 0,0,0); return 0; } int main(void) { pthread_t thread; if (fork()) return 0; while (getppid() != 1) ; pthread_create(&thread, NULL, thread_func, NULL); pthread_join(thread, NULL); return 0; } creates an unreapable zombie if /sbin/init doesn't use __WALL. This is not a kernel bug, at least in a sense that everything works as expected: debugger should reap a traced sub-thread before it can reap the leader, but without __WALL/__WCLONE do_wait() ignores sub-threads. Unfortunately, it seems that /sbin/init in most (all?) distributions doesn't use it and we have to change the kernel to avoid the problem. Note also that most init's use sys_waitid() which doesn't allow __WALL, so the necessary user-space fix is not that trivial. This patch just adds the "ptrace" check into eligible_child(). To some degree this matches the "tsk->ptrace" in exit_notify(), ->exit_signal is mostly ignored when the tracee reports to debugger. Or WSTOPPED, the tracer doesn't need to set this flag to wait for the stopped tracee. This obviously means the user-visible change: __WCLONE and __WALL no longer have any meaning for debugger. And I can only hope that this won't break something, but at least strace/gdb won't suffer. We could make a more conservative change. Say, we can take __WCLONE into account, or !thread_group_leader(). But it would be nice to not complicate these historical/confusing checks. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Jan Kratochvil <jan.kratochvil@redhat.com> Cc: "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com> Cc: Pedro Alves <palves@redhat.com> Cc: Roland McGrath <roland@hack.frob.com> Cc: <syzkaller@googlegroups.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>