summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2018-08-23Merge tag 'trace-v4.19-1' of ↵Linus Torvalds3-1/+25
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: "Masami found an off by one bug in the code that keeps "notrace" functions from being traced by kprobes. During my testing, I found that there's places that we may want to add kprobes to notrace, thus we may end up changing this code before 4.19 is released. The history behind this change is that we found that adding kprobes to various notrace functions caused the kernel to crashed. We took the safe route and decided not to allow kprobes to trace any notrace function. But because notrace is added to functions that just cause weird side effects to the function tracer, but are still safe, preventing kprobes for all notrace functios may be too much of a big hammer. One such place is __schedule() is marked notrace, to keep function tracer from doing strange recursive loops when it gets traced with NEED_RESCHED set. With this change, one can not add kprobes to the scheduler. Masami also added code to use gcov on ftrace" * tag 'trace-v4.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing/kprobes: Fix to check notrace function with correct range tracing: Allow gcov profiling on only ftrace subsystem
2018-08-23x86/mm: Only use tlb_remove_table() for paravirtPeter Zijlstra9-6/+26
If we don't use paravirt; don't play unnecessary and complicated games to free page-tables. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Rik van Riel <riel@surriel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-23mm: mmu_notifier fix for tlb_end_vmaNicholas Piggin2-14/+13
The generic tlb_end_vma does not call invalidate_range mmu notifier, and it resets resets the mmu_gather range, which means the notifier won't be called on part of the range in case of an unmap that spans multiple vmas. ARM64 seems to be the only arch I could see that has notifiers and uses the generic tlb_end_vma. I have not actually tested it. [ Catalin and Will point out that ARM64 currently only uses the notifiers for KVM, which doesn't use the ->invalidate_range() callback right now, so it's a bug, but one that happens to not affect them. So not necessary for stable. - Linus ] Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-23mm/tlb, x86/mm: Support invalidating TLB caches for RCU_TABLE_FREEPeter Zijlstra3-0/+22
Jann reported that x86 was missing required TLB invalidates when he hit the !*batch slow path in tlb_remove_table(). This is indeed the case; RCU_TABLE_FREE does not provide TLB (cache) invalidates, the PowerPC-hash where this code originated and the Sparc-hash where this was subsequently used did not need that. ARM which later used this put an explicit TLB invalidate in their __p*_free_tlb() functions, and PowerPC-radix followed that example. But when we hooked up x86 we failed to consider this. Fix this by (optionally) hooking tlb_remove_table() into the TLB invalidate code. NOTE: s390 was also needing something like this and might now be able to use the generic code again. [ Modified to be on top of Nick's cleanups, which simplified this patch now that tlb_flush_mmu_tlbonly() really only flushes the TLB - Linus ] Fixes: 9e52fc2b50de ("x86/mm: Enable RCU based page table freeing (CONFIG_HAVE_RCU_TABLE_FREE=y)") Reported-by: Jann Horn <jannh@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Rik van Riel <riel@surriel.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: David Miller <davem@davemloft.net> Cc: Will Deacon <will.deacon@arm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-23mm/tlb: Remove tlb_remove_table() non-concurrent conditionPeter Zijlstra1-9/+0
Will noted that only checking mm_users is incorrect; we should also check mm_count in order to cover CPUs that have a lazy reference to this mm (and could do speculative TLB operations). If removing this turns out to be a performance issue, we can re-instate a more complete check, but in tlb_table_flush() eliding the call_rcu_sched(). Fixes: 267239116987 ("mm, powerpc: move the RCU page-table freeing into generic code") Reported-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Rik van Riel <riel@surriel.com> Acked-by: Will Deacon <will.deacon@arm.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: David Miller <davem@davemloft.net> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-23mm: move tlb_table_flush to tlb_flush_mmu_freeNicholas Piggin1-3/+3
There is no need to call this from tlb_flush_mmu_tlbonly, it logically belongs with tlb_flush_mmu_free. This makes future fixes simpler. [ This was originally done to allow code consolidation for the mmu_notifier fix, but it also ends up helping simplify the HAVE_RCU_TABLE_INVALIDATE fix. - Linus ] Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Acked-by: Will Deacon <will.deacon@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-23getxattr: use correct xattr lengthChristian Brauner1-1/+1
When running in a container with a user namespace, if you call getxattr with name = "system.posix_acl_access" and size % 8 != 4, then getxattr silently skips the user namespace fixup that it normally does resulting in un-fixed-up data being returned. This is caused by posix_acl_fix_xattr_to_user() being passed the total buffer size and not the actual size of the xattr as returned by vfs_getxattr(). This commit passes the actual length of the xattr as returned by vfs_getxattr() down. A reproducer for the issue is: touch acl_posix setfacl -m user:0:rwx acl_posix and the compile: #define _GNU_SOURCE #include <errno.h> #include <stdio.h> #include <stdlib.h> #include <string.h> #include <sys/types.h> #include <unistd.h> #include <attr/xattr.h> /* Run in user namespace with nsuid 0 mapped to uid != 0 on the host. */ int main(int argc, void **argv) { ssize_t ret1, ret2; char buf1[128], buf2[132]; int fret = EXIT_SUCCESS; char *file; if (argc < 2) { fprintf(stderr, "Please specify a file with " "\"system.posix_acl_access\" permissions set\n"); _exit(EXIT_FAILURE); } file = argv[1]; ret1 = getxattr(file, "system.posix_acl_access", buf1, sizeof(buf1)); if (ret1 < 0) { fprintf(stderr, "%s - Failed to retrieve " "\"system.posix_acl_access\" " "from \"%s\"\n", strerror(errno), file); _exit(EXIT_FAILURE); } ret2 = getxattr(file, "system.posix_acl_access", buf2, sizeof(buf2)); if (ret2 < 0) { fprintf(stderr, "%s - Failed to retrieve " "\"system.posix_acl_access\" " "from \"%s\"\n", strerror(errno), file); _exit(EXIT_FAILURE); } if (ret1 != ret2) { fprintf(stderr, "The value of \"system.posix_acl_" "access\" for file \"%s\" changed " "between two successive calls\n", file); _exit(EXIT_FAILURE); } for (ssize_t i = 0; i < ret2; i++) { if (buf1[i] == buf2[i]) continue; fprintf(stderr, "Unexpected different in byte %zd: " "%02x != %02x\n", i, buf1[i], buf2[i]); fret = EXIT_FAILURE; } if (fret == EXIT_SUCCESS) fprintf(stderr, "Test passed\n"); else fprintf(stderr, "Test failed\n"); _exit(fret); } and run: ./tester acl_posix On a non-fixed up kernel this should return something like: root@c1:/# ./t Unexpected different in byte 16: ffffffa0 != 00 Unexpected different in byte 17: ffffff86 != 00 Unexpected different in byte 18: 01 != 00 and on a fixed kernel: root@c1:~# ./t Test passed Cc: stable@vger.kernel.org Fixes: 2f6f0654ab61 ("userns: Convert vfs posix_acl support to use kuids and kgids") Link: https://bugzilla.kernel.org/show_bug.cgi?id=199945 Reported-by: Colin Watson <cjwatson@ubuntu.com> Signed-off-by: Christian Brauner <christian@brauner.io> Acked-by: Serge Hallyn <serge@hallyn.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2018-08-23gcc-plugins: Disable when building under ClangKees Cook1-1/+1
Prior to doing compiler feature detection in Kconfig, attempts to build GCC plugins with Clang would fail the build, much in the same way missing GCC plugin headers would fail the build. However, now that this logic has been lifted into Kconfig, add an explicit test for GCC (instead of duplicating it in the feature-test script). Reported-by: Stefan Agner <stefan@agner.ch> Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Masahiro Yamada <yamada.masahiro@socionext.com>
2018-08-23blk-wbt: don't maintain inflight counts if disabledJens Axboe2-1/+21
A previous commit removed the ability to have per-rq flags. We used those flags to maintain inflight counts. Since we don't have those anymore, we have to always maintain inflight counts, even if wbt is disabled. This is clearly suboptimal. Add a queue quiesce around changing the wbt latency settings from sysfs to work around this. With that, we can reliably put the enabled check in our bio_to_wbt_flags(), since we know the WBT_TRACKED flag will be consistent for the lifetime of the request. Fixes: c1c80384c8f ("block: remove external dependency on wbt_flags") Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-08-23powerpc/mce: Fix SLB rebolting during MCE recovery path.Mahesh Salgaonkar1-1/+1
The commit e7e81847478 ("powerpc/64s: move machine check SLB flushing to mm/slb.c") introduced a bug in reloading bolted SLB entries. Unused bolted entries are stored with .esid=0 in the slb_shadow area, and that value is now used directly as the RB input to slbmte, which means the RB[52:63] index field is set to 0, which causes SLB entry 0 to be cleared. Fix this by storing the index bits in the unused bolted entries, which directs the slbmte to the right place. The SLB shadow area is also used by the hypervisor, but PAPR is okay with that, from LoPAPR v1.1, 14.11.1.3 SLB Shadow Buffer: Note: SLB is filled sequentially starting at index 0 from the shadow buffer ignoring the contents of RB field bits 52-63 Fixes: e7e81847478b ("powerpc/64s: move machine check SLB flushing to mm/slb.c") Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-23KVM: PPC: Book3S: Fix guest DMA when guest partially backed by THP pagesPaul Mackerras1-7/+10
Commit 76fa4975f3ed ("KVM: PPC: Check if IOMMU page is contained in the pinned physical page", 2018-07-17) added some checks to ensure that guest DMA mappings don't attempt to map more than the guest is entitled to access. However, errors in the logic mean that legitimate guest requests to map pages for DMA are being denied in some situations. Specifically, if the first page of the range passed to mm_iommu_get() is mapped with a normal page, and subsequent pages are mapped with transparent huge pages, we end up with mem->pageshift == 0. That means that the page size checks in mm_iommu_ua_to_hpa() and mm_iommu_up_to_hpa_rm() will always fail for every page in that region, and thus the guest can never map any memory in that region for DMA, typically leading to a flood of error messages like this: qemu-system-ppc64: VFIO_MAP_DMA: -22 qemu-system-ppc64: vfio_dma_map(0x10005f47780, 0x800000000000000, 0x10000, 0x7fff63ff0000) = -22 (Invalid argument) The logic errors in mm_iommu_get() are: (a) use of 'ua' not 'ua + (i << PAGE_SHIFT)' in the find_linux_pte() call (meaning that find_linux_pte() returns the pte for the first address in the range, not the address we are currently up to); (b) use of 'pageshift' as the variable to receive the hugepage shift returned by find_linux_pte() - for a normal page this gets set to 0, leading to us setting mem->pageshift to 0 when we conclude that the pte returned by find_linux_pte() didn't match the page we were looking at; (c) comparing 'compshift', which is a page order, i.e. log base 2 of the number of pages, with 'pageshift', which is a log base 2 of the number of bytes. To fix these problems, this patch introduces 'cur_ua' to hold the current user address and uses that in the find_linux_pte() call; introduces 'pteshift' to hold the hugepage shift found by find_linux_pte(); and compares 'pteshift' with 'compshift + PAGE_SHIFT' rather than 'compshift'. The patch also moves the local_irq_restore to the point after the PTE pointer returned by find_linux_pte() has been dereferenced because otherwise the PTE could change underneath us, and adds a check to avoid doing the find_linux_pte() call once mem->pageshift has been reduced to PAGE_SHIFT, as an optimization. Fixes: 76fa4975f3ed ("KVM: PPC: Check if IOMMU page is contained in the pinned physical page") Cc: stable@vger.kernel.org # v4.12+ Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-23powerpc/mm/radix: Only need the Nest MMU workaround for R -> RW transitionAneesh Kumar K.V1-3/+5
The Nest MMU workaround is only needed for RW upgrades. Avoid doing that for other PTE updates. We also avoid clearing the PTE while marking it invalid. This is because other page table walkers will find this PTE none and can result in unexpected behaviour due to that. Instead we clear _PAGE_PRESENT and set the software PTE bit _PAGE_INVALID. pte_present() is already updated to check for both bits. This makes sure page table walkers will find the PTE present and things like pte_pfn(pte) returns the right value. Based on an original patch from Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-23Merge tag 'perf-core-for-mingo-4.19-20180820' of ↵Ingo Molnar26-266/+256
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo: LLVM/clang/eBPF: (Arnaldo Carvalho de Melo) - Allow passing options to llc in addition to to clang. Hardware tracing: (Jack Henschel) - Improve error message for PMU address filters, clarifying availability of that feature in hardware having hardware tracing such as Intel PT. Python interface: (Jiri Olsa) - Fix read_on_cpu() interface. ELF/DWARF libraries: (Jiri Olsa) - Fix handling of the combo compressed module file + decompressed associated debuginfo file. Build (Rasmus Villemoes) - Disable parallelism for 'make clean', avoiding multiple submakes deleting the same files and causing the build to fail on systems such as Yocto. Kernel ABI copies: (Arnaldo Carvalho de Melo) - Update tools's copy of x86's cpufeatures.h. - Update arch/x86/lib/memcpy_64.S copy used in 'perf bench mem memcpy'. Miscellaneous: (Steven Rostedt) - Change libtraceevent to SPDX License format. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-08-23drm/edid: Add 6 bpc quirk for SDC panel in Lenovo B50-80Kai-Heng Feng1-0/+3
Another panel that reports "DFP 1.x compliant TMDS" but it supports 6bpc instead of 8 bpc. Apply 6 bpc quirk for the panel to fix it. BugLink: https://bugs.launchpad.net/bugs/1788308 Cc: <stable@vger.kernel.org> # v4.8+ Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20180823055332.7723-1-kai.heng.feng@canonical.com
2018-08-23ACPI: fix menuconfig presentation of ACPI submenuArnd Bergmann1-3/+3
My fix for a recursive Kconfig dependency caused another issue where the ACPI specific options end up in the top-level menu in 'menuconfig'. This was an unintended side-effect of having a silent option between 'menuconfig ACPI' and 'if ACPI'. Moving the ARCH_SUPPORTS_ACPI symbol ahead of the ACPI menu solves that problem and restores the previous presentation. Reported-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Fixes: 2c870e61132c (arm64: fix ACPI dependencies) Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-08-23powerpc/mm/books3s: Add new pte bit to mark pte temporarily invalid.Aneesh Kumar K.V1-1/+17
When splitting a huge pmd pte, we need to mark the pmd entry invalid. We can do that by clearing _PAGE_PRESENT bit. But then that will be taken as a swap pte. In order to differentiate between the two use a software pte bit when invalidating. For regular pte, due to bd5050e38aec ("powerpc/mm/radix: Change pte relax sequence to handle nest MMU hang") we need to mark the pte entry invalid when relaxing access permission. Instead of marking pte_none which can result in different page table walk routines possibly skipping this pte entry, invalidate it but still keep it marked present. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-23powerpc/nohash: fix pte_access_permitted()Christophe Leroy1-6/+3
Commit 5769beaf180a8 ("powerpc/mm: Add proper pte access check helper for other platforms") replaced generic pte_access_permitted() by an arch specific one. The generic one is defined as (pte_present(pte) && (!(write) || pte_write(pte))) The arch specific one is open coded checking that _PAGE_USER and _PAGE_WRITE (_PAGE_RW) flags are set, but lacking to check that _PAGE_RO and _PAGE_PRIVILEGED are unset, leading to a useless test on targets like the 8xx which defines _PAGE_RW and _PAGE_USER as 0. Commit 5fa5b16be5b31 ("powerpc/mm/hugetlb: Use pte_access_permitted for hugetlb access check") replaced some tests performed with pte helpers by a call to pte_access_permitted(), leading to the same issue. This patch rewrites powerpc/nohash pte_access_permitted() using pte helpers. Fixes: 5769beaf180a8 ("powerpc/mm: Add proper pte access check helper for other platforms") Fixes: 5fa5b16be5b31 ("powerpc/mm/hugetlb: Use pte_access_permitted for hugetlb access check") Cc: stable@vger.kernel.org # v4.15+ Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-23apparmor: remove no-op permission check in policy_unpackJohn Johansen1-32/+0
The patch 736ec752d95e: "AppArmor: policy routines for loading and unpacking policy" from Jul 29, 2010, leads to the following static checker warning: security/apparmor/policy_unpack.c:410 verify_accept() warn: bitwise AND condition is false here security/apparmor/policy_unpack.c:413 verify_accept() warn: bitwise AND condition is false here security/apparmor/policy_unpack.c 392 #define DFA_VALID_PERM_MASK 0xffffffff 393 #define DFA_VALID_PERM2_MASK 0xffffffff 394 395 /** 396 * verify_accept - verify the accept tables of a dfa 397 * @dfa: dfa to verify accept tables of (NOT NULL) 398 * @flags: flags governing dfa 399 * 400 * Returns: 1 if valid accept tables else 0 if error 401 */ 402 static bool verify_accept(struct aa_dfa *dfa, int flags) 403 { 404 int i; 405 406 /* verify accept permissions */ 407 for (i = 0; i < dfa->tables[YYTD_ID_ACCEPT]->td_lolen; i++) { 408 int mode = ACCEPT_TABLE(dfa)[i]; 409 410 if (mode & ~DFA_VALID_PERM_MASK) 411 return 0; 412 413 if (ACCEPT_TABLE2(dfa)[i] & ~DFA_VALID_PERM2_MASK) 414 return 0; fixes: 736ec752d95e ("AppArmor: policy routines for loading and unpacking policy") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: John Johansen <john.johansen@canonical.com>
2018-08-23Merge branch 'drm-next-4.19' of git://people.freedesktop.org/~agd5f/linux ↵Dave Airlie32-141/+154
into drm-next Fixes for 4.19: - Fix build when KCOV is enabled - Misc display fixes - A couple of SR-IOV fixes - Fence fixes for eviction handling for KFD - Misc other fixes Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexdeucher@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180822203813.2733-1-alexander.deucher@amd.com
2018-08-23Merge tag 'drm-misc-next-fixes-2018-08-22' of ↵Dave Airlie2-1/+4
git://anongit.freedesktop.org/drm/drm-misc into drm-next - Add an unprepare delay to the tv123wam panel (Sean) - Update seanpaul's email in MAINTAINERS (Sean) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Sean Paul <sean@poorly.run> Link: https://patchwork.freedesktop.org/patch/msgid/20180822193850.GA214158@art_vandelay
2018-08-23x86/mm/tlb: Revert the recent lazy TLB patchesPeter Zijlstra4-181/+77
Revert commits: 95b0e6357d3e x86/mm/tlb: Always use lazy TLB mode 64482aafe55f x86/mm/tlb: Only send page table free TLB flush to lazy TLB CPUs ac0315896970 x86/mm/tlb: Make lazy TLB mode lazier 61d0beb5796a x86/mm/tlb: Restructure switch_mm_irqs_off() 2ff6ddf19c0e x86/mm/tlb: Leave lazy TLB mode at page table free time In order to simplify the TLB invalidate fixes for x86 and unify the parts that need backporting. We'll try again later. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Rik van Riel <riel@surriel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-23include/linux/compiler*.h: make compiler-*.h mutually exclusiveNick Desaulniers9-253/+133
Commit cafa0010cd51 ("Raise the minimum required gcc version to 4.6") recently exposed a brittle part of the build for supporting non-gcc compilers. Both Clang and ICC define __GNUC__, __GNUC_MINOR__, and __GNUC_PATCHLEVEL__ for quick compatibility with code bases that haven't added compiler specific checks for __clang__ or __INTEL_COMPILER. This is brittle, as they happened to get compatibility by posing as a certain version of GCC. This broke when upgrading the minimal version of GCC required to build the kernel, to a version above what ICC and Clang claim to be. Rather than always including compiler-gcc.h then undefining or redefining macros in compiler-intel.h or compiler-clang.h, let's separate out the compiler specific macro definitions into mutually exclusive headers, do more proper compiler detection, and keep shared definitions in compiler_types.h. Fixes: cafa0010cd51 ("Raise the minimum required gcc version to 4.6") Reported-by: Masahiro Yamada <yamada.masahiro@socionext.com> Suggested-by: Eli Friedman <efriedma@codeaurora.org> Suggested-by: Joe Perches <joe@perches.com> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-23sunrpc: Add comment defining gssd upcall API keywordsChuck Lever1-0/+17
During review, it was found that the target, service, and srchost keywords are easily conflated. Add an explainer. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-08-23nfsd: Remove callback_credChuck Lever3-30/+2
Clean up: The global callback_cred is no longer used, so it can be removed. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-08-23nfsd: Use correct credential for NFSv4.0 callback with GSSChuck Lever1-1/+8
I've had trouble when operating a multi-homed Linux NFS server with Kerberos using NFSv4.0. Lately, I've seen my clients reporting this (and then hanging): May 9 11:43:26 manet kernel: NFS: NFSv4 callback contains invalid cred The client-side commit f11b2a1cfbf5 ("nfs4: copy acceptor name from context to nfs_client") appears to be related, but I suspect this problem has been going on for some time before that. RFC 7530 Section 3.3.3 says: > For Kerberos V5, nfs/hostname would be a server principal in the > Kerberos Key Distribution Center database. This is the same > principal the client acquired a GSS-API context for when it issued > the SETCLIENTID operation ... In other words, an NFSv4.0 client expects that the server will use the same GSS principal for callback that the client used to establish its lease. For example, if the client used the service principal "nfs@server.domain" to establish its lease, the server is required to use "nfs@server.domain" when performing NFSv4.0 callback operations. The Linux NFS server currently does not. It uses a common service principal for all callback connections. Sometimes this works as expected, and other times -- for example, when the server is accessible via multiple hostnames -- it won't work at all. This patch scrapes the target name from the client credential, and uses that for the NFSv4.0 callback credential. That should be correct much more often. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-08-23sunrpc: Extract target name into svc_credChuck Lever3-27/+53
NFSv4.0 callback needs to know the GSS target name the client used when it established its lease. That information is available from the GSS context created by gssproxy. Make it available in each svc_cred. Note this will also give us access to the real target service principal name (which is typically "nfs", but spec does not require that). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-08-23sunrpc: Enable the kernel to specify the hostname part of service principalsChuck Lever1-3/+17
A multi-homed NFS server may have more than one "nfs" key in its keytab. Enable the kernel to pick the key it wants as a machine credential when establishing a GSS context. This is useful for GSS-protected NFSv4.0 callbacks, which are required by RFC 7530 S3.3.3 to use the same principal as the service principal the client used when establishing its lease. A complementary modification to rpc.gssd is required to fully enable this feature. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-08-23sunrpc: Don't use stack buffer with scatterlistLaura Abbott1-3/+9
Fedora got a bug report from NFS: kernel BUG at include/linux/scatterlist.h:143! ... RIP: 0010:sg_init_one+0x7d/0x90 .. make_checksum+0x4e7/0x760 [rpcsec_gss_krb5] gss_get_mic_kerberos+0x26e/0x310 [rpcsec_gss_krb5] gss_marshal+0x126/0x1a0 [auth_rpcgss] ? __local_bh_enable_ip+0x80/0xe0 ? call_transmit_status+0x1d0/0x1d0 [sunrpc] call_transmit+0x137/0x230 [sunrpc] __rpc_execute+0x9b/0x490 [sunrpc] rpc_run_task+0x119/0x150 [sunrpc] nfs4_run_exchange_id+0x1bd/0x250 [nfsv4] _nfs4_proc_exchange_id+0x2d/0x490 [nfsv4] nfs41_discover_server_trunking+0x1c/0xa0 [nfsv4] nfs4_discover_server_trunking+0x80/0x270 [nfsv4] nfs4_init_client+0x16e/0x240 [nfsv4] ? nfs_get_client+0x4c9/0x5d0 [nfs] ? _raw_spin_unlock+0x24/0x30 ? nfs_get_client+0x4c9/0x5d0 [nfs] nfs4_set_client+0xb2/0x100 [nfsv4] nfs4_create_server+0xff/0x290 [nfsv4] nfs4_remote_mount+0x28/0x50 [nfsv4] mount_fs+0x3b/0x16a vfs_kern_mount.part.35+0x54/0x160 nfs_do_root_mount+0x7f/0xc0 [nfsv4] nfs4_try_mount+0x43/0x70 [nfsv4] ? get_nfs_version+0x21/0x80 [nfs] nfs_fs_mount+0x789/0xbf0 [nfs] ? pcpu_alloc+0x6ca/0x7e0 ? nfs_clone_super+0x70/0x70 [nfs] ? nfs_parse_mount_options+0xb40/0xb40 [nfs] mount_fs+0x3b/0x16a vfs_kern_mount.part.35+0x54/0x160 do_mount+0x1fd/0xd50 ksys_mount+0xba/0xd0 __x64_sys_mount+0x21/0x30 do_syscall_64+0x60/0x1f0 entry_SYSCALL_64_after_hwframe+0x49/0xbe This is BUG_ON(!virt_addr_valid(buf)) triggered by using a stack allocated buffer with a scatterlist. Convert the buffer for rc4salt to be dynamically allocated instead. Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1615258 Signed-off-by: Laura Abbott <labbott@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-08-23Merge tag 'platform-drivers-x86-v4.19-1' of ↵Linus Torvalds27-419/+1738
git://git.infradead.org/linux-platform-drivers-x86 Pull x86 platform driver updates from Andy Shevchenko: - The driver for Silead touchscreen configurations has been renamed from silead_dmi to touchscreen_dmi since it starts supporting other touchscreens which require some DMI quirks It also gets expanded to cover cases for Chuwi Vi10, ONDA V891W, Connect Tablet 9, Onda V820w, and Cube KNote i1101 tablets. - Another bunch of changes is related to Mellanox platform code to allow user space to communicate with Mellanox for system control and monitoring purposes. The driver notifies user on hotplug device signal receiving. - ASUS WMI drivers recognize lid flip action on UX360, and correctly toggles airplane mode LED. In addition the keyboard backlight toggle gets support. - ThinkPad ACPI driver enables support for calculator key (on at least P52). It also has been fixed to support three characters model designators, which are used for modern laptops. Earlier the battery, marked as BAT1, on ThinkPad laptops has not been configured properly, which is fixed. On the opposite the multi-battery configurations now probed correctly. - Dell SMBIOS driver starts working on some Dell servers which do not support token interface. The regression with backlight detection has also been fixed. In order to support dock mode on some laptops, Intel virtual button driver has been fixed. The last but not least is the fix to Intel HID driver due to changes in Dell systems that prevented to use power button. * tag 'platform-drivers-x86-v4.19-1' of git://git.infradead.org/linux-platform-drivers-x86: (47 commits) platform/x86: acer-wmi: Silence "unsupported" message a bit platform/x86: intel_punit_ipc: fix build errors platform/x86: ideapad: Add Y520-15IKBM and Y720-15IKBM to no_hw_rfkill platform/x86: asus-nb-wmi: Add keymap entry for lid flip action on UX360 platform/x86: acer-wmi: refactor function has_cap platform/x86: thinkpad_acpi: Fix multi-battery bug platform/x86: thinkpad_acpi: extend battery quirk coverage platform/x86: touchscreen_dmi: Add info for the Cube KNote i1101 tablet platform/x86: mlx-platform: Fix copy-paste error in mlxplat_init() platform/x86: mlx-platform: Remove unused define platform/x86: mlx-platform: Change mlxreg-io configuration for MSN274x systems Documentation/ABI: Add new attribute for mlxreg-io sysfs interfaces platform/x86: mlx-platform: Allow mlxreg-io driver activation for more systems platform/x86: mlx-platform: Add ASIC hotplug device configuration platform/mellanox: mlxreg-hotplug: Add hotplug hwmon uevent notification platform/mellanox: mlxreg-hotplug: Improve mechanism of ASIC health discovery platform/x86: mlx-platform: Add mlxreg-fan platform driver activation platform/x86: dell-laptop: Fix backlight detection platform/x86: toshiba_acpi: Fix defined but not used build warnings platform/x86: thinkpad_acpi: Support battery quirk ...
2018-08-23ia64: Fix allnoconfig section mismatch for ioc_init/ioc_iommu_infoTony Luck1-2/+2
This has been broken for an embarassingly long time (since v4.4). Just needs a couple of __init tags on functions to make the sections match up. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-23blk-wbt: fix has-sleeper queueing checkJens Axboe1-3/+5
We need to do this inside the loop as well, or we can allow new IO to supersede previous IO. Tested-by: Anchal Agarwal <anchalag@amazon.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-08-23blk-wbt: use wq_has_sleeper() for wq active checkJens Axboe1-4/+4
We need the memory barrier before checking the list head, use the appropriate helper for this. The matching queue side memory barrier is provided by set_current_state(). Tested-by: Anchal Agarwal <anchalag@amazon.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-08-23blk-wbt: move disable check into get_limit()Jens Axboe1-15/+7
Check it in one place, instead of in multiple places. Tested-by: Anchal Agarwal <anchalag@amazon.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-08-23Merge branch 'parisc-4.19-2' of ↵Linus Torvalds12-137/+113
git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux Pull more parisc updates from Helge Deller: - fix boot failure of 64-bit kernel. It got broken by the unwind optimization commit in merge window. - fix 64-bit userspace support (static 64-bit applications only, e.g. we don't yet have 64-bit userspace support in glibc). - consolidate unwind initialization code. - add machine model description to stack trace. * 'parisc-4.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux: parisc: Add hardware description to stack traces parisc: Fix boot failure of 64-bit kernel parisc: Consolidate unwind initialization calls parisc: Update comments in syscall.S regarding wide userland parisc: Fix ptraced 64-bit applications to call 64-bit syscalls parisc: Restore possibility to execute 64-bit applications
2018-08-23bcache: release dc->writeback_lock properly in bch_writeback_thread()Shan Hai1-1/+3
The writeback thread would exit with a lock held when the cache device is detached via sysfs interface, fix it by releasing the held lock before exiting the while-loop. Fixes: fadd94e05c02 (bcache: quit dc->writeback_thread when BCACHE_DEV_DETACHING is set) Signed-off-by: Shan Hai <shan.hai@oracle.com> Signed-off-by: Coly Li <colyli@suse.de> Tested-by: Shenghui Wang <shhuiw@foxmail.com> Cc: stable@vger.kernel.org #4.17+ Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-08-23Merge tag 'xtensa-20180820' of git://github.com/jcmvbkbc/linux-xtensaLinus Torvalds27-314/+1300
Pull Xtensa updates from Max Filippov: - switch xtensa arch to the generic noncoherent direct mapping operations - add support for DMA_ATTR_NO_KERNEL_MAPPING attribute - clean up users of platform/hardware.h in generic Xtensa code - fix assembly cache maintenance code for long cache lines - rework noMMU cache attributes initialization - add big-endian HiFi2 test_kc705_be CPU variant * tag 'xtensa-20180820' of git://github.com/jcmvbkbc/linux-xtensa: xtensa: add test_kc705_be variant xtensa: clean up boot-elf/bootstrap.S xtensa: make bootparam parsing optional xtensa: drop variant IRQ support xtensa: drop unneeded platform/hardware.h headers xtensa: move PLATFORM_NR_IRQS to Kconfig xtensa: rework {CONFIG,PLATFORM}_DEFAULT_MEM_START xtensa: drop unused {CONFIG,PLATFORM}_DEFAULT_MEM_SIZE xtensa: rework noMMU cache attributes initialization xtensa: increase ranges in ___invalidate_{i,d}cache_all xtensa: limit offsets in __loop_cache_{all,page} xtensa: platform-specific handling of coherent memory xtensa: support DMA_ATTR_NO_KERNEL_MAPPING attribute xtensa: use generic dma_noncoherent_ops
2018-08-22Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds59-305/+1372
Pull second set of KVM updates from Paolo Bonzini: "ARM: - Support for Group0 interrupts in guests - Cache management optimizations for ARMv8.4 systems - Userspace interface for RAS - Fault path optimization - Emulated physical timer fixes - Random cleanups x86: - fixes for L1TF - a new test case - non-support for SGX (inject the right exception in the guest) - fix lockdep false positive" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (49 commits) KVM: VMX: fixes for vmentry_l1d_flush module parameter kvm: selftest: add dirty logging test kvm: selftest: pass in extra memory when create vm kvm: selftest: include the tools headers kvm: selftest: unify the guest port macros tools: introduce test_and_clear_bit KVM: x86: SVM: Call x86_spec_ctrl_set_guest/host() with interrupts disabled KVM: vmx: Inject #UD for SGX ENCLS instruction in guest KVM: vmx: Add defines for SGX ENCLS exiting x86/kvm/vmx: Fix coding style in vmx_setup_l1d_flush() x86: kvm: avoid unused variable warning KVM: Documentation: rename the capability of KVM_CAP_ARM_SET_SERROR_ESR KVM: arm/arm64: Skip updating PTE entry if no change KVM: arm/arm64: Skip updating PMD entry if no change KVM: arm: Use true and false for boolean values KVM: arm/arm64: vgic: Do not use spin_lock_irqsave/restore with irq disabled KVM: arm/arm64: vgic: Move DEBUG_SPINLOCK_BUG_ON to vgic.h KVM: arm: vgic-v3: Add support for ICC_SGI0R and ICC_ASGI1R accesses KVM: arm64: vgic-v3: Add support for ICC_SGI0R_EL1 and ICC_ASGI1R_EL1 accesses KVM: arm/arm64: vgic-v3: Add core support for Group0 SGIs ...
2018-08-22Merge tag 'for-4.19/post-20180822' of git://git.kernel.dk/linux-blockLinus Torvalds43-656/+879
Pull more block updates from Jens Axboe: - Set of bcache fixes and changes (Coly) - The flush warn fix (me) - Small series of BFQ fixes (Paolo) - wbt hang fix (Ming) - blktrace fix (Steven) - blk-mq hardware queue count update fix (Jianchao) - Various little fixes * tag 'for-4.19/post-20180822' of git://git.kernel.dk/linux-block: (31 commits) block/DAC960.c: make some arrays static const, shrinks object size blk-mq: sync the update nr_hw_queues with blk_mq_queue_tag_busy_iter blk-mq: init hctx sched after update ctx and hctx mapping block: remove duplicate initialization tracing/blktrace: Fix to allow setting same value pktcdvd: fix setting of 'ret' error return for a few cases block: change return type to bool block, bfq: return nbytes and not zero from struct cftype .write() method block, bfq: improve code of bfq_bfqq_charge_time block, bfq: reduce write overcharge block, bfq: always update the budget of an entity when needed block, bfq: readd missing reset of parent-entity service blk-wbt: fix IO hang in wbt_wait() block: don't warn for flush on read-only device bcache: add the missing comments for smp_mb()/smp_wmb() bcache: remove unnecessary space before ioctl function pointer arguments bcache: add missing SPDX header bcache: move open brace at end of function definitions to next line bcache: add static const prefix to char * array declarations bcache: fix code comments style ...
2018-08-22Merge tag 'f2fs-for-4.19' of ↵Linus Torvalds21-532/+1671
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "In this round, we've tuned f2fs to improve general performance by serializing block allocation and enhancing discard flows like fstrim which avoids user IO contention. And we've added fsync_mode=nobarrier which gives an option to user where it skips issuing cache_flush commands to underlying flash storage. And there are many bug fixes related to fuzzed images, revoked atomic writes, quota ops, and minor direct IO. Enhancements: - add fsync_mode=nobarrier which bypasses cache_flush command - enhance the discarding flow which avoids user IOs and issues in LBA order - readahead some encrypted blocks during GC - enable in-memory inode checksum to verify the blocks if F2FS_CHECK_FS is set - enhance nat_bits behavior - set -o discard by default - set REQ_RAHEAD to bio in ->readpages Bug fixes: - fix a corner case to corrupt atomic_writes revoking flow - revisit i_gc_rwsem to fix race conditions - fix some dio behaviors captured by xfstests - correct handling errors given by quota-related failures - add many sanity check flows to avoid fuzz test failures - add more error number propagation to their callers - fix several corner cases to continue fault injection w/ shutdown loop" * tag 'f2fs-for-4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (89 commits) f2fs: readahead encrypted block during GC f2fs: avoid fi->i_gc_rwsem[WRITE] lock in f2fs_gc f2fs: fix performance issue observed with multi-thread sequential read f2fs: fix to skip verifying block address for non-regular inode f2fs: rework fault injection handling to avoid a warning f2fs: support fault_type mount option f2fs: fix to return success when trimming meta area f2fs: fix use-after-free of dicard command entry f2fs: support discard submission error injection f2fs: split discard command in prior to block layer f2fs: wake up gc thread immediately when gc_urgent is set f2fs: fix incorrect range->len in f2fs_trim_fs() f2fs: refresh recent accessed nat entry in lru list f2fs: fix avoid race between truncate and background GC f2fs: avoid race between zero_range and background GC f2fs: fix to do sanity check with block address in main area v2 f2fs: fix to do sanity check with inline flags f2fs: fix to reset i_gc_failures correctly f2fs: fix invalid memory access f2fs: fix to avoid broken of dnode block list ...
2018-08-22ovl: set I_CREATING on inode being createdMiklos Szeredi1-0/+4
...otherwise there will be list corruption due to inode_sb_list_add() being called for inode already on the sb list. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: e950564b97fd ("vfs: don't evict uninitialized inode") Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-22Merge branch 'akpm' (patches from Andrew)Linus Torvalds182-2270/+3192
Merge more updates from Andrew Morton: - the rest of MM - procfs updates - various misc things - more y2038 fixes - get_maintainer updates - lib/ updates - checkpatch updates - various epoll updates - autofs updates - hfsplus - some reiserfs work - fatfs updates - signal.c cleanups - ipc/ updates * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (166 commits) ipc/util.c: update return value of ipc_getref from int to bool ipc/util.c: further variable name cleanups ipc: simplify ipc initialization ipc: get rid of ids->tables_initialized hack lib/rhashtable: guarantee initial hashtable allocation lib/rhashtable: simplify bucket_table_alloc() ipc: drop ipc_lock() ipc/util.c: correct comment in ipc_obtain_object_check ipc: rename ipcctl_pre_down_nolock() ipc/util.c: use ipc_rcu_putref() for failues in ipc_addid() ipc: reorganize initialization of kern_ipc_perm.seq ipc: compute kern_ipc_perm.id under the ipc lock init/Kconfig: remove EXPERT from CHECKPOINT_RESTORE fs/sysv/inode.c: use ktime_get_real_seconds() for superblock stamp adfs: use timespec64 for time conversion kernel/sysctl.c: fix typos in comments drivers/rapidio/devices/rio_mport_cdev.c: remove redundant pointer md fork: don't copy inconsistent signal handler state to child signal: make get_signal() return bool signal: make sigkill_pending() return bool ...
2018-08-22ipc/util.c: update return value of ipc_getref from int to boolManfred Spraul2-2/+2
ipc_getref has still a return value of type "int", matching the atomic_t interface of atomic_inc_not_zero()/atomic_add_unless(). ipc_getref now uses refcount_inc_not_zero, which has a return value of type "bool". Therefore, update the return code to avoid implicit conversions. Link: http://lkml.kernel.org/r/20180712185241.4017-13-manfred@colorfullife.com Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Kees Cook <keescook@chromium.org> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-22ipc/util.c: further variable name cleanupsManfred Spraul6-27/+27
The varable names got a mess, thus standardize them again: id: user space id. Called semid, shmid, msgid if the type is known. Most functions use "id" already. idx: "index" for the idr lookup Right now, some functions use lid, ipc_addid() already uses idx as the variable name. seq: sequence number, to avoid quick collisions of the user space id key: user space key, used for the rhash tree Link: http://lkml.kernel.org/r/20180712185241.4017-12-manfred@colorfullife.com Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Kees Cook <keescook@chromium.org> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-22ipc: simplify ipc initializationDavidlohr Bueso6-54/+30
Now that we know that rhashtable_init() will not fail, we can get rid of a lot of the unnecessary cleanup paths when the call errored out. [manfred@colorfullife.com: variable name added to util.h to resolve checkpatch warning] Link: http://lkml.kernel.org/r/20180712185241.4017-11-manfred@colorfullife.com Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Kees Cook <keescook@chromium.org> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-22ipc: get rid of ids->tables_initialized hackDavidlohr Bueso2-16/+8
In sysvipc we have an ids->tables_initialized regarding the rhashtable, introduced in 0cfb6aee70bd ("ipc: optimize semget/shmget/msgget for lots of keys") It's there, specifically, to prevent nil pointer dereferences, from using an uninitialized api. Considering how rhashtable_init() can fail (probably due to ENOMEM, if anything), this made the overall ipc initialization capable of failure as well. That alone is ugly, but fine, however I've spotted a few issues regarding the semantics of tables_initialized (however unlikely they may be): - There is inconsistency in what we return to userspace: ipc_addid() returns ENOSPC which is certainly _wrong_, while ipc_obtain_object_idr() returns EINVAL. - After we started using rhashtables, ipc_findkey() can return nil upon !tables_initialized, but the caller expects nil for when the ipc structure isn't found, and can therefore call into ipcget() callbacks. Now that rhashtable initialization cannot fail, we can properly get rid of the hack altogether. [manfred@colorfullife.com: commit id extended to 12 digits] Link: http://lkml.kernel.org/r/20180712185241.4017-10-manfred@colorfullife.com Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Kees Cook <keescook@chromium.org> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-22lib/rhashtable: guarantee initial hashtable allocationDavidlohr Bueso1-3/+11
rhashtable_init() may fail due to -ENOMEM, thus making the entire api unusable. This patch removes this scenario, however unlikely. In order to guarantee memory allocation, this patch always ends up doing GFP_KERNEL|__GFP_NOFAIL for both the tbl as well as alloc_bucket_spinlocks(). Upon the first table allocation failure, we shrink the size to the smallest value that makes sense and retry with __GFP_NOFAIL semantics. With the defaults, this means that from 64 buckets, we retry with only 4. Any later issues regarding performance due to collisions or larger table resizing (when more memory becomes available) is the least of our problems. Link: http://lkml.kernel.org/r/20180712185241.4017-9-manfred@colorfullife.com Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Kees Cook <keescook@chromium.org> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-22lib/rhashtable: simplify bucket_table_alloc()Davidlohr Bueso1-5/+2
As of ce91f6ee5b3b ("mm: kvmalloc does not fallback to vmalloc for incompatible gfp flags") we can simplify the caller and trust kvzalloc() to just do the right thing. For the case of the GFP_ATOMIC context, we can drop the __GFP_NORETRY flag for obvious reasons, and for the __GFP_NOWARN case, however, it is changed such that the caller passes the flag instead of making bucket_table_alloc() handle it. This slightly changes the gfp flags passed on to nested_table_alloc() as it will now also use GFP_ATOMIC | __GFP_NOWARN. However, I consider this a positive consequence as for the same reasons we want nowarn semantics in bucket_table_alloc(). [manfred@colorfullife.com: commit id extended to 12 digits, line wraps updated] Link: http://lkml.kernel.org/r/20180712185241.4017-8-manfred@colorfullife.com Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Kees Cook <keescook@chromium.org> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-22ipc: drop ipc_lock()Davidlohr Bueso3-43/+23
ipc/util.c contains multiple functions to get the ipc object pointer given an id number. There are two sets of function: One set verifies the sequence counter part of the id number, other functions do not check the sequence counter. The standard for function names in ipc/util.c is - ..._check() functions verify the sequence counter - ..._idr() functions do not verify the sequence counter ipc_lock() is an exception: It does not verify the sequence counter value, but this is not obvious from the function name. Furthermore, shm.c is the only user of this helper. Thus, we can simply move the logic into shm_lock() and get rid of the function altogether. [manfred@colorfullife.com: most of changelog] Link: http://lkml.kernel.org/r/20180712185241.4017-7-manfred@colorfullife.com Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Kees Cook <keescook@chromium.org> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-22ipc/util.c: correct comment in ipc_obtain_object_checkManfred Spraul1-2/+2
The comment that explains ipc_obtain_object_check is wrong: The function checks the sequence number, not the reference counter. Note that checking the reference counter would be meaningless: The reference counter is decreased without holding any locks, thus an object with kern_ipc_perm.deleted=true may disappear at the end of the next rcu grace period. Link: http://lkml.kernel.org/r/20180712185241.4017-6-manfred@colorfullife.com Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Reviewed-by: Davidlohr Bueso <dbueso@suse.de> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Kees Cook <keescook@chromium.org> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-22ipc: rename ipcctl_pre_down_nolock()Manfred Spraul5-8/+8
Both the comment and the name of ipcctl_pre_down_nolock() are misleading: The function must be called while holdling the rw semaphore. Therefore the patch renames the function to ipcctl_obtain_check(): This name matches the other names used in util.c: - "obtain" function look up a pointer in the idr, without acquiring the object lock. - The caller is responsible for locking. - _check means that the sequence number is checked. Link: http://lkml.kernel.org/r/20180712185241.4017-5-manfred@colorfullife.com Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Reviewed-by: Davidlohr Bueso <dbueso@suse.de> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Kees Cook <keescook@chromium.org> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>