summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)AuthorFilesLines
2019-05-22mfd: da9063: Fix OTP control register names to match datasheets for DA9063/63LSteve Twiss1-3/+3
commit 6b4814a9451add06d457e198be418bf6a3e6a990 upstream. Mismatch between what is found in the Datasheets for DA9063 and DA9063L provided by Dialog Semiconductor, and the register names provided in the MFD registers file. The changes are for the OTP (one-time-programming) control registers. The two naming errors are OPT instead of OTP, and COUNT instead of CONT (i.e. control). Cc: Stable <stable@vger.kernel.org> Signed-off-by: Steve Twiss <stwiss.opensource@diasemi.com> Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-22hugetlb: use same fault hash key for shared and private mappingsMike Kravetz1-3/+1
commit 1b426bac66e6cc83c9f2d92b96e4e72acf43419a upstream. hugetlb uses a fault mutex hash table to prevent page faults of the same pages concurrently. The key for shared and private mappings is different. Shared keys off address_space and file index. Private keys off mm and virtual address. Consider a private mappings of a populated hugetlbfs file. A fault will map the page from the file and if needed do a COW to map a writable page. Hugetlbfs hole punch uses the fault mutex to prevent mappings of file pages. It uses the address_space file index key. However, private mappings will use a different key and could race with this code to map the file page. This causes problems (BUG) for the page cache remove code as it expects the page to be unmapped. A sample stack is: page dumped because: VM_BUG_ON_PAGE(page_mapped(page)) kernel BUG at mm/filemap.c:169! ... RIP: 0010:unaccount_page_cache_page+0x1b8/0x200 ... Call Trace: __delete_from_page_cache+0x39/0x220 delete_from_page_cache+0x45/0x70 remove_inode_hugepages+0x13c/0x380 ? __add_to_page_cache_locked+0x162/0x380 hugetlbfs_fallocate+0x403/0x540 ? _cond_resched+0x15/0x30 ? __inode_security_revalidate+0x5d/0x70 ? selinux_file_permission+0x100/0x130 vfs_fallocate+0x13f/0x270 ksys_fallocate+0x3c/0x80 __x64_sys_fallocate+0x1a/0x20 do_syscall_64+0x5b/0x180 entry_SYSCALL_64_after_hwframe+0x44/0xa9 There seems to be another potential COW issue/race with this approach of different private and shared keys as noted in commit 8382d914ebf7 ("mm, hugetlb: improve page-fault scalability"). Since every hugetlb mapping (even anon and private) is actually a file mapping, just use the address_space index key for all mappings. This results in potentially more hash collisions. However, this should not be the common case. Link: http://lkml.kernel.org/r/20190328234704.27083-3-mike.kravetz@oracle.com Link: http://lkml.kernel.org/r/20190412165235.t4sscoujczfhuiyt@linux-r8p5 Fixes: b5cec28d36f5 ("hugetlbfs: truncate_hugepages() takes a range of pages") Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Reviewed-by: Davidlohr Bueso <dbueso@suse.de> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-22mm/huge_memory: fix vmf_insert_pfn_{pmd, pud}() crash, handle unaligned ↵Dan Williams1-4/+2
addresses commit fce86ff5802bac3a7b19db171aa1949ef9caac31 upstream. Starting with c6f3c5ee40c1 ("mm/huge_memory.c: fix modifying of page protection by insert_pfn_pmd()") vmf_insert_pfn_pmd() internally calls pmdp_set_access_flags(). That helper enforces a pmd aligned @address argument via VM_BUG_ON() assertion. Update the implementation to take a 'struct vm_fault' argument directly and apply the address alignment fixup internally to fix crash signatures like: kernel BUG at arch/x86/mm/pgtable.c:515! invalid opcode: 0000 [#1] SMP NOPTI CPU: 51 PID: 43713 Comm: java Tainted: G OE 4.19.35 #1 [..] RIP: 0010:pmdp_set_access_flags+0x48/0x50 [..] Call Trace: vmf_insert_pfn_pmd+0x198/0x350 dax_iomap_fault+0xe82/0x1190 ext4_dax_huge_fault+0x103/0x1f0 ? __switch_to_asm+0x40/0x70 __handle_mm_fault+0x3f6/0x1370 ? __switch_to_asm+0x34/0x70 ? __switch_to_asm+0x40/0x70 handle_mm_fault+0xda/0x200 __do_page_fault+0x249/0x4f0 do_page_fault+0x32/0x110 ? page_fault+0x8/0x30 page_fault+0x1e/0x30 Link: http://lkml.kernel.org/r/155741946350.372037.11148198430068238140.stgit@dwillia2-desk3.amr.corp.intel.com Fixes: c6f3c5ee40c1 ("mm/huge_memory.c: fix modifying of page protection by insert_pfn_pmd()") Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reported-by: Piotr Balcer <piotr.balcer@intel.com> Tested-by: Yan Ma <yan.ma@intel.com> Tested-by: Pankaj Gupta <pagupta@redhat.com> Reviewed-by: Matthew Wilcox <willy@infradead.org> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Chandan Rajendra <chandan@linux.ibm.com> Cc: Souptick Joarder <jrdr.linux@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-20Merge tag 'v5.1.3' into dev-5.1Joel Stanley2-1/+28
This is the 5.1.3 stable release Signed-off-by: Joel Stanley <joel@jms.id.au>
2019-05-16i2c: core: ratelimit 'transfer when suspended' errorsWolfram Sang1-1/+2
commit 4db61c2a16fce2ef85d82751de4ba43a39347cfb upstream. There are two problems with WARN_ON() here. One: It is not ratelimited. Two: We don't see which adapter was used when trying to transfer something when already suspended. Implement a custom ratelimit once per adapter and use dev_WARN there. This fixes both issues. Drawback is that we don't see if multiple drivers are trying to transfer with the same adapter while suspended. They need to be discovered one after the other now. This is better than a high CPU load because a really broken driver might try to resend endlessly. Fixes: 9ac6cb5fbb17 ("i2c: add suspended flag and accessors for i2c adapters") Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Reviewed-by: Simon Horman <horms+renesas@verge.net.au> Signed-off-by: Wolfram Sang <wsa@the-dreams.de> Cc: stable@vger.kernel.org # v5.1+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-14cpu/speculation: Add 'mitigations=' cmdline optionJosh Poimboeuf1-0/+24
commit 98af8452945c55652de68536afdde3b520fec429 upstream Keeping track of the number of mitigations for all the CPU speculation bugs has become overwhelming for many users. It's getting more and more complicated to decide which mitigations are needed for a given architecture. Complicating matters is the fact that each arch tends to have its own custom way to mitigate the same vulnerability. Most users fall into a few basic categories: a) they want all mitigations off; b) they want all reasonable mitigations on, with SMT enabled even if it's vulnerable; or c) they want all reasonable mitigations on, with SMT disabled if vulnerable. Define a set of curated, arch-independent options, each of which is an aggregation of existing options: - mitigations=off: Disable all mitigations. - mitigations=auto: [default] Enable all the default mitigations, but leave SMT enabled, even if it's vulnerable. - mitigations=auto,nosmt: Enable all the default mitigations, disabling SMT if needed by a mitigation. Currently, these options are placeholders which don't actually do anything. They will be fleshed out in upcoming patches. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Jiri Kosina <jkosina@suse.cz> (on x86) Reviewed-by: Jiri Kosina <jkosina@suse.cz> Cc: Borislav Petkov <bp@alien8.de> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Jiri Kosina <jikos@kernel.org> Cc: Waiman Long <longman@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Jon Masters <jcm@redhat.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: linuxppc-dev@lists.ozlabs.org Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: linux-s390@vger.kernel.org Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-arch@vger.kernel.org Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Tyler Hicks <tyhicks@canonical.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Steven Price <steven.price@arm.com> Cc: Phil Auld <pauld@redhat.com> Link: https://lkml.kernel.org/r/b07a8ef9b7c5055c3a4637c87d07c296d5016fe0.1555085500.git.jpoimboe@redhat.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-14x86/speculation/mds: Add sysfs reporting for MDSThomas Gleixner1-0/+2
commit 8a4b06d391b0a42a373808979b5028f5c84d9c6a upstream Add the sysfs reporting file for MDS. It exposes the vulnerability and mitigation state similar to the existing files for the other speculative hardware vulnerabilities. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Borislav Petkov <bp@suse.de> Reviewed-by: Jon Masters <jcm@redhat.com> Tested-by: Jon Masters <jcm@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-06clk: nuvoton: add npcm750 clock function prototype initializationTomer Maimon1-0/+9
OpenBMC-Staging-Count: 3 Signed-off-by: Tomer Maimon <tmaimon77@gmail.com> Signed-off-by: Joel Stanley <joel@jms.id.au>
2019-05-06mfd: intel-peci-client: Add PECI client MFD driverJae Hyun Yoo1-0/+110
This commit adds PECI client MFD driver. OpenBMC-Staging-Count: 3 Signed-off-by: Jae Hyun Yoo <jae.hyun.yoo@linux.intel.com> Signed-off-by: Joel Stanley <joel@jms.id.au>
2019-05-06peci: Add support for PECI bus driver coreJae Hyun Yoo1-0/+142
This commit adds driver implementation for PECI bus core into linux driver framework. PECI (Platform Environment Control Interface) is a one-wire bus interface that provides a communication channel from Intel processors and chipset components to external monitoring or control devices. PECI is designed to support the following sideband functions: * Processor and DRAM thermal management - Processor fan speed control is managed by comparing Digital Thermal Sensor (DTS) thermal readings acquired via PECI against the processor-specific fan speed control reference point, or TCONTROL. Both TCONTROL and DTS thermal readings are accessible via the processor PECI client. These variables are referenced to a common temperature, the TCC activation point, and are both defined as negative offsets from that reference. - PECI based access to the processor package configuration space provides a means for Baseboard Management Controllers (BMC) or other platform management devices to actively manage the processor and memory power and thermal features. * Platform Manageability - Platform manageability functions including thermal, power, and error monitoring. Note that platform 'power' management includes monitoring and control for both the processor and DRAM subsystem to assist with data center power limiting. - PECI allows read access to certain error registers in the processor MSR space and status monitoring registers in the PCI configuration space within the processor and downstream devices. - PECI permits writes to certain registers in the processor PCI configuration space. * Processor Interface Tuning and Diagnostics - Processor interface tuning and diagnostics capabilities (Intel Interconnect BIST). The processors Intel Interconnect Built In Self Test (Intel IBIST) allows for infield diagnostic capabilities in the Intel UPI and memory controller interfaces. PECI provides a port to execute these diagnostics via its PCI Configuration read and write capabilities. * Failure Analysis - Output the state of the processor after a failure for analysis via Crashdump. PECI uses a single wire for self-clocking and data transfer. The bus requires no additional control lines. The physical layer is a self-clocked one-wire bus that begins each bit with a driven, rising edge from an idle level near zero volts. The duration of the signal driven high depends on whether the bit value is a logic '0' or logic '1'. PECI also includes variable data transfer rate established with every message. In this way, it is highly flexible even though underlying logic is simple. The interface design was optimized for interfacing between an Intel processor and chipset components in both single processor and multiple processor environments. The single wire interface provides low board routing overhead for the multiple load connections in the congested routing area near the processor and chipset components. Bus speed, error checking, and low protocol overhead provides adequate link bandwidth and reliability to transfer critical device operating conditions and configuration information. This implementation provides the basic framework to add PECI extensions to the Linux bus and device models. A hardware specific 'Adapter' driver can be attached to the PECI bus to provide sideband functions described above. It is also possible to access all devices on an adapter from userspace through the /dev interface. A device specific 'Client' driver also can be attached to the PECI bus so each processor client's features can be supported by the 'Client' driver through an adapter connection in the bus. OpenBMC-Staging-Count: 3 Signed-off-by: Jae Hyun Yoo <jae.hyun.yoo@linux.intel.com> Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Reviewed-by: Haiyue Wang <haiyue.wang@linux.intel.com> Reviewed-by: James Feist <james.feist@linux.intel.com> Reviewed-by: Vernon Mauery <vernon.mauery@linux.intel.com> [joel: Fix access_ok usage for 5.0] Signed-off-by: Joel Stanley <joel@jms.id.au>
2019-05-06Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds1-1/+0
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "I'd like to apologize for this very late pull request: I was dithering through the week whether to send the fixes, and then yesterday Jiri's crash fix for a regression introduced in this cycle clearly marked perf/urgent as 'must merge now'. Most of the commits are tooling fixes, plus there's three kernel fixes via four commits: - race fix in the Intel PEBS code - fix an AUX bug and roll back a previous attempt - fix AMD family 17h generic HW cache-event perf counters The largest diffstat contribution comes from the AMD fix - a new event table is introduced, which is a fairly low risk change but has a large linecount" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel: Fix race in intel_pmu_disable_event() perf/x86/intel/pt: Remove software double buffering PMU capability perf/ring_buffer: Fix AUX software double buffering perf tools: Remove needless asm/unistd.h include fixing build in some places tools arch uapi: Copy missing unistd.h headers for arc, hexagon and riscv tools build: Add -ldl to the disassembler-four-args feature test perf cs-etm: Always allocate memory for cs_etm_queue::prev_packet perf cs-etm: Don't check cs_etm_queue::prev_packet validity perf report: Report OOM in status line in the GTK UI perf bench numa: Add define for RUSAGE_THREAD if not present tools lib traceevent: Change tag string for error perf annotate: Fix build on 32 bit for BPF annotation tools uapi x86: Sync vmx.h with the kernel perf bpf: Return value with unlocking in perf_env__find_btf() MAINTAINERS: Include vendor specific files under arch/*/events/* perf/x86/amd: Update generic hardware cache events for Family 17h
2019-05-03Merge tag 'clk-fixes-for-linus' of ↵Linus Torvalds1-0/+16
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/clk/linux Pull clk fixes from Stephen Boyd: "Two fixes for the NKMP clks on Allwinner SoCs, a locking fix for clkdev where we forgot to hold a lock while iterating a list that can change, and finally a build fix that adds some stubs for clk APIs that are used by devfreq drivers on platforms without the clk APIs" * tag 'clk-fixes-for-linus' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/clk/linux: clk: Add missing stubs for a few functions clkdev: Hold clocks_mutex while iterating clocks list clk: sunxi-ng: nkmp: Explain why zero width check is needed clk: sunxi-ng: nkmp: Avoid GENMASK(-1, 0)
2019-05-03perf/x86/intel/pt: Remove software double buffering PMU capabilityAlexander Shishkin1-1/+0
Now that all AUX allocations are high-order by default, the software double buffering PMU capability doesn't make sense any more, get rid of it. In case some PMUs choose to opt out, we can re-introduce it. Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: adrian.hunter@intel.com Link: http://lkml.kernel.org/r/20190503085536.24119-3-alexander.shishkin@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-05-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds1-1/+1
Pull networking fixes from David Miller: 1) Out of bounds access in xfrm IPSEC policy unlink, from Yue Haibing. 2) Missing length check for esp4 UDP encap, from Sabrina Dubroca. 3) Fix byte order of RX STBC access in mac80211, from Johannes Berg. 4) Inifnite loop in bpftool map create, from Alban Crequy. 5) Register mark fix in ebpf verifier after pkt/null checks, from Paul Chaignon. 6) Properly use rcu_dereference_sk_user_data in L2TP code, from Eric Dumazet. 7) Buffer overrun in marvell phy driver, from Andrew Lunn. 8) Several crash and statistics handling fixes to bnxt_en driver, from Michael Chan and Vasundhara Volam. 9) Several fixes to the TLS layer from Jakub Kicinski (copying negative amounts of data in reencrypt, reencrypt frag copying, blind nskb->sk NULL deref, etc). 10) Several UDP GRO fixes, from Paolo Abeni and Eric Dumazet. 11) PID/UID checks on ipv6 flow labels are inverted, from Willem de Bruijn. 12) Use after free in l2tp, from Eric Dumazet. 13) IPV6 route destroy races, also from Eric Dumazet. 14) SCTP state machine can erroneously run recursively, fix from Xin Long. 15) Adjust AF_PACKET msg_name length checks, add padding bytes if necessary. From Willem de Bruijn. 16) Preserve skb_iif, so that forwarded packets have consistent values even if fragmentation is involved. From Shmulik Ladkani. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (69 commits) udp: fix GRO packet of death ipv6: A few fixes on dereferencing rt->from rds: ib: force endiannes annotation selftests: fib_rule_tests: print the result and return 1 if any tests failed ipv4: ip_do_fragment: Preserve skb_iif during fragmentation net/tls: avoid NULL pointer deref on nskb->sk in fallback selftests: fib_rule_tests: Fix icmp proto with ipv6 packet: validate msg_namelen in send directly packet: in recvmsg msg_name return at least sizeof sockaddr_ll sctp: avoid running the sctp state machine recursively stmmac: pci: Fix typo in IOT2000 comment Documentation: fix netdev-FAQ.rst markup warning ipv6: fix races in ip6_dst_destroy() l2ip: fix possible use-after-free appletalk: Set error code if register_snap_client failed net: dsa: bcm_sf2: fix buffer overflow doing set_rxnfc rxrpc: Fix net namespace cleanup ipv6/flowlabel: wait rcu grace period before put_pid() vrf: Use orig netdev to count Ip6InNoRoutes and a fresh route lookup when sending dest unreach tcp: add sanity tests in tcp_add_backlog() ...
2019-05-02Merge tag 'for-linus-20190502' of git://git.kernel.dk/linux-blockLinus Torvalds1-1/+1
Pull io_uring fixes from Jens Axboe: "This is mostly io_uring fixes/tweaks. Most of these were actually done in time for the last -rc, but I wanted to ensure that everything tested out great before including them. The code delta looks larger than it really is, as it's mostly just comment additions/changes. Outside of the comment additions/changes, this is mostly removal of unnecessary barriers. In all, this pull request contains: - Tweak to how we handle errors at submission time. We now post a completion event if the error occurs on behalf of an sqe, instead of returning it through the system call. If the error happens outside of a specific sqe, we return the error through the system call. This makes it nicer to use and makes the "normal" use case behave the same as the offload cases. (me) - Fix for a missing req reference drop from async context (me) - If an sqe is submitted with RWF_NOWAIT, don't punt it to async context. Return -EAGAIN directly, instead of using it as a hint to do async punt. (Stefan) - Fix notes on barriers (Stefan) - Remove unnecessary barriers (Stefan) - Fix potential double free of memory in setup error (Mark) - Further improve sq poll CPU validation (Mark) - Fix page allocation warning and leak on buffer registration error (Mark) - Fix iov_iter_type() for new no-ref flag (Ming) - Fix a case where dio doesn't honor bio no-page-ref (Ming)" * tag 'for-linus-20190502' of git://git.kernel.dk/linux-block: io_uring: avoid page allocation warnings iov_iter: fix iov_iter_type block: fix handling for BIO_NO_PAGE_REF io_uring: drop req submit reference always in async punt io_uring: free allocated io_memory once io_uring: fix SQPOLL cpu validation io_uring: have submission side sqe errors post a cqe io_uring: remove unnecessary barrier after unsetting IORING_SQ_NEED_WAKEUP io_uring: remove unnecessary barrier after incrementing dropped counter io_uring: remove unnecessary barrier before reading SQ tail io_uring: remove unnecessary barrier after updating SQ head io_uring: remove unnecessary barrier before reading cq head io_uring: remove unnecessary barrier before wq_has_sleeper io_uring: fix notes on barriers io_uring: fix handling SQEs requesting NOWAIT
2019-05-01iov_iter: fix iov_iter_typeMing Lei1-1/+1
Commit 875f1d0769cd ("iov_iter: add ITER_BVEC_FLAG_NO_REF flag") introduces one extra flag of ITER_BVEC_FLAG_NO_REF, and this flag is stored into iter->type. However, iov_iter_type() doesn't consider the new added flag, fix it by masking this flag in iov_iter_type(). Fixes: 875f1d0769cd ("iov_iter: add ITER_BVEC_FLAG_NO_REF flag") Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-04-30Merge tag 'usb-5.1-rc8' of ↵Linus Torvalds1-2/+0
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb Pull USB fixes from Greg KH: "Here are some small USB fixes for a bunch of warnings/errors that the syzbot has been finding with it's new-found ability to stress-test the USB layer. All of these are tiny, but fix real issues, and are marked for stable as well. All of these have had lots of testing in linux-next as well" * tag 'usb-5.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: USB: w1 ds2490: Fix bug caused by improper use of altsetting array USB: yurex: Fix protection fault after device removal usb: usbip: fix isoc packet num validation in get_pipe USB: core: Fix bug caused by duplicate interface PM usage counter USB: dummy-hcd: Fix failure to give back unlinked URBs USB: core: Fix unterminated string returned by usb_string()
2019-04-26Merge tag 'trace-v5.1-rc6' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: "Three tracing fixes: - Use "nosteal" for ring buffer splice pages - Memory leak fix in error path of trace_pid_write() - Fix preempt_enable_no_resched() (use preempt_enable()) in ring buffer code" * tag 'trace-v5.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: trace: Fix preempt_enable_no_resched() abuse tracing: Fix a memory leak by early error exit in trace_pid_write() tracing: Fix buffer_ref pipe ops
2019-04-26tracing: Fix buffer_ref pipe opsJann Horn1-0/+1
This fixes multiple issues in buffer_pipe_buf_ops: - The ->steal() handler must not return zero unless the pipe buffer has the only reference to the page. But generic_pipe_buf_steal() assumes that every reference to the pipe is tracked by the page's refcount, which isn't true for these buffers - buffer_pipe_buf_get(), which duplicates a buffer, doesn't touch the page's refcount. Fix it by using generic_pipe_buf_nosteal(), which refuses every attempted theft. It should be easy to actually support ->steal, but the only current users of pipe_buf_steal() are the virtio console and FUSE, and they also only use it as an optimization. So it's probably not worth the effort. - The ->get() and ->release() handlers can be invoked concurrently on pipe buffers backed by the same struct buffer_ref. Make them safe against concurrency by using refcount_t. - The pointers stored in ->private were only zeroed out when the last reference to the buffer_ref was dropped. As far as I know, this shouldn't be necessary anyway, but if we do it, let's always do it. Link: http://lkml.kernel.org/r/20190404215925.253531-1-jannh@google.com Cc: Ingo Molnar <mingo@redhat.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: stable@vger.kernel.org Fixes: 73a757e63114d ("ring-buffer: Return reader page back into existing ring buffer") Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-04-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller1-1/+1
Alexei Starovoitov says: ==================== pull-request: bpf 2019-04-25 The following pull-request contains BPF updates for your *net* tree. The main changes are: 1) the bpf verifier fix to properly mark registers in all stack frames, from Paul. 2) preempt_enable_no_resched->preempt_enable fix, from Peter. 3) other misc fixes. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-26bpf: Fix preempt_enable_no_resched() abusePeter Zijlstra1-1/+1
Unless the very next line is schedule(), or implies it, one must not use preempt_enable_no_resched(). It can cause a preemption to go missing and thereby cause arbitrary delays, breaking the PREEMPT=y invariant. Cc: Roman Gushchin <guro@fb.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-25clk: Add missing stubs for a few functionsDmitry Osipenko1-0/+16
Compilation fails if any of undeclared clk_set_*() functions are in use and CONFIG_HAVE_CLK=n. Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Dmitry Osipenko <digetx@gmail.com> Signed-off-by: Stephen Boyd <sboyd@kernel.org>
2019-04-25Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds1-0/+12
Pull networking fixes from David Miller: "Just the usual assortment of small'ish fixes: 1) Conntrack timeout is sometimes not initialized properly, from Alexander Potapenko. 2) Add a reasonable range limit to tcp_min_rtt_wlen to avoid undefined behavior. From ZhangXiaoxu. 3) des1 field of descriptor in stmmac driver is initialized with the wrong variable. From Yue Haibing. 4) Increase mlxsw pci sw reset timeout a little bit more, from Ido Schimmel. 5) Match IOT2000 stmmac devices more accurately, from Su Bao Cheng. 6) Fallback refcount fix in TLS code, from Jakub Kicinski. 7) Fix max MTU check when using XDP in mlx5, from Maxim Mikityanskiy. 8) Fix recursive locking in team driver, from Hangbin Liu. 9) Fix tls_set_device_offload_Rx() deadlock, from Jakub Kicinski. 10) Don't use napi_alloc_frag() outside of softiq context of socionext driver, from Ilias Apalodimas. 11) MAC address increment overflow in ncsi, from Tao Ren. 12) Fix a regression in 8K/1M pool switching of RDS, from Zhu Yanjun. 13) ipv4_link_failure has to validate the headers that are actually there because RAW sockets can pass in arbitrary garbage, from Eric Dumazet" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (43 commits) ipv4: add sanity checks in ipv4_link_failure() net/rose: fix unbound loop in rose_loopback_timer() rxrpc: fix race condition in rxrpc_input_packet() net: rds: exchange of 8K and 1M pool net: vrf: Fix operation not supported when set vrf mac net/ncsi: handle overflow when incrementing mac address net: socionext: replace napi_alloc_frag with the netdev variant on init net: atheros: fix spelling mistake "underun" -> "underrun" spi: ST ST95HF NFC: declare missing of table spi: Micrel eth switch: declare missing of table net: stmmac: move stmmac_check_ether_addr() to driver probe netfilter: fix nf_l4proto_log_invalid to log invalid packets netfilter: never get/set skb->tstamp netfilter: ebtables: CONFIG_COMPAT: drop a bogus WARN_ON Documentation: decnet: remove reference to CONFIG_DECNET_ROUTE_FWMARK dt-bindings: add an explanation for internal phy-mode net/tls: don't leak IV and record seq when offload fails net/tls: avoid potential deadlock in tls_set_device_offload_rx() selftests/net: correct the return value for run_afpackettests team: fix possible recursive locking when add slaves ...
2019-04-24net/ncsi: handle overflow when incrementing mac addressTao Ren1-0/+12
Previously BMC's MAC address is calculated by simply adding 1 to the last byte of network controller's MAC address, and it produces incorrect result when network controller's MAC address ends with 0xFF. The problem can be fixed by calling eth_addr_inc() function to increment MAC address; besides, the MAC address is also validated before assigning to BMC. Fixes: cb10c7c0dfd9 ("net/ncsi: Add NCSI Broadcom OEM command") Signed-off-by: Tao Ren <taoren@fb.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Samuel Mendoza-Jonas <sam@mendozajonas.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-20Merge tag 'for-linus-20190420' of git://git.kernel.dk/linux-blockLinus Torvalds3-3/+4
Pull block fixes from Jens Axboe: "A set of small fixes that should go into this series. This contains: - Removal of unused queue member (Hou) - Overflow bvec fix (Ming) - Various little io_uring tweaks (me) - kthread parking - Only call cpu_possible() for verified CPU - Drop unused 'file' argument to io_file_put() - io_uring_enter vs io_uring_register deadlock fix - CQ overflow fix - BFQ internal depth update fix (me)" * tag 'for-linus-20190420' of git://git.kernel.dk/linux-block: block: make sure that bvec length can't be overflow block: kill all_q_node in request_queue io_uring: fix CQ overflow condition io_uring: fix possible deadlock between io_uring_{enter,register} io_uring: drop io_file_put() 'file' argument bfq: update internal depth state when queue depth changes io_uring: only test SQPOLL cpu after we've verified it io_uring: park SQPOLL thread if it's percpu
2019-04-20Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Misc fixes: - various tooling fixes - kretprobe fixes - kprobes annotation fixes - kprobes error checking fix - fix the default events for AMD Family 17h CPUs - PEBS fix - AUX record fix - address filtering fix" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/kprobes: Avoid kretprobe recursion bug kprobes: Mark ftrace mcount handler functions nokprobe x86/kprobes: Verify stack frame on kretprobe perf/x86/amd: Add event map for AMD Family 17h perf bpf: Return NULL when RB tree lookup fails in perf_env__find_btf() perf tools: Fix map reference counting perf evlist: Fix side band thread draining perf tools: Check maps for bpf programs perf bpf: Return NULL when RB tree lookup fails in perf_env__find_bpf_prog_info() tools include uapi: Sync sound/asound.h copy perf top: Always sample time to satisfy needs of use of ordered queuing perf evsel: Use hweight64() instead of hweight_long(attr.sample_regs_user) tools lib traceevent: Fix missing equality check for strcmp perf stat: Disable DIR_FORMAT feature for 'perf stat record' perf scripts python: export-to-sqlite.py: Fix use of parent_id in calls_view perf header: Fix lock/unlock imbalances when processing BPF/BTF info perf/x86: Fix incorrect PEBS_REGS perf/ring_buffer: Fix AUX record suppression perf/core: Fix the address filtering fix kprobes: Fix error check when reusing optimized probes
2019-04-20Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds1-1/+6
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Misc fixes all over the place: a console spam fix, section attributes fixes, a KASLR fix, a TLB stack-variable alignment fix, a reboot quirk, boot options related warnings fix, an LTO fix, a deadlock fix and an RDT fix" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/cpu/intel: Lower the "ENERGY_PERF_BIAS: Set to normal" message's log priority x86/cpu/bugs: Use __initconst for 'const' init data x86/mm/KASLR: Fix the size of the direct mapping section x86/Kconfig: Fix spelling mistake "effectivness" -> "effectiveness" x86/mm/tlb: Revert "x86/mm: Align TLB invalidation info" x86/reboot, efi: Use EFI reboot for Acer TravelMate X514-51T x86/mm: Prevent bogus warnings with "noexec=off" x86/build/lto: Fix truncated .bss with -fdata-sections x86/speculation: Prevent deadlock on ssb_state::lock x86/resctrl: Do not repeat rdtgroup mode initialization
2019-04-19USB: core: Fix bug caused by duplicate interface PM usage counterAlan Stern1-2/+0
The syzkaller fuzzer reported a bug in the USB hub driver which turned out to be caused by a negative runtime-PM usage counter. This allowed a hub to be runtime suspended at a time when the driver did not expect it. The symptom is a WARNING issued because the hub's status URB is submitted while it is already active: URB 0000000031fb463e submitted while active WARNING: CPU: 0 PID: 2917 at drivers/usb/core/urb.c:363 The negative runtime-PM usage count was caused by an unfortunate design decision made when runtime PM was first implemented for USB. At that time, USB class drivers were allowed to unbind from their interfaces without balancing the usage counter (i.e., leaving it with a positive count). The core code would take care of setting the counter back to 0 before allowing another driver to bind to the interface. Later on when runtime PM was implemented for the entire kernel, the opposite decision was made: Drivers were required to balance their runtime-PM get and put calls. In order to maintain backward compatibility, however, the USB subsystem adapted to the new implementation by keeping an independent usage counter for each interface and using it to automatically adjust the normal usage counter back to 0 whenever a driver was unbound. This approach involves duplicating information, but what is worse, it doesn't work properly in cases where a USB class driver delays decrementing the usage counter until after the driver's disconnect() routine has returned and the counter has been adjusted back to 0. Doing so would cause the usage counter to become negative. There's even a warning about this in the USB power management documentation! As it happens, this is exactly what the hub driver does. The kick_hub_wq() routine increments the runtime-PM usage counter, and the corresponding decrement is carried out by hub_event() in the context of the hub_wq work-queue thread. This work routine may sometimes run after the driver has been unbound from its interface, and when it does it causes the usage counter to go negative. It is not possible for hub_disconnect() to wait for a pending hub_event() call to finish, because hub_disconnect() is called with the device lock held and hub_event() acquires that lock. The only feasible fix is to reverse the original design decision: remove the duplicate interface-specific usage counter and require USB drivers to balance their runtime PM gets and puts. As far as I know, all existing drivers currently do this. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Reported-and-tested-by: syzbot+7634edaea4d0b341c625@syzkaller.appspotmail.com CC: <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-19block: make sure that bvec length can't be overflowMing Lei1-2/+3
bvec->bv_offset may be bigger than PAGE_SIZE sometimes, such as, when one bio is splitted in the middle of one bvec via bio_split(), and bi_iter.bi_bvec_done is used to build offset of the 1st bvec of remained bio. And the remained bio's bvec may be re-submitted to fs layer via ITER_IBVEC, such as loop and nvme-loop. So we have to make sure that every bvec's offset is less than PAGE_SIZE from bio_for_each_segment_all() because some drivers(loop, nvme-loop) passes the splitted bvec to fs layer via ITER_BVEC. This patch fixes this issue reported by Zhang Yi When running nvme/011. Cc: Christoph Hellwig <hch@lst.de> Cc: Yi Zhang <yi.zhang@redhat.com> Reported-by: Yi Zhang <yi.zhang@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Fixes: 6dc4f100c175 ("block: allow bio_for_each_segment_all() to iterate over multi-page bvec") Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-04-19block: kill all_q_node in request_queueHou Tao1-1/+0
all_q_node has not been used since commit 4b855ad37194 ("blk-mq: Create hctx for each present CPU"), so remove it. Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-04-19coredump: fix race condition between mmget_not_zero()/get_task_mm() and core ↵Andrea Arcangeli1-0/+21
dumping The core dumping code has always run without holding the mmap_sem for writing, despite that is the only way to ensure that the entire vma layout will not change from under it. Only using some signal serialization on the processes belonging to the mm is not nearly enough. This was pointed out earlier. For example in Hugh's post from Jul 2017: https://lkml.kernel.org/r/alpine.LSU.2.11.1707191716030.2055@eggly.anvils "Not strictly relevant here, but a related note: I was very surprised to discover, only quite recently, how handle_mm_fault() may be called without down_read(mmap_sem) - when core dumping. That seems a misguided optimization to me, which would also be nice to correct" In particular because the growsdown and growsup can move the vm_start/vm_end the various loops the core dump does around the vma will not be consistent if page faults can happen concurrently. Pretty much all users calling mmget_not_zero()/get_task_mm() and then taking the mmap_sem had the potential to introduce unexpected side effects in the core dumping code. Adding mmap_sem for writing around the ->core_dump invocation is a viable long term fix, but it requires removing all copy user and page faults and to replace them with get_dump_page() for all binary formats which is not suitable as a short term fix. For the time being this solution manually covers the places that can confuse the core dump either by altering the vma layout or the vma flags while it runs. Once ->core_dump runs under mmap_sem for writing the function mmget_still_valid() can be dropped. Allowing mmap_sem protected sections to run in parallel with the coredump provides some minor parallelism advantage to the swapoff code (which seems to be safe enough by never mangling any vma field and can keep doing swapins in parallel to the core dumping) and to some other corner case. In order to facilitate the backporting I added "Fixes: 86039bd3b4e6" however the side effect of this same race condition in /proc/pid/mem should be reproducible since before 2.6.12-rc2 so I couldn't add any other "Fixes:" because there's no hash beyond the git genesis commit. Because find_extend_vma() is the only location outside of the process context that could modify the "mm" structures under mmap_sem for reading, by adding the mmget_still_valid() check to it, all other cases that take the mmap_sem for reading don't need the new check after mmget_not_zero()/get_task_mm(). The expand_stack() in page fault context also doesn't need the new check, because all tasks under core dumping are frozen. Link: http://lkml.kernel.org/r/20190325224949.11068-1-aarcange@redhat.com Fixes: 86039bd3b4e6 ("userfaultfd: add new syscall to provide memory externalization") Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Reported-by: Jann Horn <jannh@google.com> Suggested-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Peter Xu <peterx@redhat.com> Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Jann Horn <jannh@google.com> Acked-by: Jason Gunthorpe <jgg@mellanox.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-04-19mm: swapoff: shmem_unuse() stop eviction without igrab()Hugh Dickins1-0/+1
The igrab() in shmem_unuse() looks good, but we forgot that it gives no protection against concurrent unmounting: a point made by Konstantin Khlebnikov eight years ago, and then fixed in 2.6.39 by 778dd893ae78 ("tmpfs: fix race between umount and swapoff"). The current 5.1-rc swapoff is liable to hit "VFS: Busy inodes after unmount of tmpfs. Self-destruct in 5 seconds. Have a nice day..." followed by GPF. Once again, give up on using igrab(); but don't go back to making such heavy-handed use of shmem_swaplist_mutex as last time: that would spoil the new design, and I expect could deadlock inside shmem_swapin_page(). Instead, shmem_unuse() just raise a "stop_eviction" count in the shmem- specific inode, and shmem_evict_inode() wait for that to go down to 0. Call it "stop_eviction" rather than "swapoff_busy" because it can be put to use for others later (huge tmpfs patches expect to use it). That simplifies shmem_unuse(), protecting it from both unlink and unmount; and in practice lets it locate all the swap in its first try. But do not rely on that: there's still a theoretical case, when shmem_writepage() might have been preempted after its get_swap_page(), before making the swap entry visible to swapoff. [hughd@google.com: remove incorrect list_del()] Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1904091133570.1898@eggly.anvils Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1904081259400.1523@eggly.anvils Fixes: b56a2d8af914 ("mm: rid swapoff of quadratic complexity") Signed-off-by: Hugh Dickins <hughd@google.com> Cc: "Alex Xu (Hello71)" <alex_y_xu@yahoo.ca> Cc: Huang Ying <ying.huang@intel.com> Cc: Kelley Nielsen <kelleynnn@gmail.com> Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Cc: Rik van Riel <riel@surriel.com> Cc: Vineeth Pillai <vpillai@digitalocean.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-04-19x86/kprobes: Verify stack frame on kretprobeMasami Hiramatsu1-0/+1
Verify the stack frame pointer on kretprobe trampoline handler, If the stack frame pointer does not match, it skips the wrong entry and tries to find correct one. This can happen if user puts the kretprobe on the function which can be used in the path of ftrace user-function call. Such functions should not be probed, so this adds a warning message that reports which function should be blacklisted. Tested-by: Andrea Righi <righi.andrea@gmail.com> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/155094059185.6137.15527904013362842072.stgit@devbox Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-17Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds1-0/+3
Pull networking fixes from David Miller: 1) Handle init flow failures properly in iwlwifi driver, from Shahar S Matityahu. 2) mac80211 TXQs need to be unscheduled on powersave start, from Felix Fietkau. 3) SKB memory accounting fix in A-MDSU aggregation, from Felix Fietkau. 4) Increase RCU lock hold time in mlx5 FPGA code, from Saeed Mahameed. 5) Avoid checksum complete with XDP in mlx5, also from Saeed. 6) Fix netdev feature clobbering in ibmvnic driver, from Thomas Falcon. 7) Partial sent TLS record leak fix from Jakub Kicinski. 8) Reject zero size iova range in vhost, from Jason Wang. 9) Allow pending work to complete before clcsock release from Karsten Graul. 10) Fix XDP handling max MTU in thunderx, from Matteo Croce. 11) A lot of protocols look at the sa_family field of a sockaddr before validating it's length is large enough, from Tetsuo Handa. 12) Don't write to free'd pointer in qede ptp error path, from Colin Ian King. 13) Have to recompile IP options in ipv4_link_failure because it can be invoked from ARP, from Stephen Suryaputra. 14) Doorbell handling fixes in qed from Denis Bolotin. 15) Revert net-sysfs kobject register leak fix, it causes new problems. From Wang Hai. 16) Spectre v1 fix in ATM code, from Gustavo A. R. Silva. 17) Fix put of BROPT_VLAN_STATS_PER_PORT in bridging code, from Nikolay Aleksandrov. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (111 commits) socket: fix compat SO_RCVTIMEO_NEW/SO_SNDTIMEO_NEW tcp: tcp_grow_window() needs to respect tcp_space() ocelot: Clean up stats update deferred work ocelot: Don't sleep in atomic context (irqs_disabled()) net: bridge: fix netlink export of vlan_stats_per_port option qed: fix spelling mistake "faspath" -> "fastpath" tipc: set sysctl_tipc_rmem and named_timeout right range tipc: fix link established but not in session net: Fix missing meta data in skb with vlan packet net: atm: Fix potential Spectre v1 vulnerabilities net/core: work around section mismatch warning for ptp_classifier net: bridge: fix per-port af_packet sockets bnx2x: fix spelling mistake "dicline" -> "decline" route: Avoid crash from dereferencing NULL rt->from MAINTAINERS: normalize Woojung Huh's email address bonding: fix event handling for stacked bonds Revert "net-sysfs: Fix memory leak in netdev_register_kobject" rtnetlink: fix rtnl_valid_stats_req() nlmsg_len check qed: Fix the DORQ's attentions handling qed: Fix missing DORQ attentions ...
2019-04-16Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds1-4/+6
Pull KVM fixes from Paolo Bonzini: "5.1 keeps its reputation as a big bugfix release for KVM x86. - Fix for a memory leak introduced during the merge window - Fixes for nested VMX with ept=0 - Fixes for AMD (APIC virtualization, NMI injection) - Fixes for Hyper-V under KVM and KVM under Hyper-V - Fixes for 32-bit SMM and tests for SMM virtualization - More array_index_nospec peppering" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (21 commits) KVM: x86: avoid misreporting level-triggered irqs as edge-triggered in tracing KVM: fix spectrev1 gadgets KVM: x86: fix warning Using plain integer as NULL pointer selftests: kvm: add a selftest for SMM selftests: kvm: fix for compilers that do not support -no-pie selftests: kvm/evmcs_test: complete I/O before migrating guest state KVM: x86: Always use 32-bit SMRAM save state for 32-bit kernels KVM: x86: Don't clear EFER during SMM transitions for 32-bit vCPU KVM: x86: clear SMM flags before loading state while leaving SMM KVM: x86: Open code kvm_set_hflags KVM: x86: Load SMRAM in a single shot when leaving SMM KVM: nVMX: Expose RDPMC-exiting only when guest supports PMU KVM: x86: Raise #GP when guest vCPU do not support PMU x86/kvm: move kvm_load/put_guest_xcr0 into atomic context KVM: x86: svm: make sure NMI is injected after nmi_singlestep svm/avic: Fix invalidate logical APIC id entry Revert "svm: Fix AVIC incomplete IPI emulation" kvm: mmu: Fix overflow on kvm mmu page limit calculation KVM: nVMX: always use early vmcs check when EPT is disabled KVM: nVMX: allow tests to use bad virtual-APIC page address ...
2019-04-16KVM: fix spectrev1 gadgetsPaolo Bonzini1-4/+6
These were found with smatch, and then generalized when applicable. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-04-16x86/reboot, efi: Use EFI reboot for Acer TravelMate X514-51TJian-Hong Pan1-1/+6
Upon reboot, the Acer TravelMate X514-51T laptop appears to complete the shutdown process, but then it hangs in BIOS POST with a black screen. The problem is intermittent - at some points it has appeared related to Secure Boot settings or different kernel builds, but ultimately we have not been able to identify the exact conditions that trigger the issue to come and go. Besides, the EFI mode cannot be disabled in the BIOS of this model. However, after extensive testing, we observe that using the EFI reboot method reliably avoids the issue in all cases. So add a boot time quirk to use EFI reboot on such systems. Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=203119 Signed-off-by: Jian-Hong Pan <jian-hong@endlessm.com> Signed-off-by: Daniel Drake <drake@endlessm.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Cc: linux@endlessm.com Link: http://lkml.kernel.org/r/20190412080152.3718-1-jian-hong@endlessm.com [ Fix !CONFIG_EFI build failure, clarify the code and the changelog a bit. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-15Merge branch 'page-refs' (page ref overflow)Linus Torvalds2-5/+20
Merge page ref overflow branch. Jann Horn reported that he can overflow the page ref count with sufficient memory (and a filesystem that is intentionally extremely slow). Admittedly it's not exactly easy. To have more than four billion references to a page requires a minimum of 32GB of kernel memory just for the pointers to the pages, much less any metadata to keep track of those pointers. Jann needed a total of 140GB of memory and a specially crafted filesystem that leaves all reads pending (in order to not ever free the page references and just keep adding more). Still, we have a fairly straightforward way to limit the two obvious user-controllable sources of page references: direct-IO like page references gotten through get_user_pages(), and the splice pipe page duplication. So let's just do that. * branch page-refs: fs: prevent page refcount overflow in pipe_buf_get mm: prevent get_user_pages() from overflowing page refcount mm: add 'try_get_page()' helper function mm: make page ref count overflow check tighter and more explicit
2019-04-14fs: prevent page refcount overflow in pipe_buf_getMatthew Wilcox1-4/+6
Change pipe_buf_get() to return a bool indicating whether it succeeded in raising the refcount of the page (if the thing in the pipe is a page). This removes another mechanism for overflowing the page refcount. All callers converted to handle a failure. Reported-by: Jann Horn <jannh@google.com> Signed-off-by: Matthew Wilcox <willy@infradead.org> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-04-14mm: add 'try_get_page()' helper functionLinus Torvalds1-0/+9
This is the same as the traditional 'get_page()' function, but instead of unconditionally incrementing the reference count of the page, it only does so if the count was "safe". It returns whether the reference count was incremented (and is marked __must_check, since the caller obviously has to be aware of it). Also like 'get_page()', you can't use this function unless you already had a reference to the page. The intent is that you can use this exactly like get_page(), but in situations where you want to limit the maximum reference count. The code currently does an unconditional WARN_ON_ONCE() if we ever hit the reference count issues (either zero or negative), as a notification that the conditional non-increment actually happened. NOTE! The count access for the "safety" check is inherently racy, but that doesn't matter since the buffer we use is basically half the range of the reference count (ie we look at the sign of the count). Acked-by: Matthew Wilcox <willy@infradead.org> Cc: Jann Horn <jannh@google.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-04-14mm: make page ref count overflow check tighter and more explicitLinus Torvalds1-1/+5
We have a VM_BUG_ON() to check that the page reference count doesn't underflow (or get close to overflow) by checking the sign of the count. That's all fine, but we actually want to allow people to use a "get page ref unless it's already very high" helper function, and we want that one to use the sign of the page ref (without triggering this VM_BUG_ON). Change the VM_BUG_ON to only check for small underflows (or _very_ close to overflowing), and ignore overflows which have strayed into negative territory. Acked-by: Matthew Wilcox <willy@infradead.org> Cc: Jann Horn <jannh@google.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-04-14bfq: update internal depth state when queue depth changesJens Axboe1-0/+1
A previous commit moved the shallow depth and BFQ depth map calculations to be done at init time, moving it outside of the hotter IO path. This potentially causes hangs if the users changes the depth of the scheduler map, by writing to the 'nr_requests' sysfs file for that device. Add a blk-mq-sched hook that allows blk-mq to inform the scheduler if the depth changes, so that the scheduler can update its internal state. Tested-by: Kai Krakow <kai@kaishome.de> Reported-by: Paolo Valente <paolo.valente@linaro.org> Fixes: f0635b8a416e ("bfq: calculate shallow depths at init time") Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-04-14Merge tag 'for-linus-20190412' of git://git.kernel.dk/linux-blockLinus Torvalds4-14/+30
Pull block fixes from Jens Axboe: "Set of fixes that should go into this round. This pull is larger than I'd like at this time, but there's really no specific reason for that. Some are fixes for issues that went into this merge window, others are not. Anyway, this contains: - Hardware queue limiting for virtio-blk/scsi (Dongli) - Multi-page bvec fixes for lightnvm pblk - Multi-bio dio error fix (Jason) - Remove the cache hint from the io_uring tool side, since we didn't move forward with that (me) - Make io_uring SETUP_SQPOLL root restricted (me) - Fix leak of page in error handling for pc requests (Jérôme) - Fix BFQ regression introduced in this merge window (Paolo) - Fix break logic for bio segment iteration (Ming) - Fix NVMe cancel request error handling (Ming) - NVMe pull request with two fixes (Christoph): - fix the initial CSN for nvme-fc (James) - handle log page offsets properly in the target (Keith)" * tag 'for-linus-20190412' of git://git.kernel.dk/linux-block: block: fix the return errno for direct IO nvmet: fix discover log page when offsets are used nvme-fc: correct csn initialization and increments on error block: do not leak memory in bio_copy_user_iov() lightnvm: pblk: fix crash in pblk_end_partial_read due to multipage bvecs nvme: cancel request synchronously blk-mq: introduce blk_mq_complete_request_sync() scsi: virtio_scsi: limit number of hw queues by nr_cpu_ids virtio-blk: limit number of hw queues by nr_cpu_ids block, bfq: fix use after free in bfq_bfqq_expire io_uring: restrict IORING_SETUP_SQPOLL to root tools/io_uring: remove IOCQE_FLAG_CACHEHIT block: don't use for-inside-for in bio_for_each_segment_all
2019-04-14Merge tag 'nfs-for-5.1-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds1-8/+0
Pull NFS client bugfixes from Trond Myklebust: "Highlights include: Stable fix: - Fix a deadlock in close() due to incorrect draining of RDMA queues Bugfixes: - Revert "SUNRPC: Micro-optimise when the task is known not to be sleeping" as it is causing stack overflows - Fix a regression where NFSv4 getacl and fs_locations stopped working - Forbid setting AF_INET6 to "struct sockaddr_in"->sin_family. - Fix xfstests failures due to incorrect copy_file_range() return values" * tag 'nfs-for-5.1-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: Revert "SUNRPC: Micro-optimise when the task is known not to be sleeping" NFSv4.1 fix incorrect return value in copy_file_range xprtrdma: Fix helper that drains the transport NFS: Fix handling of reply page vector NFS: Forbid setting AF_INET6 to "struct sockaddr_in"->sin_family.
2019-04-14Merge tag 'clk-fixes-for-linus' of ↵Linus Torvalds1-0/+3
git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux Pull clk fixes from Stephen Boyd: "Here's more than a handful of clk driver fixes for changes that came in during the merge window: - Fix the AT91 sama5d2 programmable clk prescaler formula - A bunch of Amlogic meson clk driver fixes for the VPU clks - A DMI quirk for Intel's Bay Trail SoC's driver to properly mark pmc clks as critical only when really needed - Stop overwriting CLK_SET_RATE_PARENT flag in mediatek's clk gate implementation - Use the right structure to test for a frequency table in i.MX's PLL_1416x driver" * tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: clk: imx: Fix PLL_1416X not rounding rates clk: mediatek: fix clk-gate flag setting platform/x86: pmc_atom: Drop __initconst on dmi table clk: x86: Add system specific quirk to mark clocks as critical clk: meson: vid-pll-div: remove warning and return 0 on invalid config clk: meson: pll: fix rounding and setting a rate that matches precisely clk: meson-g12a: fix VPU clock parents clk: meson: g12a: fix VPU clock muxes mask clk: meson-gxbb: round the vdec dividers to closest clk: at91: fix programmable clock for sama5d2
2019-04-13Merge branch 'core-urgent-for-linus' of ↵Linus Torvalds1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core fixes from Ingo Molnar: "Fix an objtool warning plus fix a u64_to_user_ptr() macro expansion bug" * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: objtool: Add rewind_stack_do_exit() to the noreturn list linux/kernel.h: Use parentheses around argument in u64_to_user_ptr()
2019-04-11Revert "SUNRPC: Micro-optimise when the task is known not to be sleeping"Trond Myklebust1-8/+0
This reverts commit 009a82f6437490c262584d65a14094a818bcb747. The ability to optimise here relies on compiler being able to optimise away tail calls to avoid stack overflows. Unfortunately, we are seeing reports of problems, so let's just revert. Reported-by: Daniel Mack <daniel@zonque.org> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-04-11nvmet: fix discover log page when offsets are usedKeith Busch1-2/+7
The nvme target hadn't been taking the Get Log Page offset parameter into consideration, and so has been returning corrupted log pages when offsets are used. Since many tools, including nvme-cli, split the log request to 4k, we've been breaking discovery log responses when more than 3 subsystems exist. Fix the returned data by internally generating the entire discovery log page and copying only the requested bytes into the user buffer. The command log page offset type has been modified to a native __le64 to make it easier to extract the value from a command. Signed-off-by: Keith Busch <keith.busch@intel.com> Tested-by: Minwoo Im <minwoo.im@samsung.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: James Smart <james.smart@broadcom.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2019-04-11failover: allow name change on IFF_UP slave interfacesSi-Wei Liu1-0/+3
When a netdev appears through hot plug then gets enslaved by a failover master that is already up and running, the slave will be opened right away after getting enslaved. Today there's a race that userspace (udev) may fail to rename the slave if the kernel (net_failover) opens the slave earlier than when the userspace rename happens. Unlike bond or team, the primary slave of failover can't be renamed by userspace ahead of time, since the kernel initiated auto-enslavement is unable to, or rather, is never meant to be synchronized with the rename request from userspace. As the failover slave interfaces are not designed to be operated directly by userspace apps: IP configuration, filter rules with regard to network traffic passing and etc., should all be done on master interface. In general, userspace apps only care about the name of master interface, while slave names are less important as long as admin users can see reliable names that may carry other information describing the netdev. For e.g., they can infer that "ens3nsby" is a standby slave of "ens3", while for a name like "eth0" they can't tell which master it belongs to. Historically the name of IFF_UP interface can't be changed because there might be admin script or management software that is already relying on such behavior and assumes that the slave name can't be changed once UP. But failover is special: with the in-kernel auto-enslavement mechanism, the userspace expectation for device enumeration and bring-up order is already broken. Previously initramfs and various userspace config tools were modified to bypass failover slaves because of auto-enslavement and duplicate MAC address. Similarly, in case that users care about seeing reliable slave name, the new type of failover slaves needs to be taken care of specifically in userspace anyway. It's less risky to lift up the rename restriction on failover slave which is already UP. Although it's possible this change may potentially break userspace component (most likely configuration scripts or management software) that assumes slave name can't be changed while UP, it's relatively a limited and controllable set among all userspace components, which can be fixed specifically to listen for the rename events on failover slaves. Userspace component interacting with slaves is expected to be changed to operate on failover master interface instead, as the failover slave is dynamic in nature which may come and go at any point. The goal is to make the role of failover slaves less relevant, and userspace components should only deal with failover master in the long run. Fixes: 30c8bd5aa8b2 ("net: Introduce generic failover module") Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com> Reviewed-by: Liran Alon <liran.alon@oracle.com> Acked-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-11clk: x86: Add system specific quirk to mark clocks as criticalDavid Müller1-0/+3
Since commit 648e921888ad ("clk: x86: Stop marking clocks as CLK_IS_CRITICAL"), the pmc_plt_clocks of the Bay Trail SoC are unconditionally gated off. Unfortunately this will break systems where these clocks are used for external purposes beyond the kernel's knowledge. Fix it by implementing a system specific quirk to mark the necessary pmc_plt_clks as critical. Fixes: 648e921888ad ("clk: x86: Stop marking clocks as CLK_IS_CRITICAL") Signed-off-by: David Müller <dave.mueller@gmx.ch> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Stephen Boyd <sboyd@kernel.org>