summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2019-02-04s390/qeth: release cmd buffer in error pathsJulian Wiedmann1-0/+6
Whenever we fail before/while starting an IO, make sure to release the IO buffer. Usually qeth_irq() would do this for us, but if the IO doesn't even start we obviously won't get an interrupt for it either. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-04net: cls_flower: Remove filter from mask before freeing itPetr Machata1-1/+5
In fl_change(), when adding a new rule (i.e. fold == NULL), a driver may reject the new rule, for example due to resource exhaustion. By that point, the new rule was already assigned a mask, and it was added to that mask's hash table. The clean-up path that's invoked as a result of the rejection however neglects to undo the hash table addition, and proceeds to free the new rule, thus leaving a dangling pointer in the hash table. Fix by removing fnew from the mask's hash table before it is freed. Fixes: 35cc3cefc4de ("net/sched: cls_flower: Reject duplicated rules also under skip_sw") Signed-off-by: Petr Machata <petrm@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-04Merge tag 'wireless-drivers-for-davem-2019-02-04' of ↵David S. Miller7-37/+42
git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers Kalle Valo says: ==================== wireless-drivers fixes for 5.0 First set of small, but importnat, fixes for 5.0. iwlwifi * fix a build regression introduced in 5.0-rc1 wlcore * fix a firmware regression from v4.18-rc1 mt76x0 * fix for configuring tx power from user space ath10k * fix wcn3990 regression from v4.20-rc1 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-04Merge branch 'smc-fixes'David S. Miller9-46/+124
Ursula Braun says: ==================== net/smc: fixes 2019-02-04 here are more fixes in the smc code for the net tree: Patch 1 fixes an IB-related problem with SMCR. Patch 2 fixes a cursor problem for one-way traffic. Patch 3 fixes a problem with RMB-reusage. Patch 4 fixes a closing issue. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-04net/smc: correct state change for peer closingUrsula Braun1-8/+1
If some kind of closing is received from the peer while still in state SMC_INIT, it means the peer has had an active connection and closed the socket quickly before listen_work finished. This should not result in a shortcut from state SMC_INIT to state SMC_CLOSED. This patch adds the socket to the accept queue in state SMC_APPCLOSEWAIT1. The socket reaches state SMC_CLOSED once being accepted and closed with smc_release(). Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-04net/smc: delete rkey first before switching to unusedUrsula Braun1-1/+1
Once RMBs are flagged as unused they are candidates for reuse. Thus the LLC DELETE RKEY operaton should be made before flagging the RMB as unused. Fixes: c7674c001b11 ("net/smc: unregister rkeys of unused buffer") Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-04net/smc: fix sender_free computationUrsula Braun3-4/+30
In some scenarios a separate consumer cursor update is necessary. The decision is made in smc_tx_consumer_cursor_update(). The sender_free computation could be wrong: The rx confirmed cursor is always smaller than or equal to the rx producer cursor. The parameters in the smc_curs_diff() call have to be exchanged, otherwise sender_free might even be negative. And if more data arrives local_rx_ctrl.prod might be updated, enabling a cursor difference between local_rx_ctrl.prod and rx confirmed cursor larger than the RMB size. This case is not covered by smc_curs_diff(). Thus function smc_curs_diff_large() is introduced here. If a recvmsg() is processed in parallel, local_tx_ctrl.cons might change during smc_cdc_msg_send. Make sure rx_curs_confirmed is updated with the actually sent local_tx_ctrl.cons value. Fixes: e82f2e31f559 ("net/smc: optimize consumer cursor updates") Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-04net/smc: preallocated memory for rdma work requestsUrsula Braun7-33/+92
The work requests for rdma writes are built in local variables within function smc_tx_rdma_write(). This violates the rule that the work request storage has to stay till the work request is confirmed by a completion queue response. This patch introduces preallocated memory for these work requests. The storage is allocated, once a link (and thus a queue pair) is established. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-04net: dp83640: expire old TX-skbSebastian Andrzej Siewior1-3/+10
During sendmsg() a cloned skb is saved via dp83640_txtstamp() in ->tx_queue. After the NIC sends this packet, the PHY will reply with a timestamp for that TX packet. If the cable is pulled at the right time I don't see that packet. It might gets flushed as part of queue shutdown on NIC's side. Once the link is up again then after the next sendmsg() we enqueue another skb in dp83640_txtstamp() and have two on the list. Then the PHY will send a reply and decode_txts() attaches it to the first skb on the list. No crash occurs since refcounting works but we are one packet behind. linuxptp/ptp4l usually closes the socket and opens a new one (in such a timeout case) so those "stale" replies never get there. However it does not resume normal operation anymore. Purge old skbs in decode_txts(). Fixes: cb646e2b02b2 ("ptp: Added a clock driver for the National Semiconductor PHYTER.") Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-04netfilter: nf_tables: unbind set in rule from commit pathPablo Neira Ayuso7-83/+85
Anonymous sets that are bound to rules from the same transaction trigger a kernel splat from the abort path due to double set list removal and double free. This patch updates the logic to search for the transaction that is responsible for creating the set and disable the set list removal and release, given the rule is now responsible for this. Lookup is reverse since the transaction that adds the set is likely to be at the tail of the list. Moreover, this patch adds the unbind step to deliver the event from the commit path. This should not be done from the worker thread, since we have no guarantees of in-order delivery to the listener. This patch removes the assumption that both activate and deactivate callbacks need to be provided. Fixes: cd5125d8f518 ("netfilter: nf_tables: split set destruction in deactivate and destroy phase") Reported-by: Mikhail Morfikov <mmorfikov@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-02-04arm64: ptdump: Don't iterate kernel page tables using PTRS_PER_PXXWill Deacon1-30/+29
When 52-bit virtual addressing is enabled for userspace (CONFIG_ARM64_USER_VA_BITS_52=y), the kernel continues to utilise 48-bit virtual addressing in TTBR1. Consequently, PTRS_PER_PGD reflects the larger page table size for userspace and the pgd pointer for kernel page tables is offset before being written to TTBR1. This means that we can't use PTRS_PER_PGD to iterate over kernel page tables unless we apply the same offset, which is fiddly to get right and leads to some non-idiomatic walking code. Instead, just follow the usual pattern when walking page tables by using a while loop driven by pXd_offset() and pXd_addr_end(). Reported-by: Qian Cai <cai@lca.pw> Tested-by: Qian Cai <cai@lca.pw> Acked-by: Steve Capper <steve.capper@arm.com> Tested-by: Steve Capper <steve.capper@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-02-04tools headers uapi: Sync linux/in.h copy from the kernel sourcesArnaldo Carvalho de Melo1-1/+1
To get the changes in this cset: f275ee0fa3a0 ("IN_BADCLASS: fix macro to actually work") The macros changed in this cset are not used in tools/, so this is just to silence this perf tools build warning: Warning: Kernel ABI header at 'tools/include/uapi/linux/in.h' differs from latest version at 'include/uapi/linux/in.h' diff -u tools/include/uapi/linux/in.h include/uapi/linux/in.h Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David S. Miller <davem@davemloft.net> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-xbk34kwamn8bw8ywpuxetct9@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-02-04perf clang: Do not use 'return std::move(something)'Arnaldo Carvalho de Melo1-1/+1
It prevents copy elision, generating this warning when building with fedora:rawhide's clang: clang version 7.0.1 (Fedora 7.0.1-2.fc30) Target: x86_64-unknown-linux-gnu Thread model: posix InstalledDir: /usr/bin Found candidate GCC installation: /usr/bin/../lib/gcc/x86_64-redhat-linux/9 Found candidate GCC installation: /usr/lib/gcc/x86_64-redhat-linux/9 Selected GCC installation: /usr/bin/../lib/gcc/x86_64-redhat-linux/9 Candidate multilib: .;@m64 Candidate multilib: 32;@m32 Selected multilib: .;@m64 $ make -C tools/perf CC=clang LIBCLANGLLVM=1 <SNIP> util/c++/clang.cpp: In function 'std::unique_ptr<llvm::SmallVectorImpl<char> > perf::getBPFObjectFromModule(llvm::Module*)': util/c++/clang.cpp:163:18: error: moving a local object in a return statement prevents copy elision [-Werror=pessimizing-move] 163 | return std::move(Buffer); | ~~~~~~~~~^~~~~~~~ util/c++/clang.cpp:163:18: note: remove 'std::move' call cc1plus: all warnings being treated as errors <SNIP> References: http://www.cplusplus.com/forum/general/186411/#msg908572 https://en.cppreference.com/w/cpp/language/return#Notes https://en.cppreference.com/w/cpp/language/copy_elision Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-lehqf5x5q96l0o8myhb6blz6@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-02-04perf mem/c2c: Fix perf_mem_events to support powerpcRavi Bangoria5-6/+26
PowerPC hardware does not have a builtin latency filter (--ldlat) for the "mem-load" event and perf_mem_events by default includes "/ldlat=30/" which is causing a failure on PowerPC. Refactor the code to support "perf mem/c2c" on PowerPC. This patch depends on kernel side changes done my Madhavan: https://lists.ozlabs.org/pipermail/linuxppc-dev/2018-December/182596.html Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Dick Fowles <fowles@inreach.com> Cc: Don Zickus <dzickus@redhat.com> Cc: Joe Mario <jmario@redhat.com> Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Namhyung Kim <namhyung@kernel.org> Cc: linuxppc-dev@lists.ozlabs.org Link: http://lkml.kernel.org/r/20190129132412.771-1-ravi.bangoria@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-02-04perf tests evsel-tp-sched: Fix bitwise operatorGustavo A. R. Silva1-1/+1
Notice that the use of the bitwise OR operator '|' always leads to true in this particular case, which seems a bit suspicious due to the context in which this expression is being used. Fix this by using bitwise AND operator '&' instead. This bug was detected with the help of Coccinelle. Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: stable@vger.kernel.org Fixes: 6a6cd11d4e57 ("perf test: Add test for the sched tracepoint format fields") Link: http://lkml.kernel.org/r/20190122233439.GA5868@embeddedor Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-02-04netfilter: nf_nat: skip nat clash resolution for same-origin entriesMartynas Pumputis1-0/+16
It is possible that two concurrent packets originating from the same socket of a connection-less protocol (e.g. UDP) can end up having different IP_CT_DIR_REPLY tuples which results in one of the packets being dropped. To illustrate this, consider the following simplified scenario: 1. Packet A and B are sent at the same time from two different threads by same UDP socket. No matching conntrack entry exists yet. Both packets cause allocation of a new conntrack entry. 2. get_unique_tuple gets called for A. No clashing entry found. conntrack entry for A is added to main conntrack table. 3. get_unique_tuple is called for B and will find that the reply tuple of B is already taken by A. It will allocate a new UDP source port for B to resolve the clash. 4. conntrack entry for B cannot be added to main conntrack table because its ORIGINAL direction is clashing with A and the REPLY directions of A and B are not the same anymore due to UDP source port reallocation done in step 3. This patch modifies nf_conntrack_tuple_taken so it doesn't consider colliding reply tuples if the IP_CT_DIR_ORIGINAL tuples are equal. [ Florian: simplify patch to not use .allow_clash setting and always ignore identical flows ] Signed-off-by: Martynas Pumputis <martynas@weave.works> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-02-04selftests: netfilter: add simple masq/redirect test casesFlorian Westphal2-1/+763
Check basic nat/redirect/masquerade for ipv4 and ipv6. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-02-04selftests: netfilter: fix config fragment CONFIG_NF_TABLES_INETNaresh Kamboju1-1/+1
In selftests the config fragment for netfilter was added as NF_TABLES_INET=y and this patch correct it as CONFIG_NF_TABLES_INET=y Signed-off-by: Naresh Kamboju <naresh.kamboju@linaro.org> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-02-04dmaengine: dmatest: Abort test in case of mapping errorAndy Shevchenko1-18/+14
In case of mapping error the DMA addresses are invalid and continuing will screw system memory or potentially something else. [ 222.480310] dmatest: dma0chan7-copy0: summary 1 tests, 3 failures 6 iops 349 KB/s (0) ... [ 240.912725] check: Corrupted low memory at 00000000c7c75ac9 (2940 phys) = 5656000000000000 [ 240.921998] check: Corrupted low memory at 000000005715a1cd (2948 phys) = 279f2aca5595ab2b [ 240.931280] check: Corrupted low memory at 000000002f4024c0 (2950 phys) = 5e5624f349e793cf ... Abort any test if mapping failed. Fixes: 4076e755dbec ("dmatest: convert to dmaengine_unmap_data") Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Vinod Koul <vkoul@kernel.org>
2019-02-04perf/core: Don't WARN() for impossible ring-buffer sizesMark Rutland1-0/+3
The perf tool uses /proc/sys/kernel/perf_event_mlock_kb to determine how large its ringbuffer mmap should be. This can be configured to arbitrary values, which can be larger than the maximum possible allocation from kmalloc. When this is configured to a suitably large value (e.g. thanks to the perf fuzzer), attempting to use perf record triggers a WARN_ON_ONCE() in __alloc_pages_nodemask(): WARNING: CPU: 2 PID: 5666 at mm/page_alloc.c:4511 __alloc_pages_nodemask+0x3f8/0xbc8 Let's avoid this by checking that the requested allocation is possible before calling kzalloc. Reported-by: Julien Thierry <julien.thierry@arm.com> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Julien Thierry <julien.thierry@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20190110142745.25495-1-mark.rutland@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-02-04perf/x86/intel: Delay memory deallocation until x86_pmu_dead_cpu()Peter Zijlstra1-5/+11
intel_pmu_cpu_prepare() allocated memory for ->shared_regs among other members of struct cpu_hw_events. This memory is released in intel_pmu_cpu_dying() which is wrong. The counterpart of the intel_pmu_cpu_prepare() callback is x86_pmu_dead_cpu(). Otherwise if the CPU fails on the UP path between CPUHP_PERF_X86_PREPARE and CPUHP_AP_PERF_X86_STARTING then it won't release the memory but allocate new memory on the next attempt to online the CPU (leaking the old memory). Also, if the CPU down path fails between CPUHP_AP_PERF_X86_STARTING and CPUHP_PERF_X86_PREPARE then the CPU will go back online but never allocate the memory that was released in x86_pmu_dying_cpu(). Make the memory allocation/free symmetrical in regard to the CPU hotplug notifier by moving the deallocation to intel_pmu_cpu_dead(). This started in commit: a7e3ed1e47011 ("perf: Add support for supplementary event registers"). In principle the bug was introduced in v2.6.39 (!), but it will almost certainly not backport cleanly across the big CPU hotplug rewrite between v4.7-v4.15... [ bigeasy: Added patch description. ] [ mingo: Added backporting guidance. ] Reported-by: He Zhe <zhe.he@windriver.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> # With developer hat on Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> # With maintainer hat on Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: acme@kernel.org Cc: bp@alien8.de Cc: hpa@zytor.com Cc: jolsa@kernel.org Cc: kan.liang@linux.intel.com Cc: namhyung@kernel.org Cc: <stable@vger.kernel.org> Fixes: a7e3ed1e47011 ("perf: Add support for supplementary event registers"). Link: https://lkml.kernel.org/r/20181219165350.6s3jvyxbibpvlhtq@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-02-04perf/x86/intel/uncore: Add Node ID maskKan Liang1-1/+3
Some PCI uncore PMUs cannot be registered on an 8-socket system (HPE Superdome Flex). To understand which Socket the PCI uncore PMUs belongs to, perf retrieves the local Node ID of the uncore device from CPUNODEID(0xC0) of the PCI configuration space, and the mapping between Socket ID and Node ID from GIDNIDMAP(0xD4). The Socket ID can be calculated accordingly. The local Node ID is only available at bit 2:0, but current code doesn't mask it. If a BIOS doesn't clear the rest of the bits, an incorrect Node ID will be fetched. Filter the Node ID by adding a mask. Reported-by: Song Liu <songliubraving@fb.com> Tested-by: Song Liu <songliubraving@fb.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: <stable@vger.kernel.org> # v3.7+ Fixes: 7c94ee2e0917 ("perf/x86: Add Intel Nehalem and Sandy Bridge-EP uncore support") Link: https://lkml.kernel.org/r/1548600794-33162-1-git-send-email-kan.liang@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-02-04Merge branch 'fix/brcm' into fixesVinod Koul1-45/+25
2019-02-04dmaengine: bcm2835: Fix abort of transactionsLukas Wunner1-32/+9
There are multiple issues with bcm2835_dma_abort() (which is called on termination of a transaction): * The algorithm to abort the transaction first pauses the channel by clearing the ACTIVE flag in the CS register, then waits for the PAUSED flag to clear. Page 49 of the spec documents the latter as follows: "Indicates if the DMA is currently paused and not transferring data. This will occur if the active bit has been cleared [...]" https://www.raspberrypi.org/app/uploads/2012/02/BCM2835-ARM-Peripherals.pdf So the function is entering an infinite loop because it is waiting for PAUSED to clear which is always set due to the function having cleared the ACTIVE flag. The only thing that's saving it from itself is the upper bound of 10000 loop iterations. The code comment says that the intention is to "wait for any current AXI transfer to complete", so the author probably wanted to check the WAITING_FOR_OUTSTANDING_WRITES flag instead. Amend the function accordingly. * The CS register is only read at the beginning of the function. It needs to be read again after pausing the channel and before checking for outstanding writes, otherwise writes which were issued between the register read at the beginning of the function and pausing the channel may not be waited for. * The function seeks to abort the transfer by writing 0 to the NEXTCONBK register and setting the ABORT and ACTIVE flags. Thereby, the 0 in NEXTCONBK is sought to be loaded into the CONBLK_AD register. However experimentation has shown this approach to not work: The CONBLK_AD register remains the same as before and the CS register contains 0x00000030 (PAUSED | DREQ_STOPS_DMA). In other words, the control block is not aborted but merely paused and it will be resumed once the next DMA transaction is started. That is absolutely not the desired behavior. A simpler approach is to set the channel's RESET flag instead. This reliably zeroes the NEXTCONBK as well as the CS register. It requires less code and only a single MMIO write. This is also what popular user space DMA drivers do, e.g.: https://github.com/metachris/RPIO/blob/master/source/c_pwm/pwm.c Note that the spec is contradictory whether the NEXTCONBK register is writeable at all. On the one hand, page 41 claims: "The value loaded into the NEXTCONBK register can be overwritten so that the linked list of Control Block data structures can be dynamically altered. However it is only safe to do this when the DMA is paused." On the other hand, page 40 specifies: "Only three registers in each channel's register set are directly writeable (CS, CONBLK_AD and DEBUG). The other registers (TI, SOURCE_AD, DEST_AD, TXFR_LEN, STRIDE & NEXTCONBK), are automatically loaded from a Control Block data structure held in external memory." Fixes: 96286b576690 ("dmaengine: Add support for BCM2835") Signed-off-by: Lukas Wunner <lukas@wunner.de> Cc: stable@vger.kernel.org # v3.14+ Cc: Frank Pavlic <f.pavlic@kunbus.de> Cc: Martin Sperl <kernel@martin.sperl.org> Cc: Florian Meier <florian.meier@koalo.de> Cc: Clive Messer <clive.m.messer@gmail.com> Cc: Matthias Reichl <hias@horus.com> Tested-by: Stefan Wahren <stefan.wahren@i2se.com> Acked-by: Florian Kauer <florian.kauer@koalo.de> Signed-off-by: Vinod Koul <vkoul@kernel.org>
2019-02-04dmaengine: bcm2835: Fix interrupt race on RTLukas Wunner1-15/+18
If IRQ handlers are threaded (either because CONFIG_PREEMPT_RT_BASE is enabled or "threadirqs" was passed on the command line) and if system load is sufficiently high that wakeup latency of IRQ threads degrades, SPI DMA transactions on the BCM2835 occasionally break like this: ks8851 spi0.0: SPI transfer timed out bcm2835-dma 3f007000.dma: DMA transfer could not be terminated ks8851 spi0.0 eth2: ks8851_rdfifo: spi_sync() failed The root cause is an assumption made by the DMA driver which is documented in a code comment in bcm2835_dma_terminate_all(): /* * Stop DMA activity: we assume the callback will not be called * after bcm_dma_abort() returns (even if it does, it will see * c->desc is NULL and exit.) */ That assumption falls apart if the IRQ handler bcm2835_dma_callback() is threaded: A client may terminate a descriptor and issue a new one before the IRQ handler had a chance to run. In fact the IRQ handler may miss an *arbitrary* number of descriptors. The result is the following race condition: 1. A descriptor finishes, its interrupt is deferred to the IRQ thread. 2. A client calls dma_terminate_async() which sets channel->desc = NULL. 3. The client issues a new descriptor. Because channel->desc is NULL, bcm2835_dma_issue_pending() immediately starts the descriptor. 4. Finally the IRQ thread runs and writes BCM2835_DMA_INT to the CS register to acknowledge the interrupt. This clears the ACTIVE flag, so the newly issued descriptor is paused in the middle of the transaction. Because channel->desc is not NULL, the IRQ thread finalizes the descriptor and tries to start the next one. I see two possible solutions: The first is to call synchronize_irq() in bcm2835_dma_issue_pending() to wait until the IRQ thread has finished before issuing a new descriptor. The downside of this approach is unnecessary latency if clients desire rapidly terminating and re-issuing descriptors and don't have any use for an IRQ callback. (The SPI TX DMA channel is a case in point.) A better alternative is to make the IRQ thread recognize that it has missed descriptors and avoid finalizing the newly issued descriptor. So first of all, set the ACTIVE flag when acknowledging the interrupt. This keeps a newly issued descriptor running. If the descriptor was finished, the channel remains idle despite the ACTIVE flag being set. However the ACTIVE flag can then no longer be used to check whether the channel is idle, so instead check whether the register containing the current control block address is zero and finalize the current descriptor only if so. That way, there is no impact on latency and throughput if the client doesn't care for the interrupt: Only minimal additional overhead is introduced for non-cyclic descriptors as one further MMIO read is necessary per interrupt to check for idleness of the channel. Cyclic descriptors are sped up slightly by removing one MMIO write per interrupt. Fixes: 96286b576690 ("dmaengine: Add support for BCM2835") Signed-off-by: Lukas Wunner <lukas@wunner.de> Cc: stable@vger.kernel.org # v3.14+ Cc: Frank Pavlic <f.pavlic@kunbus.de> Cc: Martin Sperl <kernel@martin.sperl.org> Cc: Florian Meier <florian.meier@koalo.de> Cc: Clive Messer <clive.m.messer@gmail.com> Cc: Matthias Reichl <hias@horus.com> Tested-by: Stefan Wahren <stefan.wahren@i2se.com> Acked-by: Florian Kauer <florian.kauer@koalo.de> Signed-off-by: Vinod Koul <vkoul@kernel.org>
2019-02-04dmaengine: imx-dma: fix wrong callback invokeLeonid Iziumtsev1-4/+4
Once the "ld_queue" list is not empty, next descriptor will migrate into "ld_active" list. The "desc" variable will be overwritten during that transition. And later the dmaengine_desc_get_callback_invoke() will use it as an argument. As result we invoke wrong callback. That behaviour was in place since: commit fcaaba6c7136 ("dmaengine: imx-dma: fix callback path in tasklet"). But after commit 4cd13c21b207 ("softirq: Let ksoftirqd do its job") things got worse, since possible delay between tasklet_schedule() from DMA irq handler and actual tasklet function execution got bigger. And that gave more time for new DMA request to be submitted and to be put into "ld_queue" list. It has been noticed that DMA issue is causing problems for "mxc-mmc" driver. While stressing the system with heavy network traffic and writing/reading to/from sd card simultaneously the timeout may happen: 10013000.sdhci: mxcmci_watchdog: read time out (status = 0x30004900) That often lead to file system corruption. Signed-off-by: Leonid Iziumtsev <leonid.iziumtsev@gmail.com> Signed-off-by: Vinod Koul <vkoul@kernel.org> Cc: stable@vger.kernel.org
2019-02-04virtio_net: Account for tx bytes and packets on sending xdp_framesToshiaki Makita1-4/+16
Previously virtnet_xdp_xmit() did not account for device tx counters, which caused confusions. To be consistent with SKBs, account them on freeing xdp_frames. Reported-by: David Ahern <dsahern@gmail.com> Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-04Merge branch 'drm-fixes-5.0' of git://people.freedesktop.org/~agd5f/linux ↵Dave Airlie5-13/+60
into drm-fixes A few fixes for 5.0: - Fix radeon crash on SI with VM passthrough - Fencing fix for shared buffers - Fix power hwmon reporting on APUs - Powerplay fix for APUs Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexdeucher@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190201043455.5988-1-alexander.deucher@amd.com
2019-02-04sctp: check and update stream->out_curr when allocating stream_outXin Long1-0/+20
Now when using stream reconfig to add out streams, stream->out will get re-allocated, and all old streams' information will be copied to the new ones and the old ones will be freed. So without stream->out_curr updated, next time when trying to send from stream->out_curr stream, a panic would be caused. This patch is to check and update stream->out_curr when allocating stream_out. v1->v2: - define fa_index() to get elem index from stream->out_curr. v2->v3: - repost with no change. Fixes: 5bbbbe32a431 ("sctp: introduce stream scheduler foundations") Reported-by: Ying Xu <yinxu@redhat.com> Reported-by: syzbot+e33a3a138267ca119c7d@syzkaller.appspotmail.com Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-04xfs: set buffer ops when repair probes for btree typeDarrick J. Wong2-3/+24
In xrep_findroot_block, we work out the btree type and correctness of a given block by calling different btree verifiers on root block candidates. However, we leave the NULL b_ops while ->verify_read validates the block, which means that if the verifier calls xfs_buf_verifier_error it'll crash on the null b_ops. Fix it to set b_ops before calling the verifier and unsetting it if the verifier fails. Furthermore, improve the documentation around xfs_buf_ensure_ops, which is the function that is responsible for cleaning up the b_ops state of buffers that go through xrep_findroot_block but don't match anything. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2019-02-04xfs: end sync buffer I/O properly on shutdown errorBrian Foster1-2/+1
As of commit e339dd8d8b ("xfs: use sync buffer I/O for sync delwri queue submission"), the delwri submission code uses sync buffer I/O for sync delwri I/O. Instead of waiting on async I/O to unlock the buffer, it uses the underlying sync I/O completion mechanism. If delwri buffer submission fails due to a shutdown scenario, an error is set on the buffer and buffer completion never occurs. This can cause xfs_buf_delwri_submit() to deadlock waiting on a completion event. We could check the error state before waiting on such buffers, but that doesn't serialize against the case of an error set via a racing I/O completion. Instead, invoke I/O completion in the shutdown case regardless of buffer I/O type. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-02-04xfs: eof trim writeback mapping as soon as it is cachedBrian Foster1-0/+2
The cached writeback mapping is EOF trimmed to try and avoid races between post-eof block management and writeback that result in sending cached data to a stale location. The cached mapping is currently trimmed on the validation check, which leaves a race window between the time the mapping is cached and when it is trimmed against the current inode size. For example, if a new mapping is cached by delalloc conversion on a blocksize == page size fs, we could cycle various locks, perform memory allocations, etc. in the writeback codepath before the associated mapping is eventually trimmed to i_size. This leaves enough time for a post-eof truncate and file append before the cached mapping is trimmed. The former event essentially invalidates a range of the cached mapping and the latter bumps the inode size such the trim on the next writepage event won't trim all of the invalid blocks. fstest generic/464 reproduces this scenario occasionally and causes a lost writeback and stale delalloc blocks warning on inode inactivation. To work around this problem, trim the cached writeback mapping as soon as it is cached in addition to on subsequent validation checks. This is a minor tweak to tighten the race window as much as possible until a proper invalidation mechanism is available. Fixes: 40214d128e07 ("xfs: trim writepage mapping to within eof") Cc: <stable@vger.kernel.org> # v4.14+ Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-02-04Linux 5.0-rc5v5.0-rc5Linus Torvalds1-1/+1
2019-02-03MAINTAINERS: add entry for redpine wireless driverSiva Rebbagondla1-0/+7
Create an entry for Redpine wireless driver and add Amit and myself as maintainers. Signed-off-by: Siva Rebbagondla <siva.rebbagondla@redpinesignals.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2019-02-03net: systemport: Fix WoL with password after deep sleepFlorian Fainelli2-15/+12
Broadcom STB chips support a deep sleep mode where all register contents are lost. Because we were stashing the MagicPacket password into some of these registers a suspend into that deep sleep then a resumption would not lead to being able to wake-up from MagicPacket with password again. Fix this by keeping a software copy of the password and program it during suspend. Fixes: 83e82f4c706b ("net: systemport: add Wake-on-LAN support") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-03Merge branch 'vsock-virtio-hot-unplug'David S. Miller1-8/+21
Stefano Garzarella says: ==================== vsock/virtio: fix issues on device hot-unplug These patches try to handle the hot-unplug of vsock virtio transport device in a proper way. Maybe move the vsock_core_init()/vsock_core_exit() functions in the module_init and module_exit of vsock_virtio_transport module can't be the best way, but the architecture of vsock_core forces us to this approach for now. The vsock_core proto_ops expect a valid pointer to the transport device, so we can't call vsock_core_exit() until there are open sockets. v2 -> v3: - Rebased on master v1 -> v2: - Fixed commit message of patch 1. - Added Reviewed-by, Acked-by tags by Stefan ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-03vsock/virtio: reset connected sockets on device removalStefano Garzarella1-0/+3
When the virtio transport device disappear, we should reset all connected sockets in order to inform the users. Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-03vsock/virtio: fix kernel panic after device hot-unplugStefano Garzarella1-8/+18
virtio_vsock_remove() invokes the vsock_core_exit() also if there are opened sockets for the AF_VSOCK protocol family. In this way the vsock "transport" pointer is set to NULL, triggering the kernel panic at the first socket activity. This patch move the vsock_core_init()/vsock_core_exit() in the virtio_vsock respectively in module_init and module_exit functions, that cannot be invoked until there are open sockets. Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1609699 Reported-by: Yan Fu <yafu@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-03Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds13-14/+35
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "A few updates for x86: - Fix an unintended sign extension issue in the fault handling code - Rename the new resource control config switch so it's less confusing - Avoid setting up EFI info in kexec when the EFI runtime is disabled. - Fix the microcode version check in the AMD microcode loader so it only loads higher version numbers and never downgrades - Set EFER.LME in the 32bit trampoline before returning to long mode to handle older AMD/KVM behaviour properly. - Add Darren and Andy as x86/platform reviewers" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/resctrl: Avoid confusion over the new X86_RESCTRL config x86/kexec: Don't setup EFI info if EFI runtime is not enabled x86/microcode/amd: Don't falsely trick the late loading mechanism MAINTAINERS: Add Andy and Darren as arch/x86/platform/ reviewers x86/fault: Fix sign-extend unintended sign extension x86/boot/compressed/64: Set EFER.LME=1 in 32-bit trampoline before returning to long mode x86/cpu: Add Atom Tremont (Jacobsville)
2019-02-03Merge branch 'smp-urgent-for-linus' of ↵Linus Torvalds6-39/+9
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull cpu hotplug fixes from Thomas Gleixner: "Two fixes for the cpu hotplug machinery: - Replace the overly clever 'SMT disabled by BIOS' detection logic as it breaks KVM scenarios and prevents speculation control updates when the Hyperthreads are brought online late after boot. - Remove a redundant invocation of the speculation control update function" * 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: cpu/hotplug: Fix "SMT disabled by BIOS" detection for KVM x86/speculation: Remove redundant arch_smt_update() invocation
2019-02-03Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds6-23/+35
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Thomas Gleixner: "A pile of perf updates: - Fix broken sanity check in the /proc/sys/kernel/perf_cpu_time_max_percent write handler - Cure a perf script crash which caused by an unitinialized data structure - Highlight the hottest instruction in perf top and not a random one - Cure yet another clang issue when building perf python - Handle topology entries with no CPU correctly in the tools - Handle perf data which contains both tracepoints and performance counter entries correctly. - Add a missing NULL pointer check in perf ordered_events_free()" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf script: Fix crash when processing recorded stat data perf top: Fix wrong hottest instruction highlighted perf tools: Handle TOPOLOGY headers with no CPU perf python: Remove -fstack-clash-protection when building with some clang versions perf core: Fix perf_proc_update_handler() bug perf script: Fix crash with printing mixed trace point and other events perf ordered_events: Fix crash in ordered_events__free
2019-02-03Merge branch 'efi-urgent-for-linus' of ↵Linus Torvalds1-2/+3
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull EFI fix from Thomas Gleixner: "The dump info for the efi page table debugging lacks a terminator which causes the kernel to crash when the debugfile is read" * 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: efi/arm64: Fix debugfs crash by adding a terminator for ptdump marker
2019-02-03Merge tag 'for-5.0-rc4-tag' of ↵Linus Torvalds4-38/+71
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: - regression fix: transaction commit can run away due to delayed ref waiting heuristic, this is not necessary now because of the proper reservation mechanism introduced in 5.0 - regression fix: potential crash due to use-before-check of an ERR_PTR return value - fix for transaction abort during transaction commit that needs to properly clean up pending block groups - fix deadlock during b-tree node/leaf splitting, when this happens on some of the fundamental trees, we must prevent new tree block allocation to re-enter indirectly via the block group flushing path - potential memory leak after errors during mount * tag 'for-5.0-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: On error always free subvol_name in btrfs_mount btrfs: clean up pending block groups when transaction commit aborts btrfs: fix potential oops in device_list_add btrfs: don't end the transaction for delayed refs in throttle Btrfs: fix deadlock when allocating tree block during leaf/node split
2019-02-03x86/MCE: Initialize mce.bank in the case of a fatal error in mce_no_way_out()Tony Luck1-0/+1
Internal injection testing crashed with a console log that said: mce: [Hardware Error]: CPU 7: Machine Check Exception: f Bank 0: bd80000000100134 This caused a lot of head scratching because the MCACOD (bits 15:0) of that status is a signature from an L1 data cache error. But Linux says that it found it in "Bank 0", which on this model CPU only reports L1 instruction cache errors. The answer was that Linux doesn't initialize "m->bank" in the case that it finds a fatal error in the mce_no_way_out() pre-scan of banks. If this was a local machine check, then this partially initialized struct mce is being passed to mce_panic(). Fix is simple: just initialize m->bank in the case of a fatal error. Fixes: 40c36e2741d7 ("x86/mce: Fix incorrect "Machine check from unknown source" message") Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: x86-ml <x86@kernel.org> Cc: stable@vger.kernel.org # v4.18 Note pre-v5.0 arch/x86/kernel/cpu/mce/core.c was called arch/x86/kernel/cpu/mcheck/mce.c Link: https://lkml.kernel.org/r/20190201003341.10638-1-tony.luck@intel.com
2019-02-03Merge tag 'iio-fixes-5.0a' of ↵Greg Kroah-Hartman4-22/+66
git://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio into staging-linus Jonathan writes: First set of IIO fixes for the 5.0 cycle. Been a busy month, so these are rather later than they should have been. * atlas-ph-sensor: - Temperature scale didn't correspond to the ABI. * axp288: - A few different fixes around the TS-pin handling. * ti-ads8688 - Not enough space in the buffer used to build the scan to allow for the timestamp. * tools - iio_generic_buffer - Make num_loops signed so that we really are running for ever rather than just a long time when we specify -1. * tag 'iio-fixes-5.0a' of git://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio: iio: ti-ads8688: Update buffer allocation for timestamps tools: iio: iio_generic_buffer: make num_loops signed iio: adc: axp288: Fix TS-pin handling iio: chemical: atlas-ph-sensor: correct IIO_TEMP values to millicelsius
2019-02-03Revert "net: phy: marvell: avoid pause mode on SGMII-to-Copper for 88e151x"Russell King1-16/+0
This reverts commit 6623c0fba10ef45b64ca213ad5dec926f37fa9a0. The original diagnosis was incorrect: it appears that the NIC had PHY polling mode enabled, which meant that it overwrote the PHYs advertisement register during negotiation. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Tested-by: Yonglong Liu <liuyonglong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-02Merge tag 'devicetree-fixes-for-5.0-3' of ↵Linus Torvalds1-1/+5
git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux Pull Devicetree fix from Rob Herring: "A single fix for building DT bindings in-tree" * tag 'devicetree-fixes-for-5.0-3' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: dt-bindings: Fix dt_binding_check target for in tree builds
2019-02-02Merge tag 'riscv-for-linus-5.0-rc5' of ↵Linus Torvalds10-19/+38
git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux Pull RISC-V fixes from Palmer Dabbelt: "This contains a handful of mostly-independent patches: - make our port respect TIF_NEED_RESCHED, which fixes CONFIG_PREEMPT=y kernels - fix double-put of OF nodes - fix a misspelling of target in our Kconfig - generic PCIe is enabled in our defconfig - fix our SBI early console to properly handle line endings - fix max_low_pfn being counted in PFNs - a change to TASK_UNMAPPED_BASE to match what other arches do This has passed my standard 'boot Fedora' flow" * tag 'riscv-for-linus-5.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux: riscv: Adjust mmap base address at a third of task size riscv: fixup max_low_pfn with PFN_DOWN. tty/serial: use uart_console_write in the RISC-V SBL early console RISC-V: defconfig: Add CRYPTO_DEV_VIRTIO=y RISC-V: defconfig: Enable Generic PCIE by default RISC-V: defconfig: Move CONFIG_PCI{,E_XILINX} RISC-V: Kconfig: fix spelling mistake "traget" -> "target" RISC-V: asm/page.h: fix spelling mistake "CONFIG_64BITS" -> "CONFIG_64BIT" RISC-V: fix bad use of of_node_put RISC-V: Add _TIF_NEED_RESCHED check for kernel thread when CONFIG_PREEMPT=y
2019-02-02Merge tag 'for-linus-20190202' of git://git.kernel.dk/linux-blockLinus Torvalds9-53/+93
Pull block fixes from Jens Axboe: "A few fixes that should go into this release. This contains: - MD pull request from Song, fixing a recovery OOM issue (Alexei) - Fix for a sync related stall (Jianchao) - Dummy callback for timeouts (Tetsuo) - IDE atapi sense ordering fix (me)" * tag 'for-linus-20190202' of git://git.kernel.dk/linux-block: ide: ensure atapi sense request aren't preempted blk-mq: fix a hung issue when fsync block: pass no-op callback to INIT_WORK(). md/raid5: fix 'out of memory' during raid cache recovery
2019-02-02Merge tag 'scsi-fixes' of ↵Linus Torvalds6-27/+29
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "Five minor bug fixes. The libfc one is a tiny memory leak, the zfcp one is an incorrect user visible parameter and the rest are on error legs or obscure features" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: 53c700: pass correct "dev" to dma_alloc_attrs() scsi: bnx2fc: Fix error handling in probe() scsi: scsi_debug: fix write_same with virtual_gb problem scsi: libfc: free skb when receiving invalid flogi resp scsi: zfcp: fix sysfs block queue limit output for max_segment_size