summaryrefslogtreecommitdiff
path: root/lib
AgeCommit message (Collapse)AuthorFilesLines
2018-06-07Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds3-250/+574
Pull networking updates from David Miller: 1) Add Maglev hashing scheduler to IPVS, from Inju Song. 2) Lots of new TC subsystem tests from Roman Mashak. 3) Add TCP zero copy receive and fix delayed acks and autotuning with SO_RCVLOWAT, from Eric Dumazet. 4) Add XDP_REDIRECT support to mlx5 driver, from Jesper Dangaard Brouer. 5) Add ttl inherit support to vxlan, from Hangbin Liu. 6) Properly separate ipv6 routes into their logically independant components. fib6_info for the routing table, and fib6_nh for sets of nexthops, which thus can be shared. From David Ahern. 7) Add bpf_xdp_adjust_tail helper, which can be used to generate ICMP messages from XDP programs. From Nikita V. Shirokov. 8) Lots of long overdue cleanups to the r8169 driver, from Heiner Kallweit. 9) Add BTF ("BPF Type Format"), from Martin KaFai Lau. 10) Add traffic condition monitoring to iwlwifi, from Luca Coelho. 11) Plumb extack down into fib_rules, from Roopa Prabhu. 12) Add Flower classifier offload support to igb, from Vinicius Costa Gomes. 13) Add UDP GSO support, from Willem de Bruijn. 14) Add documentation for eBPF helpers, from Quentin Monnet. 15) Add TLS tx offload to mlx5, from Ilya Lesokhin. 16) Allow applications to be given the number of bytes available to read on a socket via a control message returned from recvmsg(), from Soheil Hassas Yeganeh. 17) Add x86_32 eBPF JIT compiler, from Wang YanQing. 18) Add AF_XDP sockets, with zerocopy support infrastructure as well. From Björn Töpel. 19) Remove indirect load support from all of the BPF JITs and handle these operations in the verifier by translating them into native BPF instead. From Daniel Borkmann. 20) Add GRO support to ipv6 gre tunnels, from Eran Ben Elisha. 21) Allow XDP programs to do lookups in the main kernel routing tables for forwarding. From David Ahern. 22) Allow drivers to store hardware state into an ELF section of kernel dump vmcore files, and use it in cxgb4. From Rahul Lakkireddy. 23) Various RACK and loss detection improvements in TCP, from Yuchung Cheng. 24) Add TCP SACK compression, from Eric Dumazet. 25) Add User Mode Helper support and basic bpfilter infrastructure, from Alexei Starovoitov. 26) Support ports and protocol values in RTM_GETROUTE, from Roopa Prabhu. 27) Support bulking in ->ndo_xdp_xmit() API, from Jesper Dangaard Brouer. 28) Add lots of forwarding selftests, from Petr Machata. 29) Add generic network device failover driver, from Sridhar Samudrala. * ra.kernel.org:/pub/scm/linux/kernel/git/davem/net-next: (1959 commits) strparser: Add __strp_unpause and use it in ktls. rxrpc: Fix terminal retransmission connection ID to include the channel net: hns3: Optimize PF CMDQ interrupt switching process net: hns3: Fix for VF mailbox receiving unknown message net: hns3: Fix for VF mailbox cannot receiving PF response bnx2x: use the right constant Revert "net: sched: cls: Fix offloading when ingress dev is vxlan" net: dsa: b53: Fix for brcm tag issue in Cygnus SoC enic: fix UDP rss bits netdev-FAQ: clarify DaveM's position for stable backports rtnetlink: validate attributes in do_setlink() mlxsw: Add extack messages for port_{un, }split failures netdevsim: Add extack error message for devlink reload devlink: Add extack to reload and port_{un, }split operations net: metrics: add proper netlink validation ipmr: fix error path when ipmr_new_table fails ip6mr: only set ip6mr_table from setsockopt when ip6mr_new_table succeeds net: hns3: remove unused hclgevf_cfg_func_mta_filter netfilter: provide udp*_lib_lookup for nf_tproxy qed*: Utilize FW 8.37.2.0 ...
2018-06-07Merge tag 'overflow-v4.18-rc1' of ↵Linus Torvalds3-0/+421
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull overflow updates from Kees Cook: "This adds the new overflow checking helpers and adds them to the 2-factor argument allocators. And this adds the saturating size helpers and does a treewide replacement for the struct_size() usage. Additionally this adds the overflow testing modules to make sure everything works. I'm still working on the treewide replacements for allocators with "simple" multiplied arguments: *alloc(a * b, ...) -> *alloc_array(a, b, ...) and *zalloc(a * b, ...) -> *calloc(a, b, ...) as well as the more complex cases, but that's separable from this portion of the series. I expect to have the rest sent before -rc1 closes; there are a lot of messy cases to clean up. Summary: - Introduce arithmetic overflow test helper functions (Rasmus) - Use overflow helpers in 2-factor allocators (Kees, Rasmus) - Introduce overflow test module (Rasmus, Kees) - Introduce saturating size helper functions (Matthew, Kees) - Treewide use of struct_size() for allocators (Kees)" * tag 'overflow-v4.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: treewide: Use struct_size() for devm_kmalloc() and friends treewide: Use struct_size() for vmalloc()-family treewide: Use struct_size() for kmalloc()-family device: Use overflow helpers for devm_kmalloc() mm: Use overflow helpers in kvmalloc() mm: Use overflow helpers in kmalloc_array*() test_overflow: Add memory allocation overflow tests overflow.h: Add allocation size calculation helpers test_overflow: Report test failures test_overflow: macrofy some more, do more tests for free lib: add runtime test of check_*_overflow functions compiler.h: enable builtin overflow checkers and add fallback code
2018-06-07Merge tag 'printk-for-4.18' of ↵Linus Torvalds2-81/+54
git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk Pull printk updates from Petr Mladek: - Help userspace log daemons to catch up with a flood of messages. They will get woken after each message even if the console is far behind and handled by another process. - Flush printk safe buffers safely even when panic() happens in the normal context. - Fix possible va_list reuse when race happened in printk_safe(). - Remove %pCr printf format to prevent sleeping in the atomic context. - Misc vsprintf code cleanup. * tag 'printk-for-4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk: printk: drop in_nmi check from printk_safe_flush_on_panic() lib/vsprintf: Remove atomic-unsafe support for %pCr serial: sh-sci: Stop using printk format %pCr thermal: bcm2835: Stop using printk format %pCr clk: renesas: cpg-mssr: Stop using printk format %pCr printk: fix possible reuse of va_list variable printk: wake up klogd in vprintk_emit vsprintf: Tweak pF/pf comment lib/vsprintf: Mark expected switch fall-through lib/vsprintf: Replace space with '_' before crng is ready lib/vsprintf: Deduplicate pointer_string() lib/vsprintf: Move pointer_string() upper lib/vsprintf: Make flag_spec global lib/vsprintf: Make strspec global lib/vsprintf: Make dec_spec global lib/test_printf: Mark big constant with UL
2018-06-05test_overflow: Add memory allocation overflow testsKees Cook1-0/+110
Make sure that the memory allocators are behaving as expected in the face of overflows of multiplied arguments or when using the array_size()-family helpers. Example output of new tests (with the expected __alloc_pages_slowpath and vmalloc warnings about refusing giant allocations removed): [ 93.062076] test_overflow: kmalloc detected saturation [ 93.062988] test_overflow: kmalloc_node detected saturation [ 93.063818] test_overflow: kzalloc detected saturation [ 93.064539] test_overflow: kzalloc_node detected saturation [ 93.120386] test_overflow: kvmalloc detected saturation [ 93.143458] test_overflow: kvmalloc_node detected saturation [ 93.166861] test_overflow: kvzalloc detected saturation [ 93.189924] test_overflow: kvzalloc_node detected saturation [ 93.221671] test_overflow: vmalloc detected saturation [ 93.246326] test_overflow: vmalloc_node detected saturation [ 93.270260] test_overflow: vzalloc detected saturation [ 93.293824] test_overflow: vzalloc_node detected saturation [ 93.294597] test_overflow: devm_kmalloc detected saturation [ 93.295383] test_overflow: devm_kzalloc detected saturation [ 93.296217] test_overflow: all tests passed Signed-off-by: Kees Cook <keescook@chromium.org>
2018-06-05test_overflow: Report test failuresKees Cook1-15/+39
This adjusts the overflow test to report failures, and prepares to add allocation tests. Signed-off-by: Kees Cook <keescook@chromium.org>
2018-06-05test_overflow: macrofy some more, do more tests for freeRasmus Villemoes1-26/+22
Obviously a+b==b+a and a*b==b*a, but the implementation of the fallback checks are not entirely symmetric in how they treat a and b. So we might as well check the (b,a,r,of) tuple as well as the (a,b,r,of) one for + and *. Rather than more copy-paste, factor out the common part to check_one_op. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Kees Cook <keescook@chromium.org>
2018-06-05lib: add runtime test of check_*_overflow functionsRasmus Villemoes3-0/+291
This adds a small module for testing that the check_*_overflow functions work as expected, whether implemented in C or using gcc builtins. Example output: test_overflow: u8 : 18 tests test_overflow: s8 : 19 tests test_overflow: u16: 17 tests test_overflow: s16: 17 tests test_overflow: u32: 17 tests test_overflow: s32: 17 tests test_overflow: u64: 17 tests test_overflow: s64: 21 tests Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> [kees: add output to commit log, drop u64 tests on 32-bit] Signed-off-by: Kees Cook <keescook@chromium.org>
2018-06-05Merge tag 'rslib-v4.18-rc1' of ↵Linus Torvalds3-130/+159
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull reed-salomon library updates from Kees Cook: "Refactors rslib and callers to provide a per-instance allocation area instead of performing VLAs on the stack" * tag 'rslib-v4.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: rslib: Allocate decoder buffers to avoid VLAs mtd: rawnand: diskonchip: Allocate rs control per instance rslib: Split rs control struct rslib: Simplify error path rslib: Remove GPL boilerplate rslib: Add SPDX identifiers rslib: Cleanup top level comments rslib: Cleanup whitespace damage dm/verity_fec: Use GFP aware reed solomon init rslib: Add GFP aware init function
2018-06-05Merge branch 'for-4.18-vsprintf-pcr-removal' into for-4.18Petr Mladek1-3/+0
2018-06-05lib/vsprintf: Remove atomic-unsafe support for %pCrGeert Uytterhoeven1-3/+0
"%pCr" formats the current rate of a clock, and calls clk_get_rate(). The latter obtains a mutex, hence it must not be called from atomic context. Remove support for this rarely-used format, as vsprintf() (and e.g. printk()) must be callable from any context. Any remaining out-of-tree users will start seeing the clock's name printed instead of its rate. Reported-by: Jia-Ju Bai <baijiaju1990@gmail.com> Fixes: 900cca2944254edd ("lib/vsprintf: add %pC{,n,r} format specifiers for clocks") Link: http://lkml.kernel.org/r/1527845302-12159-5-git-send-email-geert+renesas@glider.be To: Jia-Ju Bai <baijiaju1990@gmail.com> To: Jonathan Corbet <corbet@lwn.net> To: Michael Turquette <mturquette@baylibre.com> To: Stephen Boyd <sboyd@kernel.org> To: Zhang Rui <rui.zhang@intel.com> To: Eduardo Valentin <edubezval@gmail.com> To: Eric Anholt <eric@anholt.net> To: Stefan Wahren <stefan.wahren@i2se.com> To: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: linux-doc@vger.kernel.org Cc: linux-clk@vger.kernel.org Cc: linux-pm@vger.kernel.org Cc: linux-serial@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-renesas-soc@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: Geert Uytterhoeven <geert+renesas@glider.be> Cc: stable@vger.kernel.org # 4.1+ Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Petr Mladek <pmladek@suse.com>
2018-06-05Merge branch 'x86-dax-for-linus' of ↵Linus Torvalds1-0/+61
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 dax updates from Ingo Molnar: "This contains x86 memcpy_mcsafe() fault handling improvements the nvdimm tree would like to make more use of" * 'x86-dax-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/asm/memcpy_mcsafe: Define copy_to_iter_mcsafe() x86/asm/memcpy_mcsafe: Add write-protection-fault handling x86/asm/memcpy_mcsafe: Return bytes remaining x86/asm/memcpy_mcsafe: Add labels for __memcpy_mcsafe() write fault handling x86/asm/memcpy_mcsafe: Remove loop unrolling
2018-06-04Merge tag 'dma-mapping-4.18' of git://git.infradead.org/users/hch/dma-mappingLinus Torvalds9-321/+232
Pull dma-mapping updates from Christoph Hellwig: - replace the force_dma flag with a dma_configure bus method. (Nipun Gupta, although one patch is іncorrectly attributed to me due to a git rebase bug) - use GFP_DMA32 more agressively in dma-direct. (Takashi Iwai) - remove PCI_DMA_BUS_IS_PHYS and rely on the dma-mapping API to do the right thing for bounce buffering. - move dma-debug initialization to common code, and apply a few cleanups to the dma-debug code. - cleanup the Kconfig mess around swiotlb selection - swiotlb comment fixup (Yisheng Xie) - a trivial swiotlb fix. (Dan Carpenter) - support swiotlb on RISC-V. (based on a patch from Palmer Dabbelt) - add a new generic dma-noncoherent dma_map_ops implementation and use it for arc, c6x and nds32. - improve scatterlist validity checking in dma-debug. (Robin Murphy) - add a struct device quirk to limit the dma-mask to 32-bit due to bridge/system issues, and switch x86 to use it instead of a local hack for VIA bridges. - handle devices without a dma_mask more gracefully in the dma-direct code. * tag 'dma-mapping-4.18' of git://git.infradead.org/users/hch/dma-mapping: (48 commits) dma-direct: don't crash on device without dma_mask nds32: use generic dma_noncoherent_ops nds32: implement the unmap_sg DMA operation nds32: consolidate DMA cache maintainance routines x86/pci-dma: switch the VIA 32-bit DMA quirk to use the struct device flag x86/pci-dma: remove the explicit nodac and allowdac option x86/pci-dma: remove the experimental forcesac boot option Documentation/x86: remove a stray reference to pci-nommu.c core, dma-direct: add a flag 32-bit dma limits dma-mapping: remove unused gfp_t parameter to arch_dma_alloc_attrs dma-debug: check scatterlist segments c6x: use generic dma_noncoherent_ops arc: use generic dma_noncoherent_ops arc: fix arc_dma_{map,unmap}_page arc: fix arc_dma_sync_sg_for_{cpu,device} arc: simplify arc_dma_sync_single_for_{cpu,device} dma-mapping: provide a generic dma-noncoherent implementation dma-mapping: simplify Kconfig dependencies riscv: add swiotlb support riscv: only enable ZONE_DMA32 for 64-bit ...
2018-06-04Merge tag 'for-4.18/block-20180603' of git://git.kernel.dk/linux-blockLinus Torvalds1-32/+81
Pull block updates from Jens Axboe: - clean up how we pass around gfp_t and blk_mq_req_flags_t (Christoph) - prepare us to defer scheduler attach (Christoph) - clean up drivers handling of bounce buffers (Christoph) - fix timeout handling corner cases (Christoph/Bart/Keith) - bcache fixes (Coly) - prep work for bcachefs and some block layer optimizations (Kent). - convert users of bio_sets to using embedded structs (Kent). - fixes for the BFQ io scheduler (Paolo/Davide/Filippo) - lightnvm fixes and improvements (Matias, with contributions from Hans and Javier) - adding discard throttling to blk-wbt (me) - sbitmap blk-mq-tag handling (me/Omar/Ming). - remove the sparc jsflash block driver, acked by DaveM. - Kyber scheduler improvement from Jianchao, making it more friendly wrt merging. - conversion of symbolic proc permissions to octal, from Joe Perches. Previously the block parts were a mix of both. - nbd fixes (Josef and Kevin Vigor) - unify how we handle the various kinds of timestamps that the block core and utility code uses (Omar) - three NVMe pull requests from Keith and Christoph, bringing AEN to feature completeness, file backed namespaces, cq/sq lock split, and various fixes - various little fixes and improvements all over the map * tag 'for-4.18/block-20180603' of git://git.kernel.dk/linux-block: (196 commits) blk-mq: update nr_requests when switching to 'none' scheduler block: don't use blocking queue entered for recursive bio submits dm-crypt: fix warning in shutdown path lightnvm: pblk: take bitmap alloc. out of critical section lightnvm: pblk: kick writer on new flush points lightnvm: pblk: only try to recover lines with written smeta lightnvm: pblk: remove unnecessary bio_get/put lightnvm: pblk: add possibility to set write buffer size manually lightnvm: fix partial read error path lightnvm: proper error handling for pblk_bio_add_pages lightnvm: pblk: fix smeta write error path lightnvm: pblk: garbage collect lines with failed writes lightnvm: pblk: rework write error recovery path lightnvm: pblk: remove dead function lightnvm: pass flag on graceful teardown to targets lightnvm: pblk: check for chunk size before allocating it lightnvm: pblk: remove unnecessary argument lightnvm: pblk: remove unnecessary indirection lightnvm: pblk: return NVM_ error on failed submission lightnvm: pblk: warn in case of corrupted write buffer ...
2018-06-03bpf: add also cbpf long jump test cases with heavy expansionDaniel Borkmann1-0/+63
We have one triggering on eBPF but lets also add a cBPF example to make sure we keep tracking them. Also add anther cBPF test running max number of MSH ops. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-05-31dma-direct: don't crash on device without dma_maskChristoph Hellwig1-0/+7
Print a useful warning instead. Reported-by: Finn Thain <fthain@telegraphics.com.au> Tested-by: Finn Thain <fthain@telegraphics.com.au> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-05-28core, dma-direct: add a flag 32-bit dma limitsChristoph Hellwig1-0/+6
Various PCI bridges (VIA PCI, Xilinx PCIe) limit DMA to only 32-bits even if the device itself supports more. Add a single bit flag to struct device (to be moved into the dma extension once we get to it) to flag such devices and reject larger DMA to them. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2-3/+5
Lots of easy overlapping changes in the confict resolutions here. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-26idr: fix invalid ptr dereference on item deleteMatthew Wilcox1-1/+3
If the radix tree underlying the IDR happens to be full and we attempt to remove an id which is larger than any id in the IDR, we will call __radix_tree_delete() with an uninitialised 'slot' pointer, at which point anything could happen. This was easiest to hit with a single entry at id 0 and attempting to remove a non-0 id, but it could have happened with 64 entries and attempting to remove an id >= 64. Roman said: The syzcaller test boils down to opening /dev/kvm, creating an eventfd, and calling a couple of KVM ioctls. None of this requires superuser. And the result is dereferencing an uninitialized pointer which is likely a crash. The specific path caught by syzbot is via KVM_HYPERV_EVENTD ioctl which is new in 4.17. But I guess there are other user-triggerable paths, so cc:stable is probably justified. Matthew added: We have around 250 calls to idr_remove() in the kernel today. Many of them pass an ID which is embedded in the object they're removing, so they're safe. Picking a few likely candidates: drivers/firewire/core-cdev.c looks unsafe; the ID comes from an ioctl. drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c is similar drivers/atm/nicstar.c could be taken down by a handcrafted packet Link: http://lkml.kernel.org/r/20180518175025.GD6361@bombadil.infradead.org Fixes: 0a835c4f090a ("Reimplement IDR and IDA using the radix tree") Reported-by: <syzbot+35666cba7f0a337e2e79@syzkaller.appspotmail.com> Debugged-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-05-24blk-mq: avoid starving tag allocation after allocating process migratesMing Lei1-14/+15
When the allocation process is scheduled back and the mapped hw queue is changed, fake one extra wake up on previous queue for compensating wake up miss, so other allocations on the previous queue won't be starved. This patch fixes one request allocation hang issue, which can be triggered easily in case of very low nr_request. The race is as follows: 1) 2 hw queues, nr_requests are 2, and wake_batch is one 2) there are 3 waiters on hw queue 0 3) two in-flight requests in hw queue 0 are completed, and only two waiters of 3 are waken up because of wake_batch, but both the two waiters can be scheduled to another CPU and cause to switch to hw queue 1 4) then the 3rd waiter will wait for ever, since no in-flight request is in hw queue 0 any more. 5) this patch fixes it by the fake wakeup when waiter is scheduled to another hw queue Cc: <stable@vger.kernel.org> Reviewed-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Modified commit message to make it clearer, and make it apply on top of the 4.18 branch. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-24dma-debug: check scatterlist segmentsRobin Murphy2-0/+45
Drivers/subsystems creating scatterlists for DMA should be taking care to respect the scatter-gather limitations of the appropriate device, as described by dma_parms. A DMA API implementation cannot feasibly split a scatterlist into *more* entries than originally passed, so it is not well defined what they should do when given a segment larger than the limit they are also required to respect. Conversely, devices which are less limited than the rather conservative defaults, or indeed have no limitations at all (e.g. GPUs with their own internal MMU), should be encouraged to set appropriate dma_parms, as they may get more efficient DMA mapping performance out of it. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-05-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller5-23/+39
S390 bpf_jit.S is removed in net-next and had changes in 'net', since that code isn't used any more take the removal. TLS data structures split the TX and RX components in 'net-next', put the new struct members from the bug fix in 'net' into the RX part. The 'net-next' tree had some reworking of how the ERSPAN code works in the GRE tunneling code, overlapping with a one-line headroom calculation fix in 'net'. Overlapping changes in __sock_map_ctx_update_elem(), keep the bits that read the prog members via READ_ONCE() into local variables before using them. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-21Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds1-2/+2
Pull vfs fixes from Al Viro: "Assorted fixes all over the place" * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: aio: fix io_destroy(2) vs. lookup_ioctx() race ext2: fix a block leak nfsd: vfs_mkdir() might succeed leaving dentry negative unhashed cachefiles: vfs_mkdir() might succeed leaving dentry negative unhashed unfuck sysfs_mount() kernfs: deal with kernfs_fill_super() failures cramfs: Fix IS_ENABLED typo befs_lookup(): use d_splice_alias() affs_lookup: switch to d_splice_alias() affs_lookup(): close a race with affs_remove_link() fix breakage caused by d_find_alias() semantics change fs: don't scan the inode cache before SB_BORN is set do d_instantiate/unlock_new_inode combinations safely iov_iter: fix memory leak in pipe_get_pages_alloc() iov_iter: fix return type of __pipe_get_pages()
2018-05-19dma-mapping: provide a generic dma-noncoherent implementationChristoph Hellwig4-4/+127
Add a new dma_map_ops implementation that uses dma-direct for the address mapping of streaming mappings, and which requires arch-specific implemenations of coherent allocate/free. Architectures have to provide flushing helpers to ownership trasnfers to the device and/or CPU, and can provide optional implementations of the coherent mmap functionality, and the cache_flush routines for non-coherent long term allocations. Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Alexey Brodkin <abrodkin@synopsys.com> Acked-by: Vineet Gupta <vgupta@synopsys.com>
2018-05-19dma-mapping: simplify Kconfig dependenciesChristoph Hellwig1-4/+2
ARCH_DMA_ADDR_T_64BIT is always true for 64-bit architectures now, so we can skip the clause requiring it. 'n' is the default default, so no need to explicitly state it. Tested-by: Alexey Brodkin <abrodkin@synopsys.com> Acked-by: Vineet Gupta <vgupta@synopsys.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-05-19radix tree: fix multi-order iteration raceRoss Zwisler1-4/+2
Fix a race in the multi-order iteration code which causes the kernel to hit a GP fault. This was first seen with a production v4.15 based kernel (4.15.6-300.fc27.x86_64) utilizing a DAX workload which used order 9 PMD DAX entries. The race has to do with how we tear down multi-order sibling entries when we are removing an item from the tree. Remember for example that an order 2 entry looks like this: struct radix_tree_node.slots[] = [entry][sibling][sibling][sibling] where 'entry' is in some slot in the struct radix_tree_node, and the three slots following 'entry' contain sibling pointers which point back to 'entry.' When we delete 'entry' from the tree, we call : radix_tree_delete() radix_tree_delete_item() __radix_tree_delete() replace_slot() replace_slot() first removes the siblings in order from the first to the last, then at then replaces 'entry' with NULL. This means that for a brief period of time we end up with one or more of the siblings removed, so: struct radix_tree_node.slots[] = [entry][NULL][sibling][sibling] This causes an issue if you have a reader iterating over the slots in the tree via radix_tree_for_each_slot() while only under rcu_read_lock()/rcu_read_unlock() protection. This is a common case in mm/filemap.c. The issue is that when __radix_tree_next_slot() => skip_siblings() tries to skip over the sibling entries in the slots, it currently does so with an exact match on the slot directly preceding our current slot. Normally this works: V preceding slot struct radix_tree_node.slots[] = [entry][sibling][sibling][sibling] ^ current slot This lets you find the first sibling, and you skip them all in order. But in the case where one of the siblings is NULL, that slot is skipped and then our sibling detection is interrupted: V preceding slot struct radix_tree_node.slots[] = [entry][NULL][sibling][sibling] ^ current slot This means that the sibling pointers aren't recognized since they point all the way back to 'entry', so we think that they are normal internal radix tree pointers. This causes us to think we need to walk down to a struct radix_tree_node starting at the address of 'entry'. In a real running kernel this will crash the thread with a GP fault when you try and dereference the slots in your broken node starting at 'entry'. We fix this race by fixing the way that skip_siblings() detects sibling nodes. Instead of testing against the preceding slot we instead look for siblings via is_sibling_entry() which compares against the position of the struct radix_tree_node.slots[] array. This ensures that sibling entries are properly identified, even if they are no longer contiguous with the 'entry' they point to. Link: http://lkml.kernel.org/r/20180503192430.7582-6-ross.zwisler@linux.intel.com Fixes: 148deab223b2 ("radix-tree: improve multiorder iterators") Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Reported-by: CR, Sapthagirish <sapthagirish.cr@intel.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Matthew Wilcox <willy@infradead.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Chinner <david@fromorbit.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-05-19lib/test_bitmap.c: fix bitmap optimisation tests to report errors correctlyMatthew Wilcox1-6/+15
I had neglected to increment the error counter when the tests failed, which made the tests noisy when they fail, but not actually return an error code. Link: http://lkml.kernel.org/r/20180509114328.9887-1-mpe@ellerman.id.au Fixes: 3cc78125a081 ("lib/test_bitmap.c: add optimisation tests") Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reported-by: Michael Ellerman <mpe@ellerman.id.au> Tested-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: Yury Norov <ynorov@caviumnetworks.com> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: <stable@vger.kernel.org> [4.13+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-05-16vsprintf: Replace memory barrier with static_key for random_ptr_key updateSteven Rostedt (VMware)1-11/+15
Reviewing Tobin's patches for getting pointers out early before entropy has been established, I noticed that there's a lone smp_mb() in the code. As with most lone memory barriers, this one appears to be incorrectly used. We currently basically have this: get_random_bytes(&ptr_key, sizeof(ptr_key)); /* * have_filled_random_ptr_key==true is dependent on get_random_bytes(). * ptr_to_id() needs to see have_filled_random_ptr_key==true * after get_random_bytes() returns. */ smp_mb(); WRITE_ONCE(have_filled_random_ptr_key, true); And later we have: if (unlikely(!have_filled_random_ptr_key)) return string(buf, end, "(ptrval)", spec); /* Missing memory barrier here. */ hashval = (unsigned long)siphash_1u64((u64)ptr, &ptr_key); As the CPU can perform speculative loads, we could have a situation with the following: CPU0 CPU1 ---- ---- load ptr_key = 0 store ptr_key = random smp_mb() store have_filled_random_ptr_key load have_filled_random_ptr_key = true BAD BAD BAD! (you're so bad!) Because nothing prevents CPU1 from loading ptr_key before loading have_filled_random_ptr_key. But this race is very unlikely, but we can't keep an incorrect smp_mb() in place. Instead, replace the have_filled_random_ptr_key with a static_branch not_filled_random_ptr_key, that is initialized to true and changed to false when we get enough entropy. If the update happens in early boot, the static_key is updated immediately, otherwise it will have to wait till entropy is filled and this happens in an interrupt handler which can't enable a static_key, as that requires a preemptible context. In that case, a work_queue is used to enable it, as entropy already took too long to establish in the first place waiting a little more shouldn't hurt anything. The benefit of using the static key is that the unlikely branch in vsprintf() now becomes a nop. Link: http://lkml.kernel.org/r/20180515100558.21df515e@gandalf.local.home Cc: stable@vger.kernel.org Fixes: ad67b74d2469d ("printk: hash addresses printed with %p") Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-05-15x86/asm/memcpy_mcsafe: Define copy_to_iter_mcsafe()Dan Williams1-0/+61
Use the updated memcpy_mcsafe() implementation to define copy_user_mcsafe() and copy_to_iter_mcsafe(). The most significant difference from typical copy_to_iter() is that the ITER_KVEC and ITER_BVEC iterator types can fail to complete a full transfer. Signed-off-by: Dan Williams <dan.j.williams@intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: hch@lst.de Cc: linux-fsdevel@vger.kernel.org Cc: linux-nvdimm@lists.01.org Link: http://lkml.kernel.org/r/152539239150.31796.9189779163576449784.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-05-14sbitmap: fix race in wait batch accountingJens Axboe1-10/+25
If we have multiple callers of sbq_wake_up(), we can end up in a situation where the wait_cnt will continually go more and more negative. Consider the case where our wake batch is 1, hence wait_cnt will start out as 1. wait_cnt == 1 CPU0 CPU1 atomic_dec_return(), cnt == 0 atomic_dec_return(), cnt == -1 cmpxchg(-1, 0) (succeeds) [wait_cnt now 0] cmpxchg(0, 1) (fails) This ends up with wait_cnt being 0, we'll wakeup immediately next time. Going through the same loop as above again, and we'll have wait_cnt -1. For the case where we have a larger wake batch, the only difference is that the starting point will be higher. We'll still end up with continually smaller batch wakeups, which defeats the purpose of the rolling wakeups. Always reset the wait_cnt to the batch value. Then it doesn't matter who wins the race. But ensure that whomever does win the race is the one that increments the ws index and wakes up our batch count, loser gets to call __sbq_wake_up() again to account his wakeups towards the next active wait state index. Fixes: 6c0ca7ae292a ("sbitmap: fix wakeup hang after sbq resize") Reviewed-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-13Merge tag 'dma-mapping-4.17-5' of git://git.infradead.org/users/hch/dma-mappingLinus Torvalds1-1/+1
Pull dma-mapping fix from Christoph Hellwig: "Just one little fix from Jean to avoid a harmless but very annoying warning, especially for the drm code" * tag 'dma-mapping-4.17-5' of git://git.infradead.org/users/hch/dma-mapping: swiotlb: silent unwanted warning "buffer is full"
2018-05-12swiotlb: silent unwanted warning "buffer is full"Jean Delvare1-1/+1
If DMA_ATTR_NO_WARN is passed to swiotlb_alloc_buffer(), it should be passed further down to swiotlb_tbl_map_single(). Otherwise we escape half of the warnings but still log the other half. This is one of the multiple causes of spurious warnings reported at: https://bugs.freedesktop.org/show_bug.cgi?id=104082 Signed-off-by: Jean Delvare <jdelvare@suse.de> Fixes: 0176adb00406 ("swiotlb: refactor coherent buffer allocation") Cc: Christoph Hellwig <hch@lst.de> Cc: Christian König <christian.koenig@amd.com> Cc: Michel Dänzer <michel@daenzer.net> Cc: Takashi Iwai <tiwai@suse.de> Cc: stable@vger.kernel.org # v4.16
2018-05-12lib/find_bit_benchmark.c: avoid soft lockup in test_find_first_bit()Yury Norov1-1/+6
test_find_first_bit() is intentionally sub-optimal, and may cause soft lockup due to long time of run on some systems. So decrease length of bitmap to traverse to avoid lockup. With the change below, time of test execution doesn't exceed 0.2 seconds on my testing system. Link: http://lkml.kernel.org/r/20180420171949.15710-1-ynorov@caviumnetworks.com Fixes: 4441fca0a27f5 ("lib: test module for find_*_bit() functions") Signed-off-by: Yury Norov <ynorov@caviumnetworks.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-05-10sbitmap: warn if using smaller shallow depth than was setupOmar Sandoval1-0/+2
Make sure the user passed the right value to sbitmap_queue_min_shallow_depth(). Acked-by: Paolo Valente <paolo.valente@linaro.org> Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-10sbitmap: fix missed wakeups caused by sbitmap_queue_get_shallow()Omar Sandoval1-9/+40
The sbitmap queue wake batch is calculated such that once allocations start blocking, all of the bits which are already allocated must be enough to fulfill the batch counters of all of the waitqueues. However, the shallow allocation depth can break this invariant, since we block before our full depth is being utilized. Add sbitmap_queue_min_shallow_depth(), which saves the minimum shallow depth the sbq will use, and update sbq_calc_wake_batch() to take it into account. Acked-by: Paolo Valente <paolo.valente@linaro.org> Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-09swiotlb: update comments to refer to physical instead of virtual addressesYisheng Xie1-3/+2
swiotlb use physical address of bounce buffer when do map and unmap, therefore, related comment should be updated. Signed-off-by: Yisheng Xie <xieyisheng1@huawei.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-05-09swiotlb: remove the CONFIG_DMA_DIRECT_OPS ifdefsChristoph Hellwig1-4/+0
swiotlb now selects the DMA_DIRECT_OPS config symbol, so this will always be true. Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-05-09swiotlb: move the SWIOTLB config symbol to lib/KconfigChristoph Hellwig1-0/+5
This way we have one central definition of it, and user can select it as needed. The new option is not user visible, which is the behavior it had in most architectures, with a few notable exceptions: - On x86_64 and mips/loongson3 it used to be user selectable, but defaulted to y. It now is unconditional, which seems like the right thing for 64-bit architectures without guaranteed availablity of IOMMUs. - on powerpc the symbol is user selectable and defaults to n, but many boards select it. This change assumes no working setup required a manual selection, but if that turned out to be wrong we'll have to add another select statement or two for the respective boards. Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-05-09arch: define the ARCH_DMA_ADDR_T_64BIT config symbol in lib/KconfigChristoph Hellwig1-0/+3
Define this symbol if the architecture either uses 64-bit pointers or the PHYS_ADDR_T_64BIT is set. This covers 95% of the old arch magic. We only need an additional select for Xen on ARM (why anyway?), and we now always set ARCH_DMA_ADDR_T_64BIT on mips boards with 64-bit physical addressing instead of only doing it when highmem is set. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: James Hogan <jhogan@kernel.org>
2018-05-09dma-mapping: move the NEED_DMA_MAP_STATE config symbol to lib/KconfigChristoph Hellwig2-0/+4
This way we have one central definition of it, and user can select it as needed. Note that we now also always select it when CONFIG_DMA_API_DEBUG is select, which fixes some incorrect checks in a few network drivers. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
2018-05-09scatterlist: move the NEED_SG_DMA_LENGTH config symbol to lib/KconfigChristoph Hellwig1-0/+3
This way we have one central definition of it, and user can select it as needed. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
2018-05-09iommu-helper: move the IOMMU_HELPER config symbol to lib/Christoph Hellwig1-0/+3
This way we have one central definition of it, and user can select it as needed. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
2018-05-09iommu-helper: mark iommu_is_span_boundary as inlineChristoph Hellwig1-11/+1
This avoids selecting IOMMU_HELPER just for this function. And we only use it once or twice in normal builds so this often even is a size reduction. Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-05-09iommu-helper: unexport iommu_area_allocChristoph Hellwig1-2/+0
This function is only used by built-in code. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
2018-05-09iommu-common: move to arch/sparcChristoph Hellwig2-268/+1
This code is only used by sparc, and all new iommu drivers should use the drivers/iommu/ framework. Also remove the unused exports. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: David S. Miller <davem@davemloft.net> Reviewed-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
2018-05-08dma-debug: remove CONFIG_HAVE_DMA_API_DEBUGChristoph Hellwig1-1/+0
There is no arch specific code required for dma-debug, so there is no need to opt into the support either. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Robin Murphy <robin.murphy@arm.com>
2018-05-08dma-debug: unexport dma_debug_resize_entries and debug_dma_dump_mappingsChristoph Hellwig1-2/+0
Only used by the AMD GART and Intel VT-D drivers, which must be built in. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Robin Murphy <robin.murphy@arm.com>
2018-05-08dma-debug: simplify counting of preallocated requestsChristoph Hellwig1-16/+4
Just keep a single variable with a descriptive name instead of two with confusing names. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Robin Murphy <robin.murphy@arm.com>
2018-05-08dma-debug: move initialization to common codeChristoph Hellwig1-7/+14
Most mainstream architectures are using 65536 entries, so lets stick to that. If someone is really desperate to override it that can still be done through <asm/dma-mapping.h>, but I'd rather see a really good rationale for that. dma_debug_init is now called as a core_initcall, which for many architectures means much earlier, and provides dma-debug functionality earlier in the boot process. This should be safe as it only relies on the memory allocator already being available. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Marek Szyprowski <m.szyprowski@samsung.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com>
2018-05-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller1-212/+358
Minor conflict, a CHECK was placed into an if() statement in net-next, whilst a newline was added to that CHECK call in 'net'. Thanks to Daniel for the merge resolution. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-07PCI: remove PCI_DMA_BUS_IS_PHYSChristoph Hellwig1-1/+0
This was used by the ide, scsi and networking code in the past to determine if they should bounce payloads. Now that the dma mapping always have to support dma to all physical memory (thanks to swiotlb for non-iommu systems) there is no need to this crude hack any more. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Palmer Dabbelt <palmer@sifive.com> (for riscv) Reviewed-by: Jens Axboe <axboe@kernel.dk>