summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2020-02-26selftests: nft_concat_range: Add test for reported add/flush/add issueStefano Brivio1-4/+39
Add a specific test for the crash reported by Phil Sutter and addressed in the previous patch. The test cases that, in my intention, should have covered these cases, that is, the ones from the 'concurrency' section, don't run these sequences tightly enough and spectacularly failed to catch this. While at it, define a convenient way to add these kind of tests, by adding a "reported issues" test section. It's more convenient, for this particular test, to execute the set setup in its own function. However, future test cases like this one might need to call setup functions, and will typically need no tools other than nft, so allow for this in check_tools(). The original form of the reproducer used here was provided by Phil. Reported-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-02-26nft_set_pipapo: Actually fetch key data in nft_pipapo_remove()Stefano Brivio1-2/+4
Phil reports that adding elements, flushing and re-adding them right away: nft add table t '{ set s { type ipv4_addr . inet_service; flags interval; }; }' nft add element t s '{ 10.0.0.1 . 22-25, 10.0.0.1 . 10-20 }' nft flush set t s nft add element t s '{ 10.0.0.1 . 10-20, 10.0.0.1 . 22-25 }' triggers, almost reliably, a crash like this one: [ 71.319848] general protection fault, probably for non-canonical address 0x6f6b6e696c2e756e: 0000 [#1] PREEMPT SMP PTI [ 71.321540] CPU: 3 PID: 1201 Comm: kworker/3:2 Not tainted 5.6.0-rc1-00377-g2bb07f4e1d861 #192 [ 71.322746] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190711_202441-buildvm-armv7-10.arm.fedoraproject.org-2.fc31 04/01/2014 [ 71.324430] Workqueue: events nf_tables_trans_destroy_work [nf_tables] [ 71.325387] RIP: 0010:nft_set_elem_destroy+0xa5/0x110 [nf_tables] [ 71.326164] Code: 89 d4 84 c0 74 0e 8b 77 44 0f b6 f8 48 01 df e8 41 ff ff ff 45 84 e4 74 36 44 0f b6 63 08 45 84 e4 74 2c 49 01 dc 49 8b 04 24 <48> 8b 40 38 48 85 c0 74 4f 48 89 e7 4c 8b [ 71.328423] RSP: 0018:ffffc9000226fd90 EFLAGS: 00010282 [ 71.329225] RAX: 6f6b6e696c2e756e RBX: ffff88813ab79f60 RCX: ffff88813931b5a0 [ 71.330365] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff88813ab79f9a [ 71.331473] RBP: ffff88813ab79f60 R08: 0000000000000008 R09: 0000000000000000 [ 71.332627] R10: 000000000000021c R11: 0000000000000000 R12: ffff88813ab79fc2 [ 71.333615] R13: ffff88813b3adf50 R14: dead000000000100 R15: ffff88813931b8a0 [ 71.334596] FS: 0000000000000000(0000) GS:ffff88813bd80000(0000) knlGS:0000000000000000 [ 71.335780] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 71.336577] CR2: 000055ac683710f0 CR3: 000000013a222003 CR4: 0000000000360ee0 [ 71.337533] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 71.338557] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 71.339718] Call Trace: [ 71.340093] nft_pipapo_destroy+0x7a/0x170 [nf_tables_set] [ 71.340973] nft_set_destroy+0x20/0x50 [nf_tables] [ 71.341879] nf_tables_trans_destroy_work+0x246/0x260 [nf_tables] [ 71.342916] process_one_work+0x1d5/0x3c0 [ 71.343601] worker_thread+0x4a/0x3c0 [ 71.344229] kthread+0xfb/0x130 [ 71.344780] ? process_one_work+0x3c0/0x3c0 [ 71.345477] ? kthread_park+0x90/0x90 [ 71.346129] ret_from_fork+0x35/0x40 [ 71.346748] Modules linked in: nf_tables_set nf_tables nfnetlink 8021q [last unloaded: nfnetlink] [ 71.348153] ---[ end trace 2eaa8149ca759bcc ]--- [ 71.349066] RIP: 0010:nft_set_elem_destroy+0xa5/0x110 [nf_tables] [ 71.350016] Code: 89 d4 84 c0 74 0e 8b 77 44 0f b6 f8 48 01 df e8 41 ff ff ff 45 84 e4 74 36 44 0f b6 63 08 45 84 e4 74 2c 49 01 dc 49 8b 04 24 <48> 8b 40 38 48 85 c0 74 4f 48 89 e7 4c 8b [ 71.350017] RSP: 0018:ffffc9000226fd90 EFLAGS: 00010282 [ 71.350019] RAX: 6f6b6e696c2e756e RBX: ffff88813ab79f60 RCX: ffff88813931b5a0 [ 71.350019] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff88813ab79f9a [ 71.350020] RBP: ffff88813ab79f60 R08: 0000000000000008 R09: 0000000000000000 [ 71.350021] R10: 000000000000021c R11: 0000000000000000 R12: ffff88813ab79fc2 [ 71.350022] R13: ffff88813b3adf50 R14: dead000000000100 R15: ffff88813931b8a0 [ 71.350025] FS: 0000000000000000(0000) GS:ffff88813bd80000(0000) knlGS:0000000000000000 [ 71.350026] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 71.350027] CR2: 000055ac683710f0 CR3: 000000013a222003 CR4: 0000000000360ee0 [ 71.350028] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 71.350028] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 71.350030] Kernel panic - not syncing: Fatal exception [ 71.350412] Kernel Offset: disabled [ 71.365922] ---[ end Kernel panic - not syncing: Fatal exception ]--- which is caused by dangling elements that have been deactivated, but never removed. On a flush operation, nft_pipapo_walk() walks through all the elements in the mapping table, which are then deactivated by nft_flush_set(), one by one, and added to the commit list for removal. Element data is then freed. On transaction commit, nft_pipapo_remove() is called, and failed to remove these elements, leading to the stale references in the mapping. The first symptom of this, revealed by KASan, is a one-byte use-after-free in subsequent calls to nft_pipapo_walk(), which is usually not enough to trigger a panic. When stale elements are used more heavily, though, such as double-free via nft_pipapo_destroy() as in Phil's case, the problem becomes more noticeable. The issue comes from that fact that, on a flush operation, nft_pipapo_remove() won't get the actual key data via elem->key, elements to be deleted upon commit won't be found by the lookup via pipapo_get(), and removal will be skipped. Key data should be fetched via nft_set_ext_key(), instead. Reported-by: Phil Sutter <phil@nwl.cc> Fixes: 3c4287f62044 ("nf_tables: Add set type for arbitrary concatenation of ranges") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-02-26Merge branch 'master' of git://blackhole.kfki.hu/nfPablo Neira Ayuso3-206/+474
Jozsef Kadlecsik says: ==================== ipset patches for nf The first one is larger than usual, but the issue could not be solved simpler. Also, it's a resend of the patch I submitted a few days ago, with a one line fix on top of that: the size of the comment extensions was not taken into account at reporting the full size of the set. - Fix "INFO: rcu detected stall in hash_xxx" reports of syzbot by introducing region locking and using workqueue instead of timer based gc of timed out entries in hash types of sets in ipset. - Fix the forceadd evaluation path - the bug was also uncovered by the syzbot. ==================== Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-02-26bootconfig: Fix CONFIG_BOOTTIME_TRACING dependency issueMasami Hiramatsu2-2/+1
Since commit d8a953ddde5e ("bootconfig: Set CONFIG_BOOT_CONFIG=n by default") also changed the CONFIG_BOOTTIME_TRACING to select CONFIG_BOOT_CONFIG to show the boot-time tracing on the menu, it introduced wrong dependencies with BLK_DEV_INITRD as below. WARNING: unmet direct dependencies detected for BOOT_CONFIG Depends on [n]: BLK_DEV_INITRD [=n] Selected by [y]: - BOOTTIME_TRACING [=y] && TRACING_SUPPORT [=y] && FTRACE [=y] && TRACING [=y] This makes the CONFIG_BOOT_CONFIG selects CONFIG_BLK_DEV_INITRD to fix this error and make CONFIG_BOOTTIME_TRACING=n by default, so that both boot-time tracing and boot configuration off but those appear on the menu list. Link: http://lkml.kernel.org/r/158264140162.23842.11237423518607465535.stgit@devnote2 Fixes: d8a953ddde5e ("bootconfig: Set CONFIG_BOOT_CONFIG=n by default") Reported-by: Randy Dunlap <rdunlap@infradead.org> Compiled-tested-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-02-25icmp: allow icmpv6_ndo_send to work with CONFIG_IPV6=nJason A. Donenfeld1-6/+10
The icmpv6_send function has long had a static inline implementation with an empty body for CONFIG_IPV6=n, so that code calling it doesn't need to be ifdef'd. The new icmpv6_ndo_send function, which is intended for drivers as a drop-in replacement with an identical function signature, should follow the same pattern. Without this patch, drivers that used to work with CONFIG_IPV6=n now result in a linker error. Cc: Chen Zhou <chenzhou10@huawei.com> Reported-by: Hulk Robot <hulkci@huawei.com> Fixes: 0b41713b6066 ("icmp: introduce helper for nat'd source address in network device context") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-25Merge tag 'riscv-for-linux-5.6-rc4' of ↵Linus Torvalds5-24/+53
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V fixes from Palmer Dabbelt: "This contains a handful of RISC-V related fixes that I've collected and would like to target for 5.6-rc4: - A fix to set up the PMPs on boot, which allows the kernel to access memory on systems that don't set up permissive PMPs before getting to Linux. This only effects machine-mode kernels, which currently means only NOMMU kernels. - A fix to avoid enabling supervisor-mode interrupts when running in machine-mode, also only for NOMMU kernels. - A pair of fixes to our KASAN support to avoid corrupting memory. - A gitignore fix. This boots on QEMU's virt board for me" * tag 'riscv-for-linux-5.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: riscv: adjust the indent riscv: allocate a complete page size for each page table riscv: Fix gitignore RISC-V: Don't enable all interrupts in trap_init() riscv: set pmp configuration if kernel is running in M-mode
2020-02-25Merge branch 'mips-fixes' of ↵Linus Torvalds8-29/+56
git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux Pull MIPS fixes from Paul Burton: "Here are a few MIPS fixes, and a MAINTAINERS update to hand over MIPS maintenance to Thomas Bogendoerfer - this will be my final pull request as MIPS maintainer. Thanks for your helpful comments, useful corrections & responsiveness during the time I've fulfilled the role, and I'm sure I'll pop up elsewhere in the tree somewhere down the line" * 'mips-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: MAINTAINERS: Hand MIPS over to Thomas MIPS: ingenic: DTS: Fix watchdog nodes MIPS: X1000: Fix clock of watchdog node. MIPS: vdso: Wrap -mexplicit-relocs in cc-option MIPS: VPE: Fix a double free and a memory leak in 'release_vpe()' MIPS: cavium_octeon: Fix syncw generation. mips: vdso: add build time check that no 'jalr t9' calls left MIPS: Disable VDSO time functionality on microMIPS mips: vdso: fix 'jalr t9' crash in vdso code
2020-02-25selftests: nft_concat_range: Move option for 'list ruleset' before commandStefano Brivio1-6/+6
Before nftables commit fb9cea50e8b3 ("main: enforce options before commands"), 'nft list ruleset -a' happened to work, but it's wrong and won't work anymore. Replace it by 'nft -a list ruleset'. Reported-by: Chen Yi <yiche@redhat.com> Fixes: 611973c1e06f ("selftests: netfilter: Introduce tests for sets with range concatenation") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-02-25docs: Fix empty parallelism argumentKees Cook1-1/+1
When there was no parallelism (no top-level -j arg and a pre-1.7 sphinx-build), the argument passed would be empty ("") instead of just being missing, which would (understandably) badly confuse sphinx-build. Fix this by removing the quotes. Reported-by: Rafael J. Wysocki <rafael@kernel.org> Fixes: 51e46c7a4007 ("docs, parallelism: Rearrange how jobserver reservations are made") Cc: stable@vger.kernel.org # v5.5 only Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2020-02-25docs: remove MPX from the x86 tocStephen Kitt1-1/+0
MPX was removed in commit 45fc24e89b7c ("x86/mpx: remove MPX from arch/x86"), this removes the corresponding entry in the x86 toc. This was suggested by a Sphinx warning. Signed-off-by: Stephen Kitt <steve@sk2.org> Fixes: 45fc24e89b7cc ("x86/mpx: remove MPX from arch/x86") Acked-by: Dave Hansen <dave.hansen@intel.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2020-02-25MAINTAINERS: Hand MIPS over to ThomasPaul Burton2-4/+7
My time with MIPS the company has reached its end, and so at best I'll have little time spend on maintaining arch/mips/. Ralf last authored a patch over 2 years ago, the last time he committed one is even further back & activity was sporadic for a while before that. The reality is that he isn't active. Having a new maintainer with time to do things properly will be beneficial all round. Thomas Bogendoerfer has been involved in MIPS development for a long time & has offered to step up as maintainer, so add Thomas and remove myself & Ralf from the MIPS entry. Ralf already has an entry in CREDITS to honor his contributions, so this just adds one for me. Signed-off-by: Paul Burton <paulburton@kernel.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-kernel@vger.kernel.org Cc: linux-mips@vger.kernel.org
2020-02-25Merge tag 'mac80211-for-net-2020-02-24' of ↵David S. Miller4-9/+6
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes Berg ==================== A few fixes: * remove a double mutex-unlock * fix a leak in an error path * NULL pointer check * include if_vlan.h where needed * avoid RCU list traversal when not under RCU ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-25audit: always check the netlink payload length in audit_receive_msg()Paul Moore1-19/+21
This patch ensures that we always check the netlink payload length in audit_receive_msg() before we take any action on the payload itself. Cc: stable@vger.kernel.org Reported-by: syzbot+399c44bf1f43b8747403@syzkaller.appspotmail.com Reported-by: syzbot+e4b12d8d202701f08b6d@syzkaller.appspotmail.com Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-02-25riscv: adjust the indentZong Li1-11/+15
Adjust the indent to match Linux coding style. Signed-off-by: Zong Li <zong.li@sifive.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2020-02-25riscv: allocate a complete page size for each page tableZong Li1-11/+16
Each page table should be created by allocating a complete page size for it. Otherwise, the content of the page table would be corrupted somewhere through memory allocation which allocates the memory at the middle of the page table for other use. Signed-off-by: Zong Li <zong.li@sifive.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2020-02-24Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds19-121/+284
Pull kvm fixes from Paolo Bonzini: "Bugfixes, including the fix for CVE-2020-2732 and a few issues found by 'make W=1'" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: s390: rstify new ioctls in api.rst KVM: nVMX: Check IO instruction VM-exit conditions KVM: nVMX: Refactor IO bitmap checks into helper function KVM: nVMX: Don't emulate instructions in guest mode KVM: nVMX: Emulate MTF when performing instruction emulation KVM: fix error handling in svm_hardware_setup KVM: SVM: Fix potential memory leak in svm_cpu_init() KVM: apic: avoid calculating pending eoi from an uninitialized val KVM: nVMX: clear PIN_BASED_POSTED_INTR from nested pinbased_ctls only when apicv is globally disabled KVM: nVMX: handle nested posted interrupts when apicv is disabled for L1 kvm: x86: svm: Fix NULL pointer dereference when AVIC not enabled KVM: VMX: Add VMX_FEATURE_USR_WAIT_PAUSE KVM: nVMX: Hold KVM's srcu lock when syncing vmcs12->shadow KVM: x86: don't notify userspace IOAPIC on edge-triggered interrupt EOI kvm/emulate: fix a -Werror=cast-function-type KVM: x86: fix incorrect comparison in trace event KVM: nVMX: Fix some obsolete comments and grammar error KVM: x86: fix missing prototypes KVM: x86: enable -Werror
2020-02-24Merge branch 'linus' of ↵Linus Torvalds2-1/+4
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto fixes from Herbert Xu: "This fixes a Kconfig-related build error and an integer overflow in chacha20poly1305" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: chacha20poly1305 - prevent integer overflow on large input tee: amdtee: amdtee depends on CRYPTO_DEV_CCP_DD
2020-02-24Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds1-2/+0
Pull tmpfs fix from Al Viro: "Regression from fs_parse series this cycle..." * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: tmpfs: deny and force are not huge mount options
2020-02-24floppy: check FDC index for errors before assigning itLinus Torvalds1-2/+5
Jordy Zomer reported a KASAN out-of-bounds read in the floppy driver in wait_til_ready(). Which on the face of it can't happen, since as Willy Tarreau points out, the function does no particular memory access. Except through the FDCS macro, which just indexes a static allocation through teh current fdc, which is always checked against N_FDC. Except the checking happens after we've already assigned the value. The floppy driver is a disgrace (a lot of it going back to my original horrd "design"), and has no real maintainer. Nobody has the hardware, and nobody really cares. But it still gets used in virtual environment because it's one of those things that everybody supports. The whole thing should be re-written, or at least parts of it should be seriously cleaned up. The 'current fdc' index, which is used by the FDCS macro, and which is often shadowed by a local 'fdc' variable, is a prime example of how not to write code. But because nobody has the hardware or the motivation, let's just fix up the immediate problem with a nasty band-aid: test the fdc index before actually assigning it to the static 'fdc' variable. Reported-by: Jordy Zomer <jordy@simplyhacker.com> Cc: Willy Tarreau <w@1wt.eu> Cc: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-02-24net: bridge: fix stale eth hdr pointer in br_dev_xmitNikolay Aleksandrov1-4/+2
In br_dev_xmit() we perform vlan filtering in br_allowed_ingress() but if the packet has the vlan header inside (e.g. bridge with disabled tx-vlan-offload) then the vlan filtering code will use skb_vlan_untag() to extract the vid before filtering which in turn calls pskb_may_pull() and we may end up with a stale eth pointer. Moreover the cached eth header pointer will generally be wrong after that operation. Remove the eth header caching and just use eth_hdr() directly, the compiler does the right thing and calculates it only once so we don't lose anything. Fixes: 057658cb33fb ("bridge: suppress arp pkts on BR_NEIGH_SUPPRESS ports") Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-24Merge branch 'net-ll_temac-Bugfixes'David S. Miller2-38/+175
Esben Haabendal says: ==================== net: ll_temac: Bugfixes Fix a number of bugs which have been present since the first commit. The bugs fixed in patch 1,2 and 4 have all been observed in real systems, and was relatively easy to reproduce given an appropriate stress setup. Changes since v1: - Changed error handling of of dma_map_single() in temac_start_xmit() to drop packet instead of returning NETDEV_TX_BUSY. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-24net: ll_temac: Handle DMA halt condition caused by buffer underrunEsben Haabendal2-5/+56
The SDMA engine used by TEMAC halts operation when it has finished processing of the last buffer descriptor in the buffer ring. Unfortunately, no interrupt event is generated when this happens, so we need to setup another mechanism to make sure DMA operation is restarted when enough buffers have been added to the ring. Fixes: 92744989533c ("net: add Xilinx ll_temac device driver") Signed-off-by: Esben Haabendal <esben@geanix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-24net: ll_temac: Fix RX buffer descriptor handling on GFP_ATOMIC pressureEsben Haabendal2-31/+82
Failures caused by GFP_ATOMIC memory pressure have been observed, and due to the missing error handling, results in kernel crash such as [1876998.350133] kernel BUG at mm/slub.c:3952! [1876998.350141] invalid opcode: 0000 [#1] PREEMPT SMP PTI [1876998.350147] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 5.3.0-scnxt #1 [1876998.350150] Hardware name: N/A N/A/COMe-bIP2, BIOS CCR2R920 03/01/2017 [1876998.350160] RIP: 0010:kfree+0x1ca/0x220 [1876998.350164] Code: 85 db 74 49 48 8b 95 68 01 00 00 48 31 c2 48 89 10 e9 d7 fe ff ff 49 8b 04 24 a9 00 00 01 00 75 0b 49 8b 44 24 08 a8 01 75 02 <0f> 0b 49 8b 04 24 31 f6 a9 00 00 01 00 74 06 41 0f b6 74 24 5b [1876998.350172] RSP: 0018:ffffc900000f0df0 EFLAGS: 00010246 [1876998.350177] RAX: ffffea00027f0708 RBX: ffff888008d78000 RCX: 0000000000391372 [1876998.350181] RDX: 0000000000000000 RSI: ffffe8ffffd01400 RDI: ffff888008d78000 [1876998.350185] RBP: ffff8881185a5d00 R08: ffffc90000087dd8 R09: 000000000000280a [1876998.350189] R10: 0000000000000002 R11: 0000000000000000 R12: ffffea0000235e00 [1876998.350193] R13: ffff8881185438a0 R14: 0000000000000000 R15: ffff888118543870 [1876998.350198] FS: 0000000000000000(0000) GS:ffff88811f300000(0000) knlGS:0000000000000000 [1876998.350203] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 s#1 Part1 [1876998.350206] CR2: 00007f8dac7b09f0 CR3: 000000011e20a006 CR4: 00000000001606e0 [1876998.350210] Call Trace: [1876998.350215] <IRQ> [1876998.350224] ? __netif_receive_skb_core+0x70a/0x920 [1876998.350229] kfree_skb+0x32/0xb0 [1876998.350234] __netif_receive_skb_core+0x70a/0x920 [1876998.350240] __netif_receive_skb_one_core+0x36/0x80 [1876998.350245] process_backlog+0x8b/0x150 [1876998.350250] net_rx_action+0xf7/0x340 [1876998.350255] __do_softirq+0x10f/0x353 [1876998.350262] irq_exit+0xb2/0xc0 [1876998.350265] do_IRQ+0x77/0xd0 [1876998.350271] common_interrupt+0xf/0xf [1876998.350274] </IRQ> In order to handle such failures more graceful, this change splits the receive loop into one for consuming the received buffers, and one for allocating new buffers. When GFP_ATOMIC allocations fail, the receive will continue with the buffers that is still there, and with the expectation that the allocations will succeed in a later call to receive. Fixes: 92744989533c ("net: add Xilinx ll_temac device driver") Signed-off-by: Esben Haabendal <esben@geanix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-24net: ll_temac: Add more error handling of dma_map_single() callsEsben Haabendal1-2/+24
This adds error handling to the remaining dma_map_single() calls, so that behavior is well defined if/when we run out of DMA memory. Fixes: 92744989533c ("net: add Xilinx ll_temac device driver") Signed-off-by: Esben Haabendal <esben@geanix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-24net: ll_temac: Fix race condition causing TX hangEsben Haabendal1-3/+16
It is possible that the interrupt handler fires and frees up space in the TX ring in between checking for sufficient TX ring space and stopping the TX queue in temac_start_xmit. If this happens, the queue wake from the interrupt handler will occur before the queue is stopped, causing a lost wakeup and the adapter's transmit hanging. To avoid this, after stopping the queue, check again whether there is sufficient space in the TX ring. If so, wake up the queue again. This is a port of the similar fix in axienet driver, commit 7de44285c1f6 ("net: axienet: Fix race condition causing TX hang"). Fixes: 23ecc4bde21f ("net: ll_temac: fix checksum offload logic") Signed-off-by: Esben Haabendal <esben@geanix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-24KVM: s390: rstify new ioctls in api.rstChristian Borntraeger1-15/+18
We also need to rstify the new ioctls that we added in parallel to the rstification of the kvm docs. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-24mac80211: rx: avoid RCU list traversal under mutexMadhuparna Bhowmik1-1/+1
local->sta_mtx is held in __ieee80211_check_fast_rx_iface(). No need to use list_for_each_entry_rcu() as it also requires a cond argument to avoid false lockdep warnings when not used in RCU read-side section (with CONFIG_PROVE_RCU_LIST). Therefore use list_for_each_entry(); Signed-off-by: Madhuparna Bhowmik <madhuparnabhowmik10@gmail.com> Link: https://lore.kernel.org/r/20200223143302.15390-1-madhuparnabhowmik10@gmail.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-02-24nl80211: explicitly include if_vlan.hJohannes Berg1-0/+1
We use that here, and do seem to get it through some recursive include, but better include it explicitly. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Link: https://lore.kernel.org/r/20200224093814.1b9c258fec67.I45ac150d4e11c72eb263abec9f1f0c7add9bef2b@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-02-24net: core: devlink.c: Hold devlink->lock from the beginning of ↵Madhuparna Bhowmik1-7/+14
devlink_dpipe_table_register() devlink_dpipe_table_find() should be called under either rcu_read_lock() or devlink->lock. devlink_dpipe_table_register() calls devlink_dpipe_table_find() without holding the lock and acquires it later. Therefore hold the devlink->lock from the beginning of devlink_dpipe_table_register(). Suggested-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Madhuparna Bhowmik <madhuparnabhowmik10@gmail.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-24net: phy: Avoid multiple suspendsFlorian Fainelli1-2/+3
It is currently possible for a PHY device to be suspended as part of a network device driver's suspend call while it is still being attached to that net_device, either via phy_suspend() or implicitly via phy_stop(). Later on, when the MDIO bus controller get suspended, we would attempt to suspend again the PHY because it is still attached to a network device. This is both a waste of time and creates an opportunity for improper clock/power management bugs to creep in. Fixes: 803dd9c77ac3 ("net: phy: avoid suspending twice a PHY") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-24net: ks8851-ml: Fix IRQ handling and lockingMarek Vasut1-6/+8
The KS8851 requires that packet RX and TX are mutually exclusive. Currently, the driver hopes to achieve this by disabling interrupt from the card by writing the card registers and by disabling the interrupt on the interrupt controller. This however is racy on SMP. Replace this approach by expanding the spinlock used around the ks_start_xmit() TX path to ks_irq() RX path to assure true mutual exclusion and remove the interrupt enabling/disabling, which is now not needed anymore. Furthermore, disable interrupts also in ks_net_stop(), which was missing before. Note that a massive improvement here would be to re-use the KS8851 driver approach, which is to move the TX path into a worker thread, interrupt handling to threaded interrupt, and synchronize everything with mutexes, but that would be a much bigger rework, for a separate patch. Signed-off-by: Marek Vasut <marex@denx.de> Cc: David S. Miller <davem@davemloft.net> Cc: Lukas Wunner <lukas@wunner.de> Cc: Petr Stetiar <ynezz@true.cz> Cc: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-24docs: networking: phy: Rephrase paragraph for clarityJonathan Neuschäfer1-2/+3
Let's make it a little easier to read. Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-24tcp: fix TFO SYNACK undo to avoid double-timestamp-undoNeal Cardwell1-1/+5
In a rare corner case the new logic for undo of SYNACK RTO could result in triggering the warning in tcp_fastretrans_alert() that says: WARN_ON(tp->retrans_out != 0); The warning looked like: WARNING: CPU: 1 PID: 1 at net/ipv4/tcp_input.c:2818 tcp_ack+0x13e0/0x3270 The sequence that tickles this bug is: - Fast Open server receives TFO SYN with data, sends SYNACK - (client receives SYNACK and sends ACK, but ACK is lost) - server app sends some data packets - (N of the first data packets are lost) - server receives client ACK that has a TS ECR matching first SYNACK, and also SACKs suggesting the first N data packets were lost - server performs TS undo of SYNACK RTO, then immediately enters recovery - buggy behavior then performed a *second* undo that caused the connection to be in CA_Open with retrans_out != 0 Basically, the incoming ACK packet with SACK blocks causes us to first undo the cwnd reduction from the SYNACK RTO, but then immediately enters fast recovery, which then makes us eligible for undo again. And then tcp_rcv_synrecv_state_fastopen() accidentally performs an undo using a "mash-up" of state from two different loss recovery phases: it uses the timestamp info from the ACK of the original SYNACK, and the undo_marker from the fast recovery. This fix refines the logic to only invoke the tcp_try_undo_loss() inside tcp_rcv_synrecv_state_fastopen() if the connection is still in CA_Loss. If peer SACKs triggered fast recovery, then tcp_rcv_synrecv_state_fastopen() can't safely undo. Fixes: 794200d66273 ("tcp: undo cwnd on Fast Open spurious SYNACK retransmit") Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-24hv_netvsc: Fix unwanted wakeup in netvsc_attach()Haiyang Zhang2-1/+4
When netvsc_attach() is called by operations like changing MTU, etc., an extra wakeup may happen while netvsc_attach() calling rndis_filter_device_add() which sends rndis messages when queue is stopped in netvsc_detach(). The completion message will wake up queue 0. We can reproduce the issue by changing MTU etc., then the wake_queue counter from "ethtool -S" will increase beyond stop_queue counter: stop_queue: 0 wake_queue: 1 The issue causes queue wake up, and counter increment, no other ill effects in current code. So we didn't see any network problem for now. To fix this, initialize tx_disable to true, and set it to false when the NIC is ready to be attached or registered. Fixes: 7b2ee50c0cd5 ("hv_netvsc: common detach logic") Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-24Linux 5.6-rc3v5.6-rc3Linus Torvalds1-1/+1
2020-02-24net: usb: qmi_wwan: restore mtu min/max values after raw_ip switchDaniele Palmas1-0/+3
usbnet creates network interfaces with min_mtu = 0 and max_mtu = ETH_MAX_MTU. These values are not modified by qmi_wwan when the network interface is created initially, allowing, for example, to set mtu greater than 1500. When a raw_ip switch is done (raw_ip set to 'Y', then set to 'N') the mtu values for the network interface are set through ether_setup, with min_mtu = ETH_MIN_MTU and max_mtu = ETH_DATA_LEN, not allowing anymore to set mtu greater than 1500 (error: mtu greater than device maximum). The patch restores the original min/max mtu values set by usbnet after a raw_ip switch. Signed-off-by: Daniele Palmas <dnlplm@gmail.com> Acked-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-23Merge tag 'for-5.6-rc2-tag' of ↵Linus Torvalds7-10/+44
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "These are fixes that were found during testing with help of error injection, plus some other stable material. There's a fixup to patch added to rc1 causing locking in wrong context warnings, tests found one more deadlock scenario. The patches are tagged for stable, two of them now in the queue but we'd like all three released at the same time. I'm not happy about fixes to fixes in such a fast succession during rcs, but I hope we found all the fallouts of commit 28553fa992cb ('Btrfs: fix race between shrinking truncate and fiemap')" * tag 'for-5.6-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: Btrfs: fix deadlock during fast fsync when logging prealloc extents beyond eof Btrfs: fix btrfs_wait_ordered_range() so that it waits for all ordered extents btrfs: fix bytes_may_use underflow in prealloc error condtition btrfs: handle logged extent failure properly btrfs: do not check delayed items are empty for single transaction cleanup btrfs: reset fs_root to NULL on error in open_ctree btrfs: destroy qgroup extent records on transaction abort
2020-02-23Merge tag 'ext4_for_linus_stable' of ↵Linus Torvalds10-108/+256
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 fixes from Ted Ts'o: "More miscellaneous ext4 bug fixes (all stable fodder)" * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: fix mount failure with quota configured as module jbd2: fix ocfs2 corrupt when clearing block group bits ext4: fix race between writepages and enabling EXT4_EXTENTS_FL ext4: rename s_journal_flag_rwsem to s_writepages_rwsem ext4: fix potential race between s_flex_groups online resizing and access ext4: fix potential race between s_group_info online resizing and access ext4: fix potential race between online resizing and write operations ext4: add cond_resched() to __ext4_find_entry() ext4: fix a data race in EXT4_I(inode)->i_disksize
2020-02-23Merge tag 'csky-for-linus-5.6-rc3' of git://github.com/c-sky/csky-linuxLinus Torvalds34-164/+663
Pull csky updates from Guo Ren: "Sorry, I missed 5.6-rc1 merge window, but in this pull request the most are the fixes and the rests are between fixes and features. The only outside modification is the MAINTAINERS file update with our mailing list. - cache flush implementation fixes - ftrace modify panic fix - CONFIG_SMP boot problem fix - fix pt_regs saving for atomic.S - fix fixaddr_init without highmem. - fix stack protector support - fix fake Tightly-Coupled Memory code compile and use - fix some typos and coding convention" * tag 'csky-for-linus-5.6-rc3' of git://github.com/c-sky/csky-linux: (23 commits) csky: Replace <linux/clk-provider.h> by <linux/of_clk.h> csky: Implement copy_thread_tls csky: Add PCI support csky: Minimize defconfig to support buildroot config.fragment csky: Add setup_initrd check code csky: Cleanup old Kconfig options arch/csky: fix some Kconfig typos csky: Fixup compile warning for three unimplemented syscalls csky: Remove unused cache implementation csky: Fixup ftrace modify panic csky: Add flush_icache_mm to defer flush icache all csky: Optimize abiv2 copy_to_user_page with VM_EXEC csky: Enable defer flush_dcache_page for abiv2 cpus (807/810/860) csky: Remove unnecessary flush_icache_* implementation csky: Support icache flush without specific instructions csky/Kconfig: Add Kconfig.platforms to support some drivers csky/smp: Fixup boot failed when CONFIG_SMP csky: Set regs->usp to kernel sp, when the exception is from kernel csky/mm: Fixup export invalid_pte_table symbol csky: Separate fixaddr_init from highmem ...
2020-02-23KVM: nVMX: Check IO instruction VM-exit conditionsOliver Upton2-7/+52
Consult the 'unconditional IO exiting' and 'use IO bitmaps' VM-execution controls when checking instruction interception. If the 'use IO bitmaps' VM-execution control is 1, check the instruction access against the IO bitmaps to determine if the instruction causes a VM-exit. Signed-off-by: Oliver Upton <oupton@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-23KVM: nVMX: Refactor IO bitmap checks into helper functionOliver Upton2-14/+27
Checks against the IO bitmap are useful for both instruction emulation and VM-exit reflection. Refactor the IO bitmap checks into a helper function. Signed-off-by: Oliver Upton <oupton@google.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-23KVM: nVMX: Don't emulate instructions in guest modePaolo Bonzini1-1/+1
vmx_check_intercept is not yet fully implemented. To avoid emulating instructions disallowed by the L1 hypervisor, refuse to emulate instructions by default. Cc: stable@vger.kernel.org [Made commit, added commit msg - Oliver] Signed-off-by: Oliver Upton <oupton@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-23KVM: nVMX: Emulate MTF when performing instruction emulationOliver Upton8-2/+83
Since commit 5f3d45e7f282 ("kvm/x86: add support for MONITOR_TRAP_FLAG"), KVM has allowed an L1 guest to use the monitor trap flag processor-based execution control for its L2 guest. KVM simply forwards any MTF VM-exits to the L1 guest, which works for normal instruction execution. However, when KVM needs to emulate an instruction on the behalf of an L2 guest, the monitor trap flag is not emulated. Add the necessary logic to kvm_skip_emulated_instruction() to synthesize an MTF VM-exit to L1 upon instruction emulation for L2. Fixes: 5f3d45e7f282 ("kvm/x86: add support for MONITOR_TRAP_FLAG") Signed-off-by: Oliver Upton <oupton@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-23KVM: fix error handling in svm_hardware_setupLi RongQing1-21/+20
rename svm_hardware_unsetup as svm_hardware_teardown, move it before svm_hardware_setup, and call it to free all memory if fail to setup in svm_hardware_setup, otherwise memory will be leaked remove __exit attribute for it since it is called in __init function Signed-off-by: Li RongQing <lirongqing@baidu.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-23net: genetlink: return the error code when attribute parsing fails.Paolo Abeni1-2/+3
Currently if attribute parsing fails and the genl family does not support parallel operation, the error code returned by __nlmsg_parse() is discarded by genl_family_rcv_msg_attrs_parse(). Be sure to report the error for all genl families. Fixes: c10e6cf85e7d ("net: genetlink: push attrbuf allocation and parsing to a separate function") Fixes: ab5b526da048 ("net: genetlink: always allocate separate attrs for dumpit ops") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-23ipv4: ensure rcu_read_lock() in cipso_v4_error()Matteo Croce1-1/+6
Similarly to commit c543cb4a5f07 ("ipv4: ensure rcu_read_lock() in ipv4_link_failure()"), __ip_options_compile() must be called under rcu protection. Fixes: 3da1ed7ac398 ("net: avoid use IPCB in cipso_v4_error") Suggested-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Matteo Croce <mcroce@redhat.com> Acked-by: Paul Moore <paul@paul-moore.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-23vhost: Check docket sk_family instead of call getnameEugenio Pérez1-9/+1
Doing so, we save one call to get data we already have in the struct. Also, since there is no guarantee that getname use sockaddr_ll parameter beyond its size, we add a little bit of security here. It should do not do beyond MAX_ADDR_LEN, but syzbot found that ax25_getname writes more (72 bytes, the size of full_sockaddr_ax25, versus 20 + 32 bytes of sockaddr_ll + MAX_ADDR_LEN in syzbot repro). Fixes: 3a4d5c94e9593 ("vhost_net: a kernel-level virtio server") Reported-by: syzbot+f2a62d07a5198c819c7b@syzkaller.appspotmail.com Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-23csky: Replace <linux/clk-provider.h> by <linux/of_clk.h>Geert Uytterhoeven1-1/+1
The C-Sky platform code is not a clock provider, and just needs to call of_clk_init(). Hence it can include <linux/of_clk.h> instead of <linux/clk-provider.h>. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
2020-02-23Merge tag 'ras-urgent-2020-02-22' of ↵Linus Torvalds1-23/+27
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RAS fixes from Thomas Gleixner: "Two fixes for the AMD MCE driver: - Populate the per CPU MCA bank descriptor pointer only after it has been completely set up to prevent a use-after-free in case that one of the subsequent initialization step fails - Implement a proper release function for the sysfs entries of MCA threshold controls instead of freeing the memory right in the CPU teardown code, which leads to another use-after-free when the associated sysfs file is opened and accessed" * tag 'ras-urgent-2020-02-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mce/amd: Fix kobject lifetime x86/mce/amd: Publish the bank pointer only after setup has succeeded
2020-02-23audit: fix error handling in audit_data_to_entry()Paul Moore1-32/+39
Commit 219ca39427bf ("audit: use union for audit_field values since they are mutually exclusive") combined a number of separate fields in the audit_field struct into a single union. Generally this worked just fine because they are generally mutually exclusive. Unfortunately in audit_data_to_entry() the overlap can be a problem when a specific error case is triggered that causes the error path code to attempt to cleanup an audit_field struct and the cleanup involves attempting to free a stored LSM string (the lsm_str field). Currently the code always has a non-NULL value in the audit_field.lsm_str field as the top of the for-loop transfers a value into audit_field.val (both .lsm_str and .val are part of the same union); if audit_data_to_entry() fails and the audit_field struct is specified to contain a LSM string, but the audit_field.lsm_str has not yet been properly set, the error handling code will attempt to free the bogus audit_field.lsm_str value that was set with audit_field.val at the top of the for-loop. This patch corrects this by ensuring that the audit_field.val is only set when needed (it is cleared when the audit_field struct is allocated with kcalloc()). It also corrects a few other issues to ensure that in case of error the proper error code is returned. Cc: stable@vger.kernel.org Fixes: 219ca39427bf ("audit: use union for audit_field values since they are mutually exclusive") Reported-by: syzbot+1f4d90ead370d72e450b@syzkaller.appspotmail.com Signed-off-by: Paul Moore <paul@paul-moore.com>