summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2017-04-26svcrdma: Add helper to save pages under I/OChuck Lever1-13/+18
Clean up: extract the logic to save pages under I/O into a helper to add a big documenting comment without adding clutter in the send path. This is a refactoring change only. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-04-26svcrdma: Eliminate RPCRDMA_SQ_DEPTH_MULTChuck Lever3-4/+1
The Send Queue depth is temporarily reduced to 1 SQE per credit. The new rdma_rw API does an internal computation, during QP creation, to increase the depth of the Send Queue to handle RDMA Read and Write operations. This change has to come before the NFSD code paths are updated to use the rdma_rw API. Without this patch, rdma_rw_init_qp() increases the size of the SQ too much, resulting in memory allocation failures during QP creation. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-04-26svcrdma: Add svc_rdma_map_reply_hdr()Chuck Lever3-38/+62
Introduce a helper to DMA-map a reply's transport header before sending it. This will in part replace the map vector cache. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-04-26svcrdma: Move send_wr to svc_rdma_op_ctxtChuck Lever3-35/+44
Clean up: Move the ib_send_wr off the stack, and move common code to post a Send Work Request into a helper. This is a refactoring change only. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-04-26NFS: don't try to cross a mountpount when there isn't one there.NeilBrown1-4/+20
consider the sequence of commands: mkdir -p /import/nfs /import/bind /import/etc mount --bind / /import/bind mount --make-private /import/bind mount --bind /import/etc /import/bind/etc exportfs -o rw,no_root_squash,crossmnt,async,no_subtree_check localhost:/ mount -o vers=4 localhost:/ /import/nfs ls -l /import/nfs/etc You would not expect this to report a stale file handle. Yet it does. The manipulations under /import/bind cause the dentry for /etc to get the DCACHE_MOUNTED flag set, even though nothing is mounted on /etc. This causes nfsd to call nfsd_cross_mnt() even though there is no mountpoint. So an upcall to mountd for "/etc" is performed. The 'crossmnt' flag on the export of / causes mountd to report that /etc is exported as it is a descendant of /. It assumes the kernel wouldn't ask about something that wasn't a mountpoint. The filehandle returned identifies the filesystem and the inode number of /etc. When this filehandle is presented to rpc.mountd, via "nfsd.fh", the inode cannot be found associated with any name in /etc/exports, or with any mountpoint listed by getmntent(). So rpc.mountd says the filehandle doesn't exist. Hence ESTALE. This is fixed by teaching nfsd not to trust DCACHE_MOUNTED too much. It is just a hint, not a guarantee. Change nfsd_mountpoint() to return '1' for a certain mountpoint, '2' for a possible mountpoint, and 0 otherwise. Then change nfsd_crossmnt() to check if follow_down() actually found a mountpount and, if not, to avoid performing a lookup if the location is not known to certainly require an export-point. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-04-26nfsd4: remove pointless strdup_if_nonnullNeilBrown1-19/+6
kstrdup() already checks for NULL. (Brought to our attention by Jason Yann noticing (from sparse output) that it should have been declared static.) Signed-off-by: NeilBrown <neilb@suse.com> Reported-by: Jason Yan <yanaijie@huawei.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-04-26uapi: fix linux/nfsd/cld.h userspace compilation errorsDmitry V. Levin1-6/+8
Include <linux/types.h> and consistently use types it provides to fix the following linux/nfsd/cld.h userspace compilation errors: /usr/include/linux/nfsd/cld.h:40:2: error: unknown type name 'uint16_t' uint16_t cn_len; /* length of cm_id */ /usr/include/linux/nfsd/cld.h:46:2: error: unknown type name 'uint8_t' uint8_t cm_vers; /* upcall version */ /usr/include/linux/nfsd/cld.h:47:2: error: unknown type name 'uint8_t' uint8_t cm_cmd; /* upcall command */ /usr/include/linux/nfsd/cld.h:48:2: error: unknown type name 'int16_t' int16_t cm_status; /* return code */ /usr/include/linux/nfsd/cld.h:49:2: error: unknown type name 'uint32_t' uint32_t cm_xid; /* transaction id */ /usr/include/linux/nfsd/cld.h:51:3: error: unknown type name 'int64_t' int64_t cm_gracetime; /* grace period start time */ Signed-off-by: Dmitry V. Levin <ldv@altlinux.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-04-26nfsd: check for oversized NFSv2/v3 argumentsJ. Bruce Fields3-11/+28
A client can append random data to the end of an NFSv2 or NFSv3 RPC call without our complaining; we'll just stop parsing at the end of the expected data and ignore the rest. Encoded arguments and replies are stored together in an array of pages, and if a call is too large it could leave inadequate space for the reply. This is normally OK because NFS RPC's typically have either short arguments and long replies (like READ) or long arguments and short replies (like WRITE). But a client that sends an incorrectly long reply can violate those assumptions. This was observed to cause crashes. So, insist that the argument not be any longer than we expect. Also, several operations increment rq_next_page in the decode routine before checking the argument size, which can leave rq_next_page pointing well past the end of the page array, causing trouble later in svc_free_pages. As followup we may also want to rewrite the encoding routines to check more carefully that they aren't running off the end of the page array. Reported-by: Tuomas Haanpää <thaan@synopsys.com> Reported-by: Ari Kauppi <ari@synopsys.com> Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-04-25nfsd: stricter decoding of write-like NFSv2/v3 opsJ. Bruce Fields2-0/+6
The NFSv2/v3 code does not systematically check whether we decode past the end of the buffer. This generally appears to be harmless, but there are a few places where we do arithmetic on the pointers involved and don't account for the possibility that a length could be negative. Add checks to catch these. Reported-by: Tuomas Haanpää <thaan@synopsys.com> Reported-by: Ari Kauppi <ari@synopsys.com> Reviewed-by: NeilBrown <neilb@suse.com> Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-04-25nfsd4: minor NFSv2/v3 write decoding cleanupJ. Bruce Fields2-8/+9
Use a couple shortcuts that will simplify a following bugfix. Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-04-25nfsd: check for oversized NFSv2/v3 argumentsJ. Bruce Fields1-0/+36
A client can append random data to the end of an NFSv2 or NFSv3 RPC call without our complaining; we'll just stop parsing at the end of the expected data and ignore the rest. Encoded arguments and replies are stored together in an array of pages, and if a call is too large it could leave inadequate space for the reply. This is normally OK because NFS RPC's typically have either short arguments and long replies (like READ) or long arguments and short replies (like WRITE). But a client that sends an incorrectly long reply can violate those assumptions. This was observed to cause crashes. Also, several operations increment rq_next_page in the decode routine before checking the argument size, which can leave rq_next_page pointing well past the end of the page array, causing trouble later in svc_free_pages. So, following a suggestion from Neil Brown, add a central check to enforce our expectation that no NFSv2/v3 call has both a large call and a large reply. As followup we may also want to rewrite the encoding routines to check more carefully that they aren't running off the end of the page array. We may also consider rejecting calls that have any extra garbage appended. That would be safer, and within our rights by spec, but given the age of our server and the NFS protocol, and the fact that we've never enforced this before, we may need to balance that against the possibility of breaking some oddball client. Reported-by: Tuomas Haanpää <thaan@synopsys.com> Reported-by: Ari Kauppi <ari@synopsys.com> Cc: stable@vger.kernel.org Reviewed-by: NeilBrown <neilb@suse.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-04-24Linux 4.11-rc8v4.11-rc8Linus Torvalds1-1/+1
2017-04-24Merge tag 'upstream-4.11-rc7' of git://git.infradead.org/linux-ubifsLinus Torvalds3-13/+23
Pull UBI/UBIFS fixes from Richard Weinberger: "This contains fixes for issues in both UBI and UBIFS: - more O_TMPFILE fallout - RENAME_WHITEOUT regression due to a mis-merge - memory leak in ubifs_mknod() - power-cut problem in UBI's update volume feature" * tag 'upstream-4.11-rc7' of git://git.infradead.org/linux-ubifs: ubifs: Fix O_TMPFILE corner case in ubifs_link() ubifs: Fix RENAME_WHITEOUT support ubifs: Fix debug messages for an invalid filename in ubifs_dump_inode ubifs: Fix debug messages for an invalid filename in ubifs_dump_node ubifs: Remove filename from debug messages in ubifs_readdir ubifs: Fix memory leak in error path in ubifs_mknod ubi/upd: Always flush after prepared for an update
2017-04-23Merge branch 'ras-urgent-for-linus' of ↵Linus Torvalds3-16/+5
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RAS fix from Thomas Gleixner: "The MCE atomic notifier callchain invokes callbacks which might sleep. Convert it to a blocking notifier and prevent calls from atomic context" * 'ras-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mce: Make the MCE notifier a blocking one
2017-04-23Merge branch 'irq-urgent-for-linus' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fix from Thomas Gleixner: "The (hopefully) final fix for the irq affinity spreading logic" * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: genirq/affinity: Fix calculating vectors to assign
2017-04-22Merge tag 'nfsd-4.11-2' of git://linux-nfs.org/~bfields/linuxLinus Torvalds1-1/+1
Pull nfsd bugfix from Bruce Fields: "Fix a 4.11 regression that triggers a BUG() on an attempt to use an unsupported NFSv4 compound op" * tag 'nfsd-4.11-2' of git://linux-nfs.org/~bfields/linux: nfsd: fix oops on unsupported operation
2017-04-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds25-210/+345
Pull networking fixes from David Miller: 1) Don't race in IPSEC dumps, from Yuejie Shi. 2) Verify lengths properly in IPSEC reqeusts, from Herbert Xu. 3) Fix out of bounds access in ipv6 segment routing code, from David Lebrun. 4) Don't write into the header of cloned SKBs in smsc95xx driver, from James Hughes. 5) Several other drivers have this bug too, fix them. From Eric Dumazet. 6) Fix access to uninitialized data in TC action cookie code, from Wolfgang Bumiller. 7) Fix double free in IPV6 segment routing, again from David Lebrun. 8) Don't let userspace set the RTF_PCPU flag, oops. From David Ahern. 9) Fix use after free in qrtr code, from Dan Carpenter. 10) Don't double-destroy devices in ip6mr code, from Nikolay Aleksandrov. 11) Don't pass out-of-range TX queue indices into drivers, from Tushar Dave. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (30 commits) netpoll: Check for skb->queue_mapping ip6mr: fix notification device destruction bpf, doc: update bpf maintainers entry net: qrtr: potential use after free in qrtr_sendmsg() bpf: Fix values type used in test_maps net: ipv6: RTF_PCPU should not be settable from userspace gso: Validate assumption of frag_list segementation kaweth: use skb_cow_head() to deal with cloned skbs ch9200: use skb_cow_head() to deal with cloned skbs lan78xx: use skb_cow_head() to deal with cloned skbs sr9700: use skb_cow_head() to deal with cloned skbs cx82310_eth: use skb_cow_head() to deal with cloned skbs smsc75xx: use skb_cow_head() to deal with cloned skbs ipv6: sr: fix double free of skb after handling invalid SRH MAINTAINERS: Add "B:" field for networking. net sched actions: allocate act cookie early qed: Fix issue in populating the PFC config paramters. qed: Fix possible system hang in the dcbnl-getdcbx() path. qed: Fix sending an invalid PFC error mask to MFW. qed: Fix possible error in populating max_tc field. ...
2017-04-21netpoll: Check for skb->queue_mappingTushar Dave1-2/+8
Reducing real_num_tx_queues needs to be in sync with skb queue_mapping otherwise skbs with queue_mapping greater than real_num_tx_queues can be sent to the underlying driver and can result in kernel panic. One such event is running netconsole and enabling VF on the same device. Or running netconsole and changing number of tx queues via ethtool on same device. e.g. Unable to handle kernel NULL pointer dereference tsk->{mm,active_mm}->context = 0000000000001525 tsk->{mm,active_mm}->pgd = fff800130ff9a000 \|/ ____ \|/ "@'/ .. \`@" /_| \__/ |_\ \__U_/ kworker/48:1(475): Oops [#1] CPU: 48 PID: 475 Comm: kworker/48:1 Tainted: G OE 4.11.0-rc3-davem-net+ #7 Workqueue: events queue_process task: fff80013113299c0 task.stack: fff800131132c000 TSTATE: 0000004480e01600 TPC: 00000000103f9e3c TNPC: 00000000103f9e40 Y: 00000000 Tainted: G OE TPC: <ixgbe_xmit_frame_ring+0x7c/0x6c0 [ixgbe]> g0: 0000000000000000 g1: 0000000000003fff g2: 0000000000000000 g3: 0000000000000001 g4: fff80013113299c0 g5: fff8001fa6808000 g6: fff800131132c000 g7: 00000000000000c0 o0: fff8001fa760c460 o1: fff8001311329a50 o2: fff8001fa7607504 o3: 0000000000000003 o4: fff8001f96e63a40 o5: fff8001311d77ec0 sp: fff800131132f0e1 ret_pc: 000000000049ed94 RPC: <set_next_entity+0x34/0xb80> l0: 0000000000000000 l1: 0000000000000800 l2: 0000000000000000 l3: 0000000000000000 l4: 000b2aa30e34b10d l5: 0000000000000000 l6: 0000000000000000 l7: fff8001fa7605028 i0: fff80013111a8a00 i1: fff80013155a0780 i2: 0000000000000000 i3: 0000000000000000 i4: 0000000000000000 i5: 0000000000100000 i6: fff800131132f1a1 i7: 00000000103fa4b0 I7: <ixgbe_xmit_frame+0x30/0xa0 [ixgbe]> Call Trace: [00000000103fa4b0] ixgbe_xmit_frame+0x30/0xa0 [ixgbe] [0000000000998c74] netpoll_start_xmit+0xf4/0x200 [0000000000998e10] queue_process+0x90/0x160 [0000000000485fa8] process_one_work+0x188/0x480 [0000000000486410] worker_thread+0x170/0x4c0 [000000000048c6b8] kthread+0xd8/0x120 [0000000000406064] ret_from_fork+0x1c/0x2c [0000000000000000] (null) Disabling lock debugging due to kernel taint Caller[00000000103fa4b0]: ixgbe_xmit_frame+0x30/0xa0 [ixgbe] Caller[0000000000998c74]: netpoll_start_xmit+0xf4/0x200 Caller[0000000000998e10]: queue_process+0x90/0x160 Caller[0000000000485fa8]: process_one_work+0x188/0x480 Caller[0000000000486410]: worker_thread+0x170/0x4c0 Caller[000000000048c6b8]: kthread+0xd8/0x120 Caller[0000000000406064]: ret_from_fork+0x1c/0x2c Caller[0000000000000000]: (null) Signed-off-by: Tushar Dave <tushar.n.dave@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-21ip6mr: fix notification device destructionNikolay Aleksandrov1-7/+6
Andrey Konovalov reported a BUG caused by the ip6mr code which is caused because we call unregister_netdevice_many for a device that is already being destroyed. In IPv4's ipmr that has been resolved by two commits long time ago by introducing the "notify" parameter to the delete function and avoiding the unregister when called from a notifier, so let's do the same for ip6mr. The trace from Andrey: ------------[ cut here ]------------ kernel BUG at net/core/dev.c:6813! invalid opcode: 0000 [#1] SMP KASAN Modules linked in: CPU: 1 PID: 1165 Comm: kworker/u4:3 Not tainted 4.11.0-rc7+ #251 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Workqueue: netns cleanup_net task: ffff880069208000 task.stack: ffff8800692d8000 RIP: 0010:rollback_registered_many+0x348/0xeb0 net/core/dev.c:6813 RSP: 0018:ffff8800692de7f0 EFLAGS: 00010297 RAX: ffff880069208000 RBX: 0000000000000002 RCX: 0000000000000001 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88006af90569 RBP: ffff8800692de9f0 R08: ffff8800692dec60 R09: 0000000000000000 R10: 0000000000000006 R11: 0000000000000000 R12: ffff88006af90070 R13: ffff8800692debf0 R14: dffffc0000000000 R15: ffff88006af90000 FS: 0000000000000000(0000) GS:ffff88006cb00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fe7e897d870 CR3: 00000000657e7000 CR4: 00000000000006e0 Call Trace: unregister_netdevice_many.part.105+0x87/0x440 net/core/dev.c:7881 unregister_netdevice_many+0xc8/0x120 net/core/dev.c:7880 ip6mr_device_event+0x362/0x3f0 net/ipv6/ip6mr.c:1346 notifier_call_chain+0x145/0x2f0 kernel/notifier.c:93 __raw_notifier_call_chain kernel/notifier.c:394 raw_notifier_call_chain+0x2d/0x40 kernel/notifier.c:401 call_netdevice_notifiers_info+0x51/0x90 net/core/dev.c:1647 call_netdevice_notifiers net/core/dev.c:1663 rollback_registered_many+0x919/0xeb0 net/core/dev.c:6841 unregister_netdevice_many.part.105+0x87/0x440 net/core/dev.c:7881 unregister_netdevice_many net/core/dev.c:7880 default_device_exit_batch+0x4fa/0x640 net/core/dev.c:8333 ops_exit_list.isra.4+0x100/0x150 net/core/net_namespace.c:144 cleanup_net+0x5a8/0xb40 net/core/net_namespace.c:463 process_one_work+0xc04/0x1c10 kernel/workqueue.c:2097 worker_thread+0x223/0x19c0 kernel/workqueue.c:2231 kthread+0x35e/0x430 kernel/kthread.c:231 ret_from_fork+0x31/0x40 arch/x86/entry/entry_64.S:430 Code: 3c 32 00 0f 85 70 0b 00 00 48 b8 00 02 00 00 00 00 ad de 49 89 47 78 e9 93 fe ff ff 49 8d 57 70 49 8d 5f 78 eb 9e e8 88 7a 14 fe <0f> 0b 48 8b 9d 28 fe ff ff e8 7a 7a 14 fe 48 b8 00 00 00 00 00 RIP: rollback_registered_many+0x348/0xeb0 RSP: ffff8800692de7f0 ---[ end trace e0b29c57e9b3292c ]--- Reported-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Tested-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-21bpf, doc: update bpf maintainers entryDaniel Borkmann1-1/+15
Add various related files that have been missing under BPF entry covering essential parts of its infrastructure and also add myself as co-maintainer. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-21net: qrtr: potential use after free in qrtr_sendmsg()Dan Carpenter1-1/+3
If skb_pad() fails then it frees the skb so we should check for errors. Fixes: bdabad3e363d ("net: Add Qualcomm IPC router") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-21bpf: Fix values type used in test_mapsDavid Miller1-2/+2
Maps of per-cpu type have their value element size adjusted to 8 if it is specified smaller during various map operations. This makes test_maps as a 32-bit binary fail, in fact the kernel writes past the end of the value's array on the user's stack. To be quite honest, I think the kernel should reject creation of a per-cpu map that doesn't have a value size of at least 8 if that's what the kernel is going to silently adjust to later. If the user passed something smaller, it is a sizeof() calcualtion based upon the type they will actually use (just like in this testcase code) in later calls to the map operations. Fixes: df570f577231 ("samples/bpf: unit test for BPF_MAP_TYPE_PERCPU_ARRAY") Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org>
2017-04-21net: ipv6: RTF_PCPU should not be settable from userspaceDavid Ahern2-1/+5
Andrey reported a fault in the IPv6 route code: kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] SMP KASAN Modules linked in: CPU: 1 PID: 4035 Comm: a.out Not tainted 4.11.0-rc7+ #250 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 task: ffff880069809600 task.stack: ffff880062dc8000 RIP: 0010:ip6_rt_cache_alloc+0xa6/0x560 net/ipv6/route.c:975 RSP: 0018:ffff880062dced30 EFLAGS: 00010206 RAX: dffffc0000000000 RBX: ffff8800670561c0 RCX: 0000000000000006 RDX: 0000000000000003 RSI: ffff880062dcfb28 RDI: 0000000000000018 RBP: ffff880062dced68 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: ffff880062dcfb28 R14: dffffc0000000000 R15: 0000000000000000 FS: 00007feebe37e7c0(0000) GS:ffff88006cb00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000205a0fe4 CR3: 000000006b5c9000 CR4: 00000000000006e0 Call Trace: ip6_pol_route+0x1512/0x1f20 net/ipv6/route.c:1128 ip6_pol_route_output+0x4c/0x60 net/ipv6/route.c:1212 ... Andrey's syzkaller program passes rtmsg.rtmsg_flags with the RTF_PCPU bit set. Flags passed to the kernel are blindly copied to the allocated rt6_info by ip6_route_info_create making a newly inserted route appear as though it is a per-cpu route. ip6_rt_cache_alloc sees the flag set and expects rt->dst.from to be set - which it is not since it is not really a per-cpu copy. The subsequent call to __ip6_dst_alloc then generates the fault. Fix by checking for the flag and failing with EINVAL. Fixes: d52d3997f843f ("ipv6: Create percpu rt6_info") Reported-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Tested-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-21gso: Validate assumption of frag_list segementationIlan Tayari1-4/+14
Commit 07b26c9454a2 ("gso: Support partial splitting at the frag_list pointer") assumes that all SKBs in a frag_list (except maybe the last one) contain the same amount of GSO payload. This assumption is not always correct, resulting in the following warning message in the log: skb_segment: too many frags For example, mlx5 driver in Striding RQ mode creates some RX SKBs with one frag, and some with 2 frags. After GRO, the frag_list SKBs end up having different amounts of payload. If this frag_list SKB is then forwarded, the aforementioned assumption is violated. Validate the assumption, and fall back to software GSO if it not true. Change-Id: Ia03983f4a47b6534dd987d7a2aad96d54d46d212 Fixes: 07b26c9454a2 ("gso: Support partial splitting at the frag_list pointer") Signed-off-by: Ilan Tayari <ilant@mellanox.com> Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-21Merge branch 'skb_cow_head'David S. Miller6-44/+16
Eric Dumazet says: ==================== net: use skb_cow_head() to deal with cloned skbs James Hughes found an issue with smsc95xx driver. Same problematic code is found in other drivers. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-21kaweth: use skb_cow_head() to deal with cloned skbsEric Dumazet1-12/+6
We can use skb_cow_head() to properly deal with clones, especially the ones coming from TCP stack that allow their head being modified. This avoids a copy. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: James Hughes <james.hughes@raspberrypi.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-21ch9200: use skb_cow_head() to deal with cloned skbsEric Dumazet1-7/+2
We need to ensure there is enough headroom to push extra header, but we also need to check if we are allowed to change headers. skb_cow_head() is the proper helper to deal with this. Fixes: 4a476bd6d1d9 ("usbnet: New driver for QinHeng CH9200 devices") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: James Hughes <james.hughes@raspberrypi.org> Cc: Matthew Garrett <mjg59@srcf.ucam.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-21lan78xx: use skb_cow_head() to deal with cloned skbsEric Dumazet1-7/+2
We need to ensure there is enough headroom to push extra header, but we also need to check if we are allowed to change headers. skb_cow_head() is the proper helper to deal with this. Fixes: 55d7de9de6c3 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet device driver") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: James Hughes <james.hughes@raspberrypi.org> Cc: Woojung Huh <woojung.huh@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-21sr9700: use skb_cow_head() to deal with cloned skbsEric Dumazet1-7/+2
We need to ensure there is enough headroom to push extra header, but we also need to check if we are allowed to change headers. skb_cow_head() is the proper helper to deal with this. Fixes: c9b37458e956 ("USB2NET : SR9700 : One chip USB 1.1 USB2NET SR9700Device Driver Support") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: James Hughes <james.hughes@raspberrypi.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-21cx82310_eth: use skb_cow_head() to deal with cloned skbsEric Dumazet1-5/+2
We need to ensure there is enough headroom to push extra header, but we also need to check if we are allowed to change headers. skb_cow_head() is the proper helper to deal with this. Fixes: cc28a20e77b2 ("introduce cx82310_eth: Conexant CX82310-based ADSL router USB ethernet driver") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: James Hughes <james.hughes@raspberrypi.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-21smsc75xx: use skb_cow_head() to deal with cloned skbsEric Dumazet1-6/+2
We need to ensure there is enough headroom to push extra header, but we also need to check if we are allowed to change headers. skb_cow_head() is the proper helper to deal with this. Fixes: d0cad871703b ("smsc75xx: SMSC LAN75xx USB gigabit ethernet adapter driver") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: James Hughes <james.hughes@raspberrypi.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-21ipv6: sr: fix double free of skb after handling invalid SRHDavid Lebrun1-1/+0
The icmpv6_param_prob() function already does a kfree_skb(), this patch removes the duplicate one. Fixes: 1ababeba4a21f3dba3da3523c670b207fb2feb62 ("ipv6: implement dataplane support for rthdr type 4 (Segment Routing Header)") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David Lebrun <david.lebrun@uclouvain.be> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-21Merge tag 'powerpc-4.11-8' of ↵Linus Torvalds3-8/+8
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: "Just two fixes. The first fixes kprobing a stdu, and is marked for stable as it's been broken for ~ever. In hindsight this could have gone in next. The other is a fix for a change we merged this cycle, where if we take a certain exception when the kernel is running relocated (currently only used for kdump), we checkstop the box. Thanks to Ravi Bangoria" * tag 'powerpc-4.11-8' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/64: Fix HMI exception on LE with CONFIG_RELOCATABLE=y powerpc/kprobe: Fix oops when kprobed on 'stdu' instruction
2017-04-21Merge tag 'pci-v4.11-fixes-5' of ↵Linus Torvalds2-3/+13
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI fix from Bjorn Helgaas: "Sorry this is so late. It's been in -next for over a week, but I forgot to send it on until now. A single fix to the DT binding of the HiSilicon PCIe host support" * tag 'pci-v4.11-fixes-5' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: PCI: hisi: Fix DT binding (hisi-pcie-almost-ecam)
2017-04-21Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds6-13/+66
Pull block layer fixes from Jens Axboe: "A couple of last minute fixes for regressions in this cycle. More specifically: - Two patches from Andy, adjusting the NVMe APST quirks to avoid some issues specific to one Toshiba drive, and some variant of Samsung on two specific Dell laptops. - A fix for mtip32xx, turning off mq scheduling on that device. We have a real fix for this, but it's too late in the cycle. Thankfully we already have a NO_SCHED flag we can apply here. A prep patch for this is ensuring that we honor the NO_SCHED flag when attempting to online switch schedulers, previsouly we only did so for drive load time. From Ming. - Fixing an oops in blk-mq polling with scheduling attached. This one is easily reproducible, it would be a shame to release 4.11 with that issue. From me. I'd prefer not having to send in patches at this point in time, but the above are all things that have regressed in this cycle and the fixes are relatively straight forward" * 'for-linus' of git://git.kernel.dk/linux-block: blk-mq: fix potential oops with polling and blk-mq scheduler nvme: Quirk APST off on "THNSF5256GPUK TOSHIBA" nvme: Adjust the Samsung APST quirk mtip32xx: pass BLK_MQ_F_NO_SCHED block: respect BLK_MQ_F_NO_SCHED
2017-04-21Merge tag 'acpi-4.11-final' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI build fix from Rafael Wysocki: "This avoids a false-positive build warning from the compiler. Specifics: - Avoid a false-positive warning regarding a variable that may not be initialized that started to trigger after a previous general build fix (Arnd Bergmann)" * tag 'acpi-4.11-final' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPI / power: Avoid maybe-uninitialized warning
2017-04-21Merge tag 'mmc-v4.11-rc7' of ↵Linus Torvalds4-4/+22
git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc Pull MMC fixes from Ulf Hansson: "MMC core: - kmalloc sdio scratch buffer to make it DMA-friendly MMC host: - dw_mmc: Fix behaviour for SDIO IRQs when runtime PM is used - sdhci-esdhc-imx: Correct pad I/O drive strength for UHS-DDR50 cards" * tag 'mmc-v4.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: mmc: sdhci-esdhc-imx: increase the pad I/O drive strength for DDR50 card mmc: dw_mmc: Don't allow Runtime PM for SDIO cards mmc: sdio: fix alignment issue in struct sdio_func
2017-04-21Merge branch 'for-linus' of ↵Linus Torvalds1-0/+8
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input Pull input fixlet from Dmitry Torokhov: "An update to Elan PS/2 driver to allow working on yet another Lifebook" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: elantech - add Fujitsu Lifebook E547 to force crc_enabled
2017-04-21MAINTAINERS: Add "B:" field for networking.David S. Miller1-0/+1
We want people to report bugs to the netdev list. Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-21Merge branch 'akpm' (patches from Andrew)Linus Torvalds2-24/+21
Merge two mm fixes from Andrew Morton. * emailed patches from Andrew Morton <akpm@linux-foundation.org>: mm: prevent NR_ISOLATE_* stats from going negative Revert "mm, page_alloc: only use per-cpu allocator for irq-safe requests"
2017-04-21mm: prevent NR_ISOLATE_* stats from going negativeRabin Vincent1-1/+1
Commit 6afcf8ef0ca0 ("mm, compaction: fix NR_ISOLATED_* stats for pfn based migration") moved the dec_node_page_state() call (along with the page_is_file_cache() call) to after putback_lru_page(). But page_is_file_cache() can change after putback_lru_page() is called, so it should be called before putback_lru_page(), as it was before that patch, to prevent NR_ISOLATE_* stats from going negative. Without this fix, non-CONFIG_SMP kernels end up hanging in the while(too_many_isolated()) { congestion_wait() } loop in shrink_active_list() due to the negative stats. Mem-Info: active_anon:32567 inactive_anon:121 isolated_anon:1 active_file:6066 inactive_file:6639 isolated_file:4294967295 ^^^^^^^^^^ unevictable:0 dirty:115 writeback:0 unstable:0 slab_reclaimable:2086 slab_unreclaimable:3167 mapped:3398 shmem:18366 pagetables:1145 bounce:0 free:1798 free_pcp:13 free_cma:0 Fixes: 6afcf8ef0ca0 ("mm, compaction: fix NR_ISOLATED_* stats for pfn based migration") Link: http://lkml.kernel.org/r/1492683865-27549-1-git-send-email-rabin.vincent@axis.com Signed-off-by: Rabin Vincent <rabinv@axis.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Ming Ling <ming.ling@spreadtrum.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-04-21Revert "mm, page_alloc: only use per-cpu allocator for irq-safe requests"Mel Gorman1-23/+20
This reverts commit 374ad05ab64. While the patch worked great for userspace allocations, the fact that softirq loses the per-cpu allocator caused problems. It needs to be redone taking into account that a separate list is needed for hard/soft IRQs or alternatively find a cheap way of detecting reentry due to an interrupt. Both are possible but sufficiently tricky that it shouldn't be rushed. Jesper had one method for allowing softirqs but reported that the cost was high enough that it performed similarly to a plain revert. His figures for netperf TCP_STREAM were as follows Baseline v4.10.0 : 60316 Mbit/s Current 4.11.0-rc6: 47491 Mbit/s Jesper's patch : 60662 Mbit/s This patch : 60106 Mbit/s As this is a regression, I wish to revert to noirq allocator for now and go back to the drawing board. Link: http://lkml.kernel.org/r/20170415145350.ixy7vtrzdzve57mh@techsingularity.net Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Reported-by: Tariq Toukan <ttoukan.linux@gmail.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-04-20blk-mq: fix potential oops with polling and blk-mq schedulerJens Axboe1-1/+10
If we have a scheduler attached, blk_mq_tag_to_rq() on the scheduled tags will return NULL if a request is no longer in flight. This is different than using the normal tags, where it will always return the fixed request. Check for this condition for polling, in case we happen to enter polling for a completed request. The request address remains valid, so this check and return should be perfectly safe. Fixes: bd166ef183c2 ("blk-mq-sched: add framework for MQ capable IO schedulers") Tested-by: Stephen Bates <sbates@raithlin.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-20nvme: Quirk APST off on "THNSF5256GPUK TOSHIBA"Andy Lutomirski1-0/+9
There's a report that it malfunctions with APST on. See https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1678184 Cc: Kai-Heng Feng <kai.heng.feng@canonical.com> Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-20nvme: Adjust the Samsung APST quirkAndy Lutomirski3-11/+38
I got a couple more reports: the Samsung APST issues appears to affect multiple 950-series devices in Dell XPS 15 9550 and Precision 5510 laptops. Change the quirk: rather than blacklisting the firmware on the first problematic SSD that was reported, disable APST on all 144d:a802 devices if they're installed in the two affected Dell models. While we're at it, disable only the deepest sleep state instead of all of them -- the reporters say that this is sufficient to fix the problem. (I have a device that appears to be entirely identical to one of the affected devices, but I have a different Dell laptop, so it's not the case that all Samsung devices with firmware BXW75D0Q are broken under all circumstances.) Samsung engineers have an affected system, and hopefully they'll give us a better workaround some time soon. In the mean time, this should minimize regressions. See https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1678184 Cc: Kai-Heng Feng <kai.heng.feng@canonical.com> Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-20net sched actions: allocate act cookie earlyWolfgang Bumiller1-23/+32
Policing filters do not use the TCA_ACT_* enum and the tb[] nlattr array in tcf_action_init_1() doesn't get filled for them so we should not try to look for a TCA_ACT_COOKIE attribute in the then uninitialized array. The error handling in cookie allocation then calls tcf_hash_release() leading to invalid memory access later on. Additionally, if cookie allocation fails after an already existing non-policing filter has successfully been changed, tcf_action_release() should not be called, also we would have to roll back the changes in the error handling, so instead we now allocate the cookie early and assign it on success at the end. CVE-2017-7979 Fixes: 1045ba77a596 ("net sched actions: Add support for user cookies") Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-20Merge branch 'qed-dcbx-fixes'David S. Miller1-1/+12
Sudarsana Reddy Kalluru says: ==================== qed: Dcbx bug fixes The series has set of bug fixes for dcbx implementation of qed driver. Please consider applying this to 'net' branch. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-20qed: Fix issue in populating the PFC config paramters.sudarsana.kalluru@cavium.com1-0/+2
Change ieee_setpfc() callback implementation to populate traffic class count with the user provided value. Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-20qed: Fix possible system hang in the dcbnl-getdcbx() path.sudarsana.kalluru@cavium.com1-1/+1
qed_dcbnl_get_dcbx() API uses kmalloc in GFT_KERNEL mode. The API gets invoked in the interrupt context by qed_dcbnl_getdcbx callback. Need to invoke this kmalloc in atomic mode. Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-20qed: Fix sending an invalid PFC error mask to MFW.sudarsana.kalluru@cavium.com1-0/+2
PFC error-mask value is not supported by MFW, but this bit could be set in the pfc bit-map of the operational parameters if remote device supports it. These operational parameters are used as basis for populating the dcbx config parameters. User provided configs will be applied on top of these parameters and then send them to MFW when requested. Driver need to clear the error-mask bit before sending the config parameters to MFW. Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>