summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2018-08-02Merge ra.kernel.org:/pub/scm/linux/kernel/git/davem/netDavid S. Miller239-915/+2004
The BTF conflicts were simple overlapping changes. The virtio_net conflict was an overlap of a fix of statistics counter, happening alongisde a move over to a bonafide statistics structure rather than counting value on the stack. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-02be2net: fix spelling mistake "seqence" -> "sequence"Colin Ian King1-1/+1
Trivial fix to spelling mistake in dev_info message. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-02net: Fix coding style in skb_push()Ganesh Goudar1-1/+1
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-02net: fec: check DMA addressing limitationsStefan Agner1-0/+8
Check DMA addressing limitations as suggested by the DMA API how-to. This does not fix a particular issue seen but is considered good style. Signed-off-by: Stefan Agner <stefan@agner.ch> Acked-by: Fugang Duan <fugang.duan@nxp.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-02rxrpc: Remove set but not used variable 'nowj'Wei Yongjun1-2/+1
Fixes gcc '-Wunused-but-set-variable' warning: net/rxrpc/proc.c: In function 'rxrpc_call_seq_show': net/rxrpc/proc.c:66:29: warning: variable 'nowj' set but not used [-Wunused-but-set-variable] unsigned long timeout = 0, nowj; ^ Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds19-30/+184
Pull networking fixes from David Miller: "Fixes keep trickling in: 1) Various IP fragmentation memory limit hardening changes from Eric Dumazet. 2) Revert ipv6 metrics leak change, it causes more problems than it fixes for now. 3) Fix WoL regression in stmmac driver, from Jose Abreu. 4) Netlink socket spectre v1 gadget fix, from Jeremy Cline" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: Revert "net/ipv6: fix metrics leak" rxrpc: Fix user call ID check in rxrpc_service_prealloc_one net: dsa: Do not suspend/resume closed slave_dev netlink: Fix spectre v1 gadget in netlink_create() Documentation: dpaa2: Use correct heading adornment net: stmmac: Fix WoL for PCI-based setups bonding: avoid lockdep confusion in bond_get_stats() enic: do not call enic_change_mtu in enic_probe ipv4: frags: handle possible skb truesize change inet: frag: enforce memory limits earlier net/mlx5e: IPoIB, Set the netdevice sw mtu in ipoib enhanced flow net/mlx5e: Fix null pointer access when setting MTU of vport representor net/mlx5e: Set port trust mode to PCP as default net/mlx5e: E-Switch, Initialize eswitch only if eswitch manager net: dsa: mv88e6xxx: Fix SERDES support on 88E6141/6341 brcmfmac: fix regression in parsing NVRAM for multiple devices iwlwifi: add more card IDs for 9000 series
2018-08-02Squashfs: Compute expected length from inode size rather than block lengthPhillip Lougher4-23/+24
Previously in squashfs_readpage() when copying data into the page cache, it used the length of the datablock read from the filesystem (after decompression). However, if the filesystem has been corrupted this data block may be short, which will leave pages unfilled. The fix for this is to compute the expected number of bytes to copy from the inode size, and use this to detect if the block is short. Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk> Tested-by: Willy Tarreau <w@1wt.eu> Cc: Анатолий Тросиненко <anatoly.trosinenko@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-02squashfs: more metadata hardeningLinus Torvalds3-6/+13
The squashfs fragment reading code doesn't actually verify that the fragment is inside the fragment table. The end result _is_ verified to be inside the image when actually reading the fragment data, but before that is done, we may end up taking a page fault because the fragment table itself might not even exist. Another report from Anatoly and his endless squashfs image fuzzing. Reported-by: Анатолий Тросиненко <anatoly.trosinenko@gmail.com> Acked-by:: Phillip Lougher <phillip.lougher@gmail.com>, Cc: Willy Tarreau <w@1wt.eu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-02Revert "net/ipv6: fix metrics leak"David S. Miller1-14/+4
This reverts commit df18b50448fab1dff093731dfd0e25e77e1afcd1. This change causes other problems and use-after-free situations as found by syzbot. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-02Merge branch 'fixes' of git://git.armlinux.org.uk/~rmk/linux-armLinus Torvalds1-1/+3
Pull ARM fix from Russell King: "Just a single fix this time around for recent binutils causing build problems when generating Thumb-2 code" * 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm: ARM: 8781/1: Fix Thumb-2 syscall return for binutils 2.29+
2018-08-01net: don't declare IPv6 non-local bind helper if CONFIG_IPV6 undefinedVincent Bernat1-7/+7
Fixes: 83ba4645152d ("net: add helpers checking if socket can be bound to nonlocal address") Signed-off-by: Vincent Bernat <vincent@bernat.im> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-01mm: do not initialize TLB stack vma's with vma_init()Linus Torvalds5-17/+12
Commit 2c4541e24c55 ("mm: use vma_init() to initialize VMAs on stack and data segments") tried to initialize various left-over ad-hoc vma's "properly", but actually made things worse for the temporary vma's used for TLB flushing. vma_init() doesn't actually initialize all of the vma, just a few fields, so doing something like - struct vm_area_struct vma = { .vm_mm = tlb->mm, }; + struct vm_area_struct vma; + + vma_init(&vma, tlb->mm); was actually very bad: instead of having a nicely initialized vma with every field but "vm_mm" zeroed, you'd have an entirely uninitialized vma with only a couple of fields initialized. And they weren't even fields that the code in question mostly cared about. The flush_tlb_range() function takes a "struct vma" rather than a "struct mm_struct", because a few architectures actually care about what kind of range it is - being able to only do an ITLB flush if it's a range that doesn't have data accesses enabled, for example. And all the normal users already have the vma for doing the range invalidation. But a few people want to call flush_tlb_range() with a range they just made up, so they also end up using a made-up vma. x86 just has a special "flush_tlb_mm_range()" function for this, but other architectures (arm and ia64) do the "use fake vma" thing instead, and thus got caught up in the vma_init() changes. At the same time, the TLB flushing code really doesn't care about most other fields in the vma, so vma_init() is just unnecessary and pointless. This fixes things by having an explicit "this is just an initializer for the TLB flush" initializer macro, which is used by the arm/arm64/ia64 people who mis-use this interface with just a dummy vma. Fixes: 2c4541e24c55 ("mm: use vma_init() to initialize VMAs on stack and data segments") Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: John Stultz <john.stultz@linaro.org> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-01mm: delete historical BUG from zap_pmd_range()Hugh Dickins1-4/+2
Delete the old VM_BUG_ON_VMA() from zap_pmd_range(), which asserted that mmap_sem must be held when splitting an "anonymous" vma there. Whether that's still strictly true nowadays is not entirely clear, but the danger of sometimes crashing on the BUG is now fairly clear. Even with the new stricter rules for anonymous vma marking, the condition it checks for can possible trigger. Commit 44960f2a7b63 ("staging: ashmem: Fix SIGBUS crash when traversing mmaped ashmem pages") is good, and originally I thought it was safe from that VM_BUG_ON_VMA(), because the /dev/ashmem fd exposed to the user is disconnected from the vm_file in the vma, and madvise(,,MADV_REMOVE) insists on VM_SHARED. But after I read John's earlier mail, drawing attention to the vfs_fallocate() in there: I may be wrong, and I don't know if Android has THP in the config anyway, but it looks to me like an unmap_mapping_range() from ashmem's vfs_fallocate() could hit precisely the VM_BUG_ON_VMA(), once it's vma_is_anonymous(). Signed-off-by: Hugh Dickins <hughd@google.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-01Merge tag 'rxrpc-next-20180801' of ↵David S. Miller11-67/+193
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs David Howells says: ==================== rxrpc: Development Here are some patches that add some more tracepoints to AF_RXRPC and fix some issues therein. The most significant points are: (1) Display the call timeout information in /proc/net/rxrpc/calls. (2) Save the call's debug_id in the rxrpc_channel struct so that it can be used in traces after the rxrpc_call struct has been destroyed. (3) Increase the size of the kAFS Rx window from 32 to 63 to be about the same as the Auristor server. (4) Propose the terminal ACK for a client call after it has received all its data to be transmitted after a short interval so that it will get transmitted if not first superseded by a new call on the same channel. (5) Flush ACKs during the data reception if we detect that we've run out of data.[*] (6) Trace successful packet transmission and softirq to process context socket notification. [*] Note that on a uncontended gigabit network, rxrpc runs in to trouble with ACK packets getting batched together (up to ~32 at a time) somewhere between the IP transmit queue on the client and the ethernet receive queue on the server. I can see the kernel afs filesystem client and Auristor userspace server stalling occasionally on a 512MB single read. Sticking tracepoints in the network driver at either end seems to show that, although the ACK transmissions made by the client are reasonably spaced timewise, the received ACKs come in batches from the network card on the server. I'm not sure what, if anything, can be done about this. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-01rxrpc: Fix user call ID check in rxrpc_service_prealloc_oneYueHaibing1-2/+2
There just check the user call ID isn't already in use, hence should compare user_call_ID with xcall->user_call_ID, which is current node's user_call_ID. Fixes: 540b1c48c37a ("rxrpc: Fix deadlock between call creation and sendmsg/recvmsg") Suggested-by: David Howells <dhowells@redhat.com> Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-01Merge tag 'mmc-v4.18-rc5' of ↵Linus Torvalds1-1/+2
git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc Pull MMC fix from Ulf Hansson: "MMC host: mxcmmc: Fix build error for powerpc" * tag 'mmc-v4.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: mmc: mxcmmc: Fix missing parentheses and brace
2018-08-01Merge tag 'pm-urgent-4.18' of ↵Linus Torvalds3-67/+74
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fixes from Rafael Wysocki: "These fix the scope of a recent intel_pstate driver optimization used incorrectly on some systems due to processor identification ambiguity and fix a few issues in the turbostat utility, including three recent regressions. Specifics: - Use ACPI FADT preferred PM Profile to distinguish Skylake desktop processors from some server ones with the same model number in order to limit the scope of the recent IO-wait boost optimization to servers, as intended (Srinivas Pandruvada). - Fix several issues in the turbostat utility: * Fix the -S option on 1-CPU systems (Len Brown). * Fix computations using incorrect processor core counts (Artem Bityutskiy). * Fix the x2apic debug message (Len Brown). * Fix logical node enumeration to allow for non-sequential physical nodes (Prarit Bhargava). * Fix reported family on modern AMD processors (Calvin Walton). * Clarify the RAPL column information in the man page (Len Brown)" * tag 'pm-urgent-4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: cpufreq: intel_pstate: Limit the scope of HWP dynamic boost platforms tools/power turbostat: version 18.07.27 tools/power turbostat: Read extended processor family from CPUID tools/power turbostat: Fix logical node enumeration to allow for non-sequential physical nodes tools/power turbostat: fix x2apic debug message output file tools/power turbostat: fix bogus summary values tools/power turbostat: fix -S on UP systems tools/power turbostat: Update turbostat(8) RAPL throttling column description
2018-08-01squashfs metadata 2: electric boogalooLinus Torvalds3-14/+20
Anatoly continues to find issues with fuzzed squashfs images. This time, corrupt, missing, or undersized data for the page filling wasn't checked for, because the squashfs_{copy,read}_cache() functions did the squashfs_copy_data() call without checking the resulting data size. Which could result in the page cache pages being incompletely filled in, and no error indication to the user space reading garbage data. So make a helper function for the "fill in pages" case, because the exact same incomplete sequence existed in two places. [ I should have made a squashfs branch for these things, but I didn't intend to start doing them in the first place. My historical connection through cramfs is why I got into looking at these issues at all, and every time I (continue to) think it's a one-off. Because _this_ time is always the last time. Right? - Linus ] Reported-by: Anatoly Trosinenko <anatoly.trosinenko@gmail.com> Tested-by: Willy Tarreau <w@1wt.eu> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Phillip Lougher <phillip@squashfs.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-01staging: ashmem: Fix SIGBUS crash when traversing mmaped ashmem pagesJohn Stultz1-0/+2
Amit Pundir and Youling in parallel reported crashes with recent mainline kernels running Android: F DEBUG : *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** F DEBUG : Build fingerprint: 'Android/db410c32_only/db410c32_only:Q/OC-MR1/102:userdebug/test-key F DEBUG : Revision: '0' F DEBUG : ABI: 'arm' F DEBUG : pid: 2261, tid: 2261, name: zygote >>> zygote <<< F DEBUG : signal 7 (SIGBUS), code 2 (BUS_ADRERR), fault addr 0xec00008 ... <snip> ... F DEBUG : backtrace: F DEBUG : #00 pc 00001c04 /system/lib/libc.so (memset+48) F DEBUG : #01 pc 0010c513 /system/lib/libart.so (create_mspace_with_base+82) F DEBUG : #02 pc 0015c601 /system/lib/libart.so (art::gc::space::DlMallocSpace::CreateMspace(void*, unsigned int, unsigned int)+40) F DEBUG : #03 pc 0015c3ed /system/lib/libart.so (art::gc::space::DlMallocSpace::CreateFromMemMap(art::MemMap*, std::__1::basic_string<char, std::__ 1::char_traits<char>, std::__1::allocator<char>> const&, unsigned int, unsigned int, unsigned int, unsigned int, bool)+36) ... This was bisected back to commit bfd40eaff5ab ("mm: fix vma_is_anonymous() false-positives"). create_mspace_with_base() in the trace above, utilizes ashmem, and with ashmem, for shared mappings we use shmem_zero_setup(), which sets the vma->vm_ops to &shmem_vm_ops. But for private ashmem mappings nothing sets the vma->vm_ops. Looking at the problematic patch, it seems to add a requirement that one call vma_set_anonymous() on a vma, otherwise the dummy_vm_ops will be used. Using the dummy_vm_ops seem to triggger SIGBUS when traversing unmapped pages. Thus, this patch adds a call to vma_set_anonymous() for ashmem private mappings and seems to avoid the reported problem. Fixes: bfd40eaff5ab ("mm: fix vma_is_anonymous() false-positives") Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hugh Dickins <hughd@google.com> Cc: Joel Fernandes <joelaf@google.com> Cc: Colin Cross <ccross@google.com> Cc: Matthew Wilcox <willy@infradead.org> Reported-by: Amit Pundir <amit.pundir@linaro.org> Reported-by: Youling 257 <youling257@gmail.com> Signed-off-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-01cxgb4: fix endian to test F_FW_PORT_CMD_DCBXDIS32Ganesh Goudar1-4/+3
For FW_PORT_ACTION_GET_PORT_INFO32 messages, the u.info32.lstatus32_to_cbllen32 is 32-bit Big Endian. We need to translate that to CPU Endian in order to test F_FW_PORT_CMD_DCBXDIS32. Signed-off-by: Casey Leedom <leedom@chelsio.com> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-01Merge branch 'net-sched-cleanups'David S. Miller2-54/+62
Jiri Pirko says: ==================== net: sched: couple of adjustments/fixes ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-01net: sched: make tcf_chain_{get,put}() staticJiri Pirko2-21/+16
These are no longer used outside of cls_api.c so make them static. Move tcf_chain_flush() to avoid fwd declaration of tcf_chain_put(). Signed-off-by: Jiri Pirko <jiri@mellanox.com> v1->v2: - new patch Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-01net: sched: fix notifications for action-held chainsJiri Pirko1-28/+43
Chains that only have action references serve as placeholders. Until a non-action reference is created, user should not be aware of the chain. Also he should not receive any notifications about it. So send notifications for the new chain only in case the chain gets the first non-action reference. Symmetrically to that, when the last non-action reference is dropped, send the notification about deleted chain. Reported-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> v1->v2: - made __tcf_chain_{get,put}() static as suggested by Cong Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-01net: sched: change name of zombie chain to "held_by_acts_only"Jiri Pirko1-8/+6
As mentioned by Cong and Jakub during the review process, it is a bit odd to sometimes (act flow) create a new chain which would be immediately a "zombie". So just rename it to "held_by_acts_only". Signed-off-by: Jiri Pirko <jiri@mellanox.com> Suggested-by: Cong Wang <xiyou.wangcong@gmail.com> Suggested-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-01net: hns3: fix return value error while hclge_cmd_csq_clean failedHuazhong Tan1-3/+9
While cleaning the command queue, the value of the HEAD register is not in the range of next_to_clean and next_to_use, meaning that this value is invalid. This also means that there is a hardware error and the hardware will trigger a reset soon. At this time we should return an error code instead of 0, and HCLGE_STATE_CMD_DISABLE needs to be set to prevent sending command again. Fixes: 3ff504908f95 ("net: hns3: fix a dead loop in hclge_cmd_csq_clean") Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-01rds: remove redundant variable 'rds_ibdev'YueHaibing1-3/+0
Variable 'rds_ibdev' is being assigned but never used, so can be removed. fix this clang warning: net/rds/ib_send.c:762:24: warning: variable ‘rds_ibdev’ set but not used [-Wunused-but-set-variable] Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-01strparser: remove redundant variable 'rd_desc'YueHaibing1-4/+0
Variable 'rd_desc' is being assigned but never used, so can be removed. fix this clang warning: net/strparser/strparser.c:411:20: warning: variable ‘rd_desc’ set but not used [-Wunused-but-set-variable] Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-01ip_gre: remove redundant variables t_hlenYueHaibing1-5/+0
After commit ffc2b6ee4174 ("ip_gre: fix IFLA_MTU ignored on NEWLINK") variable t_hlen is assigned values that are never read, hence they are redundant and can be removed. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-01ia64: mark special ia64 memory areas anonymousLinus Torvalds1-0/+2
Commit bfd40eaff5ab ("mm: fix vma_is_anonymous() false-positives") made newly allocated vma's have a dummy vm_ops field so that they wouldn't be mistaken for anonymous mappings, and if you wanted an anonymous vma you had to explicitly say so by calling "vma_set_anonymous()" on it. However, it missed the two special vmas that ia64 processes have: the register backing store and the NaT page. So they wouldn't actually act like anonymous ranges, and page faults on them caused a SIGBUS rather than the creation of a new anon page in them. That obviously will make any ia64 binary very unhappy indeed, and the boot fails early. Fixes: bfd40eaff5ab ("mm: fix vma_is_anonymous() false-positives") Reported-by: Tony Luck <tony.luck@intel.com> Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: John Stultz <john.stultz@linaro.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-01tcp: remove set but not used variable 'skb_size'Wei Yongjun1-2/+1
Fixes gcc '-Wunused-but-set-variable' warning: net/ipv4/tcp_output.c: In function 'tcp_collapse_retrans': net/ipv4/tcp_output.c:2700:6: warning: variable 'skb_size' set but not used [-Wunused-but-set-variable] int skb_size, next_skb_size; ^ Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-01Merge branch 'tcp-add-4-new-stats'David S. Miller6-8/+69
Wei Wang says: ==================== tcp: add 4 new stats This patch series adds 3 RFC4898 stats: 1. tcpEStatsPerfHCDataOctetsOut 2. tcpEStatsPerfOctetsRetrans 3. tcpEStatsStackDSACKDups and an addtional stat to record the number of data packet reordering events seen: 4. tcp_reord_seen Together with the existing stats, application can use them to measure the retransmission rate in bytes, exclude spurious retransmissions reflected by DSACK, and keep track of the reordering events on live connections. In particular the networks with different MTUs make bytes-based loss stats more useful. Google servers have been using these stats for many years to instrument transport and network performance. Note: The first patch is a refactor to add a helper to calculate opt_stats size in order to make later changes cleaner. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-01tcp: add stat of data packet reordering eventsWei Wang5-4/+11
Introduce a new TCP stats to record the number of reordering events seen and expose it in both tcp_info (TCP_INFO) and opt_stats (SOF_TIMESTAMPING_OPT_STATS). Application can use this stats to track the frequency of the reordering events in addition to the existing reordering stats which tracks the magnitude of the latest reordering event. Note: this new stats tracks reordering events triggered by ACKs, which could often be fewer than the actual number of packets being delivered out-of-order. Signed-off-by: Wei Wang <weiwan@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-01tcp: add dsack blocks received statsWei Wang4-0/+10
Introduce a new TCP stat to record the number of DSACK blocks received (RFC4989 tcpEStatsStackDSACKDups) and expose it in both tcp_info (TCP_INFO) and opt_stats (SOF_TIMESTAMPING_OPT_STATS). Signed-off-by: Wei Wang <weiwan@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-01tcp: add data bytes retransmitted statsWei Wang4-0/+11
Introduce a new TCP stat to record the number of bytes retransmitted (RFC4898 tcpEStatsPerfOctetsRetrans) and expose it in both tcp_info (TCP_INFO) and opt_stats (SOF_TIMESTAMPING_OPT_STATS). Signed-off-by: Wei Wang <weiwan@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-01tcp: add data bytes sent statsWei Wang4-1/+13
Introduce a new TCP stat to record the number of bytes sent (RFC4898 tcpEStatsPerfHCDataOctetsOut) and expose it in both tcp_info (TCP_INFO) and opt_stats (SOF_TIMESTAMPING_OPT_STATS). Signed-off-by: Wei Wang <weiwan@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-01tcp: add a helper to calculate size of opt_statsWei Wang1-3/+24
This is to refactor the calculation of the size of opt_stats to a helper function to make the code cleaner and easier for later changes. Suggested-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Wei Wang <weiwan@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-01net: dsa: Do not suspend/resume closed slave_devFlorian Fainelli1-0/+6
If a DSA slave network device was previously disabled, there is no need to suspend or resume it. Fixes: 2446254915a7 ("net: dsa: allow switch drivers to implement suspend/resume hooks") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-01Merge branch 'ipv4-Control-SKB-reprioritization-after-forwarding'David S. Miller10-79/+379
Petr Machata says: ==================== ipv4: Control SKB reprioritization after forwarding After IPv4 packets are forwarded, the priority of the corresponding SKB is updated according to the TOS field of IPv4 header. This overrides any prioritization done earlier by e.g. an skbedit action or ingress-qos-map defined at a vlan device. Such overriding may not always be desirable. Even if the packet ends up being routed, which implies this is an L3 network node, an administrator may wish to preserve whatever prioritization was done earlier on in the pipeline. Therefore this patch set introduces a sysctl that controls this behavior, net.ipv4.ip_forward_update_priority. It's value is 1 by default to preserve the current behavior. All of the above is implemented in patch #1. Value changes prompt a new NETEVENT_IPV4_FWD_UPDATE_PRIORITY_UPDATE notification, so that the drivers can hook up whatever logic may depend on this value. That is implemented in patch #2. In patches #3 and #4, mlxsw is adapted to recognize the sysctl. On initialization, the RGCR register that handles router configuration is set in accordance with the sysctl. The new notification is listened to and RGCR is reconfigured as necessary. In patches #5 to #7, a selftest is added to verify that mlxsw reflects the sysctl value as necessary. The test is expressed in terms of the recently-introduced ieee_setapp support, and works by observing how DSCP value gets rewritten depending on packet priority. For this reason, the test is added to the subdirectory drivers/net/mlxsw. Even though it's not particularly specific to mlxsw, it's not suitable for running on soft devices (which don't support the ieee_setapp et.al.). Changes from v1 to v2: - In patch #1, init sysctl_ip_fwd_update_priority to 1 instead of true. Changes from RFC to v1: - Fix wrong sysctl name in ip-sysctl.txt - Add notifications - Add mlxsw support - Add self test ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-01selftests: mlxsw: Add test for ip_forward_update_priorityPetr Machata1-0/+233
Verify that with that sysctl turned off, DSCP prioritization and rewrite works the same way as in qos_dscp_bridge test. However when the sysctl is charged, there should be a reprioritization after routing stage, which will be observed by a different DSCP rewrite on egress. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-01selftests: forwarding: Move DSCP capture to lib.shPetr Machata2-42/+42
dscp_capture_install() and dscp_capture_uninstall() are going to be useful for a test added by a following patch, move them therefore to lib.sh together with related helpers. While doing so, change the rule preference from mere DSCP value to DSCP+100 in order to support adding captures of packets with DSCP of 0. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-01selftests: forwarding: Move lldpad waiting to lib.shPetr Machata2-20/+24
The function lldpad_wait() will be useful for a test added by a following patch. Likewise would the "sleep 5" with its extensive comment. Therefore move lldpad_wait() to lib.sh in order to allow reuse. Rename it to lldpad_app_wait_set() to recognize that what this is intended to wait on are the pending APP sets. For the sleeping, add a function lldpad_app_wait_del(). That will serve to hold the related explanatory comment (which edit for clarity), and as a token in the caller to identify the sites where this sort of waiting takes place. That will serve when/if a better way to handle this business is found. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-01mlxsw: spectrum_router: Handle sysctl_ip_fwd_update_priorityPetr Machata1-1/+17
This sysctl setting controls whether packet priority should be updated after forwarding. Configure RGCR.usp accordingly so that the device is in sync with the kernel handling. Note that RGCR doesn't allow changing arbitrary parameters mid-operation, however "usp" is exempt and can be reconfigured. Also react to NETEVENT_IPV4_FWD_UPDATE_PRIORITY_UPDATE notifications that signify change in this configuration. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-01mlxsw: spectrum: Extract work-scheduling into a new functionPetr Machata1-15/+23
The boilerplate to schedule NETEVENT_IPV4_MPATH_HASH_UPDATE and NETEVENT_IPV6_MPATH_HASH_UPDATE handling is almost equivalent to that of NETEVENT_IPV4_FWD_UPDATE_PRIORITY_UPDATE that's coming in the next patch. The only difference is which actual worker function should be called. Extract this boilerplate into a named function in order to allow reuse. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-01net: ipv4: Notify about changes to ip_forward_update_priorityPetr Machata2-1/+19
Drivers may make offloading decision based on whether ip_forward_update_priority is enabled or not. Therefore distribute netevent notifications to give them a chance to react to a change. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-01net: ipv4: Control SKB reprioritization after forwardingPetr Machata5-1/+22
After IPv4 packets are forwarded, the priority of the corresponding SKB is updated according to the TOS field of IPv4 header. This overrides any prioritization done earlier by e.g. an skbedit action or ingress-qos-map defined at a vlan device. Such overriding may not always be desirable. Even if the packet ends up being routed, which implies this is an L3 network node, an administrator may wish to preserve whatever prioritization was done earlier on in the pipeline. Therefore introduce a sysctl that controls this behavior. Keep the default value at 1 to maintain backward-compatible behavior. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-01netlink: Fix spectre v1 gadget in netlink_create()Jeremy Cline1-0/+2
'protocol' is a user-controlled value, so sanitize it after the bounds check to avoid using it for speculative out-of-bounds access to arrays indexed by it. This addresses the following accesses detected with the help of smatch: * net/netlink/af_netlink.c:654 __netlink_create() warn: potential spectre issue 'nlk_cb_mutex_keys' [w] * net/netlink/af_netlink.c:654 __netlink_create() warn: potential spectre issue 'nlk_cb_mutex_key_strings' [w] * net/netlink/af_netlink.c:685 netlink_create() warn: potential spectre issue 'nl_table' [w] (local cap) Cc: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Jeremy Cline <jcline@redhat.com> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-01net: add helpers checking if socket can be bound to nonlocal addressVincent Bernat6-12/+21
The construction "net->ipv4.sysctl_ip_nonlocal_bind || inet->freebind || inet->transparent" is present three times and its IPv6 counterpart is also present three times. We introduce two small helpers to characterize these tests uniformly. Signed-off-by: Vincent Bernat <vincent@bernat.im> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-01net: change Exar/Neterion menu items to be alphabeticalJon Mason3-11/+16
Neterion was standalone for several years, then acquired by Exar and shutdown in 11 months without ever making any new Exar branded adapters. Users would probably think of them as Neterion and not Exar (as there have been no follow-on adapters and the vast majority ever sold were under the Neterion name). 6c541b4595a2 ("net: ethernet: Sort Kconfig sourcing alphabetically") sorted Kconfig sourcing based on directory names, but in a couple cases, the menu item text is quite different from the directory name and is not sorted correctly: drivers/net/ethernet/neterion/Kconfig => "Exar devices" To address that and clear up any confusion about the name, "Exar" was changed to "Neterion (Exar)" and the relevant entries in the Makefile and Kconfig were reordered to match the alphabetical organization. Inspired-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Jon Mason <jdmason@kudzu.us> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-01net/tls: Use kmemdup to simplify the codezhong jiang2-4/+2
Kmemdup is better than kmalloc+memcpy. So replace them. Signed-off-by: zhong jiang <zhongjiang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-01net/tipc: remove redundant variables 'tn' and 'oport'Colin Ian King1-4/+1
Variables 'tn' and 'oport' are being assigned but are never used hence they are redundant and can be removed. Cleans up clang warnings: warning: variable 'oport' set but not used [-Wunused-but-set-variable] warning: variable 'tn' set but not used [-Wunused-but-set-variable] Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>