summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2020-05-04net: ena: allow setting the hash function without changing the keySameeh Jubran3-14/+32
Current code does not allow setting the hash function without changing the key. This commit enables it. To achieve this we separate ena_com_get_hash_function() to 2 functions: ena_com_get_hash_function() - which gets only the hash function, and ena_com_get_hash_key() - which gets only the hash key. Also return 0 instead of rc at the end of ena_get_rxfh() since all previous operations succeeded. Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Signed-off-by: Sameeh Jubran <sameehj@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-04net: ena: fix error returning in ena_com_get_hash_function()Arthur Kiyanovski1-2/+4
In case the "func" parameter is NULL we now return "-EINVAL". This shouldn't happen in general, but when it does happen, this is the proper way to handle it. We also check func for NULL in the beginning of the function, as there is no reason to do all the work and realize in the end of the function it was useless. Signed-off-by: Sameeh Jubran <sameehj@amazon.com> Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-04net: ena: avoid unnecessary admin command when RSS function set failsArthur Kiyanovski1-1/+3
Currently when ena_set_hash_function() fails the hash function is restored to the previous value by calling an admin command to get the hash function from the device. In this commit we avoid the admin command, by saving the previous hash function before calling ena_set_hash_function() and using this previous value to restore the hash function in case of failure of ena_set_hash_function(). Signed-off-by: Sameeh Jubran <sameehj@amazon.com> Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-04net: usb: qmi_wwan: add support for DW5816eMatt Jolly1-0/+1
Add support for Dell Wireless 5816e to drivers/net/usb/qmi_wwan.c Signed-off-by: Matt Jolly <Kangie@footclan.ninja> Acked-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-04net_sched: sch_skbprio: add message validation to skbprio_change()Eric Dumazet1-0/+3
Do not assume the attribute has the right size. Fixes: aea5f654e6b7 ("net/sched: add skbprio scheduler") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-04Merge branch 'sch_fq-optimizations'David S. Miller1-36/+48
Eric Dumazet says: ==================== net_sched: sch_fq: round of optimizations This series is focused on better layout of struct fq_flow to reduce number of cache line misses in fq_enqueue() and dequeue operations. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-04net_sched: sch_fq: perform a prefetch() earlierEric Dumazet1-1/+1
The prefetch() done in fq_dequeue() can be done a bit earlier after the refactoring of the code done in the prior patch. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-04net_sched: sch_fq: do not call fq_peek() twice per packetEric Dumazet1-18/+16
This refactors the code to not call fq_peek() from fq_dequeue_head() since the caller can provide the skb. Also rename fq_dequeue_head() to fq_dequeue_skb() because 'head' is a bit vague, given the skb could come from t_root rb-tree. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-04net_sched: sch_fq: use bulk freeing in fq_gc()Eric Dumazet1-7/+11
fq_gc() already builds a small array of pointers, so using kmem_cache_free_bulk() needs very little change. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-04net_sched: sch_fq: change fq_flow size/layoutEric Dumazet1-2/+7
sizeof(struct fq_flow) is 112 bytes on 64bit arches. This means that half of them use two cache lines, but 50% use three cache lines. This patch adds cache line alignment, and makes sure that only the first cache line is touched by fq_enqueue(), which is more expensive that fq_dequeue() in general. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-04net_sched: sch_fq: avoid touching f->next from fq_gc()Eric Dumazet1-8/+13
A significant amount of cpu cycles is spent in fq_gc() When fq_gc() does its lookup in the rb-tree, it needs the following fields from struct fq_flow : f->sk (lookup key in the rb-tree) f->fq_node (anchor in the rb-tree) f->next (used to determine if the flow is detached) f->age (used to determine if the flow is candidate for gc) This unfortunately spans two cache lines (assuming 64 bytes cache lines) We can avoid using f->next, if we use the low order bit of f->{age|tail} This low order bit is 0, if f->tail points to an sk_buff. We set the low order bit to 1, if the union contains a jiffies value. Combined with the following patch, this makes sure we only need to bring into cpu caches one cache line per flow. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-04Linux 5.7-rc4Linus Torvalds1-1/+1
2020-05-03Merge tag 'for-5.7-rc3-tag' of ↵Linus Torvalds4-7/+53
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull more btrfs fixes from David Sterba: "A few more stability fixes, minor build warning fixes and git url fixup: - fix partial loss of prealloc extent past i_size after fsync - fix potential deadlock due to wrong transaction handle passing via journal_info - fix gcc 4.8 struct intialization warning - update git URL in MAINTAINERS entry" * tag 'for-5.7-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: MAINTAINERS: btrfs: fix git repo URL btrfs: fix gcc-4.8 build warning for struct initializer btrfs: transaction: Avoid deadlock due to bad initialization timing of fs_info::journal_info btrfs: fix partial loss of prealloc extent past i_size after fsync
2020-05-03Merge tag 'iommu-fixes-v5.7-rc3' of ↵Linus Torvalds5-7/+11
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull IOMMU fixes from Joerg Roedel: - Fix a memory leak when dev_iommu gets freed and a sub-pointer does not - Build dependency fixes for Mediatek, spapr_tce, and Intel IOMMU driver - Export iommu_group_get_for_dev() only for GPLed modules - Fix AMD IOMMU interrupt remapping when x2apic is enabled - Fix error path in the QCOM IOMMU driver probe function * tag 'iommu-fixes-v5.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: iommu/qcom: Fix local_base status check iommu: Properly export iommu_group_get_for_dev() iommu/vt-d: Use right Kconfig option name iommu/amd: Fix legacy interrupt remapping for x2APIC-enabled system iommu: spapr_tce: Disable compile testing to fix build on book3s_32 config iommu/mediatek: Fix MTK_IOMMU dependencies iommu: Fix the memory leak in dev_iommu_free()
2020-05-03MAINTAINERS: btrfs: fix git repo URLEric Biggers1-1/+1
The git repo listed for btrfs hasn't been updated in over a year. List the current one instead. Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-05-03inet_diag: bc: read cgroup id only for full socketsDmitry Yakunin1-1/+2
Fix bug introduced by commit b1f3e43dbfac ("inet_diag: add support for cgroup filter"). Signed-off-by: Dmitry Yakunin <zeil@yandex-team.ru> Reported-by: syzbot+ee80f840d9bf6893223b@syzkaller.appspotmail.com Reported-by: syzbot+13bef047dbfffa5cd1af@syzkaller.appspotmail.com Fixes: b1f3e43dbfac ("inet_diag: add support for cgroup filter") Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-03net: ethernet: fec: Replace interrupt driven MDIO with polled IOAndrew Lunn2-36/+45
Measurements of the MDIO bus have shown that driving the MDIO bus using interrupts is slow. Back to back MDIO transactions take about 90us, with 25us spent performing the transaction, and the remainder of the time the bus is idle. Replacing the completion interrupt with polled IO results in back to back transactions of 40us. The polling loop waiting for the hardware to complete the transaction takes around 28us. Which suggests interrupt handling has an overhead of 50us, and polled IO nearly halves this overhead, and doubles the MDIO performance. Care has to be taken when setting the MII_SPEED register, or it can trigger an MII event> That then upsets the polling, due to an unexpected pending event. Suggested-by: Chris Heally <cphealy@gmail.com> Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-03smc: Remove unused function.David S. Miller1-24/+0
net/smc/smc_llc.c:544:12: warning: ‘smc_llc_alloc_alt_link’ defined but not used [-Wunused-function] static int smc_llc_alloc_alt_link(struct smc_link_group *lgr, ^~~~~~~~~~~~~~~~~~~~~~ Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-03Merge branch 'ptp-Add-adjust-phase-to-support-phase-offset'David S. Miller7-6/+114
Vincent Cheng says: ==================== ptp: Add adjust phase to support phase offset. This series adds adjust phase to the PTP Hardware Clock device interface. Some PTP hardware clocks have a write phase mode that has a built-in hardware filtering capability. The write phase mode utilizes a phase offset control word instead of a frequency offset control word. Add adjust phase function to take advantage of this capability. Changes since v1: - As suggested by Richard Cochran: 1. ops->adjphase is new so need to check for non-null function pointer. 2. Kernel coding style uses lower_case_underscores. 3. Use existing PTP clock API for delayed worker. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-03ptp: ptp_clockmatrix: Add adjphase() to support PHC write phase mode.Vincent Cheng2-2/+98
Add idtcm_adjphase() to support PHC write phase mode. Signed-off-by: Vincent Cheng <vincent.cheng.xh@renesas.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-03ptp: Add adjust_phase to ptp_clock_caps capability.Vincent Cheng3-3/+8
Add adjust_phase to ptp_clock_caps capability to allow user to query if a PHC driver supports adjust phase with ioctl PTP_CLOCK_GETCAPS command. Signed-off-by: Vincent Cheng <vincent.cheng.xh@renesas.com> Reviewed-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-03ptp: Add adjphase function to support phase offset control.Vincent Cheng2-1/+8
Adds adjust phase function to take advantage of a PHC clock's hardware filtering capability that uses phase offset control word instead of frequency offset control word. Signed-off-by: Vincent Cheng <vincent.cheng.xh@renesas.com> Reviewed-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-02Merge tag 'pm-5.7-rc4' of ↵Linus Torvalds3-3/+10
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fixes from Rafael Wysocki: - prevent the intel_pstate driver from printing excessive diagnostic messages in some cases (Chris Wilson) - make the hibernation restore kernel freeze kernel threads as well as user space tasks (Dexuan Cui) - fix the ACPI device PM disagnostic messages to include the correct power state name (Kai-Heng Feng). * tag 'pm-5.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: PM: ACPI: Output correct message on target power state PM: hibernate: Freeze kernel threads in software_resume() cpufreq: intel_pstate: Only mention the BIOS disabling turbo mode once
2020-05-02Merge branches 'pm-cpufreq' and 'pm-sleep'Rafael J. Wysocki2-1/+8
* pm-cpufreq: cpufreq: intel_pstate: Only mention the BIOS disabling turbo mode once * pm-sleep: PM: hibernate: Freeze kernel threads in software_resume()
2020-05-02Merge tag 'iomap-5.7-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds2-4/+9
Pull iomap fix from Darrick Wong: "Hoist the check for an unrepresentable FIBMAP return value into ioctl_fibmap. The internal kernel function can handle 64-bit values (and is needed to fix a regression on ext4 + jbd2). It is only the userspace ioctl that is so old that it cannot deal" * tag 'iomap-5.7-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: fibmap: Warn and return an error in case of block > INT_MAX
2020-05-02Merge tag 'nfs-for-5.7-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds10-36/+79
Pull NFS client bugfixes from Trond Myklebust: "Highlights include: Stable fixes: - fix handling of backchannel binding in BIND_CONN_TO_SESSION Bugfixes: - Fix a credential use-after-free issue in pnfs_roc() - Fix potential posix_acl refcnt leak in nfs3_set_acl - defer slow parts of rpc_free_client() to a workqueue - Fix an Oopsable race in __nfs_list_for_each_server() - Fix trace point use-after-free race - Regression: the RDMA client no longer responds to server disconnect requests - Fix return values of xdr_stream_encode_item_{present, absent} - _pnfs_return_layout() must always wait for layoutreturn completion Cleanups: - Remove unreachable error conditions" * tag 'nfs-for-5.7-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: NFS: Fix a race in __nfs_list_for_each_server() NFSv4.1: fix handling of backchannel binding in BIND_CONN_TO_SESSION SUNRPC: defer slow parts of rpc_free_client() to a workqueue. NFSv4: Remove unreachable error condition due to rpc_run_task() SUNRPC: Remove unreachable error condition xprtrdma: Fix use of xdr_stream_encode_item_{present, absent} xprtrdma: Fix trace point use-after-free race xprtrdma: Restore wake-up-all to rpcrdma_cm_event_handler() nfs: Fix potential posix_acl refcnt leak in nfs3_set_acl NFS/pnfs: Fix a credential use-after-free issue in pnfs_roc() NFS/pnfs: Ensure that _pnfs_return_layout() waits for layoutreturn completion
2020-05-02Merge tag 'dmaengine-fix-5.7-rc4' of ↵Linus Torvalds10-60/+65
git://git.infradead.org/users/vkoul/slave-dma Pull dmaengine fixes from Vinod Koul: "Core: - Documentation typo fixes - fix the channel indexes - dmatest: fixes for process hang and iterations Drivers: - hisilicon: build error fix without PCI_MSI - ti-k3: deadlock fix - uniphier-xdmac: fix for reg region - pch: fix data race - tegra: fix clock state" * tag 'dmaengine-fix-5.7-rc4' of git://git.infradead.org/users/vkoul/slave-dma: dmaengine: dmatest: Fix process hang when reading 'wait' parameter dmaengine: dmatest: Fix iteration non-stop logic dmaengine: tegra-apb: Ensure that clock is enabled during of DMA synchronization dmaengine: fix channel index enumeration dmaengine: mmp_tdma: Reset channel error on release dmaengine: mmp_tdma: Do not ignore slave config validation errors dmaengine: pch_dma.c: Avoid data race between probe and irq handler dt-bindings: dma: uniphier-xdmac: switch to single reg region include/linux/dmaengine: Typos fixes in API documentation dmaengine: xilinx_dma: Add missing check for empty list dmaengine: ti: k3-psil: fix deadlock on error path dmaengine: hisilicon: Fix build error without PCI_MSI
2020-05-02vhost: vsock: kick send_pkt worker once device is startedJia He1-0/+5
Ning Bo reported an abnormal 2-second gap when booting Kata container [1]. The unconditional timeout was caused by VSOCK_DEFAULT_CONNECT_TIMEOUT of connecting from the client side. The vhost vsock client tries to connect an initializing virtio vsock server. The abnormal flow looks like: host-userspace vhost vsock guest vsock ============== =========== ============ connect() --------> vhost_transport_send_pkt_work() initializing | vq->private_data==NULL | will not be queued V schedule_timeout(2s) vhost_vsock_start() <--------- device ready set vq->private_data wait for 2s and failed connect() again vq->private_data!=NULL recv connecting pkt Details: 1. Host userspace sends a connect pkt, at that time, guest vsock is under initializing, hence the vhost_vsock_start has not been called. So vq->private_data==NULL, and the pkt is not been queued to send to guest 2. Then it sleeps for 2s 3. After guest vsock finishes initializing, vq->private_data is set 4. When host userspace wakes up after 2s, send connecting pkt again, everything is fine. As suggested by Stefano Garzarella, this fixes it by additional kicking the send_pkt worker in vhost_vsock_start once the virtio device is started. This makes the pending pkt sent again. After this patch, kata-runtime (with vsock enabled) boot time is reduced from 3s to 1s on a ThunderX2 arm64 server. [1] https://github.com/kata-containers/runtime/issues/1917 Reported-by: Ning Bo <n.b@live.com> Suggested-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Jia He <justin.he@arm.com> Link: https://lore.kernel.org/r/20200501043840.186557-1-justin.he@arm.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
2020-05-02virtio-blk: handle block_device_operations callbacks after hot unplugStefan Hajnoczi1-8/+78
A userspace process holding a file descriptor to a virtio_blk device can still invoke block_device_operations after hot unplug. This leads to a use-after-free accessing vblk->vdev in virtblk_getgeo() when ioctl(HDIO_GETGEO) is invoked: BUG: unable to handle kernel NULL pointer dereference at 0000000000000090 IP: [<ffffffffc00e5450>] virtio_check_driver_offered_feature+0x10/0x90 [virtio] PGD 800000003a92f067 PUD 3a930067 PMD 0 Oops: 0000 [#1] SMP CPU: 0 PID: 1310 Comm: hdio-getgeo Tainted: G OE ------------ 3.10.0-1062.el7.x86_64 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 task: ffff9be5fbfb8000 ti: ffff9be5fa890000 task.ti: ffff9be5fa890000 RIP: 0010:[<ffffffffc00e5450>] [<ffffffffc00e5450>] virtio_check_driver_offered_feature+0x10/0x90 [virtio] RSP: 0018:ffff9be5fa893dc8 EFLAGS: 00010246 RAX: ffff9be5fc3f3400 RBX: ffff9be5fa893e30 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffff9be5fbc10b40 RBP: ffff9be5fa893dc8 R08: 0000000000000301 R09: 0000000000000301 R10: 0000000000000000 R11: 0000000000000000 R12: ffff9be5fdc24680 R13: ffff9be5fbc10b40 R14: ffff9be5fbc10480 R15: 0000000000000000 FS: 00007f1bfb968740(0000) GS:ffff9be5ffc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000090 CR3: 000000003a894000 CR4: 0000000000360ff0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: [<ffffffffc016ac37>] virtblk_getgeo+0x47/0x110 [virtio_blk] [<ffffffff8d3f200d>] ? handle_mm_fault+0x39d/0x9b0 [<ffffffff8d561265>] blkdev_ioctl+0x1f5/0xa20 [<ffffffff8d488771>] block_ioctl+0x41/0x50 [<ffffffff8d45d9e0>] do_vfs_ioctl+0x3a0/0x5a0 [<ffffffff8d45dc81>] SyS_ioctl+0xa1/0xc0 A related problem is that virtblk_remove() leaks the vd_index_ida index when something still holds a reference to vblk->disk during hot unplug. This causes virtio-blk device names to be lost (vda, vdb, etc). Fix these issues by protecting vblk->vdev with a mutex and reference counting vblk so the vd_index_ida index can be removed in all cases. Fixes: 48e4043d4529 ("virtio: add virtio disk geometry feature") Reported-by: Lance Digby <ldigby@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Link: https://lore.kernel.org/r/20200430140442.171016-1-stefanha@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
2020-05-02Merge tag 'vfio-v5.7-rc4' of git://github.com/awilliam/linux-vfioLinus Torvalds1-5/+5
Pull VFIO fixes from Alex Williamson: - copy_*_user validity check for new vfio_dma_rw interface (Yan Zhao) - Fix a potential math overflow (Yan Zhao) - Use follow_pfn() for calculating PFNMAPs (Sean Christopherson) * tag 'vfio-v5.7-rc4' of git://github.com/awilliam/linux-vfio: vfio/type1: Fix VA->PA translation for PFNMAP VMAs in vaddr_get_pfn() vfio: avoid possible overflow in vfio_iommu_type1_pin_pages vfio: checking of validity of user vaddr in vfio_dma_rw
2020-05-02Merge tag 'arm64-fixes' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fix from Catalin Marinas: "Add -fasynchronous-unwind-tables to the vDSO CFLAGS" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: vdso: Add -fasynchronous-unwind-tables to cflags
2020-05-02Merge tag 'io_uring-5.7-2020-05-01' of git://git.kernel.dk/linux-blockLinus Torvalds1-27/+31
Pull io_uring fixes from Jens Axboe: - Fix for statx not grabbing the file table, making AT_EMPTY_PATH fail - Cover a few cases where async poll can handle retry, eliminating the need for an async thread - fallback request busy/free fix (Bijan) - syzbot reported SQPOLL thread exit fix for non-preempt (Xiaoguang) - Fix extra put of req for sync_file_range (Pavel) - Always punt splice async. We'll improve this for 5.8, but wanted to eliminate the inode mutex lock from the non-blocking path for 5.7 (Pavel) * tag 'io_uring-5.7-2020-05-01' of git://git.kernel.dk/linux-block: io_uring: punt splice async because of inode mutex io_uring: check non-sync defer_list carefully io_uring: fix extra put in sync_file_range() io_uring: use cond_resched() in io_ring_ctx_wait_and_kill() io_uring: use proper references for fallback_req locking io_uring: only force async punt if poll based retry can't handle it io_uring: enable poll retry for any file with ->read_iter / ->write_iter io_uring: statx must grab the file table for valid fd
2020-05-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller153-2960/+6332
Alexei Starovoitov says: ==================== pull-request: bpf-next 2020-05-01 (v2) The following pull-request contains BPF updates for your *net-next* tree. We've added 61 non-merge commits during the last 6 day(s) which contain a total of 153 files changed, 6739 insertions(+), 3367 deletions(-). The main changes are: 1) pulled work.sysctl from vfs tree with sysctl bpf changes. 2) bpf_link observability, from Andrii. 3) BTF-defined map in map, from Andrii. 4) asan fixes for selftests, from Andrii. 5) Allow bpf_map_lookup_elem for SOCKMAP and SOCKHASH, from Jakub. 6) production cloudflare classifier as a selftes, from Lorenz. 7) bpf_ktime_get_*_ns() helper improvements, from Maciej. 8) unprivileged bpftool feature probe, from Quentin. 9) BPF_ENABLE_STATS command, from Song. 10) enable bpf_[gs]etsockopt() helpers for sock_ops progs, from Stanislav. 11) enable a bunch of common helpers for cg-device, sysctl, sockopt progs, from Stanislav. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-02Merge branch 'net-smc-extent-buffer-mapping-and-port-handling'David S. Miller11-196/+522
Karsten Graul says: ==================== net/smc: extent buffer mapping and port handling Add functionality to map/unmap and register/unregister memory buffers for specific SMC-R links and for the whole link group. Prepare LLC layer messages for the support of multiple links and extent the processing of adapter events. And add further small preparations needed for the SMC-R failover support. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-02selftests/bpf: Use reno instead of dctcpStanislav Fomichev2-4/+3
Andrey pointed out that we can use reno instead of dctcp for CC tests and drop CONFIG_TCP_CONG_DCTCP=y requirement. Fixes: beecf11bc218 ("bpf: Bpf_{g,s}etsockopt for struct bpf_sock_addr") Suggested-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20200501224320.28441-1-sdf@google.com
2020-05-02net/smc: llc_add_link_work to handle ADD_LINK LLC requestsKarsten Graul2-2/+23
Introduce a work that is scheduled when a new ADD_LINK LLC request is received. The work will call either the SMC client or SMC server ADD_LINK processing. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-02net/smc: allocate index for a new linkKarsten Graul2-1/+25
Add smc_llc_alloc_alt_link() to find a free link index for a new link, depending on the new link group type. And update constants for the maximum number of links to 3 (2 symmetric and 1 dangling asymmetric link). These maximum numbers are the same as used by other implementations of the SMC-R protocol. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-02net/smc: introduce smc_pnet_find_alt_roce()Karsten Graul2-3/+17
Introduce a new function in smc_pnet.c that searches for an alternate IB device, using an existing link group and a primary IB device. The alternate IB device needs to be active and must have the same PNETID as the link group. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-02net/smc: remove DELETE LINK processing from smc_core.cKarsten Graul1-33/+0
Support for multiple links makes the former DELETE LINK processing obsolete which sent one DELETE_LINK LLC message for each single link. Remove this processing from smc_core.c. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-02net/smc: take link down instead of terminating the link groupKarsten Graul4-19/+13
Use the introduced link down processing in all places where the link group is terminated and take down the affected link only. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-02net/smc: add smcr_port_err() and smcr_link_down() processingKarsten Graul4-32/+98
Call smcr_port_err() when an IB event reports an inactive IB device. smcr_port_err() calls smcr_link_down() for all affected links. smcr_link_down() either triggers the local DELETE_LINK processing, or sends an DELETE_LINK LLC message to the SMC server to initiate the processing. The old handler function smc_port_terminate() is removed. Add helper smcr_link_down_cond() to take a link down conditionally, and smcr_link_down_cond_sched() to schedule the link_down processing to a work. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-02net/smc: add smcr_port_add() and smcr_link_up() processingKarsten Graul3-0/+88
Call smcr_port_add() when an IB event reports a new active IB device. smcr_port_add() will start a work which either triggers the local ADD_LINK processing, or send an ADD_LINK LLC message to the SMC server to initiate the processing. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-02net/smc: remember PNETID of IB device for later device matchingKarsten Graul2-0/+4
The PNETID is needed to find an alternate link for a link group. Save the PNETID of the link that is used to create the link group for later device matching. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-02net/smc: mutex to protect the lgr against parallel reconfigurationsKarsten Graul4-12/+34
Introduce llc_conf_mutex in the link group which is used to protect the buffers and lgr states against parallel link reconfiguration. This ensures that new connections do not start to register buffers with the links of a link group when link creation or termination is running. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-02net/smc: extend smc_llc_send_add_link() and smc_llc_send_delete_link()Karsten Graul3-46/+62
All LLC sends are done from worker context only, so remove the prep functions which were used to build the message before it was sent, and add the function content into the respective send function smc_llc_send_add_link() and smc_llc_send_delete_link(). Extend smc_llc_send_add_link() to include the qp_mtu value in the LLC message, which is needed to establish a link after the initial link was created. Extend smc_llc_send_delete_link() to contain a link_id and a reason code for the link deletion in the LLC message, which is needed when a specific link should be deleted. And add the list of existing DELETE_LINK reason codes. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-02net/smc: map and register buffers for a new linkKarsten Graul2-0/+62
Introduce support to map and register all current buffers for a new link. smcr_buf_map_lgr() will map used buffers for a new link and smcr_buf_reg_lgr() can be called to register used buffers on the IB device of the new link. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-02net/smc: unmapping of buffers to support multiple linksKarsten Graul2-17/+60
With the support of multiple links that are created and cleared there is a need to unmap one link from all current buffers. Add unmapping by link and by rmb. And make smcr_link_clear() available to be called from the LLC layer. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-02net/smc: multiple link support for rmb buffer registrationKarsten Graul3-35/+36
The CONFIRM_RKEY LLC processing handles all links in one LLC message. Move the call to this processing out of smcr_link_reg_rmb() which does processing per link, into smcr_lgr_reg_rmbs() which is responsible for link group level processing. Move smcr_link_reg_rmb() into module smc_core.c. >From af_smc.c now call smcr_lgr_reg_rmbs() to register new rmbs on all available links. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-02Merge branch 'Introduce-a-flow-gate-control-action-and-apply-IEEE'David S. Miller13-1/+2268
Po Liu says: ==================== Introduce a flow gate control action and apply IEEE Changes from V4: ---------------- 0001: Fix and modify according to Vlid Buslov suggestions: - Change spin_lock_bh() to spin_lock() since tcf_gate_act() already in software irq. - Remove spin lock protect in the ops->cleanup function. - Enable the CONFIG_DEBUG_ATOMIC_SLEEP and CONFIG_PROVE_LOCKING checking, then fix as kzalloc flag type and lock deadlock. - Change kzalloc() flag type from GFP_KERNEL to GFP_ATOMIC since function in the spin_lock protect. - Change hrtimer type from HRTIMER_MODE_ABS to HRTIMER_MODE_ABS_SOFT to avoid deadlock. 0002: Fix and modify according to Vlid Buslov suggestions: - Remove all rcu_read_lock protection since no rcu parameters. - Enable the CONFIG_DEBUG_ATOMIC_SLEEP and CONFIG_PROVE_LOCKING checking, then check kzalloc sleeping flag. - Change kzalloc to kcalloc for array memory alloc and change GFP_KERNEL flag to GFP_ATOMIC since function holding spin_lock protect. 0003: - No changes. 0004: - Commit comments rephrase act by Claudiu Manoil. Changes from V3: ---------------- 0001: Fix and modify according to Vlid Buslov: - Remove the struct gate_action and move the parameters to the struct tcf_gate align with tc_action parameters. This would not need to alloc rcu type memory with pointer. - Remove the spin_lock type entry_lock which do not needed anymore, will use the tcf_lock system provided. - Provide lockep protect for the status parameters in the tcf_gate_act(). - Remove the cycletime 0 input warning, return error directly. And: - Remove Qci related description in the Kconfig for gate action. 0002: - Fix rcu_read_lock protect range suggested by Vlid Buslov. 0003: - No changes. 0004: - Fix bug of gate maxoct wildcard condition not included. - Fix the pass time basetime calculation report by Vladimir Otlean. Changes from V2: 0001: No changes. 0002: No changes. 0003: No changes. 0004: Fix the vlan id filter parameter and add reject src mac FF-FF-FF-FF-FF-FF filter in driver. Changes from V1: ---------------- 0000: Update description make it more clear 0001: Removed 'add update dropped stats' patch, will provide pull request as standalone patches. 0001: Update commit description make it more clear ack by Jiri Pirko. 0002: No changes 0003: Fix some code style ack by Jiri Pirko. 0004: Fix enetc_psfp_enable/disable parameter type ack by test robot iprout2 command patches: Not attach with these serial patches, will provide separate pull request after kernel accept these patches. Changes from RFC: ----------------- 0000: Reduce to 5 patches and remove the 4 max frame size offload and flow metering in the policing offload action, Only keep gate action offloading implementation. 0001: No changes. 0002: - fix kfree lead ack by Jakub Kicinski and Cong Wang - License fix from Jakub Kicinski and Stephen Hemminger - Update example in commit acked by Vinicius Costa Gomes - Fix the rcu protect in tcf_gate_act() acked by Vinicius 0003: No changes 0004: No changes 0005: Acked by Vinicius Costa Gomes - Use refcount kernel lib - Update stream gate check code position - Update reduce ref names more clear iprout2 command patches: 0000: Update license expression and add gate id 0001: Add tc action gate man page -------------------------------------------------------------------- These patches add stream gate action policing in IEEE802.1Qci (Per-Stream Filtering and Policing) software support and hardware offload support in tc flower, and implement the stream identify, stream filtering and stream gate filtering action in the NXP ENETC ethernet driver. Per-Stream Filtering and Policing (PSFP) specifies flow policing and filtering for ingress flows, and has three main parts: 1. The stream filter instance table consists of an ordered list of stream filters that determine the filtering and policing actions that are to be applied to frames received on a specific stream. The main elements are stream gate id, flow metering id and maximum SDU size. 2. The stream gate function setup a gate list to control ingress traffic class open/close state. When the gate is running at open state, the flow could pass but dropped when gate state is running to close. User setup a bastime to tell gate when start running the entry list, then the hardware would periodiclly. There is no compare qdisc action support. 3. Flow metering is two rates two buckets and three-color marker to policing the frames. Flow metering instance are as specified in the algorithm in MEF10.3. The most likely qdisc action is policing action. The first patch introduces an ingress frame flow control gate action, for the point 2. The tc gate action maintains the open/close state gate list, allowing flows to pass when the gate is open. Each gate action may policing one or more qdisc filters. When the start time arrived, The driver would repeat the gate list periodiclly. User can assign a passed time, the driver would calculate a new future time by the cycletime of the gate list. The 0002 patch introduces the gate flow hardware offloading. The 0003 patch adds support control the on/off for the tc flower offloading by ethtool. The 0004 patch implement the stream identify and stream filtering and stream gate filtering action in the NXP ENETC ethernet driver. Tc filter command provide filtering keys with MAC address and VLAN id. These keys would be set to stream identify instance entry. Stream gate instance entry would refer the gate action parameters. Stream filter instance entry would refer the stream gate index and assign a stream handle value matches to the stream identify instance. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-02net: enetc: add tc flower psfp offload driverPo Liu5-15/+1300
This patch is to add tc flower offload for the enetc IEEE 802.1Qci(PSFP) function. There are four main feature parts to implement the flow policing and filtering for ingress flow with IEEE 802.1Qci features. They are stream identify(this is defined in the P802.1cb exactly but needed for 802.1Qci), stream filtering, stream gate and flow metering. Each function block includes many entries by index to assign parameters. So for one frame would be filtered by stream identify first, then flow into stream filter block by the same handle between stream identify and stream filtering. Then flow into stream gate control which assigned by the stream filtering entry. And then policing by the gate and limited by the max sdu in the filter block(optional). At last, policing by the flow metering block, index choosing at the fitering block. So you can see that each entry of block may link to many upper entries since they can be assigned same index means more streams want to share the same feature in the stream filtering or stream gate or flow metering. To implement such features, each stream filtered by source/destination mac address, some stream maybe also plus the vlan id value would be treated as one flow chain. This would be identified by the chain_index which already in the tc filter concept. Driver would maintain this chain and also with gate modules. The stream filter entry create by the gate index and flow meter(optional) entry id and also one priority value. Offloading only transfer the gate action and flow filtering parameters. Driver would create (or search same gate id and flow meter id and priority) one stream filter entry to set to the hardware. So stream filtering do not need transfer by the action offloading. This architecture is same with tc filter and actions relationship. tc filter maintain the list for each flow feature by keys. And actions maintain by the action list. Below showing a example commands by tc: > tc qdisc add dev eth0 ingress > ip link set eth0 address 10:00:80:00:00:00 > tc filter add dev eth0 parent ffff: protocol ip chain 11 \ flower skip_sw dst_mac 10:00:80:00:00:00 \ action gate index 10 \ sched-entry open 200000000 1 8000000 \ sched-entry close 100000000 -1 -1 Command means to set the dst_mac 10:00:80:00:00:00 to index 11 of stream identify module. Then setting the gate index 10 of stream gate module. Keep the gate open for 200ms and limit the traffic volume to 8MB in this sched-entry. Then direct the frames to the ingress queue 1. Signed-off-by: Po Liu <Po.Liu@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>