summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2020-02-20net: phy: broadcom: Wire suspend/resume for BCM54810Florian Fainelli1-0/+16
The BCM54810 PHY can use the standard BMCR Power down suspend, but needs a custom resume routine which first clear the Power down bit, and then re-initializes the PHY. While in low-power mode, the PHY only accepts writes to the BMCR register. The datasheet clearly says it: Reads or writes to any MII register other than MII Control register (address 00h) while the device is in the standby power-down mode may cause unpredictable results. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-20net: phy: broadcom: Have bcm54xx_adjust_rxrefclk() check for flagsFlorian Fainelli1-4/+1
bcm54xx_adjust_rxrefclk() already checks for PHY_BRCM_AUTO_PWRDWN_ENABLE and PHY_BRCM_DIS_TXCRXC_NOENRGY in order to set the appropriate bit. The situation is a bit more complicated with the flag PHY_BRCM_RX_REFCLK_UNUSED but essentially amounts to the same situation. The default setting for the 125MHz clock is to be on for all PHYs and we still treat BCM50610 and BCM50610M specifically with the polarity of the bit reversed. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-20net: phy: broadcom: Allow BCM54810 to use bcm54xx_adjust_rxrefclk()Florian Fainelli1-1/+2
The function bcm54xx_adjust_rxrefclk() works correctly on the BCM54810 PHY, allow this device ID to proceed through. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-20udp: rehash on disconnectWillem de Bruijn1-1/+5
As of the below commit, udp sockets bound to a specific address can coexist with one bound to the any addr for the same port. The commit also phased out the use of socket hashing based only on port (hslot), in favor of always hashing on {addr, port} (hslot2). The change broke the following behavior with disconnect (AF_UNSPEC): server binds to 0.0.0.0:1337 server connects to 127.0.0.1:80 server disconnects client connects to 127.0.0.1:1337 client sends "hello" server reads "hello" // times out, packet did not find sk On connect the server acquires a specific source addr suitable for routing to its destination. On disconnect it reverts to the any addr. The connect call triggers a rehash to a different hslot2. On disconnect, add the same to return to the original hslot2. Skip this step if the socket is going to be unhashed completely. Fixes: 4cdeeee9252a ("net: udp: prefer listeners bound to an address") Reported-by: Pavel Roskin <plroskin@gmail.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-20net/tls: Fix to avoid gettig invalid tls recordRohit Maheshwari1-1/+19
Current code doesn't check if tcp sequence number is starting from (/after) 1st record's start sequnce number. It only checks if seq number is before 1st record's end sequnce number. This problem will always be a possibility in re-transmit case. If a record which belongs to a requested seq number is already deleted, tls_get_record will start looking into list and as per the check it will look if seq number is before the end seq of 1st record, which will always be true and will return 1st record always, it should in fact return NULL. As part of the fix, start looking each record only if the sequence number lies in the list else return NULL. There is one more check added, driver look for the start marker record to handle tcp packets which are before the tls offload start sequence number, hence return 1st record if the record is tls start marker and seq number is before the 1st record's starting sequence number. Fixes: e8f69799810c ("net/tls: Add generic NIC offload infrastructure") Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-20sfc: remove unused variable 'efx_default_channel_type'YueHaibing1-1/+0
drivers/net/ethernet/sfc/efx.c:116:38: warning: efx_default_channel_type defined but not used [-Wunused-const-variable=] commit 83975485077d ("sfc: move channel alloc/removal code") left behind this, remove it. Reported-by: Hulk Robot <hulkci@huawei.com> Fixes: 83975485077d ("sfc: move channel alloc/removal code") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Acked-by: Martin Habets <mhabets@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-20Merge branch 'hns3-next'David S. Miller5-4/+67
Huazhong Tan says: ==================== net: hns3: misc updates for -net-next This series includes some misc updates for the HNS3 ethernet driver. [patch 1] modifies an unsuitable print when setting dulex mode. [patch 2] adds some debugfs info for TC and DWRR. [patch 3] adds some debugfs info for loopback. [patch 4] adds a missing help info for QS shaper in debugfs. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-20net: hns3: add missing help info for QS shaper in debugfsYonglong Liu1-0/+1
HNS3 driver can dump QS shaper configs via debugfs, but missing help info in debugfs for this operation. Signed-off-by: Yonglong Liu <liuyonglong@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-20net: hns3: add support for dump MAC ID and loopback status in debugfsYufeng Mo4-0/+60
The MAC ID and loopback status information are obtained from the hardware, which will be helpful for debugging. This patch adds support for these two items in debugfs. Signed-off-by: Yufeng Mo <moyufeng@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-20net: hns3: add enabled TC numbers and DWRR weight info in debugfsYonglong Liu1-3/+5
The actual enabled TC numbers and the DWRR weight of each TC may be helpful for debugging, so adds them into debugfs. Signed-off-by: Yonglong Liu <liuyonglong@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-20net: hns3: modify an unsuitable print when setting unknown duplex to fibreGuangbin Huang1-1/+1
Currently, if device is in link down status and user uses 'ethtool -s' command to set speed but not specify duplex mode, the duplex mode passed from ethtool to driver is unknown value(255), and the fibre port will identify this value as half duplex mode and print "only copper port supports half duplex!". This message is confusing. So for fibre port, only the setting duplex is half, prints error and returns. Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-20mlxsw: Replace zero-length array with flexible-array memberGustavo A. R. Silva11-14/+14
The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Tested-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-20phy: avoid unnecessary link-up delay in polling modePetr Oros2-4/+6
commit 93c0970493c71f ("net: phy: consider latched link-down status in polling mode") removed double-read of latched link-state register for polling mode from genphy_update_link(). This added extra ~1s delay into sequence link down->up. Following scenario: - After boot link goes up - phy_start() is called triggering an aneg restart, hence link goes down and link-down info is latched. - After aneg has finished link goes up. In phy_state_machine is checked link state but it is latched "link is down". The state machine is scheduled after one second and there is detected "link is up". This extra delay can be avoided when we keep link-state register double read in case when link was down previously. With this solution we don't miss a link-down event in polling mode and link-up is faster. Details about this quirky behavior on Realtek phy: Without patch: T0: aneg is started, link goes down, link-down status is latched T0+3s: state machine runs, up-to-date link-down is read T0+4s: state machine runs, aneg is finished (BMSR_ANEGCOMPLETE==1), here i read link-down (BMSR_LSTATUS==0), T0+5s: state machine runs, aneg is finished (BMSR_ANEGCOMPLETE==1), up-to-date link-up is read (BMSR_LSTATUS==1), phydev->link goes up, state change PHY_NOLINK to PHY_RUNNING With patch: T0: aneg is started, link goes down, link-down status is latched T0+3s: state machine runs, up-to-date link-down is read T0+4s: state machine runs, aneg is finished (BMSR_ANEGCOMPLETE==1), first BMSR read: BMSR_ANEGCOMPLETE==1 and BMSR_LSTATUS==0, second BMSR read: BMSR_ANEGCOMPLETE==1 and BMSR_LSTATUS==1, phydev->link goes up, state change PHY_NOLINK to PHY_RUNNING Signed-off-by: Petr Oros <poros@redhat.com> Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-20selftests/bpf: Fix build of sockmap_ktls.cAlexei Starovoitov1-0/+1
The selftests fails to build with: tools/testing/selftests/bpf/prog_tests/sockmap_ktls.c: In function ‘test_sockmap_ktls_disconnect_after_delete’: tools/testing/selftests/bpf/prog_tests/sockmap_ktls.c:72:37: error: ‘TCP_ULP’ undeclared (first use in this function) 72 | err = setsockopt(cli, IPPROTO_TCP, TCP_ULP, "tls", strlen("tls")); | ^~~~~~~ Similar to commit that fixes build of sockmap_basic.c on systems with old /usr/include fix the build of sockmap_ktls.c Fixes: d1ba1204f2ee ("selftests/bpf: Test unhashing kTLS socket after removing from map") Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20200219205514.3353788-1-ast@kernel.org
2020-02-20bpf: Fix a potential deadlock with bpf_map_do_batchYonghong Song1-3/+31
Commit 057996380a42 ("bpf: Add batch ops to all htab bpf map") added lookup_and_delete batch operation for hash table. The current implementation has bpf_lru_push_free() inside the bucket lock, which may cause a deadlock. syzbot reports: -> #2 (&htab->buckets[i].lock#2){....}: __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline] _raw_spin_lock_irqsave+0x95/0xcd kernel/locking/spinlock.c:159 htab_lru_map_delete_node+0xce/0x2f0 kernel/bpf/hashtab.c:593 __bpf_lru_list_shrink_inactive kernel/bpf/bpf_lru_list.c:220 [inline] __bpf_lru_list_shrink+0xf9/0x470 kernel/bpf/bpf_lru_list.c:266 bpf_lru_list_pop_free_to_local kernel/bpf/bpf_lru_list.c:340 [inline] bpf_common_lru_pop_free kernel/bpf/bpf_lru_list.c:447 [inline] bpf_lru_pop_free+0x87c/0x1670 kernel/bpf/bpf_lru_list.c:499 prealloc_lru_pop+0x2c/0xa0 kernel/bpf/hashtab.c:132 __htab_lru_percpu_map_update_elem+0x67e/0xa90 kernel/bpf/hashtab.c:1069 bpf_percpu_hash_update+0x16e/0x210 kernel/bpf/hashtab.c:1585 bpf_map_update_value.isra.0+0x2d7/0x8e0 kernel/bpf/syscall.c:181 generic_map_update_batch+0x41f/0x610 kernel/bpf/syscall.c:1319 bpf_map_do_batch+0x3f5/0x510 kernel/bpf/syscall.c:3348 __do_sys_bpf+0x9b7/0x41e0 kernel/bpf/syscall.c:3460 __se_sys_bpf kernel/bpf/syscall.c:3355 [inline] __x64_sys_bpf+0x73/0xb0 kernel/bpf/syscall.c:3355 do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294 entry_SYSCALL_64_after_hwframe+0x49/0xbe -> #0 (&loc_l->lock){....}: check_prev_add kernel/locking/lockdep.c:2475 [inline] check_prevs_add kernel/locking/lockdep.c:2580 [inline] validate_chain kernel/locking/lockdep.c:2970 [inline] __lock_acquire+0x2596/0x4a00 kernel/locking/lockdep.c:3954 lock_acquire+0x190/0x410 kernel/locking/lockdep.c:4484 __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline] _raw_spin_lock_irqsave+0x95/0xcd kernel/locking/spinlock.c:159 bpf_common_lru_push_free kernel/bpf/bpf_lru_list.c:516 [inline] bpf_lru_push_free+0x250/0x5b0 kernel/bpf/bpf_lru_list.c:555 __htab_map_lookup_and_delete_batch+0x8d4/0x1540 kernel/bpf/hashtab.c:1374 htab_lru_map_lookup_and_delete_batch+0x34/0x40 kernel/bpf/hashtab.c:1491 bpf_map_do_batch+0x3f5/0x510 kernel/bpf/syscall.c:3348 __do_sys_bpf+0x1f7d/0x41e0 kernel/bpf/syscall.c:3456 __se_sys_bpf kernel/bpf/syscall.c:3355 [inline] __x64_sys_bpf+0x73/0xb0 kernel/bpf/syscall.c:3355 do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294 entry_SYSCALL_64_after_hwframe+0x49/0xbe Possible unsafe locking scenario: CPU0 CPU2 ---- ---- lock(&htab->buckets[i].lock#2); lock(&l->lock); lock(&htab->buckets[i].lock#2); lock(&loc_l->lock); *** DEADLOCK *** To fix the issue, for htab_lru_map_lookup_and_delete_batch() in CPU0, let us do bpf_lru_push_free() out of the htab bucket lock. This can avoid the above deadlock scenario. Fixes: 057996380a42 ("bpf: Add batch ops to all htab bpf map") Reported-by: syzbot+a38ff3d9356388f2fb83@syzkaller.appspotmail.com Reported-by: syzbot+122b5421d14e68f29cd1@syzkaller.appspotmail.com Suggested-by: Hillf Danton <hdanton@sina.com> Suggested-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Acked-by: Brian Vazquez <brianvv@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20200219234757.3544014-1-yhs@fb.com
2020-02-20bpf: Do not grab the bucket spinlock by default on htab batch opsBrian Vazquez1-2/+22
Grabbing the spinlock for every bucket even if it's empty, was causing significant perfomance cost when traversing htab maps that have only a few entries. This patch addresses the issue by checking first the bucket_cnt, if the bucket has some entries then we go and grab the spinlock and proceed with the batching. Tested with a htab of size 50K and different value of populated entries. Before: Benchmark Time(ns) CPU(ns) --------------------------------------------- BM_DumpHashMap/1 2759655 2752033 BM_DumpHashMap/10 2933722 2930825 BM_DumpHashMap/200 3171680 3170265 BM_DumpHashMap/500 3639607 3635511 BM_DumpHashMap/1000 4369008 4364981 BM_DumpHashMap/5k 11171919 11134028 BM_DumpHashMap/20k 69150080 69033496 BM_DumpHashMap/39k 190501036 190226162 After: Benchmark Time(ns) CPU(ns) --------------------------------------------- BM_DumpHashMap/1 202707 200109 BM_DumpHashMap/10 213441 210569 BM_DumpHashMap/200 478641 472350 BM_DumpHashMap/500 980061 967102 BM_DumpHashMap/1000 1863835 1839575 BM_DumpHashMap/5k 8961836 8902540 BM_DumpHashMap/20k 69761497 69322756 BM_DumpHashMap/39k 187437830 186551111 Fixes: 057996380a42 ("bpf: Add batch ops to all htab bpf map") Signed-off-by: Brian Vazquez <brianvv@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20200218172552.215077-1-brianvv@google.com
2020-02-20igc: Add dump optionsSasha Neftin6-1/+338
Placeholder for debugging functionality. In this patch, we add some registers and rings summary dumps. Signed-off-by: Sasha Neftin <sasha.neftin@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-02-20igc: Complete to commit Add legacy power management supportSasha Neftin1-0/+27
commit 9513d2a5dc7f ("igc: Add legacy power management support") Add power management resume and schedule suspend requests. Add power management get and put synchronization. Signed-off-by: Sasha Neftin <sasha.neftin@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-02-20igc: make non-global functions staticChen Zhou1-2/+2
Fix sparse warning: drivers/net/ethernet/intel/igc/igc_ptp.c:512:6: warning: symbol 'igc_ptp_tx_work' was not declared. Should it be static? drivers/net/ethernet/intel/igc/igc_ptp.c:644:6: warning: symbol 'igc_ptp_suspend' was not declared. Should it be static? Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Chen Zhou <chenzhou10@huawei.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-02-20net: intel: e1000e: fix possible sleep-in-atomic-context bugs in ↵Jia-Ju Bai1-2/+2
e1000e_get_hw_semaphore() The driver may sleep while holding a spinlock. The function call path (from bottom to top) in Linux 4.19 is: drivers/net/ethernet/intel/e1000e/mac.c, 1366: usleep_range in e1000e_get_hw_semaphore drivers/net/ethernet/intel/e1000e/80003es2lan.c, 322: e1000e_get_hw_semaphore in e1000_release_swfw_sync_80003es2lan drivers/net/ethernet/intel/e1000e/80003es2lan.c, 197: e1000_release_swfw_sync_80003es2lan in e1000_release_phy_80003es2lan drivers/net/ethernet/intel/e1000e/netdev.c, 4883: (FUNC_PTR) e1000_release_phy_80003es2lan in e1000e_update_phy_stats drivers/net/ethernet/intel/e1000e/netdev.c, 4917: e1000e_update_phy_stats in e1000e_update_stats drivers/net/ethernet/intel/e1000e/netdev.c, 5945: e1000e_update_stats in e1000e_get_stats64 drivers/net/ethernet/intel/e1000e/netdev.c, 5944: spin_lock in e1000e_get_stats64 drivers/net/ethernet/intel/e1000e/mac.c, 1384: usleep_range in e1000e_get_hw_semaphore drivers/net/ethernet/intel/e1000e/80003es2lan.c, 322: e1000e_get_hw_semaphore in e1000_release_swfw_sync_80003es2lan drivers/net/ethernet/intel/e1000e/80003es2lan.c, 197: e1000_release_swfw_sync_80003es2lan in e1000_release_phy_80003es2lan drivers/net/ethernet/intel/e1000e/netdev.c, 4883: (FUNC_PTR) e1000_release_phy_80003es2lan in e1000e_update_phy_stats drivers/net/ethernet/intel/e1000e/netdev.c, 4917: e1000e_update_phy_stats in e1000e_update_stats drivers/net/ethernet/intel/e1000e/netdev.c, 5945: e1000e_update_stats in e1000e_get_stats64 drivers/net/ethernet/intel/e1000e/netdev.c, 5944: spin_lock in e1000e_get_stats64 (FUNC_PTR) means a function pointer is called. To fix these bugs, usleep_range() is replaced with udelay(). These bugs are found by a static analysis tool STCheck written by myself. Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-02-20selftests/bpf: Change llvm flag -mcpu=probe to -mcpu=v3Yonghong Song1-2/+2
The latest llvm supports cpu version v3, which is cpu version v1 plus some additional 64bit jmp insns and 32bit jmp insn support. In selftests/bpf Makefile, the llvm flag -mcpu=probe did runtime probe into the host system. Depending on compilation environments, it is possible that runtime probe may fail, e.g., due to memlock issue. This will cause generated code with cpu version v1. This may cause confusion as the same compiler and the same C code generates different byte codes in different environment. Let us change the llvm flag -mcpu=probe to -mcpu=v3 so the generated code will be the same regardless of the compilation environment. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200219004236.2291125-1-yhs@fb.com
2020-02-20Merge branch 'bpf_read_branch_records'Alexei Starovoitov5-2/+309
Daniel Xu says: ==================== Branch records are a CPU feature that can be configured to record certain branches that are taken during code execution. This data is particularly interesting for profile guided optimizations. perf has had branch record support for a while but the data collection can be a bit coarse grained. We (Facebook) have seen in experiments that associating metadata with branch records can improve results (after postprocessing). We generally use bpf_probe_read_*() to get metadata out of userspace. That's why bpf support for branch records is useful. Aside from this particular use case, having branch data available to bpf progs can be useful to get stack traces out of userspace applications that omit frame pointers. Changes in v8: - Use globals instead of perf buffer - Call test_perf_branches__detach() before destroying skeleton - Fix typo in docs Changes in v7: - Const-ify and static-ify local var - Documentation formatting Changes in v6: - Move #ifdef a little to avoid unused variable warnings on !x86 - Test negative condition in selftest (-EINVAL on improperly configured perf event) - Skip positive condition selftest on setups that don't support branch records Changes in v5: - Rename bpf_perf_prog_read_branches() -> bpf_read_branch_records() - Rename BPF_F_GET_BR_SIZE -> BPF_F_GET_BRANCH_RECORDS_SIZE - Squash tools/ bpf.h sync into selftest commit Changes in v4: - Add BPF_F_GET_BR_SIZE flag - Return -ENOENT on unsupported architectures - Only accept initialized memory in helper - Check buffer size is multiple of sizeof(struct perf_branch_entry) - Use bpf skeleton in selftest - Add commit messages - Spelling and formatting Changes in v3: - Document filling unused buffer with zero - Formatting fixes - Rebase Changes in v2: - Change to a bpf helper instead of context access - Avoid mentioning Intel specific things ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-02-20selftests/bpf: Add bpf_read_branch_records() selftestDaniel Xu3-1/+244
Add a selftest to test: * default bpf_read_branch_records() behavior * BPF_F_GET_BRANCH_RECORDS_SIZE flag behavior * error path on non branch record perf events * using helper to write to stack * using helper to write to global On host with hardware counter support: # ./test_progs -t perf_branches #27/1 perf_branches_hw:OK #27/2 perf_branches_no_hw:OK #27 perf_branches:OK Summary: 1/2 PASSED, 0 SKIPPED, 0 FAILED On host without hardware counter support (VM): # ./test_progs -t perf_branches #27/1 perf_branches_hw:OK #27/2 perf_branches_no_hw:OK #27 perf_branches:OK Summary: 1/2 PASSED, 1 SKIPPED, 0 FAILED Also sync tools/include/uapi/linux/bpf.h. Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200218030432.4600-3-dxu@dxuuu.xyz
2020-02-20bpf: Add bpf_read_branch_records() helperDaniel Xu2-1/+65
Branch records are a CPU feature that can be configured to record certain branches that are taken during code execution. This data is particularly interesting for profile guided optimizations. perf has had branch record support for a while but the data collection can be a bit coarse grained. We (Facebook) have seen in experiments that associating metadata with branch records can improve results (after postprocessing). We generally use bpf_probe_read_*() to get metadata out of userspace. That's why bpf support for branch records is useful. Aside from this particular use case, having branch data available to bpf progs can be useful to get stack traces out of userspace applications that omit frame pointers. Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200218030432.4600-2-dxu@dxuuu.xyz
2020-02-20e1000e: fix missing cpu_to_le64 on buffer_addrBen Dooks (Codethink)1-1/+1
The following warning suggests there is a missing cpu_to_le64() in the e1000_flush_tx_ring() function (it is also the behaviour elsewhere in the driver to do cpu_to_le64() on the buffer_addr when setting it) drivers/net/ethernet/intel/e1000e/netdev.c:3813:30: warning: incorrect type in assignment (different base types) drivers/net/ethernet/intel/e1000e/netdev.c:3813:30: expected restricted __le64 [usertype] buffer_addr drivers/net/ethernet/intel/e1000e/netdev.c:3813:30: got unsigned long long [usertype] dma Signed-off-by: Ben Dooks (Codethink) <ben.dooks@codethink.co.uk> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-02-20ice: fix define for E822 backplane deviceBruce Allan3-4/+4
This product's name has changed; update the macro identifier accordingly. Signed-off-by: Bruce Allan <bruce.w.allan@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-02-20ice: add support for E823 devicesBruce Allan3-9/+53
Add E823 device ids and convert conditional expressions to a more appropriate switch statement. Signed-off-by: Bruce Allan <bruce.w.allan@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-02-20ice: add additional E810 device idBruce Allan2-0/+3
Add support for device id 0x159b. Signed-off-by: Bruce Allan <bruce.w.allan@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-02-20ice: add backslash-n to stringsJesse Brandeburg5-18/+13
There were several strings found without line feeds, fix them by adding a line feed, as is typical. Without this lotsofmessagescanbejumbledtogether. This patch has known checkpatch warnings from long lines for the NL_* messages, because checkpatch doesn't know how to ignore them. Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-02-20ice: increase PF reset wait timeout to 300 millisecondsJacob Keller1-1/+1
Increase the maximum time that the driver will wait for a PF reset from 200 milliseconds to 300 milliseconds, to account for possibility of a slightly longer than expected PF reset. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-02-20ice: Support XDP UMEM wake up mechanismKrzysztof Kazimierczak1-0/+18
Add support for a new AF_XDP feature that has already been introduced in upstreamed Intel NIC drivers. If a user space application signals that it might sleep using the new bind flag XDP_USE_NEED_WAKEUP, the driver will then set this flag if it has no more buffers on the NIC Rx ring and yield to the application. For Tx, it will set the flag if it has no outstanding Tx completion interrupts and return to the application. Signed-off-by: Krzysztof Kazimierczak <krzysztof.kazimierczak@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-02-20ice: SW DCB, report correct max TC valueDave Ertman1-8/+1
lldpad is using the value reported in the DCB config for max_tc as the max allowed number of TCs, not the current max. ICE driver was reporting it as current maximum TC. Change DCB_NL function to report maximum TC allowed by this device. Signed-off-by: Dave Ertman <david.m.ertman@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-02-20ice: Report correct DCB modeAvinash Dayanand1-3/+24
Add code to detect if DCB is in IEEE or CEE mode. Without this the code will always report as IEEE mode which is incorrect and confuses the user. Signed-off-by: Avinash Dayanand <avinash.dayanand@intel.com> Signed-off-by: Scott Register <scottx.register@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-02-20ice: Add DCBNL ops required to configure ETS in CEE for SW DCBAvinash JD1-0/+43
Couple of DCBNL ops are required for configuring ETS in SW DCB CEE mode. If these functions are not added, it'll break the CEE functionality. Signed-off-by: Avinash JD <avinash.dayanand@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-02-20ice: Always clear the QRXFLXP_CNTXT register for VF Rx queuesBrett Creeley2-2/+7
Currently when the PF reduces its number of channels via ethtool and then VFs are created there may be stale data for some of the Rx queues belonging to VFs. This happens when a VF reuses an Rx queue that was previously used by the PF. Specifically, the QRXFLXP_CNTXT register will have incorrect values. Fix this by always clearing the relevant values in the QRXFLXP_CNTXT register for VF queues. Signed-off-by: Brett Creeley <brett.creeley@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-02-19ice: Fix for TCAM entry managementDan Nowlin1-14/+51
Order intermediate VSIG list correct in order to correctly match existing VSIG lists. When overriding pre-existing TCAM entries, properly delete the existing entry and remove it from the change/update list. Signed-off-by: Dan Nowlin <dan.nowlin@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-02-19ice: update malicious driver detection event handlingPaul Greenwalt6-59/+150
Update the PF VFs MDD event message to rate limit once per second and report the total number Rx|Tx event count. Add support to print pending MDD events that occur during the rate limit. The use of net_ratelimit did not allow for per VF Rx|Tx granularity. Additional PF MDD log messages are guarded by netif_msg_[rx|tx]_err(). Since VF RX MDD events disable the queue, add ethtool private flag mdd-auto-reset-vf to configure VF reset to re-enable the queue. Disable anti-spoof detection interrupt to prevent spurious events during a function reset. To avoid race condition do not make PF MDD register reads conditional on global MDD result. Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-02-19ice: Validate config for SW DCB mapAvinash Dayanand3-0/+47
Validate the inputs for SW DCB config received either via lldptool or pcap file. And don't apply DCB for bad bandwidth inputs. Without this patch, any config having bad inputs will cause the loss of link making PF unusable even after driver reload. Recoverable only via system reboot. Signed-off-by: Avinash Dayanand <avinash.dayanand@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-02-19ice: Wait for VF to be reset/ready before configurationBrett Creeley2-61/+76
The configuration/command below is failing when the VF in the xml file is already bound to the host iavf driver. pci_0000_af_0_0.xml: <interface type='hostdev' managed='yes'> <source> <address type='pci' domain='0x0000' bus='0xaf' slot='0x0' function='0x0'/> </source> <mac address='00:de:ad:00:11:01'/> </interface> > virsh attach-device domain_name pci_0000_af_0_0.xml error: Failed to attach device from pci_0000_af_0_0.xml error: Cannot set interface MAC/vlanid to 00:de:ad:00:11:01/0 for ifname ens1f1 vf 0: Device or resource busy This is failing because the VF has not been completely removed/reset after being unbound (via the virsh command above) from the host iavf driver and ice_set_vf_mac() checks if the VF is disabled before waiting for the reset to finish. Fix this by waiting for the VF remove/reset process to happen before checking if the VF is disabled. Also, since many functions for VF administration on the PF were more or less calling the same 3 functions (ice_wait_on_vf_reset(), ice_is_vf_disabled(), and ice_check_vf_init()) move these into the helper function ice_check_vf_ready_for_cfg(). Then call this function in any flow that attempts to configure/query a VF from the PF. Lastly, increase the maximum wait time in ice_wait_on_vf_reset() to 800ms, and modify/add the #define(s) that determine the wait time. This was done for robustness because in rare/stress cases VF removal can take a max of ~800ms and previously the wait was a max of ~300ms. Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-02-19ice: Don't tell the OS that link is going downMichal Swiatkowski1-7/+0
Remove code that tell the OS that link is going down when user change flow control via ethtool. When link is up it isn't certain that link goes down after 0x0605 aq command. If link doesn't go down, OS thinks that link is down, but physical link is up. To reset this state user have to take interface down and up. If link goes down after 0x0605 command, FW send information about that and after that driver tells the OS that the link goes down. So this code in ethtool is unnecessary. Signed-off-by: Michal Swiatkowski <michal.swiatkowski@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-02-19ice: Don't reject odd values of usecs set by userBrett Creeley2-12/+39
Currently if a user sets an odd [tx|rx]-usecs value through ethtool, the request is denied because the hardware is set to have an ITR granularity of 2us. This caused poor customer experience. Fix this by aligning to a register allowed value, which results in rounding down. Also, print a once per ring container type message to be clear about our intentions. Also, change the ITR_TO_REG define to be the bitwise and of the ITR setting and the ICE_ITR_MASK. This makes the purpose of ITR_TO_REG more obvious. Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-02-19Merge branch 'tcp_v6_gso_csum_prep'David S. Miller15-101/+27
Heiner Kallweit says: ==================== net: core: add helper tcp_v6_gso_csum_prep Several network drivers for chips that support TSO6 share the same code for preparing the TCP header, so let's factor it out to a helper. A difference is that some drivers reset the payload_len whilst others don't do this. This value is overwritten by TSO anyway, therefore the new helper resets it in general. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-19vmxnet3: use new helper tcp_v6_gso_csum_prepHeiner Kallweit1-4/+1
Use new helper tcp_v6_gso_csum_prep in additional network drivers. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-19r8152: use new helper tcp_v6_gso_csum_prepHeiner Kallweit1-24/+2
Use new helper tcp_v6_gso_csum_prep in additional network drivers. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Acked-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-19hv_netvsc: use new helper tcp_v6_gso_csum_prepHeiner Kallweit1-4/+1
Use new helper tcp_v6_gso_csum_prep in additional network drivers. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-19net: socionext: use new helper tcp_v6_gso_csum_prepHeiner Kallweit1-5/+1
Use new helper tcp_v6_gso_csum_prep in additional network drivers. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-19net: qcom/emac: use new helper tcp_v6_gso_csum_prepHeiner Kallweit1-5/+2
Use new helper tcp_v6_gso_csum_prep in additional network drivers. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-19ionic: use new helper tcp_v6_gso_csum_prepHeiner Kallweit1-4/+1
Use new helper tcp_v6_gso_csum_prep in additional network drivers. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Acked-by: Shannon Nelson <snelson@pensando.io> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-19jme: use new helper tcp_v6_gso_csum_prepHeiner Kallweit1-6/+1
Use new helper tcp_v6_gso_csum_prep in additional network drivers. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-19e1000(e): use new helper tcp_v6_gso_csum_prepHeiner Kallweit2-9/+2
Use new helper tcp_v6_gso_csum_prep in additional network drivers. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>