summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2020-03-09bnxt_en: Simplify __bnxt_poll_cqs_done().Michael Chan1-8/+7
Simplify the function by removing tha 'all' parameter. In the current code, the caller has to specify whether to update/arm both completion rings with the 'all' parameter. Instead of this, we can just update/arm all the completion rings that have been polled. By setting cpr->had_work_done earlier in __bnxt_poll_work(), we know which completion ring has been polled and can just update/arm all the completion rings with cpr->had_work_done set. This simplifies the function with one less parameter and works just as well. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-09bnxt_en: Handle all NQ notifications in bnxt_poll_p5().Michael Chan1-3/+4
In bnxt_poll_p5(), the logic polls for up to 2 completion rings (RX and TX) for work. In the current code, if we reach budget polling the first completion ring, we will stop. If the other completion ring has work to do, we will handle it when NAPI calls us back. This is not optimal. We potentially leave an unproceesed entry in the NQ. When we are finally done with NAPI polling and re-enable interrupt, the remaining entry in the NQ will cause interrupt to be triggered immediately for no reason. Modify the code in bnxt_poll_p5() to keep looping until all NQ entries are handled even if the first completion ring has reached budget. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-09net/sched: act_ct: fix lockdep splat in tcf_ct_flow_table_getEric Dumazet1-13/+11
Convert zones_lock spinlock to zones_mutex mutex, and struct (tcf_ct_flow_table)->ref to a refcount, so that control path can use regular GFP_KERNEL allocations from standard process context. This is more robust in case of memory pressure. The refcount is needed because tcf_ct_flow_table_put() can be called from RCU callback, thus in BH context. The issue was spotted by syzbot, as rhashtable_init() was called with a spinlock held, which is bad since GFP_KERNEL allocations can sleep. Note to developers : Please make sure your patches are tested with CONFIG_DEBUG_ATOMIC_SLEEP=y BUG: sleeping function called from invalid context at mm/slab.h:565 in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 9582, name: syz-executor610 2 locks held by syz-executor610/9582: #0: ffffffff8a34eb80 (rtnl_mutex){+.+.}, at: rtnl_lock net/core/rtnetlink.c:72 [inline] #0: ffffffff8a34eb80 (rtnl_mutex){+.+.}, at: rtnetlink_rcv_msg+0x3f9/0xad0 net/core/rtnetlink.c:5437 #1: ffffffff8a3961b8 (zones_lock){+...}, at: spin_lock_bh include/linux/spinlock.h:343 [inline] #1: ffffffff8a3961b8 (zones_lock){+...}, at: tcf_ct_flow_table_get+0xa3/0x1700 net/sched/act_ct.c:67 Preemption disabled at: [<0000000000000000>] 0x0 CPU: 0 PID: 9582 Comm: syz-executor610 Not tainted 5.6.0-rc3-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x188/0x20d lib/dump_stack.c:118 ___might_sleep.cold+0x1f4/0x23d kernel/sched/core.c:6798 slab_pre_alloc_hook mm/slab.h:565 [inline] slab_alloc_node mm/slab.c:3227 [inline] kmem_cache_alloc_node_trace+0x272/0x790 mm/slab.c:3593 __do_kmalloc_node mm/slab.c:3615 [inline] __kmalloc_node+0x38/0x60 mm/slab.c:3623 kmalloc_node include/linux/slab.h:578 [inline] kvmalloc_node+0x61/0xf0 mm/util.c:574 kvmalloc include/linux/mm.h:645 [inline] kvzalloc include/linux/mm.h:653 [inline] bucket_table_alloc+0x8b/0x480 lib/rhashtable.c:175 rhashtable_init+0x3d2/0x750 lib/rhashtable.c:1054 nf_flow_table_init+0x16d/0x310 net/netfilter/nf_flow_table_core.c:498 tcf_ct_flow_table_get+0xe33/0x1700 net/sched/act_ct.c:82 tcf_ct_init+0xba4/0x18a6 net/sched/act_ct.c:1050 tcf_action_init_1+0x697/0xa20 net/sched/act_api.c:945 tcf_action_init+0x1e9/0x2f0 net/sched/act_api.c:1001 tcf_action_add+0xdb/0x370 net/sched/act_api.c:1411 tc_ctl_action+0x366/0x456 net/sched/act_api.c:1466 rtnetlink_rcv_msg+0x44e/0xad0 net/core/rtnetlink.c:5440 netlink_rcv_skb+0x15a/0x410 net/netlink/af_netlink.c:2478 netlink_unicast_kernel net/netlink/af_netlink.c:1303 [inline] netlink_unicast+0x537/0x740 net/netlink/af_netlink.c:1329 netlink_sendmsg+0x882/0xe10 net/netlink/af_netlink.c:1918 sock_sendmsg_nosec net/socket.c:652 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:672 ____sys_sendmsg+0x6b9/0x7d0 net/socket.c:2343 ___sys_sendmsg+0x100/0x170 net/socket.c:2397 __sys_sendmsg+0xec/0x1b0 net/socket.c:2430 do_syscall_64+0xf6/0x790 arch/x86/entry/common.c:294 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x4403d9 Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 fb 13 fc ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007ffd719af218 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 00000000004403d9 RDX: 0000000000000000 RSI: 0000000020000300 RDI: 0000000000000003 RBP: 00000000006ca018 R08: 0000000000000005 R09: 00000000004002c8 R10: 0000000000000008 R11: 00000000000 Fixes: c34b961a2492 ("net/sched: act_ct: Create nf flow table per zone") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Paul Blakey <paulb@mellanox.com> Cc: Jiri Pirko <jiri@mellanox.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-09net: rmnet: set NETIF_F_LLTX flagTaehee Yoo1-0/+2
The rmnet_vnd_setup(), which is the callback of ->ndo_start_xmit() is allowed to call concurrently because it uses RCU protected data. So, it doesn't need tx lock. Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-09net: stmmac: dwmac1000: Disable ACS if enhanced descs are not usedRemi Pommarel1-1/+2
ACS (auto PAD/FCS stripping) removes FCS off 802.3 packets (LLC) so that there is no need to manually strip it for such packets. The enhanced DMA descriptors allow to flag LLC packets so that the receiving callback can use that to strip FCS manually or not. On the other hand, normal descriptors do not support that. Thus in order to not truncate LLC packet ACS should be disabled when using normal DMA descriptors. Fixes: 47dd7a540b8a0 ("net: add support for STMicroelectronics Ethernet controllers.") Signed-off-by: Remi Pommarel <repk@triplefau.lt> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-09gre: fix uninit-value in __iptunnel_pull_headerEric Dumazet1-2/+10
syzbot found an interesting case of the kernel reading an uninit-value [1] Problem is in the handling of ETH_P_WCCP in gre_parse_header() We look at the byte following GRE options to eventually decide if the options are four bytes longer. Use skb_header_pointer() to not pull bytes if we found that no more bytes were needed. All callers of gre_parse_header() are properly using pskb_may_pull() anyway before proceeding to next header. [1] BUG: KMSAN: uninit-value in pskb_may_pull include/linux/skbuff.h:2303 [inline] BUG: KMSAN: uninit-value in __iptunnel_pull_header+0x30c/0xbd0 net/ipv4/ip_tunnel_core.c:94 CPU: 1 PID: 11784 Comm: syz-executor940 Not tainted 5.6.0-rc2-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1c9/0x220 lib/dump_stack.c:118 kmsan_report+0xf7/0x1e0 mm/kmsan/kmsan_report.c:118 __msan_warning+0x58/0xa0 mm/kmsan/kmsan_instr.c:215 pskb_may_pull include/linux/skbuff.h:2303 [inline] __iptunnel_pull_header+0x30c/0xbd0 net/ipv4/ip_tunnel_core.c:94 iptunnel_pull_header include/net/ip_tunnels.h:411 [inline] gre_rcv+0x15e/0x19c0 net/ipv6/ip6_gre.c:606 ip6_protocol_deliver_rcu+0x181b/0x22c0 net/ipv6/ip6_input.c:432 ip6_input_finish net/ipv6/ip6_input.c:473 [inline] NF_HOOK include/linux/netfilter.h:307 [inline] ip6_input net/ipv6/ip6_input.c:482 [inline] ip6_mc_input+0xdf2/0x1460 net/ipv6/ip6_input.c:576 dst_input include/net/dst.h:442 [inline] ip6_rcv_finish net/ipv6/ip6_input.c:76 [inline] NF_HOOK include/linux/netfilter.h:307 [inline] ipv6_rcv+0x683/0x710 net/ipv6/ip6_input.c:306 __netif_receive_skb_one_core net/core/dev.c:5198 [inline] __netif_receive_skb net/core/dev.c:5312 [inline] netif_receive_skb_internal net/core/dev.c:5402 [inline] netif_receive_skb+0x66b/0xf20 net/core/dev.c:5461 tun_rx_batched include/linux/skbuff.h:4321 [inline] tun_get_user+0x6aef/0x6f60 drivers/net/tun.c:1997 tun_chr_write_iter+0x1f2/0x360 drivers/net/tun.c:2026 call_write_iter include/linux/fs.h:1901 [inline] new_sync_write fs/read_write.c:483 [inline] __vfs_write+0xa5a/0xca0 fs/read_write.c:496 vfs_write+0x44a/0x8f0 fs/read_write.c:558 ksys_write+0x267/0x450 fs/read_write.c:611 __do_sys_write fs/read_write.c:623 [inline] __se_sys_write fs/read_write.c:620 [inline] __ia32_sys_write+0xdb/0x120 fs/read_write.c:620 do_syscall_32_irqs_on arch/x86/entry/common.c:339 [inline] do_fast_syscall_32+0x3c7/0x6e0 arch/x86/entry/common.c:410 entry_SYSENTER_compat+0x68/0x77 arch/x86/entry/entry_64_compat.S:139 RIP: 0023:0xf7f62d99 Code: 90 e8 0b 00 00 00 f3 90 0f ae e8 eb f9 8d 74 26 00 89 3c 24 c3 90 90 90 90 90 90 90 90 90 90 90 90 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90 90 90 90 eb 0d 90 90 90 90 90 90 90 90 90 90 90 90 RSP: 002b:00000000fffedb2c EFLAGS: 00000217 ORIG_RAX: 0000000000000004 RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000020002580 RDX: 0000000000000fca RSI: 0000000000000036 RDI: 0000000000000004 RBP: 0000000000008914 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 Uninit was created at: kmsan_save_stack_with_flags mm/kmsan/kmsan.c:144 [inline] kmsan_internal_poison_shadow+0x66/0xd0 mm/kmsan/kmsan.c:127 kmsan_slab_alloc+0x8a/0xe0 mm/kmsan/kmsan_hooks.c:82 slab_alloc_node mm/slub.c:2793 [inline] __kmalloc_node_track_caller+0xb40/0x1200 mm/slub.c:4401 __kmalloc_reserve net/core/skbuff.c:142 [inline] __alloc_skb+0x2fd/0xac0 net/core/skbuff.c:210 alloc_skb include/linux/skbuff.h:1051 [inline] alloc_skb_with_frags+0x18c/0xa70 net/core/skbuff.c:5766 sock_alloc_send_pskb+0xada/0xc60 net/core/sock.c:2242 tun_alloc_skb drivers/net/tun.c:1529 [inline] tun_get_user+0x10ae/0x6f60 drivers/net/tun.c:1843 tun_chr_write_iter+0x1f2/0x360 drivers/net/tun.c:2026 call_write_iter include/linux/fs.h:1901 [inline] new_sync_write fs/read_write.c:483 [inline] __vfs_write+0xa5a/0xca0 fs/read_write.c:496 vfs_write+0x44a/0x8f0 fs/read_write.c:558 ksys_write+0x267/0x450 fs/read_write.c:611 __do_sys_write fs/read_write.c:623 [inline] __se_sys_write fs/read_write.c:620 [inline] __ia32_sys_write+0xdb/0x120 fs/read_write.c:620 do_syscall_32_irqs_on arch/x86/entry/common.c:339 [inline] do_fast_syscall_32+0x3c7/0x6e0 arch/x86/entry/common.c:410 entry_SYSENTER_compat+0x68/0x77 arch/x86/entry/entry_64_compat.S:139 Fixes: 95f5c64c3c13 ("gre: Move utility functions to common headers") Fixes: c54419321455 ("GRE: Refactor GRE tunneling code.") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-09Merge branch 'bareudp-several-code-cleanup-for-bareudp-module'David S. Miller1-6/+11
Taehee Yoo says: ==================== bareudp: several code cleanup for bareudp module This patchset is to cleanup bareudp module code. 1. The first patch is to add module alias In the current bareudp code, there is no module alias. So, RTNL couldn't load bareudp module automatically. 2. The second patch is to add extack message. The extack error message is useful for noticing specific errors when command is failed. 3. The third patch is to remove unnecessary udp_encap_enable(). In the bareudp_socket_create(), udp_encap_enable() is called. But, the it's already called in the setup_udp_tunnel_sock(). So, it could be removed. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-09bareudp: remove unnecessary udp_encap_enable() in bareudp_socket_create()Taehee Yoo1-3/+0
In the current code, udp_encap_enable() is called in bareudp_socket_create(). But, setup_udp_tunnel_sock() internally calls udp_encap_enable(). So, udp_encap_enable() is unnecessary. Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-09bareudp: print error message when command failsTaehee Yoo1-3/+10
When bareudp netlink command fails, it doesn't print any error message. So, users couldn't know the exact reason. In order to tell the exact reason to the user, the extack error message is used in this patch. Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-09bareudp: add module aliasTaehee Yoo1-0/+1
In the current bareudp code, there is no module alias. So, RTNL couldn't load bareudp module automatically. Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-09Merge branch 'cxgb4-chcr-ktls-tx-ofld-support-on-T6-adapter'David S. Miller17-17/+2506
Rohit Maheshwari says: ==================== cxgb4/chcr: ktls tx ofld support on T6 adapter This series of patches add support for kernel tls offload in Tx direction, over Chelsio T6 NICs. SKBs marked as decrypted will be treated as tls plain text packets and then offloaded to encrypt using network device (chelsio T6 adapter). This series is broken down as follows: Patch 1 defines a new macro and registers tls_dev_add and tls_dev_del callbacks. When tls_dev_add gets called we send a connection request to our hardware and to make HW understand about tls offload. Its a partial connection setup and only ipv4 part is done. Patch 2 handles the HW response of the connection request and then we request to update TCB and handle it's HW response as well. Also we save crypto key locally. Only supporting TLS_CIPHER_AES_GCM_128_KEY_SIZE. Patch 3 handles tls marked skbs (decrypted bit set) and sends it to ULD for crypto handling. This code has a minimal portion of tx handler, to handle only one complete record per skb. Patch 4 hanldes partial end part of records. Also added logic to handle multiple records in one single skb. It also adds support to send out tcp option(/s) if exists in skb. If a record is partial but has end part of a record, we'll fetch complete record and then only send it to HW to generate HASH on complete record. Patch 5 handles partial first or middle part of record, it uses AES_CTR to encrypt the partial record. If we are trying to send middle record, it's start should be 16 byte aligned, so we'll fetch few earlier bytes from the record and then send it to HW for encryption. Patch 6 enables ipv6 support and also includes ktls startistics. v1->v2: - mark tcb state to close in tls_dev_del. - u_ctx is now picked from adapter structure. - clear atid in case of failure. - corrected ULP_CRYPTO_KTLS_INLINE value. - optimized tcb update using control queue. - state machine handling when earlier states received. - chcr_write_cpl_set_tcb_ulp function is shifted to patch3. - un-necessary updating left variable. v2->v3: - add empty line after variable declaration. - local variable declaration in reverse christmas tree ordering. v3->v4: - replaced kfree_skb with dev_kfree_skb_any. - corrected error message reported by kbuild test robot <lkp@intel.com> - mss calculation logic. - correct place for Alloc skb check. - Replaced atomic_t with atomic64_t - added few more statistics counters. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-09cxgb4/chcr: Add ipv6 support and statisticsRohit Maheshwari4-3/+149
Adding ipv6 support and ktls related statistics. v1->v2: - added blank lines at 2 places. v3->v4: - Replaced atomic_t with atomic64_t - added few necessary stat counters. Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-09chcr: Handle first or middle part of recordRohit Maheshwari3-2/+489
This patch contains handling of first part or middle part of the record. When we get a middle record, we will fetch few already sent bytes to make packet start 16 byte aligned. And if the packet has only the header part, we don't need to send it for packet encryption, send that packet as a plaintext. v1->v2: - un-necessary updating left variable. v3->v4: - replaced kfree_skb with dev_kfree_skb_any. Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-09chcr: handle partial end part of a recordRohit Maheshwari1-7/+304
TCP segment can chop a record in any order. Record can either be complete or it can be partial (first part which contains header, middle part which doesn't have header or TAG, and the end part which contains TAG. This patch handles partial end part of a tx record. In case of partial end part's, driver will send complete record to HW, so that HW will calculate GHASH (TAG) of complete packet. Also added support to handle multiple records in a segment. v1->v2: - miner change in calling chcr_write_cpl_set_tcb_ulp. - no need of checking return value of chcr_ktls_write_tcp_options. v3->v4: - replaced kfree_skb with dev_kfree_skb_any. Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-09cxgb4/chcr: complete record tx handlingRohit Maheshwari8-4/+686
Added tx handling in this patch. This includes handling of segments contain single complete record. v1->v2: - chcr_write_cpl_set_tcb_ulp is added in this patch. v3->v4: - mss calculation logic. - replaced kfree_skb with dev_kfree_skb_any. - corrected error message reported by kbuild test robot <lkp@intel.com> Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-09cxgb4/chcr: Save tx keys and handle HW responseRohit Maheshwari9-13/+391
As part of this patch generated and saved crypto keys, handled HW response of act_open_req and set_tcb_req. Defined connection state update. v1->v2: - optimized tcb update using control queue. - state machine handling when earlier states received. v2->v3: - Added one empty line after function declaration. Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-09cxgb4/chcr : Register to tls add and del callbackRohit Maheshwari11-0/+499
A new macro is defined to enable ktls tx offload support on Chelsio T6 adapter. And if this macro is enabled, cxgb4 will send mailbox to enable or disable ktls settings on HW. In chcr, enabled tx offload flag in netdev and registered tls_dev_add and tls_dev_del. v1->v2: - mark tcb state to close in tls_dev_del. - u_ctx is now picked from adapter structure. - clear atid in case of failure. - corrected ULP_CRYPTO_KTLS_INLINE value. v2->v3: - add empty line after variable declaration. - local variable declaration in reverse christmas tree ordering. Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-09ipvlan: do not add hardware address of master to its unicast filter listJiri Wiesner1-4/+1
There is a problem when ipvlan slaves are created on a master device that is a vmxnet3 device (ipvlan in VMware guests). The vmxnet3 driver does not support unicast address filtering. When an ipvlan device is brought up in ipvlan_open(), the ipvlan driver calls dev_uc_add() to add the hardware address of the vmxnet3 master device to the unicast address list of the master device, phy_dev->uc. This inevitably leads to the vmxnet3 master device being forced into promiscuous mode by __dev_set_rx_mode(). Promiscuous mode is switched on the master despite the fact that there is still only one hardware address that the master device should use for filtering in order for the ipvlan device to be able to receive packets. The comment above struct net_device describes the uc_promisc member as a "counter, that indicates, that promiscuous mode has been enabled due to the need to listen to additional unicast addresses in a device that does not implement ndo_set_rx_mode()". Moreover, the design of ipvlan guarantees that only the hardware address of a master device, phy_dev->dev_addr, will be used to transmit and receive all packets from its ipvlan slaves. Thus, the unicast address list of the master device should not be modified by ipvlan_open() and ipvlan_stop() in order to make ipvlan a workable option on masters that do not support unicast address filtering. Fixes: 2ad7bf3638411 ("ipvlan: Initial check-in of the IPVLAN driver") Reported-by: Per Sundstrom <per.sundstrom@redqube.se> Signed-off-by: Jiri Wiesner <jwiesner@suse.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-09Merge branch 'net-allow-user-specify-TC-action-HW-stats-type'David S. Miller19-26/+230
Jiri Pirko says: ==================== net: allow user specify TC action HW stats type Currently, when user adds a TC action and the action gets offloaded, the user expects the HW stats to be counted and included in stats dump. However, since drivers may implement different types of counting, there is no way to specify which one the user is interested in. For example for mlx5, only delayed counters are available as the driver periodically polls for updated stats. In case of mlxsw, the counters are queried on dump time. However, the HW resources for this type of counters is quite limited (couple of thousands). This limits the amount of supported offloaded filters significantly. Without counter assigned, the HW is capable to carry millions of those. On top of that, mlxsw HW is able to support delayed counters as well in greater numbers. That is going to be added in a follow-up patch. This patchset allows user to specify one of the following types of HW stats for added action: immediate - queried during dump time delayed - polled from HW periodically or sent by HW in async manner disabled - no stats needed Note that if "hw_stats" option is not passed, user does not care about the type, just expects any type of stats. Examples: $ tc filter add dev enp0s16np28 ingress proto ip handle 1 pref 1 flower skip_sw dst_ip 192.168.1.1 action drop hw_stats disabled $ tc -s filter show dev enp0s16np28 ingress filter protocol ip pref 1 flower chain 0 filter protocol ip pref 1 flower chain 0 handle 0x1 eth_type ipv4 dst_ip 192.168.1.1 skip_sw in_hw in_hw_count 2 action order 1: gact action drop random type none pass val 0 index 1 ref 1 bind 1 installed 7 sec used 2 sec Action statistics: Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 hw_stats disabled $ tc filter add dev enp0s16np28 ingress proto ip handle 1 pref 1 flower skip_sw dst_ip 192.168.1.1 action drop hw_stats immediate $ tc -s filter show dev enp0s16np28 ingress filter protocol ip pref 1 flower chain 0 filter protocol ip pref 1 flower chain 0 handle 0x1 eth_type ipv4 dst_ip 192.168.1.1 skip_sw in_hw in_hw_count 2 action order 1: gact action drop random type none pass val 0 index 1 ref 1 bind 1 installed 11 sec used 4 sec Action statistics: Sent 102 bytes 1 pkt (dropped 1, overlimits 0 requeues 0) Sent software 0 bytes 0 pkt Sent hardware 102 bytes 1 pkt backlog 0b 0p requeues 0 hw_stats immediate ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-09sched: act: allow user to specify type of HW stats for a filterJiri Pirko4-0/+69
Currently, user who is adding an action expects HW to report stats, however it does not have exact expectations about the stats types. That is aligned with TCA_ACT_HW_STATS_TYPE_ANY. Allow user to specify the type of HW stats for an action and require it. Pass the information down to flow_offload layer. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-09flow_offload: introduce "disabled" HW stats type and allow it in mlxswJiri Pirko2-1/+2
Introduce new type for disabled HW stats and allow the value in mlxsw offload. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-09mlxsw: spectrum_acl: Ask device for rule stats only if counter was createdJiri Pirko2-10/+19
Set a flag in case rule counter was created. Only query the device for stats of a rule, which has the valid counter assigned. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-09flow_offload: introduce "delayed" HW stats type and allow it in mlx5Jiri Pirko2-3/+7
Introduce new type for delayed HW stats and allow the value in mlx5 offload. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-09flow_offload: introduce "immediate" HW stats type and allow it in mlxswJiri Pirko2-2/+4
Introduce new type for immediate HW stats and allow the value in mlxsw offload. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-09mlxsw: restrict supported HW stats type to "any"Jiri Pirko1-4/+10
Currently don't allow actions with any other type to be inserted. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-09mlxsw: spectrum_flower: Do not allow mixing HW stats types for actionsJiri Pirko1-0/+2
As there is one set of counters for the whole action chain, forbid to mix the HW stats types. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-09flow_offload: check for basic action hw stats typeJiri Pirko12-11/+119
Introduce flow_action_basic_hw_stats_types_check() helper and use it in drivers. That sanitizes the drivers which do not have support for action HW stats types. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-09ocelot_flower: use flow_offload_has_one_action() helperJiri Pirko1-1/+1
Instead of directly checking number of action entries, use flow_offload_has_one_action() helper. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-09flow_offload: Introduce offload of HW stats typeJiri Pirko1-0/+3
Initially, pass "ANY" (struct is zeroed) to the drivers as that is the current implicit value coming down to flow_offload. Add a bool indicating that entries have mixed HW stats type. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-09Linux 5.6-rc5v5.6-rc5Linus Torvalds1-1/+1
2020-03-09Merge tag 'armsoc-fixes' of ↵Linus Torvalds43-91/+416
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull ARM SoC fixes from Olof Johansson: "We've been accruing these for a couple of weeks, so the batch is a bit bigger than usual. Largest delta is due to a led-bl driver that is added -- there was a miscommunication before the merge window and the driver didn't make it in. Due to this, the platforms needing it regressed. At this point, it seemed easier to add the new driver than unwind the changes. Besides that, there are a handful of various fixes: - AMD tee memory leak fix - A handful of fixlets for i.MX SCU communication - A few maintainers woke up and realized DEBUG_FS had been missing for a while, so a few updates of that. ... and the usual collection of smaller fixes to various platforms" * tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (37 commits) ARM: socfpga_defconfig: Add back DEBUG_FS arm64: dts: socfpga: agilex: Fix gmac compatible ARM: bcm2835_defconfig: Explicitly restore CONFIG_DEBUG_FS arm64: dts: meson: fix gxm-khadas-vim2 wifi arm64: dts: meson-sm1-sei610: add missing interrupt-names ARM: meson: Drop unneeded select of COMMON_CLK ARM: dts: bcm2711: Add pcie0 alias ARM: dts: bcm283x: Add missing properties to the PWR LED tee: amdtee: fix memory leak in amdtee_open_session() ARM: OMAP2+: Fix compile if CONFIG_HAVE_ARM_SMCCC is not set arm: dts: dra76x: Fix mmc3 max-frequency ARM: dts: dra7: Add "dma-ranges" property to PCIe RC DT nodes bus: ti-sysc: Fix 1-wire reset quirk ARM: dts: r8a7779: Remove deprecated "renesas, rcar-sata" compatible value soc: imx-scu: Align imx sc msg structs to 4 firmware: imx: Align imx_sc_msg_req_cpu_start to 4 firmware: imx: scu-pd: Align imx sc msg structs to 4 firmware: imx: misc: Align imx sc msg structs to 4 firmware: imx: scu: Ensure sequential TX ARM: dts: imx7-colibri: Fix frequency for sd/mmc ...
2020-03-09Merge tag 'edac_urgent-2020-03-08' of ↵Linus Torvalds1-15/+7
git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras Pull EDAC fix from Borislav Petkov: "Error reporting fix for synopsys_edac: do not overwrite partial decoded error message (Sherry Sun)" * tag 'edac_urgent-2020-03-08' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: EDAC/synopsys: Do not print an error with back-to-back snprintf() calls
2020-03-08Merge tag 'char-misc-5.6-rc5' of ↵Linus Torvalds5-8/+31
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc fixes from Greg KH: "Here are four small char/misc driver fixes for reported issues for 5.6-rc5. These fixes are: - binder fix for a potential use-after-free problem found (took two tries to get it right) - interconnect core fix - altera-stapl driver fix All four of these have been in linux-next for a while with no reported issues" * tag 'char-misc-5.6-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: binder: prevent UAF for binderfs devices II interconnect: Handle memory allocation errors altera-stapl: altera_get_note: prevent write beyond end of 'key' binder: prevent UAF for binderfs devices
2020-03-08Merge tag 'driver-core-5.6-rc5' of ↵Linus Torvalds5-30/+44
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core and debugfs fixes from Greg KH: "Here are four small driver core / debugfs patches for 5.6-rc3: - debugfs api cleanup now that all debugfs_create_regset32() callers have been fixed up. This was waiting until after the -rc1 merge as these fixes came in through different trees - driver core sync state fixes based on reports of minor issues found in the feature All of these have been in linux-next with no reported issues" * tag 'driver-core-5.6-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: driver core: Skip unnecessary work when device doesn't have sync_state() driver core: Add dev_has_sync_state() driver core: Call sync_state() even if supplier has no consumers debugfs: remove return value of debugfs_create_regset32()
2020-03-08Merge tag 'tty-5.6-rc5' of ↵Linus Torvalds8-29/+90
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull tty/serial fixes from Greg KH: "Here are some small tty/serial fixes for 5.6-rc5 Just some small serial driver fixes, and a vt core fixup, full details are: - vt fixes for issues found by syzbot - serdev fix for Apple boxes - fsl_lpuart serial driver fixes - MAINTAINER update for incorrect serial files - new device ids for 8250_exar driver - mvebu-uart fix All of these have been in linux-next with no reported issues" * tag 'tty-5.6-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: tty: serial: fsl_lpuart: free IDs allocated by IDA Revert "tty: serial: fsl_lpuart: drop EARLYCON_DECLARE" serdev: Fix detection of UART devices on Apple machines. MAINTAINERS: Add missed files related to Synopsys DesignWare UART serial: 8250_exar: add support for ACCES cards tty:serial:mvebu-uart:fix a wrong return vt: selection, push sel_lock up vt: selection, push console lock down
2020-03-08Merge tag 'usb-5.6-rc5' of ↵Linus Torvalds12-117/+163
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb Pull USB/PHY fixes from Greg KH: "Here are some small USB and PHY driver fixes for reported issues for 5.6-rc5. Included in here are: - phy driver fixes - new USB quirks - USB cdns3 gadget driver fixes - USB hub core fixes All of these have been in linux-next with no reported issues" * tag 'usb-5.6-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: usb: dwc3: gadget: Update chain bit correctly when using sg list usb: core: port: do error out if usb_autopm_get_interface() fails usb: core: hub: do error out if usb_autopm_get_interface() fails usb: core: hub: fix unhandled return by employing a void function usb: storage: Add quirk for Samsung Fit flash usb: quirks: add NO_LPM quirk for Logitech Screen Share usb: usb251xb: fix regulator probe and error handling phy: allwinner: Fix GENMASK misuse usb: cdns3: gadget: toggle cycle bit before reset endpoint usb: cdns3: gadget: link trb should point to next request phy: mapphone-mdm6600: Fix timeouts by adding wake-up handling phy: brcm-sata: Correct MDIO operations for 40nm platforms phy: ti: gmii-sel: do not fail in case of gmii phy: ti: gmii-sel: fix set of copy-paste errors phy: core: Fix phy_get() to not return error on link creation failure phy: mapphone-mdm6600: Fix write timeouts with shorter GPIO toggle interval
2020-03-08pid: Fix error return value in some casesCorey Minyard1-0/+2
Recent changes to alloc_pid() allow the pid number to be specified on the command line. If set_tid_size is set, then the code scanning the levels will hard-set retval to -EPERM, overriding it's previous -ENOMEM value. After the code scanning the levels, there are error returns that do not set retval, assuming it is still set to -ENOMEM. So set retval back to -ENOMEM after scanning the levels. Fixes: 49cb2fc42ce4 ("fork: extend clone3() to support setting a PID") Signed-off-by: Corey Minyard <cminyard@mvista.com> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Cc: Andrei Vagin <avagin@gmail.com> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Adrian Reber <areber@redhat.com> Cc: <stable@vger.kernel.org> # 5.5 Link: https://lore.kernel.org/r/20200306172314.12232-1-minyard@acm.org [christian.brauner@ubuntu.com: fixup commit message] Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-03-08virtio_balloon: Adjust label in virtballoon_probeNathan Chancellor1-1/+1
Clang warns when CONFIG_BALLOON_COMPACTION is unset: ../drivers/virtio/virtio_balloon.c:963:1: warning: unused label 'out_del_vqs' [-Wunused-label] out_del_vqs: ^~~~~~~~~~~~ 1 warning generated. Move the label within the preprocessor block since it is only used when CONFIG_BALLOON_COMPACTION is set. Fixes: 1ad6f58ea936 ("virtio_balloon: Fix memory leaks on errors in virtballoon_probe()") Link: https://github.com/ClangBuiltLinux/linux/issues/886 Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Link: https://lore.kernel.org/r/20200216004039.23464-1-natechancellor@gmail.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com>
2020-03-08virtio-blk: improve virtqueue error to BLK_STSHalil Pasic1-2/+7
Let's change the mapping between virtqueue_add errors to BLK_STS statuses, so that -ENOSPC, which indicates virtqueue full is still mapped to BLK_STS_DEV_RESOURCE, but -ENOMEM which indicates non-device specific resource outage is mapped to BLK_STS_RESOURCE. Signed-off-by: Halil Pasic <pasic@linux.ibm.com> Link: https://lore.kernel.org/r/20200213123728.61216-3-pasic@linux.ibm.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-03-08virtio-blk: fix hw_queue stopped on arbitrary errorHalil Pasic1-3/+5
Since nobody else is going to restart our hw_queue for us, the blk_mq_start_stopped_hw_queues() is in virtblk_done() is not sufficient necessarily sufficient to ensure that the queue will get started again. In case of global resource outage (-ENOMEM because mapping failure, because of swiotlb full) our virtqueue may be empty and we can get stuck with a stopped hw_queue. Let us not stop the queue on arbitrary errors, but only on -EONSPC which indicates a full virtqueue, where the hw_queue is guaranteed to get started by virtblk_done() before when it makes sense to carry on submitting requests. Let us also remove a stale comment. Signed-off-by: Halil Pasic <pasic@linux.ibm.com> Cc: Jens Axboe <axboe@kernel.dk> Fixes: f7728002c1c7 ("virtio_ring: fix return code on DMA mapping fails") Link: https://lore.kernel.org/r/20200213123728.61216-2-pasic@linux.ibm.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-03-08virtio_ring: Fix mem leak with vring_new_virtqueue()Suman Anna1-2/+2
The functions vring_new_virtqueue() and __vring_new_virtqueue() are used with split rings, and any allocations within these functions are managed outside of the .we_own_ring flag. The commit cbeedb72b97a ("virtio_ring: allocate desc state for split ring separately") allocates the desc state within the __vring_new_virtqueue() but frees it only when the .we_own_ring flag is set. This leads to a memory leak when freeing such allocated virtqueues with the vring_del_virtqueue() function. Fix this by moving the desc_state free code outside the flag and only for split rings. Issue was discovered during testing with remoteproc and virtio_rpmsg. Fixes: cbeedb72b97a ("virtio_ring: allocate desc state for split ring separately") Signed-off-by: Suman Anna <s-anna@ti.com> Link: https://lore.kernel.org/r/20200224212643.30672-1-s-anna@ti.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2020-03-08fscrypt: don't evict dirty inodes after removing keyEric Biggers1-0/+9
After FS_IOC_REMOVE_ENCRYPTION_KEY removes a key, it syncs the filesystem and tries to get and put all inodes that were unlocked by the key so that unused inodes get evicted via fscrypt_drop_inode(). Normally, the inodes are all clean due to the sync. However, after the filesystem is sync'ed, userspace can modify and close one of the files. (Userspace is *supposed* to close the files before removing the key. But it doesn't always happen, and the kernel can't assume it.) This causes the inode to be dirtied and have i_count == 0. Then, fscrypt_drop_inode() failed to consider this case and indicated that the inode can be dropped, causing the write to be lost. On f2fs, other problems such as a filesystem freeze could occur due to the inode being freed while still on f2fs's dirty inode list. Fix this bug by making fscrypt_drop_inode() only drop clean inodes. I've written an xfstest which detects this bug on ext4, f2fs, and ubifs. Fixes: b1c0ec3599f4 ("fscrypt: add FS_IOC_REMOVE_ENCRYPTION_KEY ioctl") Cc: <stable@vger.kernel.org> # v5.4+ Link: https://lore.kernel.org/r/20200305084138.653498-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-03-08Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds15-59/+95
Pull rdma fixes from Jason Gunthorpe: "Nothing particularly exciting, some small ODP regressions from the mmu notifier rework, another bunch of syzkaller fixes, and a bug fix for a botched syzkaller fix in the first rc pull request. - Fix busted syzkaller fix in 'get_new_pps' - this turned out to crash on certain HW configurations - Bug fixes for various missed things in error unwinds - Add a missing rcu_read_lock annotation in hfi/qib - Fix two ODP related regressions from the recent mmu notifier changes - Several more syzkaller bugs in siw, RDMA netlink, verbs and iwcm - Revert an old patch in CMA as it is now shown to not be allocating port numbers properly" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: RDMA/iwcm: Fix iwcm work deallocation RDMA/siw: Fix failure handling during device creation RDMA/nldev: Fix crash when set a QP to a new counter but QPN is missing RDMA/odp: Ensure the mm is still alive before creating an implicit child RDMA/core: Fix protection fault in ib_mr_pool_destroy IB/mlx5: Fix implicit ODP race IB/hfi1, qib: Ensure RCU is locked when accessing list RDMA/core: Fix pkey and port assignment in get_new_pps RMDA/cm: Fix missing ib_cm_destroy_id() in ib_cm_insert_listen() RDMA/rw: Fix error flow during RDMA context initialization RDMA/core: Fix use of logical OR in get_new_pps Revert "RDMA/cma: Simplify rdma_resolve_addr() error flow"
2020-03-08net/mlx5: HW bit for goto chain offload supportEli Cohen1-1/+2
Add the HW bit definition indecating goto chain offload support. Signed-off-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-03-08net/mlx5: Expose link speed directlyMark Bloch1-1/+2
Expose port rate as part of the port speed register fields. Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-03-08net/mlx5: Introduce TLS and IPSec objects enumsSaeed Mahameed2-2/+3
Expose the TLS encryption key general object type enum correctly, and add the IPSec encryption key general object type enum. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-03-08net/mlx5: Introduce egress acl forward-to-vport capabilityVu Pham1-1/+1
Add HCA_CAP.egress_acl_forward_to_vport field to check whether HW supports e-switch vport's egress acl to forward packets to other e-switch vport or not. By default E-Switch egress ACL forwards eswitch vports egress packets to their corresponding NIC/VF vports. With this cap enabled, the driver is allowed to alter this behavior and forward packets to arbitrary NIC/VF vports with the following limitations: a. Multiple processing paths are supported if all of the following conditions are met: - HCA_CAP.egress_acl_forward_to_vport is set ==1. - A destination of type Flow Table only appears once, as the last destination in the list. - Vport destination is supported if HCA_CAP.egress_acl_forward_to_vport==1. Vport must not be the Uplink. b. Flow_tag not supported. c. This table is only applicable after an FDB table is created. d. Push VLAN action is not supported. e. Pop VLAN action cannot be added concurrently to this table and FDB table. This feature will be used during port failover in bonding scenario where two VFs representors are bonded to handle failover egress traffic (VM's ingress/receive traffic). Signed-off-by: Vu Pham <vuhuong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-03-07Merge tag 'io_uring-5.6-2020-03-07' of git://git.kernel.dk/linux-blockLinus Torvalds3-47/+38
Pull io_uring fixes from Jens Axboe: "Here are a few io_uring fixes that should go into this release. This contains: - Removal of (now) unused io_wq_flush() and associated flag (Pavel) - Fix cancelation lockup with linked timeouts (Pavel) - Fix for potential use-after-free when freeing percpu ref for fixed file sets - io-wq cancelation fixups (Pavel)" * tag 'io_uring-5.6-2020-03-07' of git://git.kernel.dk/linux-block: io_uring: fix lockup with timeouts io_uring: free fixed_file_data after RCU grace period io-wq: remove io_wq_flush and IO_WQ_WORK_INTERNAL io-wq: fix IO_WQ_WORK_NO_CANCEL cancellation
2020-03-07Merge tag 'block-5.6-2020-03-07' of git://git.kernel.dk/linux-blockLinus Torvalds6-41/+11
Pull block fixes from Jens Axboe: "Here are a few fixes that should go into this release. This contains: - Revert of a bad bcache patch from this merge window - Removed unused function (Daniel) - Fixup for the blktrace fix from Jan from this release (Cengiz) - Fix of deeper level bfqq overwrite in BFQ (Carlo)" * tag 'block-5.6-2020-03-07' of git://git.kernel.dk/linux-block: block, bfq: fix overwrite of bfq_group pointer in bfq_find_set_group() blktrace: fix dereference after null check Revert "bcache: ignore pending signals when creating gc and allocator thread" block: Remove used kblockd_schedule_work_on()
2020-03-07Merge tag 'media/v5.6-2' of ↵Linus Torvalds5-36/+22
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media Pull media fixes from Mauro Carvalho Chehab: - a fix for the media controller links in both hantro driver and in v4l2-mem2mem core - some fixes for the pulse8-cec driver - vicodec: handle alpha channel for RGB32 formats, as it may be used - mc-entity.c: fix handling of pad flags * tag 'media/v5.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: media: hantro: Fix broken media controller links media: mc-entity.c: use & to check pad flags, not == media: v4l2-mem2mem.c: fix broken links media: vicodec: process all 4 components for RGB32 formats media: pulse8-cec: close serio in disconnect, not adap_free media: pulse8-cec: INIT_DELAYED_WORK was called too late