summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2019-12-20net/tls: add helper for testing if socket is RX offloadedJakub Kicinski2-2/+12
There is currently no way for driver to reliably check that the socket it has looked up is in fact RX offloaded. Add a helper. This allows drivers to catch misbehaving firmware. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-20nfp: pass packet pointer to nfp_net_parse_meta()Jakub Kicinski1-10/+8
Make nfp_net_parse_meta() take a packet pointer and return a drop/no drop decision. Right now it returns the end of metadata and caller compares it to the packet pointer. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-20Merge branch 'nfp-ipv6-tunnel'David S. Miller7-229/+893
John Hurley says: ==================== Add ipv6 tunnel support to NFP The following patches add support for IPv6 tunnel offload to the NFP driver. Patches 1-2 do some code tidy up and prepare existing code for reuse in IPv6 tunnels. Patches 3-4 handle IPv6 tunnel decap (match) rules. Patches 5-8 handle encap (action) rules. Patch 9 adds IPv6 support to the merge and pre-tunnel rule functions. v1->v2: - fix compiler warning when building without CONFIG_IPV6 set - Jakub Kicinski (patch 7) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-20nfp: flower: update flow merge code to support IPv6 tunnelsJohn Hurley1-5/+23
Both pre-tunnel match rules and flow merge functions parse compiled match/action fields for validation. Update these validation functions to include IPv6 match and action fields. Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-20nfp: flower: support ipv6 tunnel keep-alive messages from fwJohn Hurley4-0/+67
FW sends an update of IPv6 tunnels that are active in a given period. Use this information to update the kernel table so that neighbour entries do not time out when active on the NIC. Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-20nfp: flower: handle notifiers for ipv6 route changesJohn Hurley2-68/+181
A notifier is used to track route changes in the kernel. If a change is made to a route that is offloaded to fw then an update is sent to the NIC. The driver tracks all routes that are offloaded to determine if a kernel change is of interest. Extend the notifier to track IPv6 route changes and create a new list that stores offloaded IPv6 routes. Modify the IPv4 route helper functions to accept varying address lengths. This way, the same core functions can be used to handle IPv4 and IPv6. Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-20nfp: flower: handle ipv6 tunnel no neigh requestJohn Hurley4-8/+116
When fw does not know the next hop for an IPv6 tunnel, it sends a request to the driver. Handle this request by doing a route lookup on the IPv6 address and offloading the next hop to the fw neighbour table. Similar functions already exist to handle IPv4 no neighbour requests. To avoid confusion, append these functions with the _ipv4 tag. There is no change in functionality with this. Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-20nfp: flower: modify pre-tunnel and set tunnel action for ipv6John Hurley3-30/+62
The IPv4 set tunnel action allows the setting of tunnel metadata such as the TTL and ToS values. The pre-tunnel action includes the destination IP address and is used to calculate the next hop from from the neighbour table. Much of the IPv4 tunnel actions can be reused for IPv6 tunnels. Change the names of associated functions and structs to remove the IPv4 identifier and make minor modifcations to support IPv6 tunnel actions. Ensure the pre-tunnel action contains the IPv6 address along with an identifying flag when an IPv6 tunnel action is required. Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-20nfp: flower: offload list of IPv6 tunnel endpoint addressesJohn Hurley5-1/+141
Fw requires a list of IPv6 addresses that are used as tunnel endpoints to enable correct decap of tunneled packets. Store a list of IPv6 endpoints used in rules with a ref counter to track how many times it is in use. Offload the entire list any time a new IPv6 address is added or when an address is removed (ref count is 0). Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-20nfp: flower: compile match for IPv6 tunnelsJohn Hurley4-51/+246
IPv6 tunnel matches are now supported by firmware. Modify the NFP driver to compile these match rules. IPv6 matches are handled similar to IPv4 tunnels with the difference the address length. The type of tunnel is indicated by the same bitmap that is used in IPv4 with an extra bit signifying that the IPv6 variation should be used. Only compile IPv6 tunnel matches when the fw features symbol indicated that they are compatible with the currently loaded fw. Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-20nfp: flower: move udp tunnel key match compilation to helper functionJohn Hurley1-22/+35
IPv4 UDP and GRE tunnel match rule compile helpers share functions for compiling fields such as IP addresses. However, they handle fields such tunnel IDs differently. Create new helper functions for compiling GRE and UDP tunnel key data. This is in preparation for supporting IPv6 tunnels where these new functions can be reused. This patch does not change functionality. Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-20nfp: flower: pass flow rule pointer directly to match functionsJohn Hurley1-49/+27
In kernel 5.1, the flow offload API was introduced along with a helper function to extract the flow_rule from the TC offload struct. Each of the match helper functions are passed the offload struct and extract the flow rule to a local variable. Simplify the code while also removing the extra compat and local variable calls by extracting the rule once in the main match handler, and passing a reference to the rule direct to each helper. This patch does not change driver functionality. Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-20hdlcdrv: replace unnecessary assertion in hdlcdrv_registerAditya Pakki1-2/+0
In hdlcdrv_register, failure to register the driver causes a crash. The three callers of hdlcdrv_register all pass valid pointers and do not fail. The patch eliminates the unnecessary BUG_ON assertion. Signed-off-by: Aditya Pakki <pakki001@umn.edu> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-20nfc: s3fwrn5: replace the assertion with a WARN_ONAditya Pakki1-1/+4
In s3fwrn5_fw_recv_frame, if fw_info->rsp is not empty, the current code causes a crash via BUG_ON. However, s3fwrn5_fw_send_msg does not crash in such a scenario. The patch replaces the BUG_ON by returning the error to the callers and frees up skb. Signed-off-by: Aditya Pakki <pakki001@umn.edu> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-20Merge branch 'macb-fix-probing-of-PHY-not-described-in-the-dt'David S. Miller3-5/+31
Antoine Tenart says: ==================== net: macb: fix probing of PHY not described in the dt The macb Ethernet driver supports various ways of referencing its network PHY. When a device tree is used the PHY can be referenced with a phy-handle or, if connected to its internal MDIO bus, described in a child node. Some platforms omitted the PHY description while connecting the PHY to the internal MDIO bus and in such cases the MDIO bus has to be scanned "manually" by the macb driver. Prior to the phylink conversion the driver registered the MDIO bus with of_mdiobus_register and then in case the PHY couldn't be retrieved using dt or using phy_find_first (because registering an MDIO bus with of_mdiobus_register masks all PHYs) the macb driver was "manually" scanning the MDIO bus (like mdiobus_register does). The phylink conversion did break this particular case but reimplementing the manual scan of the bus in the macb driver wouldn't be very clean. The solution seems to be registering the MDIO bus based on if the PHYs are described in the device tree or not. There are multiple ways to do this, none is perfect. I chose to check if any of the child nodes of the macb node was a network PHY and based on this to register the MDIO bus with the of_ helper or not. The drawback is boards referencing the PHY through phy-handle, would scan the entire MDIO bus of the macb at boot time (as the MDIO bus would be registered with mdiobus_register). For this solution to work properly of_mdiobus_child_is_phy has to be exported, which means the patch doing so has to be backported to -stable as well. Another possible solution could have been to simply check if the macb node has a child node by counting its sub-nodes. This isn't techically perfect, as there could be other sub-nodes (in practice this should be fine, fixed-link being taken care of in the driver). We could also simply s/of_mdiobus_register/mdiobus_register/ but that could break boards using the PHY description in child node as a selector (which really would be not a proper way to do this...). The real issue here being having PHYs not described in the dt but we have dt backward compatibility, so we have to live with that. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-20net: macb: fix probing of PHY not described in the dtAntoine Tenart1-4/+23
This patch fixes the case where the PHY isn't described in the device tree. This is due to the way the MDIO bus is registered in the driver: whether the PHY is described in the device tree or not, the bus is registered through of_mdiobus_register. The function masks all the PHYs and only allow probing the ones described in the device tree. Prior to the Phylink conversion this was also done but later on in the driver the MDIO bus was manually scanned to circumvent the fact that the PHY wasn't described. This patch fixes it in a proper way, by registering the MDIO bus based on if the PHY attached to a given interface is described in the device tree or not. Fixes: 7897b071ac3b ("net: macb: convert to phylink") Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-20of: mdio: export of_mdiobus_child_is_phyAntoine Tenart2-1/+8
This patch exports of_mdiobus_child_is_phy, allowing to check if a child node is a network PHY. Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-20net: mvpp2: cycle comphy to power it downRussell King1-0/+10
Presently, at boot time, the comphys are enabled. For firmware compatibility reasons, the comphy driver does not power down the comphys at boot. Consequently, the ethernet comphys are left active until the network interfaces are brought through an up/down cycle. If the port is never used, the port wastes power needlessly. Arrange for the ethernet comphys to be cycled by the mvpp2 driver as if the interface went through an up/down cycle during driver probe, thereby powering them down. This saves: 270mW per 10G SFP+ port on the Macchiatobin Single Shot (eth0/eth1) 370mW per 10G PHY port on the Macchiatobin Double Shot (eth0/eth1) 160mW on the SFP port on either Macchiatobin flavour (eth3) Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Acked-by: Antoine Tenart <antoine.tenart@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-20net: sfp: report error on failure to read sfp soft statusRussell King1-2/+9
Report a rate-limited error if we fail to read the SFP soft status, and preserve the current status in that case. This avoids I2C bus errors from triggering a link flap. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-20netfilter: nft_tproxy: Fix port selector on Big EndianPhil Sutter1-2/+2
On Big Endian architectures, u16 port value was extracted from the wrong parts of u32 sreg_port, just like commit 10596608c4d62 ("netfilter: nf_tables: fix mismatch in big-endian system") describes. Fixes: 4ed8eb6570a49 ("netfilter: nf_tables: Add native tproxy support") Signed-off-by: Phil Sutter <phil@nwl.cc> Acked-by: Florian Westphal <fw@strlen.de> Acked-by: Máté Eckl <ecklm94@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-12-20netfilter: ebtables: compat: reject all padding in matches/watchersFlorian Westphal1-17/+16
syzbot reported following splat: BUG: KASAN: vmalloc-out-of-bounds in size_entry_mwt net/bridge/netfilter/ebtables.c:2063 [inline] BUG: KASAN: vmalloc-out-of-bounds in compat_copy_entries+0x128b/0x1380 net/bridge/netfilter/ebtables.c:2155 Read of size 4 at addr ffffc900004461f4 by task syz-executor267/7937 CPU: 1 PID: 7937 Comm: syz-executor267 Not tainted 5.5.0-rc1-syzkaller #0 size_entry_mwt net/bridge/netfilter/ebtables.c:2063 [inline] compat_copy_entries+0x128b/0x1380 net/bridge/netfilter/ebtables.c:2155 compat_do_replace+0x344/0x720 net/bridge/netfilter/ebtables.c:2249 compat_do_ebt_set_ctl+0x22f/0x27e net/bridge/netfilter/ebtables.c:2333 [..] Because padding isn't considered during computation of ->buf_user_offset, "total" is decremented by fewer bytes than it should. Therefore, the first part of if (*total < sizeof(*entry) || entry->next_offset < sizeof(*entry)) will pass, -- it should not have. This causes oob access: entry->next_offset is past the vmalloced size. Reject padding and check that computed user offset (sum of ebt_entry structure plus all individual matches/watchers/targets) is same value that userspace gave us as the offset of the next entry. Reported-by: syzbot+f68108fed972453a0ad4@syzkaller.appspotmail.com Fixes: 81e675c227ec ("netfilter: ebtables: add CONFIG_COMPAT support") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-12-20selftests: netfilter: extend flowtable test script with dnat ruleFlorian Westphal1-5/+34
NAT test currently covers snat (masquerade) only. Also add a dnat rule and then check that a connecting to the to-be-dnated address will work. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-12-20netfilter: nf_flow_table: fix big-endian integer overflowArnd Bergmann1-1/+1
In some configurations, gcc reports an integer overflow: net/netfilter/nf_flow_table_offload.c: In function 'nf_flow_rule_match': net/netfilter/nf_flow_table_offload.c:80:21: error: unsigned conversion from 'int' to '__be16' {aka 'short unsigned int'} changes value from '327680' to '0' [-Werror=overflow] mask->tcp.flags = TCP_FLAG_RST | TCP_FLAG_FIN; ^~~~~~~~~~~~ From what I can tell, we want the upper 16 bits of these constants, so they need to be shifted in cpu-endian mode. Fixes: c29f74e0df7a ("netfilter: nf_flow_table: hardware offload support") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-12-20scsi: target/iblock: Fix protection error with blocks greater than 512BIsrael Rukshin1-1/+3
The sector size of the block layer is 512 bytes, but integrity interval size might be different (in case of 4K block size of the media). At the initiator side the virtual start sector is the one that was originally submitted by the block layer (512 bytes) for the Reftag usage. The initiator converts the Reftag to integrity interval units and sends it to the target. So the target virtual start sector should be calculated at integrity interval units. prepare_fn() and complete_fn() don't remap correctly the Reftag when using incorrect units of the virtual start sector, which leads to the following protection error at the device: "blk_update_request: protection error, dev sdb, sector 2048 op 0x0:(READ) flags 0x10000 phys_seg 1 prio class 0" To fix that, set the seed in integrity interval units. Link: https://lore.kernel.org/r/1576078562-15240-1-git-send-email-israelr@mellanox.com Signed-off-by: Israel Rukshin <israelr@mellanox.com> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-12-20scsi: libcxgbi: fix NULL pointer dereference in cxgbi_device_destroy()Varun Prakash1-1/+2
If cxgb4i_ddp_init() fails then cdev->cdev2ppm will be NULL, so add a check for NULL pointer before dereferencing it. Link: https://lore.kernel.org/r/1576676731-3068-1-git-send-email-varun@chelsio.com Signed-off-by: Varun Prakash <varun@chelsio.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-12-20scsi: lpfc: fix spelling mistakes of asynchronousColin Ian King2-6/+6
There are spelling mistakes of asynchronous in a lpfc_printf_log message and comments. Fix these. Link: https://lore.kernel.org/r/20191218084301.627555-1-colin.king@canonical.com Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: James Smart <james.smart@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-12-20tracing: Have the histogram compare functions convert to u64 firstSteven Rostedt (VMware)1-2/+2
The compare functions of the histogram code would be specific for the size of the value being compared (byte, short, int, long long). It would reference the value from the array via the type of the compare, but the value was stored in a 64 bit number. This is fine for little endian machines, but for big endian machines, it would end up comparing zeros or all ones (depending on the sign) for anything but 64 bit numbers. To fix this, first derference the value as a u64 then convert it to the type being compared. Link: http://lkml.kernel.org/r/20191211103557.7bed6928@gandalf.local.home Cc: stable@vger.kernel.org Fixes: 08d43a5fa063e ("tracing: Add lock-free tracing_map") Acked-by: Tom Zanussi <zanussi@kernel.org> Reported-by: Sven Schnelle <svens@stackframe.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-12-20tracing: Avoid memory leak in process_system_preds()Keita Suzuki1-1/+1
When failing in the allocation of filter_item, process_system_preds() goes to fail_mem, where the allocated filter is freed. However, this leads to memory leak of filter->filter_string and filter->prog, which is allocated before and in process_preds(). This bug has been detected by kmemleak as well. Fix this by changing kfree to __free_fiter. unreferenced object 0xffff8880658007c0 (size 32): comm "bash", pid 579, jiffies 4295096372 (age 17.752s) hex dump (first 32 bytes): 63 6f 6d 6d 6f 6e 5f 70 69 64 20 20 3e 20 31 30 common_pid > 10 00 00 00 00 00 00 00 00 65 73 00 00 00 00 00 00 ........es...... backtrace: [<0000000067441602>] kstrdup+0x2d/0x60 [<00000000141cf7b7>] apply_subsystem_event_filter+0x378/0x932 [<000000009ca32334>] subsystem_filter_write+0x5a/0x90 [<0000000072da2bee>] vfs_write+0xe1/0x240 [<000000004f14f473>] ksys_write+0xb4/0x150 [<00000000a968b4a0>] do_syscall_64+0x6d/0x1e0 [<000000001a189f40>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 unreferenced object 0xffff888060c22d00 (size 64): comm "bash", pid 579, jiffies 4295096372 (age 17.752s) hex dump (first 32 bytes): 01 00 00 00 00 00 00 00 00 e8 d7 41 80 88 ff ff ...........A.... 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<00000000b8c1b109>] process_preds+0x243/0x1820 [<000000003972c7f0>] apply_subsystem_event_filter+0x3be/0x932 [<000000009ca32334>] subsystem_filter_write+0x5a/0x90 [<0000000072da2bee>] vfs_write+0xe1/0x240 [<000000004f14f473>] ksys_write+0xb4/0x150 [<00000000a968b4a0>] do_syscall_64+0x6d/0x1e0 [<000000001a189f40>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 unreferenced object 0xffff888041d7e800 (size 512): comm "bash", pid 579, jiffies 4295096372 (age 17.752s) hex dump (first 32 bytes): 70 bc 85 97 ff ff ff ff 0a 00 00 00 00 00 00 00 p............... 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<000000001e04af34>] process_preds+0x71a/0x1820 [<000000003972c7f0>] apply_subsystem_event_filter+0x3be/0x932 [<000000009ca32334>] subsystem_filter_write+0x5a/0x90 [<0000000072da2bee>] vfs_write+0xe1/0x240 [<000000004f14f473>] ksys_write+0xb4/0x150 [<00000000a968b4a0>] do_syscall_64+0x6d/0x1e0 [<000000001a189f40>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Link: http://lkml.kernel.org/r/20191211091258.11310-1-keitasuzuki.park@sslab.ics.keio.ac.jp Cc: Ingo Molnar <mingo@redhat.com> Cc: stable@vger.kernel.org Fixes: 404a3add43c9c ("tracing: Only add filter list when needed") Signed-off-by: Keita Suzuki <keitasuzuki.park@sslab.ics.keio.ac.jp> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-12-20Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller21-108/+269
Daniel Borkmann says: ==================== pull-request: bpf 2019-12-19 The following pull-request contains BPF updates for your *net* tree. We've added 10 non-merge commits during the last 8 day(s) which contain a total of 21 files changed, 269 insertions(+), 108 deletions(-). The main changes are: 1) Fix lack of synchronization between xsk wakeup and destroying resources used by xsk wakeup, from Maxim Mikityanskiy. 2) Fix pruning with tail call patching, untrack programs in case of verifier error and fix a cgroup local storage tracking bug, from Daniel Borkmann. 3) Fix clearing skb->tstamp in bpf_redirect() when going from ingress to egress which otherwise cause issues e.g. on fq qdisc, from Lorenz Bauer. 4) Fix compile warning of unused proc_dointvec_minmax_bpf_restricted() when only cBPF is present, from Alexander Lobakin. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-20bpf: Add further test_verifier cases for record_func_keyDaniel Borkmann3-24/+176
Expand dummy prog generation such that we can easily check on return codes and add few more test cases to make sure we keep on tracking pruning behavior. # ./test_verifier [...] #1066/p XDP pkt read, pkt_data <= pkt_meta', bad access 1 OK #1067/p XDP pkt read, pkt_data <= pkt_meta', bad access 2 OK Summary: 1580 PASSED, 0 SKIPPED, 0 FAILED Also verified that JIT dump of added test cases looks good. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/df7200b6021444fd369376d227de917357285b65.1576789878.git.daniel@iogearbox.net
2019-12-20bpf: Fix record_func_key to perform backtracking on r3Daniel Borkmann1-1/+7
While testing Cilium with /unreleased/ Linus' tree under BPF-based NodePort implementation, I noticed a strange BPF SNAT engine behavior from time to time. In some cases it would do the correct SNAT/DNAT service translation, but at a random point in time it would just stop and perform an unexpected translation after SYN, SYN/ACK and stack would send a RST back. While initially assuming that there is some sort of a race condition in BPF code, adding trace_printk()s for debugging purposes at some point seemed to have resolved the issue auto-magically. Digging deeper on this Heisenbug and reducing the trace_printk() calls to an absolute minimum, it turns out that a single call would suffice to trigger / not trigger the seen RST issue, even though the logic of the program itself remains unchanged. Turns out the single call changed verifier pruning behavior to get everything to work. Reconstructing a minimal test case, the incorrect JIT dump looked as follows: # bpftool p d j i 11346 0xffffffffc0cba96c: [...] 21: movzbq 0x30(%rdi),%rax 26: cmp $0xd,%rax 2a: je 0x000000000000003a 2c: xor %edx,%edx 2e: movabs $0xffff89cc74e85800,%rsi 38: jmp 0x0000000000000049 3a: mov $0x2,%edx 3f: movabs $0xffff89cc74e85800,%rsi 49: mov -0x224(%rbp),%eax 4f: cmp $0x20,%eax 52: ja 0x0000000000000062 54: add $0x1,%eax 57: mov %eax,-0x224(%rbp) 5d: jmpq 0xffffffffffff6911 62: mov $0x1,%eax [...] Hence, unexpectedly, JIT emitted a direct jump even though retpoline based one would have been needed since in line 2c and 3a we have different slot keys in BPF reg r3. Verifier log of the test case reveals what happened: 0: (b7) r0 = 14 1: (73) *(u8 *)(r1 +48) = r0 2: (71) r0 = *(u8 *)(r1 +48) 3: (15) if r0 == 0xd goto pc+4 R0_w=inv(id=0,umax_value=255,var_off=(0x0; 0xff)) R1=ctx(id=0,off=0,imm=0) R10=fp0 4: (b7) r3 = 0 5: (18) r2 = 0xffff89cc74d54a00 7: (05) goto pc+3 11: (85) call bpf_tail_call#12 12: (b7) r0 = 1 13: (95) exit from 3 to 8: R0_w=inv13 R1=ctx(id=0,off=0,imm=0) R10=fp0 8: (b7) r3 = 2 9: (18) r2 = 0xffff89cc74d54a00 11: safe processed 13 insns (limit 1000000) [...] Second branch is pruned by verifier since considered safe, but issue is that record_func_key() couldn't have seen the index in line 3a and therefore decided that emitting a direct jump at this location was okay. Fix this by reusing our backtracking logic for precise scalar verification in order to prevent pruning on the slot key. This means verifier will track content of r3 all the way backwards and only prune if both scalars were unknown in state equivalence check and therefore poisoned in the first place in record_func_key(). The range is [x,x] in record_func_key() case since the slot always would have to be constant immediate. Correct verification after fix: 0: (b7) r0 = 14 1: (73) *(u8 *)(r1 +48) = r0 2: (71) r0 = *(u8 *)(r1 +48) 3: (15) if r0 == 0xd goto pc+4 R0_w=invP(id=0,umax_value=255,var_off=(0x0; 0xff)) R1=ctx(id=0,off=0,imm=0) R10=fp0 4: (b7) r3 = 0 5: (18) r2 = 0x0 7: (05) goto pc+3 11: (85) call bpf_tail_call#12 12: (b7) r0 = 1 13: (95) exit from 3 to 8: R0_w=invP13 R1=ctx(id=0,off=0,imm=0) R10=fp0 8: (b7) r3 = 2 9: (18) r2 = 0x0 11: (85) call bpf_tail_call#12 12: (b7) r0 = 1 13: (95) exit processed 15 insns (limit 1000000) [...] And correct corresponding JIT dump: # bpftool p d j i 11 0xffffffffc0dc34c4: [...] 21: movzbq 0x30(%rdi),%rax 26: cmp $0xd,%rax 2a: je 0x000000000000003a 2c: xor %edx,%edx 2e: movabs $0xffff9928b4c02200,%rsi 38: jmp 0x0000000000000049 3a: mov $0x2,%edx 3f: movabs $0xffff9928b4c02200,%rsi 49: cmp $0x4,%rdx 4d: jae 0x0000000000000093 4f: and $0x3,%edx 52: mov %edx,%edx 54: cmp %edx,0x24(%rsi) 57: jbe 0x0000000000000093 59: mov -0x224(%rbp),%eax 5f: cmp $0x20,%eax 62: ja 0x0000000000000093 64: add $0x1,%eax 67: mov %eax,-0x224(%rbp) 6d: mov 0x110(%rsi,%rdx,8),%rax 75: test %rax,%rax 78: je 0x0000000000000093 7a: mov 0x30(%rax),%rax 7e: add $0x19,%rax 82: callq 0x000000000000008e 87: pause 89: lfence 8c: jmp 0x0000000000000087 8e: mov %rax,(%rsp) 92: retq 93: mov $0x1,%eax [...] Also explicitly adding explicit env->allow_ptr_leaks to fixup_bpf_calls() since backtracking is enabled under former (direct jumps as well, but use different test). In case of only tracking different map pointers as in c93552c443eb ("bpf: properly enforce index mask to prevent out-of-bounds speculation"), pruning cannot make such short-cuts, neither if there are paths with scalar and non-scalar types as r3. mark_chain_precision() is only needed after we know that register_is_const(). If it was not the case, we already poison the key on first path and non-const key in later paths are not matching the scalar range in regsafe() either. Cilium NodePort testing passes fine as well now. Note, released kernels not affected. Fixes: d2e4c1e6c294 ("bpf: Constant map key tracking for prog array pokes") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/ac43ffdeb7386c5bd688761ed266f3722bb39823.1576789878.git.daniel@iogearbox.net
2019-12-20bpf: Remove unnecessary assertion on fp_oldAditya Pakki1-2/+0
The two callers of bpf_prog_realloc - bpf_patch_insn_single and bpf_migrate_filter dereference the struct fp_old, before passing it to the function. Thus assertion to check fp_old is unnecessary and can be removed. Signed-off-by: Aditya Pakki <pakki001@umn.edu> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20191219175735.19231-1-pakki001@umn.edu
2019-12-19Merge branch 'phylib-consolidation'David S. Miller6-217/+129
Russell King says: ==================== phylib consolidation Over the last few releases, there has been a push to clean up and consolidate the phylib code. Some cases have been missed, and this series catches those cases. 1. Remove redundant .aneg_done initialisers; calling genphy_aneg_done() for clause 22 PHYs is the default when .aneg_done is not set. 2. Some PHY drivers manually set phydev->pause and phydev->asym_pause, but we have a helper for this - phy_resolve_aneg_pause(), introduced in 2d880b8709c0 ("net: phy: extract pause mode"). Use this in the lxt, marvell and uPD60620 drivers. Incidentally, this brings up the question whether marvell fiber mode is correctly interpreting and advertising the pause parameters. 3. Add a genphy_check_and_restart_aneg() helper, which complements the clause 45 version of this. This will be useful for PHY drivers that open code this logic (e.g. marvell.c) 4. Add a genphy_read_status_fixed() helper to read the fixed-mode status from a clause 22 PHY. lxt and marvell both contain copies of this code, so convert them over. 5. Arrange marvell driver to use genphy_read_lpa() for copper mode. This needs some rearrangement of the code in marvell_read_status_page_an(), but preserves using the PHY specific status register to derive the current negotiation results. 6. Simplify the marvell driver so we can use the genphy_read_status_fixed() helper directly rather than marvell_read_status_page_fixed(). 7. Use positive logic in the marvell driver to determine the link state, and get rid of the REGISTER_LINK_STATUS definition; we already have a definition for this. 8. The marvell driver reads the PHY specific status register multiple times when determining the status: once in marvell_update_link() and again in marvell_read_status_page_an(). This is a waste; rearrange to read the status register once, and pass its value into marvell_read_status_page_an(). We preserve using genphy_update_link() for the copper side. 9. The marvell driver was using private clause 37 definitions, but we have clause 37 definitions in uapi/linux/mii.h. Use the generic definitions. 10. Switch the marvell driver to use phy_modify_changed() to modify the fiber advertisement. 11. Switch the marvell driver to use genphy_check_and_restart_aneg() introduced above rather than open-coding this functionality. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-19net: phy: marvell: use genphy_check_and_restart_aneg()Russell King1-20/+1
Use the helper to check and restart autonegotiation for the marvell fiber page negotiation setting. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-19net: phy: marvell: use phy_modify_changed()Russell King1-16/+10
Use phy_modify_changed() to change the fiber advertisement register rather than open coding this functionality. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-19net: phy: marvell: use existing clause 37 definitionsRussell King1-18/+8
Use existing clause 37 advertising/link partner definitions rather than private ones for the advertisement registers. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-19net: phy: marvell: consolidate phy status readingRussell King1-43/+17
marvell_read_status_page_an() always reads the PHY status register, but marvell_update_link() has already done this. Rather than wastefully reading the register twice in quick succession, read it once in marvell_read_status_page() and use the result for both. This makes marvell_update_link() rather pointless, so move it into marvell_read_status_page(). Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-19net: phy: marvell: use positive logic for link stateRussell King1-4/+3
Rather than using negative logic: if (there is no link) set link = 0 else set link = 1 use the more natural positive logic: if (there is link) set link = 1 else set link = 0 Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-19net: phy: marvell: initialise link partner state earlierRussell King1-20/+5
Move the initialisation of the link partner state earlier, inside marvell_read_status_page(), so we don't have the same initialisation scattered amongst the other files. This is in a similar place to the genphy implementation, so would result in the same behaviour if a PHY read error occurs. This allows us to get rid of marvell_read_status_page_fixed(), which became a pointless wrapper around genphy_read_status_fixed(). Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-19net: phy: marvell: rearrange to use genphy_read_lpa()Russell King1-34/+32
Rearrange the Marvell PHY driver to use genphy_read_lpa() rather than open-coding this functionality. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-19net: phy: provide and use genphy_read_status_fixed()Russell King4-47/+41
There are two drivers and generic code which contain exactly the same code to read the status of a PHY operating without autonegotiation enabled. Rather than duplicate this code, provide a helper to read this information. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-19net: phy: add genphy_check_and_restart_aneg()Russell King2-17/+32
Add a helper for restarting autonegotiation(), similar to the clause 45 variant. Use it in __genphy_config_aneg() Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-19net: phy: use phy_resolve_aneg_pause()Russell King3-14/+3
Several drivers code their own version of this, working from the LPA register, after setting the ethtool link partner advertisement bitmask. Use the generic function instead. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-19net: phy: remove redundant .aneg_done initialisersRussell King2-7/+0
Remove initialisers that set .aneg_done to genphy_aneg_done - this is the default for clause 22 PHYs, so the initialiser is redundant. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-19Merge ath-next from git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath.gitKalle Valo30-302/+568
ath.git patches for v5.6. Major changes: wil6210 * support set_multicast_to_unicast cfg80211 operation * support set_cqm_rssi_config cfg80211 operation wcn36xx * disable HW_CONNECTION_MONITOR as firmware is buggy
2019-12-19ath11k: Use sizeof_field() instead of FIELD_SIZEOF()Kees Cook1-1/+1
The FIELD_SIZEOF() macro was redundant, and is being removed from the kernel. Since commit c593642c8be0 ("treewide: Use sizeof_field() macro") this is one of the last users of the old macro, so replace it. Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2019-12-19net, sysctl: Fix compiler warning when only cBPF is presentAlexander Lobakin1-0/+2
proc_dointvec_minmax_bpf_restricted() has been firstly introduced in commit 2e4a30983b0f ("bpf: restrict access to core bpf sysctls") under CONFIG_HAVE_EBPF_JIT. Then, this ifdef has been removed in ede95a63b5e8 ("bpf: add bpf_jit_limit knob to restrict unpriv allocations"), because a new sysctl, bpf_jit_limit, made use of it. Finally, this parameter has become long instead of integer with fdadd04931c2 ("bpf: fix bpf_jit_limit knob for PAGE_SIZE >= 64K") and thus, a new proc_dolongvec_minmax_bpf_restricted() has been added. With this last change, we got back to that proc_dointvec_minmax_bpf_restricted() is used only under CONFIG_HAVE_EBPF_JIT, but the corresponding ifdef has not been brought back. So, in configurations like CONFIG_BPF_JIT=y && CONFIG_HAVE_EBPF_JIT=n since v4.20 we have: CC net/core/sysctl_net_core.o net/core/sysctl_net_core.c:292:1: warning: ‘proc_dointvec_minmax_bpf_restricted’ defined but not used [-Wunused-function] 292 | proc_dointvec_minmax_bpf_restricted(struct ctl_table *table, int write, | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Suppress this by guarding it with CONFIG_HAVE_EBPF_JIT again. Fixes: fdadd04931c2 ("bpf: fix bpf_jit_limit knob for PAGE_SIZE >= 64K") Signed-off-by: Alexander Lobakin <alobakin@dlink.ru> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20191218091821.7080-1-alobakin@dlink.ru
2019-12-19ath11k: explicitly cast wmi commands to their correct struct typeJohn Crispin1-3/+3
Three of the WMI command handlers were not casting to the right data type. Lets make the code consistent with the other handlers. Signed-off-by: John Crispin <john@phrozen.org> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2019-12-19wil6210: add support for set_cqm_rssi_configDedy Lansky4-0/+130
set_cqm_rssi_config() is used by the kernel to configure connection quality monitor RSSI threshold. wil6210 uses WMI_SET_LINK_MONITOR_CMDID to set the RSSI threshold to FW which in turn reports RSSI threshold changes with WMI_LINK_MONITOR_EVENTID. Signed-off-by: Dedy Lansky <dlansky@codeaurora.org> Signed-off-by: Maya Erez <merez@codeaurora.org> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2019-12-19wil6210: support set_multicast_to_unicast cfg80211 operationAhmad Masri3-1/+48
Wil6210 AP has a separate ring for transmitting multicast packets, multicast packets are transmitted without an ack from the receiver side. Therefore, 802.11 spec defines some low MCS rates for multicat packets. However, there is no guarantee that these packets were really received and handled on the client side. Some applications that rely on multicast packets, may prefer to transmit these packets as a unicast to ensure reliability, and also to ensure better performance with high MCS rates. multicast to unicast is done by duplicating multicast packets to all clients and changing the DA (multicast) to the MAC address of the client. see NL80211_CMD_SET_MULTICAST_TO_UNICAST for more info. Signed-off-by: Ahmad Masri <amasri@codeaurora.org> Signed-off-by: Maya Erez <merez@codeaurora.org> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>