summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2019-12-25Merge branch 's390-qeth-fixes'David S. Miller4-38/+43
Julian Wiedmann says: ==================== s390/qeth: fixes 2019-12-23 please apply the following patch series for qeth to your net tree. This brings two fixes for errors during device initialization, deals with several issues in the vnicc control code, and adds a missing lock. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-25s390/qeth: fix initialization on old HWJulian Wiedmann1-3/+1
I stumbled over an old OSA model that claims to support DIAG_ASSIST, but then rejects the cmd to query its DIAG capabilities. In the old code this was ok, as the returned raw error code was > 0. Now that we translate the raw codes to errnos, the "rc < 0" causes us to fail the initialization of the device. The fix is trivial: don't bail out when the DIAG query fails. Such an error is not critical, we can still use the device (with a slightly reduced set of features). Fixes: 742d4d40831d ("s390/qeth: convert remaining legacy cmd callbacks") Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-25s390/qeth: vnicc Fix init to defaultAlexandra Winter1-1/+3
During vnicc_init wanted_char should be compared to cur_char and not to QETH_VNICC_DEFAULT. Without this patch there is no way to enforce the default values as desired values. Note, that it is expected, that a card comes online with default values. This patch was tested with private card firmware. Fixes: caa1f0b10d18 ("s390/qeth: add VNICC enable/disable support") Signed-off-by: Alexandra Winter <wintera@linux.ibm.com> Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-25s390/qeth: Fix vnicc_is_in_use if rx_bcast not setAlexandra Winter1-2/+1
Symptom: After vnicc/rx_bcast has been manually set to 0, bridge_* sysfs parameters can still be set or written. Only occurs on HiperSockets, as OSA doesn't support changing rx_bcast. Vnic characteristics and bridgeport settings are mutually exclusive. rx_bcast defaults to 1, so manually setting it to 0 should disable bridge_* parameters. Instead it makes sense here to check the supported mask. If the card does not support vnicc at all, bridge commands are always allowed. Fixes: caa1f0b10d18 ("s390/qeth: add VNICC enable/disable support") Signed-off-by: Alexandra Winter <wintera@linux.ibm.com> Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-25s390/qeth: fix false reporting of VNIC CHAR config failureAlexandra Winter1-1/+0
Symptom: Error message "Configuring the VNIC characteristics failed" in dmesg whenever an OSA interface on z15 is set online. The VNIC characteristics get re-programmed when setting a L2 device online. This follows the selected 'wanted' characteristics - with the exception that the INVISIBLE characteristic unconditionally gets switched off. For devices that don't support INVISIBLE (ie. OSA), the resulting IO failure raises a noisy error message ("Configuring the VNIC characteristics failed"). For IQD, INVISIBLE is off by default anyways. So don't unnecessarily special-case the INVISIBLE characteristic, and thereby suppress the misleading error message on OSA devices. Fixes: caa1f0b10d18 ("s390/qeth: add VNICC enable/disable support") Signed-off-by: Alexandra Winter <wintera@linux.ibm.com> Reviewed-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-25s390/qeth: lock the card while changing its hsuidJulian Wiedmann2-17/+28
qeth_l3_dev_hsuid_store() initially checks the card state, but doesn't take the conf_mutex to ensure that the card stays in this state while being reconfigured. Rework the code to take this lock, and drop a redundant state check in a helper function. Fixes: b333293058aa ("qeth: add support for af_iucv HiperSockets transport") Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-25s390/qeth: fix qdio teardown after early init errorJulian Wiedmann3-14/+10
qeth_l?_set_online() goes through a number of initialization steps, and on any error uses qeth_l?_stop_card() to tear down the residual state. The first initialization step is qeth_core_hardsetup_card(). When this fails after having established a QDIO context on the device (ie. somewhere after qeth_mpc_initialize()), qeth_l?_stop_card() doesn't shut down this QDIO context again (since the card state hasn't progressed from DOWN at this stage). Even worse, we then call qdio_free() as final teardown step to free the QDIO data structures - while some of them are still hooked into wider QDIO infrastructure such as the IRQ list. This is inevitably followed by use-after-frees and other nastyness. Fix this by unconditionally calling qeth_qdio_clear_card() to shut down the QDIO context, and also to halt/clear any pending activity on the various IO channels. Remove the naive attempt at handling the teardown in qeth_mpc_initialize(), it clearly doesn't suffice and we're handling it properly now in the wider teardown code. Fixes: 4a71df50047f ("qeth: new qeth device driver") Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-25Merge branch 'Simplify-IPv6-route-offload-API'David S. Miller5-176/+238
Ido Schimmel says: ==================== Simplify IPv6 route offload API Motivation ========== This is the IPv6 counterpart of "Simplify IPv4 route offload API" [1]. The aim of this patch set is to simplify the IPv6 route offload API by making the stack a bit smarter about the notifications it is generating. This allows driver authors to focus on programming the underlying device instead of having to duplicate the IPv6 route insertion logic in their driver, which is error-prone. Details ======= Today, whenever an IPv6 route is added or deleted a notification is sent in the FIB notification chain and it is up to offload drivers to decide if the route should be programmed to the hardware or not. This is not an easy task as in hardware routes are keyed by {prefix, prefix length, table id}, whereas the kernel can store multiple such routes that only differ in metric / nexthop info. This series makes sure that only routes that are actually used in the data path are notified to offload drivers. This greatly simplifies the work these drivers need to do, as they are now only concerned with programming the hardware and do not need to replicate the IPv6 route insertion logic and store multiple identical routes. The route that is notified is the first route in the IPv6 FIB node, which represents a single prefix and length in a given table. In case the route is deleted and there is another route with the same key, a replace notification is emitted. Otherwise, a delete notification is emitted. Unlike IPv4, in IPv6 it is possible to append individual nexthops to an existing multipath route. Therefore, in addition to the replace and delete notifications present in IPv4, an append notification is also used. Testing ======= To ensure there is no degradation in route insertion rates, I averaged the insertion rate of 512k routes (/64 and /128) over 50 runs. Did not observe any degradation. Functional tests are available here [2]. They rely on route trap indication, which is added in a subsequent patch set. In addition, I have been running syzkaller for the past couple of weeks with debug options enabled. Did not observe any problems. Patch set overview ================== Patches #1-#7 gradually introduce the new FIB notifications Patch #8 converts mlxsw to use the new notifications Patch #9 remove the old notifications [1] https://patchwork.ozlabs.org/cover/1209738/ [2] https://github.com/idosch/linux/tree/fib-notifier ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-25ipv6: Remove old route notifications and convert listenersIdo Schimmel5-70/+21
Now that mlxsw is converted to use the new FIB notifications it is possible to delete the old ones and use the new replace / append / delete notifications. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-25mlxsw: spectrum_router: Start using new IPv6 route notificationsIdo Schimmel1-149/+76
With the new notifications mlxsw does not need to handle identical routes itself, as this is taken care of by the core IPv6 code. Instead, mlxsw only needs to take care of inserting and removing routes from the device. Convert mlxsw to use the new IPv6 route notifications and simplify the code. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-25ipv6: Handle multipath route deletion notificationIdo Schimmel1-0/+26
When an entire multipath route is deleted, only emit a notification if it is the first route in the node. Emit a replace notification in case the last sibling is followed by another route. Otherwise, emit a delete notification. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-25ipv6: Handle route deletion notificationIdo Schimmel2-1/+44
For the purpose of route offload, when a single route is deleted, it is only of interest if it is the first route in the node or if it is sibling to such a route. In the first case, distinguish between several possibilities: 1. Route is the last route in the node. Emit a delete notification 2. Route is followed by a non-multipath route. Emit a replace notification for the non-multipath route. 3. Route is followed by a multipath route. Emit a replace notification for the multipath route. In the second case, only emit a delete notification to ensure the route is no longer used as a valid nexthop. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-25ipv6: Only Replay routes of interest to new listenersIdo Schimmel1-0/+40
When a new listener is registered to the FIB notification chain it receives a dump of all the available routes in the system. Instead, make sure to only replay the IPv6 routes that are actually used in the data path and are of any interest to the new listener. This is done by iterating over all the routing tables in the given namespace, but from each traversed node only the first route ('leaf') is notified. Multipath routes are notified in a single notification instead of one for each nexthop. Add fib6_rt_dump_tmp() to do that. Later on in the patch set it will be renamed to fib6_rt_dump() instead of the existing one. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-25ipv6: Notify multipath route if should be offloadedIdo Schimmel1-0/+48
In a similar fashion to previous patches, only notify the new multipath route if it is the first route in the node or if it was appended to such route. The type of the notification (replace vs. append) is determined based on the number of routes added ('nhn') and the number of sibling routes. If the two do not match, then an append notification should be sent. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-25ipv6: Notify route if replacing currently offloaded oneIdo Schimmel1-0/+7
Similar to the corresponding IPv4 patch, only notify the new route if it is replacing the currently offloaded one. Meaning, the one pointed to by 'fn->leaf'. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-25ipv6: Notify newly added route if should be offloadedIdo Schimmel1-0/+18
fib6_add_rt2node() takes care of adding a single route ('struct fib6_info') to a FIB node. The route in question should only be notified in case it is added as the first route in the node (lowest metric) or if it is added as a sibling route to the first route in the node. The first criterion can be tested by checking if the route is pointed to by 'fn->leaf'. The second criterion can be tested by checking the new 'notify_sibling_rt' variable that is set when the route is added as a sibling to the first route in the node. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-25net: fib_notifier: Add temporary events to the FIB notification chainIdo Schimmel1-0/+2
Subsequent patches are going to simplify the IPv6 route offload API, which will only use three events - replace, delete and append. Introduce a temporary version of replace and delete in order to make the conversion easier to review. Note that append does not need a temporary version, as it is currently not used. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-25Merge branch 'disable-neigh-update-for-tunnels-during-pmtu-update'David S. Miller19-32/+58
Hangbin Liu says: ==================== disable neigh update for tunnels during pmtu update When we setup a pair of gretap, ping each other and create neighbour cache. Then delete and recreate one side. We will never be able to ping6 to the new created gretap. The reason is when we ping6 remote via gretap, we will call like gre_tap_xmit() - ip_tunnel_xmit() - tnl_update_pmtu() - skb_dst_update_pmtu() - ip6_rt_update_pmtu() - __ip6_rt_update_pmtu() - dst_confirm_neigh() - ip6_confirm_neigh() - __ipv6_confirm_neigh() - n->confirmed = now As the confirmed time updated, in neigh_timer_handler() the check for NUD_DELAY confirm time will pass and the neigh state will back to NUD_REACHABLE. So the old/wrong mac address will be used again. If we do not update the confirmed time, the neigh state will go to neigh->nud_state = NUD_PROBE; then go to NUD_FAILED and re-create the neigh later, which is what IPv4 does. We couldn't remove the ip6_confirm_neigh() directly as we still need it for TCP flows. To fix it, we have to pass a bool parameter to dst_ops.update_pmtu() and only disable neighbor update for tunnels. v5: No code change, upate some commits description v4: No code change, upate some commits description v3: Do not remove dst_confirm_neigh, but add a new bool parameter in dst_ops.update_pmtu to control whether we should do neighbor confirm. Also split the big patch to small ones for each area. v2: Remove dst_confirm_neigh in __ip6_rt_update_pmtu. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-25net/dst: do not confirm neighbor for vxlan and geneve pmtu updateHangbin Liu1-1/+1
When do IPv6 tunnel PMTU update and calls __ip6_rt_update_pmtu() in the end, we should not call dst_confirm_neigh() as there is no two-way communication. So disable the neigh confirm for vxlan and geneve pmtu update. v5: No change. v4: No change. v3: Do not remove dst_confirm_neigh, but add a new bool parameter in dst_ops.update_pmtu to control whether we should do neighbor confirm. Also split the big patch to small ones for each area. v2: Remove dst_confirm_neigh in __ip6_rt_update_pmtu. Fixes: a93bf0ff4490 ("vxlan: update skb dst pmtu on tx path") Fixes: 52a589d51f10 ("geneve: update skb dst pmtu on tx path") Reviewed-by: Guillaume Nault <gnault@redhat.com> Tested-by: Guillaume Nault <gnault@redhat.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-25sit: do not confirm neighbor when do pmtu updateHangbin Liu1-1/+1
When do IPv6 tunnel PMTU update and calls __ip6_rt_update_pmtu() in the end, we should not call dst_confirm_neigh() as there is no two-way communication. v5: No change. v4: No change. v3: Do not remove dst_confirm_neigh, but add a new bool parameter in dst_ops.update_pmtu to control whether we should do neighbor confirm. Also split the big patch to small ones for each area. v2: Remove dst_confirm_neigh in __ip6_rt_update_pmtu. Reviewed-by: Guillaume Nault <gnault@redhat.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-25vti: do not confirm neighbor when do pmtu updateHangbin Liu2-2/+2
When do IPv6 tunnel PMTU update and calls __ip6_rt_update_pmtu() in the end, we should not call dst_confirm_neigh() as there is no two-way communication. Although vti and vti6 are immune to this problem because they are IFF_NOARP interfaces, as Guillaume pointed. There is still no sense to confirm neighbour here. v5: Update commit description. v4: No change. v3: Do not remove dst_confirm_neigh, but add a new bool parameter in dst_ops.update_pmtu to control whether we should do neighbor confirm. Also split the big patch to small ones for each area. v2: Remove dst_confirm_neigh in __ip6_rt_update_pmtu. Reviewed-by: Guillaume Nault <gnault@redhat.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-25tunnel: do not confirm neighbor when do pmtu updateHangbin Liu2-3/+3
When do tunnel PMTU update and calls __ip6_rt_update_pmtu() in the end, we should not call dst_confirm_neigh() as there is no two-way communication. v5: No Change. v4: Update commit description v3: Do not remove dst_confirm_neigh, but add a new bool parameter in dst_ops.update_pmtu to control whether we should do neighbor confirm. Also split the big patch to small ones for each area. v2: Remove dst_confirm_neigh in __ip6_rt_update_pmtu. Fixes: 0dec879f636f ("net: use dst_confirm_neigh for UDP, RAW, ICMP, L2TP") Reviewed-by: Guillaume Nault <gnault@redhat.com> Tested-by: Guillaume Nault <gnault@redhat.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-25net/dst: add new function skb_dst_update_pmtu_no_confirmHangbin Liu1-0/+9
Add a new function skb_dst_update_pmtu_no_confirm() for callers who need update pmtu but should not do neighbor confirm. v5: No change. v4: No change. v3: Do not remove dst_confirm_neigh, but add a new bool parameter in dst_ops.update_pmtu to control whether we should do neighbor confirm. Also split the big patch to small ones for each area. v2: Remove dst_confirm_neigh in __ip6_rt_update_pmtu. Reviewed-by: Guillaume Nault <gnault@redhat.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-25gtp: do not confirm neighbor when do pmtu updateHangbin Liu1-1/+1
When do IPv6 tunnel PMTU update and calls __ip6_rt_update_pmtu() in the end, we should not call dst_confirm_neigh() as there is no two-way communication. Although GTP only support ipv4 right now, and __ip_rt_update_pmtu() does not call dst_confirm_neigh(), we still set it to false to keep consistency with IPv6 code. v5: No change. v4: No change. v3: Do not remove dst_confirm_neigh, but add a new bool parameter in dst_ops.update_pmtu to control whether we should do neighbor confirm. Also split the big patch to small ones for each area. v2: Remove dst_confirm_neigh in __ip6_rt_update_pmtu. Reviewed-by: Guillaume Nault <gnault@redhat.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-25ip6_gre: do not confirm neighbor when do pmtu updateHangbin Liu1-1/+1
When we do ipv6 gre pmtu update, we will also do neigh confirm currently. This will cause the neigh cache be refreshed and set to REACHABLE before xmit. But if the remote mac address changed, e.g. device is deleted and recreated, we will not able to notice this and still use the old mac address as the neigh cache is REACHABLE. Fix this by disable neigh confirm when do pmtu update v5: No change. v4: No change. v3: Do not remove dst_confirm_neigh, but add a new bool parameter in dst_ops.update_pmtu to control whether we should do neighbor confirm. Also split the big patch to small ones for each area. v2: Remove dst_confirm_neigh in __ip6_rt_update_pmtu. Reported-by: Jianlin Shi <jishi@redhat.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-25net: add bool confirm_neigh parameter for dst_ops.update_pmtuHangbin Liu14-25/+42
The MTU update code is supposed to be invoked in response to real networking events that update the PMTU. In IPv6 PMTU update function __ip6_rt_update_pmtu() we called dst_confirm_neigh() to update neighbor confirmed time. But for tunnel code, it will call pmtu before xmit, like: - tnl_update_pmtu() - skb_dst_update_pmtu() - ip6_rt_update_pmtu() - __ip6_rt_update_pmtu() - dst_confirm_neigh() If the tunnel remote dst mac address changed and we still do the neigh confirm, we will not be able to update neigh cache and ping6 remote will failed. So for this ip_tunnel_xmit() case, _EVEN_ if the MTU is changed, we should not be invoking dst_confirm_neigh() as we have no evidence of successful two-way communication at this point. On the other hand it is also important to keep the neigh reachability fresh for TCP flows, so we cannot remove this dst_confirm_neigh() call. To fix the issue, we have to add a new bool parameter for dst_ops.update_pmtu to choose whether we should do neigh update or not. I will add the parameter in this patch and set all the callers to true to comply with the previous way, and fix the tunnel code one by one on later patches. v5: No change. v4: No change. v3: Do not remove dst_confirm_neigh, but add a new bool parameter in dst_ops.update_pmtu to control whether we should do neighbor confirm. Also split the big patch to small ones for each area. v2: Remove dst_confirm_neigh in __ip6_rt_update_pmtu. Suggested-by: David Miller <davem@davemloft.net> Reviewed-by: Guillaume Nault <gnault@redhat.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-25r8169: move enabling EEE to rtl8169_init_phyHeiner Kallweit1-11/+3
Simplify the code by moving the call to rtl_enable_eee() from the individual PHY configs to rtl8169_init_phy(). Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-25r8169: remove MAC workaround in rtl8168e_2_hw_phy_configHeiner Kallweit1-3/+0
Due to recent changes we don't need the call to rtl_rar_exgmac_set() and longer at this place. It's called from rtl_rar_set() which is called in rtl_init_mac_address() and rtl8169_resume(). Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-25r8169: factor out rtl8168h_2_get_adc_bias_ioffsetHeiner Kallweit1-19/+20
Simplify and factor out this magic from rtl8168h_2_hw_phy_config() and name it based on the vendor driver. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-25Merge branch 'ovs-mpls-actions'David S. Miller4-9/+96
Martin Varghese says: ==================== New openvswitch MPLS actions for layer 2 tunnelling The existing PUSH MPLS action inserts MPLS header between ethernet header and the IP header. Though this behaviour is fine for L3 VPN where an IP packet is encapsulated inside a MPLS tunnel, it does not suffice the L2 VPN (l2 tunnelling) requirements. In L2 VPN the MPLS header should encapsulate the ethernet packet. The new mpls action ADD_MPLS inserts MPLS header at the start of the packet or at the start of the l3 header depending on the value of l3 tunnel flag in the ADD_MPLS arguments. POP_MPLS action is extended to support ethertype 0x6558 OVS userspace changes - --------------------- Encap & Decap ovs actions are extended to support MPLS packet type. The encap & decap adds and removes MPLS header at the start of packet as depicted below. Actions - encap(mpls(ether_type=0x8847)),encap(ethernet) Incoming packet -> | ETH | IP | Payload | 1 Actions - encap(mpls(ether_type=0x8847)) [Kernel action - add_mpls:0x8847] Outgoing packet -> | MPLS | ETH | Payload| 2 Actions - encap(ethernet) [ Kernel action - push_eth ] Outgoing packet -> | ETH | MPLS | ETH | Payload| Decapsulation: Incoming packet -> | ETH | MPLS | ETH | IP | Payload | Actions - decap(),decap(packet_type(ns=0,type=0) 1 Actions - decap() [Kernel action - pop_eth) Outgoing packet -> | MPLS | ETH | IP | Payload| 2 Actions - decap(packet_type(ns=0,type=0) [Kernel action - pop_mpls:0x6558] Outgoing packet -> | ETH | IP | Payload ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-25openvswitch: New MPLS actions for layer 2 tunnellingMartin Varghese3-6/+89
The existing PUSH MPLS action inserts MPLS header between ethernet header and the IP header. Though this behaviour is fine for L3 VPN where an IP packet is encapsulated inside a MPLS tunnel, it does not suffice the L2 VPN (l2 tunnelling) requirements. In L2 VPN the MPLS header should encapsulate the ethernet packet. The new mpls action ADD_MPLS inserts MPLS header at the start of the packet or at the start of the l3 header depending on the value of l3 tunnel flag in the ADD_MPLS arguments. POP_MPLS action is extended to support ethertype 0x6558. Signed-off-by: Martin Varghese <martin.varghese@nokia.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-25net: Rephrased comments section of skb_mpls_pop()Martin Varghese1-1/+1
Rephrased comments section of skb_mpls_pop() to align it with comments section of skb_mpls_push(). Signed-off-by: Martin Varghese <martin.varghese@nokia.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-25net: skb_mpls_push() modified to allow MPLS header push at start of packet.Martin Varghese1-2/+6
The existing skb_mpls_push() implementation always inserts mpls header after the mac header. L2 VPN use cases requires MPLS header to be inserted before the ethernet header as the ethernet packet gets tunnelled inside MPLS header in those cases. Signed-off-by: Martin Varghese <martin.varghese@nokia.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-25Merge tag 'rxrpc-fixes-20191220' of ↵David S. Miller7-98/+85
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs David Howells says: ==================== rxrpc: Fixes Here are a couple of bugfixes plus a patch that makes one of the bugfixes easier: (1) Move the ping and mutex unlock on a new call from rxrpc_input_packet() into rxrpc_new_incoming_call(), which it calls. This means the lock-unlock section is entirely within the latter function. This simplifies patch (2). (2) Don't take the call->user_mutex at all in the softirq path. Mutexes aren't allowed to be taken or released there and a patch was merged that caused a warning to be emitted every time this happened. Looking at the code again, it looks like that taking the mutex isn't actually necessary, as the value of call->state will block access to the call. (3) Fix the incoming call path to check incoming calls earlier to reject calls to RPC services for which we don't have a security key of the appropriate class. This avoids an assertion failure if YFS tries making a secure call to the kafs cache manager RPC service. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-25net: dsa: bcm_sf2: Fix IP fragment location and behaviorFlorian Fainelli1-3/+3
The IP fragment is specified through user-defined field as the first bit of the first user-defined word. We were previously trying to extract it from the user-defined mask which could not possibly work. The ip_frag is also supposed to be a boolean, if we do not cast it as such, we risk overwriting the next fields in CFP_DATA(6) which would render the rule inoperative. Fixes: 7318166cacad ("net: dsa: bcm_sf2: Add support for ethtool::rxnfc") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-25sctp: fix err handling of stream initializationMarcelo Ricardo Leitner1-15/+15
The fix on 951c6db954a1 fixed the issued reported there but introduced another. When the allocation fails within sctp_stream_init() it is okay/necessary to free the genradix. But it is also called when adding new streams, from sctp_send_add_streams() and sctp_process_strreset_addstrm_in() and in those situations it cannot just free the genradix because by then it is a fully operational association. The fix here then is to only free the genradix in sctp_stream_init() and on those other call sites move on with what it already had and let the subsequent error handling to handle it. Tested with the reproducers from this report and the previous one, with lksctp-tools and sctp-tests. Reported-by: syzbot+9a1bc632e78a1a98488b@syzkaller.appspotmail.com Fixes: 951c6db954a1 ("sctp: fix memleak on err handling of stream initialization") Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-25udp: fix integer overflow while computing available space in sk_rcvbufAntonio Messina1-1/+1
When the size of the receive buffer for a socket is close to 2^31 when computing if we have enough space in the buffer to copy a packet from the queue to the buffer we might hit an integer overflow. When an user set net.core.rmem_default to a value close to 2^31 UDP packets are dropped because of this overflow. This can be visible, for instance, with failure to resolve hostnames. This can be fixed by casting sk_rcvbuf (which is an int) to unsigned int, similarly to how it is done in TCP. Signed-off-by: Antonio Messina <amessina@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-24io-wq: add cond_resched() to worker threadHillf Danton1-0/+2
Reschedule the current IO worker to cut the risk that it is becoming a cpu hog. Signed-off-by: Hillf Danton <hdanton@sina.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-23rseq/selftests: Clarify rseq_prepare_unload() helper requirementsMathieu Desnoyers1-5/+7
The rseq.h UAPI now documents that the rseq_cs field must be cleared before reclaiming memory that contains the targeted struct rseq_cs, but also that the rseq_cs field must be cleared before reclaiming memory of the code pointed to by the rseq_cs start_ip and post_commit_offset fields. While we can expect that use of dlclose(3) will typically unmap both struct rseq_cs and its associated code at once, nothing would theoretically prevent a JIT from reclaiming the code without reclaiming the struct rseq_cs, which would erroneously allow the kernel to consider new code which is not a rseq critical section as a rseq critical section following a code reclaim. Suggested-by: Florian Weimer <fw@deneb.enyo.de> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Shuah Khan <skhan@linuxfoundation.org> Cc: Florian Weimer <fw@deneb.enyo.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: "Paul E. McKenney" <paulmck@linux.ibm.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Paul Turner <pjt@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2019-12-23rseq/selftests: Fix: Namespace gettid() for compatibility with glibc 2.30Mathieu Desnoyers1-8/+10
glibc 2.30 introduces gettid() in public headers, which clashes with the internal static definition within rseq selftests. Rename gettid() to rseq_gettid() to eliminate this symbol name clash. Reported-by: Tommi T. Rantala <tommi.t.rantala@nokia.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Shuah Khan <skhan@linuxfoundation.org> Cc: Tommi T. Rantala <tommi.t.rantala@nokia.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: "Paul E. McKenney" <paulmck@linux.ibm.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Paul Turner <pjt@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: <stable@vger.kernel.org> # v4.18+ Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2019-12-23rseq/selftests: Turn off timeout settingMathieu Desnoyers1-0/+1
As the rseq selftests can run for a long period of time, disable the timeout that the general selftests have. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Shuah Khan <skhan@linuxfoundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: "Paul E. McKenney" <paulmck@linux.ibm.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Paul Turner <pjt@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2019-12-23kunit/kunit_tool_test: Test '--build_dir' option runSeongJae Park1-0/+8
This commit adds kunit tool test for the '--build_dir' option. Signed-off-by: SeongJae Park <sjpark@amazon.de> Reviewed-by: Brendan Higgins <brendanhiggins@google.com> Tested-by: Brendan Higgins <brendanhiggins@google.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2019-12-23kunit: Rename 'kunitconfig' to '.kunitconfig'SeongJae Park3-10/+8
This commit renames 'kunitconfig' to '.kunitconfig' so that it can be automatically ignored by git and do not disturb people who want to type 'kernel/' by pressing only the 'k' and then 'tab' key. Signed-off-by: SeongJae Park <sjpark@amazon.de> Reviewed-by: Brendan Higgins <brendanhiggins@google.com> Tested-by: Brendan Higgins <brendanhiggins@google.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2019-12-23kunit: Place 'test.log' under the 'build_dir'SeongJae Park3-4/+4
'kunit' writes the 'test.log' under the kernel source directory even though a 'build_dir' option is given. As users who use the option might expect the outputs to be placed under the specified directory, this commit modifies the logic to write the log file under the 'build_dir'. Signed-off-by: SeongJae Park <sjpark@amazon.de> Reviewed-by: Brendan Higgins <brendanhiggins@google.com> Tested-by: Brendan Higgins <brendanhiggins@google.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2019-12-23kunit: Create default config in '--build_dir'SeongJae Park2-4/+11
If both '--build_dir' and '--defconfig' are given, the handling of '--defconfig' ignores '--build_dir' option. This commit modifies the behavior to respect '--build_dir' option. Reported-by: Brendan Higgins <brendanhiggins@google.com> Suggested-by: Brendan Higgins <brendanhiggins@google.com> Signed-off-by: SeongJae Park <sjpark@amazon.de> Reviewed-by: Brendan Higgins <brendanhiggins@google.com> Tested-by: Brendan Higgins <brendanhiggins@google.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2019-12-23kunit: Remove duplicated defconfig creationSeongJae Park1-3/+0
'--defconfig' option is handled by the 'main() of the 'kunit.py' but again handled in following 'run_tests()'. This commit removes this duplicated handling of the option in the 'run_tests()'. Signed-off-by: SeongJae Park <sjpark@amazon.de> Reviewed-by: Brendan Higgins <brendanhiggins@google.com> Tested-by: Brendan Higgins <brendanhiggins@google.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2019-12-23docs/kunit/start: Use in-tree 'kunit_defconfig'SeongJae Park1-2/+1
The kunit doc suggests users to get the default `kunitconfig` from an external git tree. However, the file is already located under the `arch/um/configs/` of the kernel tree. Because the local file is easier to access and maintain, this commit updates the doc to use it. Signed-off-by: SeongJae Park <sjpark@amazon.de> Reviewed-by: Brendan Higgins <brendanhiggins@google.com> Tested-by: Brendan Higgins <brendanhiggins@google.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2019-12-23selftests: livepatch: Fix it to do root uid check and skipShuah Khan2-3/+15
livepatch test configures the system and debug environment to run tests. Some of these actions fail without root access and test dumps several permission denied messages before it exits. Fix test-state.sh to call setup_config instead of set_dynamic_debug as suggested by Petr Mladek <pmladek@suse.com> Fix it to check root uid and exit with skip code instead. Signed-off-by: Shuah Khan <skhan@linuxfoundation.org> Acked-by: Joe Lawrence <joe.lawrence@redhat.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2019-12-23selftests: firmware: Fix it to do root uid check and skipShuah Khan1-0/+6
firmware attempts to load test modules that require root access and fail. Fix it to check for root uid and exit with skip code instead. Before this fix: selftests: firmware: fw_run_tests.sh modprobe: ERROR: could not insert 'test_firmware': Operation not permitted You must have the following enabled in your kernel: CONFIG_TEST_FIRMWARE=y CONFIG_FW_LOADER=y CONFIG_FW_LOADER_USER_HELPER=y CONFIG_IKCONFIG=y CONFIG_IKCONFIG_PROC=y not ok 1 selftests: firmware: fw_run_tests.sh # SKIP With this fix: selftests: firmware: fw_run_tests.sh skip all tests: must be run as root not ok 1 selftests: firmware: fw_run_tests.sh # SKIP Signed-off-by: Shuah Khan <skhan@linuxfoundation.org> Reviwed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2019-12-23selftests: filesystems/epoll: fix build errorShuah Khan1-1/+1
epoll build fails to find pthread lib. Fix Makefile to use LDLIBS instead of LDFLAGS. LDLIBS is the right flag to use here with -l option when invoking ld. gcc -I../../../../../usr/include/ -lpthread epoll_wakeup_test.c -o .../tools/testing/selftests/filesystems/epoll/epoll_wakeup_test /usr/bin/ld: /tmp/ccaZvJUl.o: in function `kill_timeout': epoll_wakeup_test.c:(.text+0x4dd): undefined reference to `pthread_kill' /usr/bin/ld: epoll_wakeup_test.c:(.text+0x4f2): undefined reference to `pthread_kill' /usr/bin/ld: /tmp/ccaZvJUl.o: in function `epoll9': epoll_wakeup_test.c:(.text+0x6382): undefined reference to `pthread_create' /usr/bin/ld: epoll_wakeup_test.c:(.text+0x64d2): undefined reference to `pthread_create' /usr/bin/ld: epoll_wakeup_test.c:(.text+0x6626): undefined reference to `pthread_join' /usr/bin/ld: epoll_wakeup_test.c:(.text+0x684c): undefined reference to `pthread_tryjoin_np' /usr/bin/ld: epoll_wakeup_test.c:(.text+0x6864): undefined reference to `pthread_kill' /usr/bin/ld: epoll_wakeup_test.c:(.text+0x6878): undefined reference to `pthread_join' Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>