summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2016-11-13openvswitch: allow L3 netdev portsJiri Benc1-3/+6
Allow ARPHRD_NONE interfaces to be added to ovs bridge. Based on previous versions by Lorand Jakab and Simon Horman. Signed-off-by: Lorand Jakab <lojakab@cisco.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-13openvswitch: add Ethernet push and pop actionsJiri Benc3-0/+82
It's not allowed to push Ethernet header in front of another Ethernet header. It's not allowed to pop Ethernet header if there's a vlan tag. This preserves the invariant that L3 packet never has a vlan tag. Based on previous versions by Lorand Jakab and Simon Horman. Signed-off-by: Lorand Jakab <lojakab@cisco.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-13openvswitch: netlink: support L3 packetsJiri Benc1-61/+99
Extend the ovs flow netlink protocol to support L3 packets. Packets without OVS_KEY_ATTR_ETHERNET attribute specify L3 packets; for those, the OVS_KEY_ATTR_ETHERTYPE attribute is mandatory. Push/pop vlan actions are only supported for Ethernet packets. Based on previous versions by Lorand Jakab and Simon Horman. Signed-off-by: Lorand Jakab <lojakab@cisco.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-13openvswitch: add processing of L3 packetsJiri Benc3-37/+101
Support receiving, extracting flow key and sending of L3 packets (packets without an Ethernet header). Note that even after this patch, non-Ethernet interfaces are still not allowed to be added to bridges. Similarly, netlink interface for sending and receiving L3 packets to/from user space is not in place yet. Based on previous versions by Lorand Jakab and Simon Horman. Signed-off-by: Lorand Jakab <lojakab@cisco.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-13openvswitch: support MPLS push and pop for L3 packetsJiri Benc1-7/+11
Update Ethernet header only if there is one. Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-13openvswitch: pass mac_proto to ovs_vport_sendJiri Benc3-14/+19
We'll need it to alter packets sent to ARPHRD_NONE interfaces. Change do_output() to use the actual L2 header size of the packet when deciding on the minimum cutlen. The assumption here is that what matters is not the output interface hard_header_len but rather the L2 header of the particular packet. For example, ARPHRD_NONE tunnels that encapsulate Ethernet should get at least the Ethernet header. Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-13openvswitch: add mac_proto field to the flow keyJiri Benc4-11/+31
Use a hole in the structure. We support only Ethernet so far and will add a support for L2-less packets shortly. We could use a bool to indicate whether the Ethernet header is present or not but the approach with the mac_proto field is more generic and occupies the same number of bytes in the struct, while allowing later extensibility. It also makes the code in the next patches more self explaining. It would be nice to use ARPHRD_ constants but those are u16 which would be waste. Thus define our own constants. Another upside of this is that we can overload this new field to also denote whether the flow key is valid. This has the advantage that on refragmentation, we don't have to reparse the packet but can rely on the stored eth.type. This is especially important for the next patches in this series - instead of adding another branch for L2-less packets before calling ovs_fragment, we can just remove all those branches completely. Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-13openvswitch: use hard_header_len instead of hardcoded ETH_HLENJiri Benc2-5/+8
On tx, use hard_header_len while deciding whether to refragment or drop the packet. That way, all combinations are calculated correctly: * L2 packet going to L2 interface (the L2 header len is subtracted), * L2 packet going to L3 interface (the L2 header is included in the packet lenght), * L3 packet going to L3 interface. Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-13bpf, mlx4: fix prog refcount in mlx4_en_try_alloc_resources error pathDaniel Borkmann3-1/+20
Commit 67f8b1dcb9ee ("net/mlx4_en: Refactor the XDP forwarding rings scheme") added a bug in that the prog's reference count is not dropped in the error path when mlx4_en_try_alloc_resources() is failing from mlx4_xdp_set(). We previously took bpf_prog_add(prog, priv->rx_ring_num - 1), that we need to release again. Earlier in the call path, dev_change_xdp_fd() itself holds a reference to the prog as well (hence the '- 1' in the bpf_prog_add()), so a simple atomic_sub() is safe to use here. When an error is propagated, then bpf_prog_put() is called eventually from dev_change_xdp_fd() Fixes: 67f8b1dcb9ee ("net/mlx4_en: Refactor the XDP forwarding rings scheme") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-13net: ethernet: ti: davinci_cpdma: free memory while channel destroyIvan Khoronzhuk1-1/+1
While create/destroy channel operation memory is not freed. It was supposed that memory is freed while driver remove. But a channel can be created and destroyed many times while changing number of channels with ethtool. Based on net-next/master Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-10Merge branch 'hns-fixes'David S. Miller14-142/+577
Salil Mehta says: ==================== Bug fixes & Code improvements in HNS driver This patch-set introduces some bug fixes and code improvements. These have been identified during internal review or testing of the driver by internal Hisilicon teams. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-10net: hns: add the support to add/remove the ucast entry to/from tableKejian Yan7-0/+155
This patch adds the support to add or remove the unicast entries to the table and remove from the table. Reported-by: Daode Huang <huangdaode@hisilicon.com> Signed-off-by: Kejian Yan <yankejian@huawei.com> Reviewed-by: Yisen Zhuang <yisen.zhuang@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-10net: hns: add multicast tcam table clearKejian Yan7-0/+113
There is no clear operation before add a new multicast tcam table, so the tcam table will be overflow when add more entries. Reported-by: Daode Huang <huangdaode@hisilicon.com> Signed-off-by: Kejian Yan <yankejian@huawei.com> Reviewed-by: Yisen Zhuang <yisen.zhuang@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-10net: hns: modify tcam table of mask_keyQianqian Xie1-0/+7
The packets of wrong mac address(only the last bit is different) can be received in Big-endian by current definition of mask_key. Thus it needs to be modified to support Big-endian and ensure Big-endian normal. Signed-off-by: Qianqian Xie <xieqianqian@huawei.com> Reviewed-by: Yisen Zhuang <yisen.zhuang@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-10net: hns: modify tcam table of mac mc-entryQianqian Xie1-8/+17
The current definition of mac_mc_entry is only suitable for Little-endian. Thus it needs to modify tcam table of mac mc-entry to support both Little-endian and Big-endian. Signed-off-by: Qianqian Xie <xieqianqian@huawei.com> Reviewed-by: Yisen Zhuang <yisen.zhuang@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-10net: hns: modify tcam table of mac mc-portQianqian Xie1-5/+13
Little-endian is only supported by current tcam table to add or delete mac mc-port. This patch makes it support both Little-endian and Big-endian. Signed-off-by: Qianqian Xie <xieqianqian@huawei.com> Reviewed-by: Yisen Zhuang <yisen.zhuang@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-10net: hns: modify table index to get mac entryQianqian Xie1-4/+10
Big-endian is not supported by the current definition of table index to get mac entry. It needs to be modified to support both Little-endian and Big-endian. Signed-off-by: Qianqian Xie <xieqianqian@huawei.com> Reviewed-by: Yisen Zhuang <yisen.zhuang@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-10net: hns: modify tcam table of mac uc-entryQianqian Xie1-6/+12
The current definition of mac_uc_entry is only suitable for Little-endian. Thus it needs to modify tcam table of mac uc-entry to support both Little-endian and Big-endian. Signed-off-by: Qianqian Xie <xieqianqian@huawei.com> Reviewed-by: Yisen Zhuang <yisen.zhuang@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-10net: hns: modify tcam table and set mac keyQianqian Xie2-9/+20
The current definition of dsaf_drv_tbl_tcam_key is only suitable for Little-endian. If data is stored in Big-endian, this may lead to error in data use. Shift operation can make it work normally in both Big-endian and Little-endian. Signed-off-by: Qianqian Xie <xieqianqian@huawei.com> Reviewed-by: Yisen Zhuang <yisen.zhuang@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-10net: hns: modify buffer format of cpu data to le64Qianqian Xie1-3/+3
Hardware ring buffer data is stored in Little-endian. Thus cpu data should be modified to Little-endian. Signed-off-by: Qianqian Xie <xieqianqian@huawei.com> Reviewed-by: Yisen Zhuang <yisen.zhuang@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-10net: hns: fix to intimate the link-status change by adding LF/RF methodDaode Huang5-49/+37
In current scenario, when the interface is disabled we reset the XGMAC RX/TX functionality. This operation does not affects the PHY layer/SFP and which appears UP to the remote end(this behaviour is unlike GMAC). The result is remote end keeps on sending the packets which gets partly processed by XMAC and dropped. Since these are partly processed these appears as errored packets in the packet counter statistics. This patch fixes this behaviour and adds local-fault and remote-fault functionality which can be used to intimate the remote peer whenever the state of the interface changes. This patch also removes the existing hns_dsaf_xge_core_srst_by_port function which was being used to reset the RX/TX functionality at XGE Core. Reported-by: Jun He <hjat2005@huawei.com> Signed-off-by: Daode Huang <huangdaode@hisilicon.com> Reviewed-by: Yisen Zhuang <yisen.zhuang@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-10net: hns: modify ethtool statistics value errorQianqian Xie1-2/+2
This patch modify the gmac_rx_filt_pkt and gmac_rx_octets_total_filt statistics value. The two statistics is inconsistent with register, and just the opposite. Signed-off-by: Qianqian Xie <xieqianqian@huawei.com> Signed-off-by: Jun He <hjat2005@huawei.com> Reviewed-by: Yisen Zhuang <yisen.zhuang@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-10net: hns: delete redundant macro definitionQianqian Xie3-5/+1
This patch deletes redundant macro definitions in hns drivers. And change the .h file containing relation to make the layers more clearly Signed-off-by: Qianqian Xie <xieqianqian@huawei.com> Signed-off-by: Weiwei Deng <dengweiwei@huawei.com> Reviewed-by: Yisen Zhuang <yisen.zhuang@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-10net: hns: bug fix about restart auto-negotiationDaode Huang1-1/+2
When set auto-negotiation off and duplex half, if run "ethtool -r ethX" on port with phy, then the port will be failed to work. It should forbid to start auto-negotiation when auto-negotiate is off. This patch add the limited condition. Reported-by: Jinchuang Tian <tianjinchuang1@huawei.com> Signed-off-by: Daode Huang <huangdaode@hisilicon.com> Reviewed-by: Yisen Zhuang <yisen.zhuang@huawei.com> Reviewed-by: lipeng <lipeng321@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-10net: hns: set default mac pause time to 0xffffDaode Huang1-1/+1
The default mac pause time set to 0xff which is too short for pausing, this patch change it to the max value 0xffff. Signed-off-by: Daode Huang <huangdaode@hisilicon.com> Reviewed-by: Yisen Zhuang <yisen.zhuang@huawei.com> Reviewed-by: lipeng <lipeng321@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-10net: hns: fix for promisc mode in HNS driverKejian Yan4-4/+67
If set promisc mode when there is some traffic, The service nic will cause system halted. We reserve the last 6 tcam entry for the 6 ports. If promisc mode is enabled, we can config the relative tcam as fuzzy matching and set to be valid, or set the tcam to be invalid Signed-off-by: Kejian Yan <yankejian@huawei.com> Reviewed-by: Yisen Zhuang <yisen.zhuang@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-10net: hns: add fuzzy match of tcam table for hnsKejian Yan5-54/+118
Since there is not enough tcam table entries for vlan and multicast address, HNSv2 needs to add support of fuzzy matching of TCAM tables. To add fuzzy match of TCAM, we Add the property to mask the bits to be fuzzy matched Signed-off-by: Kejian Yan <yankejian@huawei.com> Reviewed-by: Yisen Zhuang <yisen.zhuang@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-10Doc: hisi: hns adds mc-mac-mask propertyKejian Yan1-0/+8
Since there is not enough tcam table entries for every vlan and multicast address, HNS needs to add support of fuzzy matching of TCAM tables. Adding the property to mask the bits to be fuzzy matched, so update the bindings document Signed-off-by: Kejian Yan <yankejian@huawei.com> Reviewed-by: Yisen Zhuang <yisen.zhuang@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-10tcp: remove unaligned accesses from tcp_get_info()Eric Dumazet1-6/+5
After commit 6ed46d1247a5 ("sock_diag: align nlattr properly when needed"), tcp_get_info() gets 64bit aligned memory, so we can avoid the unaligned helpers. Suggested-by: David Miller <davem@davemloft.net> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-10Merge tag 'batadv-next-for-davem-20161108-v2' of ↵David S. Miller22-255/+582
git://git.open-mesh.org/linux-merge Simon Wunderlich says: ==================== pull request for net-next: batman-adv 2016-11-08 v2 This feature and cleanup patchset includes the following changes: - netlink and code cleanups by Sven Eckelmann (3 patches) - Cleanup and minor fixes by Linus Luessing (3 patches) - Speed up multicast update intervals, by Linus Luessing - Avoid (re)broadcast in meshes for some easy cases, by Linus Luessing - Clean up tx return state handling, by Sven Eckelmann (6 patches) - Fix some special mac address handling cases, by Sven Eckelmann (3 patches) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-10Merge branch 'PHC-freq-fine-tuning'David S. Miller4-16/+27
Richard Cochran says: ==================== PHC frequency fine tuning This series expands the PTP Hardware Clock subsystem by adding a method that passes the frequency tuning word to the the drivers without dropping the low order bits. Keeping those bits is useful for drivers whose frequency resolution is higher than 1 ppb. The appended script (below) runs a simple demonstration of the improvement. This test needs two Intel i210 PCIe cards installed in the same PC, with their SDP0 pins connected by copper wire. Measuring the estimated offset (from the ptp4l servo) and the true offset (from the PPS) over one hour yields the following statistics. | | Est. Before | Est. After | True Before | True After | |--------+---------------+---------------+---------------+---------------| | min | -5.200000e+01 | -1.600000e+01 | -3.100000e+01 | -1.000000e+00 | | max | +5.700000e+01 | +2.500000e+01 | +8.500000e+01 | +4.000000e+01 | | pk-pk: | +1.090000e+02 | +4.100000e+01 | +1.160000e+02 | +4.100000e+01 | | mean | +6.472222e-02 | +1.277778e-02 | +2.422083e+01 | +1.826083e+01 | | stddev | +1.158006e+01 | +4.581982e+00 | +1.207708e+01 | +4.981435e+00 | Here the numbers in units of nanoseconds, and the ~20 nanosecond PPS offset is due to input/output delays on the i210's external interface logic. With the series applied, both the peak to peak error and the standard deviation improve by a factor of more than two. These two graphs show the improvement nicely. http://linuxptp.sourceforge.net/fine-tuning/fine-est.png http://linuxptp.sourceforge.net/fine-tuning/fine-tru.png ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-10ptp: dp83640: Use the high resolution frequency method.Richard Cochran1-7/+7
The dp83640 has a frequency resolution of about 0.029 ppb. This patch lets users of the device benefit from the increased frequency resolution when tuning the clock. Signed-off-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-10ptp: igb: Use the high resolution frequency method.Richard Cochran1-8/+8
The 82580 and related devices offer a frequency resolution of about 0.029 ppb. This patch lets users of the device benefit from the increased frequency resolution when tuning the clock. Signed-off-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-10ptp: Introduce a high resolution frequency adjustment method.Richard Cochran2-1/+12
The internal PTP Hardware Clock (PHC) interface limits the resolution for frequency adjustments to one part per billion. However, some hardware devices allow finer adjustment, and making use of the increased resolution improves synchronization measurably on such devices. This patch adds an alternative method that allows finer frequency tuning by passing the scaled ppm value to PHC drivers. This value comes from user space, and it has a resolution of about 0.015 ppb. We also deprecate the older method, anticipating its removal once existing drivers have been converted over. Signed-off-by: Richard Cochran <richardcochran@gmail.com> Suggested-by: Ulrik De Bie <ulrik.debie-os@e2big.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-10net: napi_hash_add() is no longer exportedEric Dumazet2-13/+1
There are no more users except from net/core/dev.c napi_hash_add() can now be static. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-10bnxt_en: do not call napi_hash_add()Eric Dumazet1-1/+0
This is automatically done from netif_napi_add(), and we want to not export napi_hash_add() anymore in the following patch. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Michael Chan <michael.chan@broadcom.com> Acked-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-10bpf: Remove unused but set variablesTobias Klauser1-2/+0
Remove the unused but set variables min_set and max_set in adjust_reg_min_max_vals to fix the following warning when building with 'W=1': kernel/bpf/verifier.c:1483:7: warning: variable ‘min_set’ set but not used [-Wunused-but-set-variable] There is no warning about max_set being unused, but since it is only used in the assignment of min_set it can be removed as well. They were introduced in commit 484611357c19 ("bpf: allow access into map value arrays") but seem to have never been used. Cc: Josef Bacik <jbacik@fb.com> Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-10tc_act: Remove tcf_act macroYotam Gigi1-1/+0
tc_act macro addressed a non existing field, and was not used in the kernel source. Signed-off-by: Yotam Gigi <yotamg@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-10Merge branch 'ipv6-sr'David S. Miller28-12/+2070
David Lebrun says: ==================== net: add support for IPv6 Segment Routing v5: - Check SRH validity when adding a new route with lwtunnels and when setting an IPV6_RTHDR socket option. - Check that hdr->segments_left is not out of bounds when processing an SR-enabled packet. - Add __ro_after_init attribute to seg6_genl_policy structure. - Add CONFIG_IPV6_SEG6_INLINE option to enable or disable direct header insertion. v4: - Change @cleanup in ipv6_srh_rcv() from int to bool - Move checksum helper functions into header file - Add common definition for SR TLVs - Add comments for HMAC computation algorithm - Use rhashtable to store HMAC infos instead of linked list - Remove packed attribute for struct sr6_tlv_hmac - Use dst cache only if CONFIG_DST_CACHE is enabled v3: - Fix compilation for CONFIG_IPV6={n,m} v2: - Remove packed attribute from sr6 struct and replaced unaligned 16-bit flags with two 8-bit flags. - SR code now included by default. Option CONFIG_IPV6_SEG6_HMAC exists for HMAC support (which requires crypto dependencies). - Replace "hidden" calls to mutex_{un,}lock to direct calls. - Fix reverse xmas tree coding style. - Fix cast-from-void*'s. - Update skb->csum to account for SR modifications. - Add dst_cache in seg6_output. Segment Routing (SR) is a source routing paradigm, architecturally defined in draft-ietf-spring-segment-routing-09 [1]. The IPv6 flavor of SR is defined in draft-ietf-6man-segment-routing-header-02 [2]. The main idea is that an SR-enabled packet contains a list of segments, which represent mandatory waypoints. Each waypoint is called a segment endpoint. The SR-enabled packet is routed normally (e.g. shortest path) between the segment endpoints. A node that inserts an SRH into a packet is called an ingress node, and a node that is the last segment endpoint is called an egress node. From an IPv6 viewpoint, an SR-enabled packet contains an IPv6 extension header, which is a Routing Header type 4, defined as follows: struct ipv6_sr_hdr { __u8 nexthdr; __u8 hdrlen; __u8 type; __u8 segments_left; __u8 first_segment; __u8 flag_1; __u8 flag_2; __u8 reserved; struct in6_addr segments[0]; }; The first 4 bytes of the SRH is consistent with the Routing Header definition in RFC 2460. The type is set to `4' (SRH). Each segment is encoded as an IPv6 address. The segments are encoded in reverse order: segments[0] is the last segment of the path, and segments[first_segment] is the first segment of the path. segments[segments_left] points to the currently active segment and segments_left is decremented at each segment endpoint. There exist two ways for a packet to receive an SRH, we call them encap mode and inline mode. In the encap mode, the packet is encapsulated in an outer IPv6 header that contains the SRH. The inner (original) packet is not modified. A virtual tunnel is thus created between the ingress node (the node that encapsulates) and the egress node (the last segment of the path). Once an encapsulated SR packet reaches the egress node, the node decapsulates the packet and performs a routing decision on the inner packet. This kind of SRH insertion is intended to use for routers that encapsulates in-transit packet. The second SRH insertion method, the inline mode, acts by directly inserting the SRH right after the IPv6 header of the original packet. For this method, if a particular flag (SR6_FLAG_CLEANUP) is set, then the penultimate segment endpoint must strip the SRH from the packet before forwarding it to the last segment endpoint. This insertion method is intended to use for endhosts, however it is also used for in-transit packets by some industry actors. Note that directly inserting extension headers may break several mechanisms such as Path MTU Discovery, IPSec AH, etc. For this reason, this insertion method is only available if CONFIG_IPV6_SEG6_INLINE is enabled. Finally, the SRH may contain TLVs after the segments list. Several types of TLVs are defined, but we currently consider only the HMAC TLV. This TLV is an answer to the deprecation of the RH0 and enables to ensure the authenticity and integrity of the SRH. The HMAC text contains the flags, the first_segment index, the full list of segments, and the source address of the packet. While SR is intended to use mostly within a single administrative domain, the HMAC TLV allows to verify SR packets coming from an untrusted source. This patches series implements support for the IPv6 flavor of SR and is logically divided into the following components: (1) Data plane support (patch 01). This patch adds a function in net/ipv6/exthdrs.c to handle the Routing Header type 4. It enables the kernel to act as a segment endpoint, by supporting the following operations: decrementation of the segments_left field, cleanup flag support (removal of the SRH if we are the penultimate segment endpoint) and decapsulation of the inner packet as an egress node. (2) Control plane support (patches 02..03 and 07..09). These patches enables to insert SRH on locally emitted and/or forwarded packets, both with encap mode and with inline mode. The SRH insertion is controlled through the lightweight tunnels mechanism. Furthermore, patch 08 enables the applications to insert an SRH on a per-socket basis, through the setsockopt() system call. The mechanism to specify a per-socket Routing Header was already defined for RH0 and no special modification was performed on this side. However, the code to actually push the RH onto the packets had to be adapted for the SRH specifications. (3) HMAC support (patches 04..06). These patches adds the support of the HMAC TLV verification for the dataplane part, and generation for the control plane part. Two hashing algorithms are supported (SHA-1 as legacy and SHA-256 as required by the IETF draft), but additional algorithms can be easily supported by simply adding an entry into an array. [1] https://tools.ietf.org/html/draft-ietf-spring-segment-routing-09 [2] https://tools.ietf.org/html/draft-ietf-6man-segment-routing-header-02 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-10ipv6: sr: add documentation file for per-interface sysctlsDavid Lebrun1-0/+18
This patch adds documentation for some SR-related per-interface sysctls. Signed-off-by: David Lebrun <david.lebrun@uclouvain.be> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-10ipv6: sr: add support for SRH injection through setsockoptDavid Lebrun2-4/+85
This patch adds support for per-socket SRH injection with the setsockopt system call through the IPPROTO_IPV6, IPV6_RTHDR options. The SRH is pushed through the ipv6_push_nfrag_opts function. Signed-off-by: David Lebrun <david.lebrun@uclouvain.be> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-10ipv6: add source address argument for ipv6_push_nfrag_optsDavid Lebrun4-7/+9
This patch prepares for insertion of SRH through setsockopt(). The new source address argument is used when an HMAC field is present in the SRH, which must be filled. The HMAC signature process requires the source address as input text. Signed-off-by: David Lebrun <david.lebrun@uclouvain.be> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-10ipv6: sr: add calls to verify and insert HMAC signaturesDavid Lebrun2-0/+31
This patch enables the verification of the HMAC signature for transiting SR-enabled packets, and its insertion on encapsulated/injected SRH. Signed-off-by: David Lebrun <david.lebrun@uclouvain.be> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-10ipv6: sr: implement API to control SR HMAC structureDavid Lebrun1-0/+229
This patch provides an implementation of the genetlink commands to associate a given HMAC key identifier with an hashing algorithm and a secret. Signed-off-by: David Lebrun <david.lebrun@uclouvain.be> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-10ipv6: sr: add core files for SR HMAC supportDavid Lebrun10-0/+612
This patch adds the necessary functions to compute and check the HMAC signature of an SR-enabled packet. Two HMAC algorithms are supported: hmac(sha1) and hmac(sha256). In order to avoid dynamic memory allocation for each HMAC computation, a per-cpu ring buffer is allocated for this purpose. A new per-interface sysctl called seg6_require_hmac is added, allowing a user-defined policy for processing HMAC-signed SR-enabled packets. A value of -1 means that the HMAC field will always be ignored. A value of 0 means that if an HMAC field is present, its validity will be enforced (the packet is dropped is the signature is incorrect). Finally, a value of 1 means that any SR-enabled packet that does not contain an HMAC signature or whose signature is incorrect will be dropped. Signed-off-by: David Lebrun <david.lebrun@uclouvain.be> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-10ipv6: sr: add support for SRH encapsulation and injection with lwtunnelsDavid Lebrun9-1/+526
This patch creates a new type of interfaceless lightweight tunnel (SEG6), enabling the encapsulation and injection of SRH within locally emitted packets and forwarded packets. >From a configuration viewpoint, a seg6 tunnel would be configured as follows: ip -6 ro ad fc00::1/128 encap seg6 mode encap segs fc42::1,fc42::2,fc42::3 dev eth0 Any packet whose destination address is fc00::1 would thus be encapsulated within an outer IPv6 header containing the SRH with three segments, and would actually be routed to the first segment of the list. If `mode inline' was specified instead of `mode encap', then the SRH would be directly inserted after the IPv6 header without outer encapsulation. The inline mode is only available if CONFIG_IPV6_SEG6_INLINE is enabled. This feature was made configurable because direct header insertion may break several mechanisms such as PMTUD or IPSec AH. Signed-off-by: David Lebrun <david.lebrun@uclouvain.be> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-10ipv6: sr: add code base for control plane support of SR-IPv6David Lebrun7-2/+278
This patch adds the necessary hooks and structures to provide support for SR-IPv6 control plane, essentially the Generic Netlink commands that will be used for userspace control over the Segment Routing kernel structures. The genetlink commands provide control over two different structures: tunnel source and HMAC data. The tunnel source is the source address that will be used by default when encapsulating packets into an outer IPv6 header + SRH. If the tunnel source is set to :: then an address of the outgoing interface will be selected as the source. The HMAC commands currently just return ENOTSUPP and will be implemented in a future patch. Signed-off-by: David Lebrun <david.lebrun@uclouvain.be> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-10ipv6: implement dataplane support for rthdr type 4 (Segment Routing Header)David Lebrun7-0/+284
Implement minimal support for processing of SR-enabled packets as described in https://tools.ietf.org/html/draft-ietf-6man-segment-routing-header-02. This patch implements the following operations: - Intermediate segment endpoint: incrementation of active segment and rerouting. - Egress for SR-encapsulated packets: decapsulation of outer IPv6 header + SRH and routing of inner packet. - Cleanup flag support for SR-inlined packets: removal of SRH if we are the penultimate segment endpoint. A per-interface sysctl seg6_enabled is provided, to accept/deny SR-enabled packets. Default is deny. This patch does not provide support for HMAC-signed packets. Signed-off-by: David Lebrun <david.lebrun@uclouvain.be> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-10net: mii: report 0 for unknown lp_advertisingArnd Bergmann1-0/+2
The newly introduced mii_ethtool_get_link_ksettings function sets lp_advertising to an uninitialized value when BMCR_ANENABLE is not set: drivers/net/mii.c: In function 'mii_ethtool_get_link_ksettings': drivers/net/mii.c:224:2: error: 'lp_advertising' may be used uninitialized in this function [-Werror=maybe-uninitialized] As documented in include/uapi/linux/ethtool.h, the value is expected to be zero when we don't know it, so let's initialize it to that. Fixes: bc8ee596afe8 ("net: mii: add generic function to support ksetting support") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-10xen-netback: prefer xenbus_scanf() over xenbus_gather()Jan Beulich1-6/+6
For single items being collected this should be preferred as being more typesafe (as the compiler can check format string and to-be-written-to variable match) and more efficient (requiring one less parameter to be passed). Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>